Skip to main content

Study: Teacher evaluation reforms failed to improve student outcomes

Broadest assessment of reform efforts finds near-zero impact

Reforms of teacher evaluation systems across the country during the last dozen years have largely failed their primary goal: To raise student academic performance.

That’s one of the findings of a new study co-authored by UNC School of Education researcher Matthew Springer, Ph.D., published in December as a working paper by the Annenberg Institute at Brown University.

The study, which its authors say provides the broadest and most generalizable evidence of the efficacy of teacher evaluation reforms in the U.S., concludes that despite billions of dollars spent reforming teacher evaluation systems, the reforms have had almost zero positive effect on student outcomes.

“These data show that on average across the country, teacher evaluation reforms haven’t had their intended effect,” said Springer, the Robena and Walter E. Hussman, Jr. Distinguished Professor of Education Reform. “We found that while linking teacher evaluations to student performance has worked in a few places, it has proved to be very difficult for most school districts to establish these systems in ways that contribute to better academic outcomes for students.”

Racing to nowhere

Before the reforms, teacher evaluations relied primarily on observations, had little direct connection to teacher compensation or employment, and saw nearly all teachers receiving satisfactory ratings, leaving no way to differentiate among the teachers’ performances, researchers have documented.

Joshua Bleiberg, Eric Brunner, Erica Harbatkin, Matthew A. Kraft, and Matthew Springer. (2021). The Effect of Teacher Evaluation on Achievement and Attainment: Evidence from Statewide Reforms. (EdWorkingPaper: 21-496). Retrieved from Annenberg Institute at Brown University:

Reform proponents advocated that teacher evaluation systems that take into account student performance would make it possible for school districts to reward effective teachers, while also identifying lower-performing teachers in need of professional development or to be removed from their jobs.

Incentivized by the federal government’s Race to the Top grant competitions between 2009 and 2017, 44 states and the District of Columbia implemented reforms aimed at linking the evaluations of teachers to the academic performance of their students.

A team of researchers — Springer and colleagues from Michigan State University, Brown University, and the University of Connecticut — set out to analyze the effects of the reforms, measuring student performance during the period 2009 to 2018 on standardized mathematics and English Language Arts exams, augmented with data on the student attainment outcomes of high school graduation and college enrollment.

The bottom line: The reforms had no discernable effect on student achievement in mathematics or English Language Arts and little effect on educational attainment.

The team went on to examine whether differences among teacher evaluation systems produced different results, finding that they did not.

Why didn’t the reforms work?

Previous studies have found that teacher evaluation reforms implemented in a few individual school districts and states — such as Washington, D.C., Chicago, Denver, Newark, Dallas, Tennessee, and New Mexico — have shown positive impact on student achievement.

Analysis by Springer and team confirmed those findings, giving the team confidence in the validity of their analytical methods.

But, the team said, while the findings of successful reforms in a few places demonstrate that it is possible to create teacher evaluation systems that take into account student outcomes, the very few examples of success highlight the fact that it is difficult to do so. The experiences in those few districts and states are not generalizable across the nation as a whole, the researchers said.

The actual design and implementation of reformed evaluation systems across the country frequently failed to follow proven best practices for performance management systems, with the result being systems that only vaguely resembled what reformers had envisioned, the team said. As a result, reformed evaluation systems often were not meaningfully different than the status quo, the team said. Additionally, states that did adopt more rigorous features in their evaluation systems typically failed to sustain them over time.

The reform efforts also may have had unintended consequences of driving down job satisfaction among educators and imposing burdensome demands on administrators’ time, perhaps displacing other more productive activities, the team said.

Another word with Matthew Springer

Following is a Q&A with Springer regarding the findings of the study:

Why have teacher evaluation reforms generally failed to lift student achievement?

Springer: My hunch is there are two primary culprits — implementation and design. Successful implementation of top-down policies and programs like the one studied in this paper are highly dependent on a change to the behavior of key actors, namely the principals and teachers responsible for student performance. A large amount of research has documented the failure of top-down policy reforms, particularly in the education sector where “mandates” filter from federal to state and district levels and eventually reach schools, classrooms, teachers, and students.

More than 45 states and the District of Columbia have invested in so-called next-generation teacher evaluation systems, which include tenure reforms, widespread use of standards-based teacher performance rubrics, and more frequent and structured observations. But at the same time, the federal government has provided design waivers to states, which, as we note in the paper, essentially allowed some districts and states to water down the implementation of these reforms.

This study looked at the effects of teacher evaluations on students’ academic achievement and attainment. But what about teacher compensation? You’ve done other work studying the use of compensation practices, particularly incentives to reward highly effective teachers, finding that those systems can lead to higher student achievement.

What more do we need to learn about how to make effective incentive programs that can more widely support student achievement and educational attainment?

Springer: A growing body of research documents the important role strategic compensation policies can play in retaining highly effective educators and, ultimately, improving educational opportunities for students. My work with Luis Rodriguez of New York University and Walker Swain of the University of Georgia shows that a retention bonus can shift teachers’ decisions to persist in the challenging work environments of high-accountability, high-poverty, racially isolated schools, and promote higher levels of learning than would have occurred had these teachers left.

However, we have to remember that for many teachers, additional pay alone is inadequate to overcome pressures to leave, and only affects the underlying learning and working conditions to the extent that retained teachers improve the leadership culture in the building. Moving forward, we need to gain a better understanding the role of non-financial incentives, such as the interactions between working conditions and simple salary improvements, as well as how financial incentives can improve teacher supply.

Should policymakers and administrators shift from pressing for high-stakes teacher evaluation systems? Or, are there workable ways to make these systems more effective?

Springer: A first order concern is how districts and states respond to changes in federal guidance regarding teacher evaluation. Even though places like Cincinnati, Chicago, and Washington, D.C., demonstrate that teacher evaluation reforms can realize their intended purpose, states are starting to back off on teacher evaluation reforms and related components. If states continue to disinvest in these policies and related infrastructure, then the potential utility will ultimately fade. And, this includes losing one of the most critical components of teacher evaluation systems today — post-observation performance feedback. Unlike other aspects of state evaluation systems, feedback takes an explicitly developmental approach to achieve better outcomes: Teachers develop as professionals and improve their skills in response to direct feedback on their practice.

In another related study, with my colleague Seth Hunter of George Mason University, we conducted the first large-scale study of post-observation performance feedback provided to early-career teachers and examine how it relates to measures of teacher human capital. While prior research from outside the education sector shows feedback can be an important driver to improve employee performance, we find that few teachers are receiving the individually tailored and substantive feedback that can help them improve their practice.

The bottom line is that if next-generation teacher evaluation policies are to be successful, we need to pay close attention to proper design and implementation.