Skip to main content

Making the grade

Essay: Rankings of schools take focus from grappling with deeper problems

Researcher: Ethan Hutt
Illustration showing a maze of school evaluation factors

U.S. News & World Report announced in October 2021 that in addition to its annual rankings of high schools, colleges, and graduate schools, the company was releasing a list ranking the top elementary and middle schools in the country. Immediately, teachers, school officials, scholars, and commentators on both the left and the right overwhelmingly panned the announcement.

The Edge: Rankings of elementary and middle schools, such as those by U.S. News & World Report, too frequently describe a simplified view of schools that is divorced from reality, says Ethan Hutt, the Gary Stuck Faculty Scholar in Education. In a column published by The Washington Post, Hutt says the rankings encourage a chase after statistical trends rather than close examination of actual school needs. In a Q&A, Hutt talks about what kinds of data would be helpful.

Although U.S. News has long been criticized for distorting perceptions of schools and the choices of school leaders by ranking schools based on a handful of “performance” variables, Americans have always been concerned with the quality of their schools, a concern that has frequently led them to quantify schooling. In fact, the impulse driving the U.S. News rankings, which so many found disturbing, also drives much of contemporary education policy.

Evaluation and comparison have always been ways to assuage anxieties and uncertainties about educating the young. The first standardized tests in American public schools were given in Boston in 1845. The brainchild of Horace Mann and his colleague, Samuel Gridley Howe, the tests were intended to demonstrate the need for serious changes in public schools.

When test score data wasn’t available, school officials sought other information as proxies for school quality. For instance, the percentage of enrolled students attending school on a daily basis and the number of “over-aged” students in a grade were understood as indicators of school efficiency and objective points of comparison across school systems.

Throughout the first decades of the 20th century, school statistics and ranked lists became common features of annual reports and newspaper coverage as the public sought to understand how well their schools were doing.

But even as these statistics circulated, experts recognized their considerable limitations. Education was an inherently local affair, so information was valuable only in the context of local decision-making. Without standard curriculums, textbooks, funding formulas, teacher licensure, or graduation requirements across states or even districts, how useful could statistical comparisons really be?

Indeed, in 1959, after conducting a nationwide study of high schools, Harvard University President James B. Conant concluded it was “impossible” to discriminate among them. There were “too many high schools of too many different types” to allow for generalizations — one could “make valid judgments about American secondary education, but only school by school.”

Conant’s warning went unheeded. Throughout the 1950s and ’60s, policymakers and analysts became increasingly convinced that schools, like businesses, were just “systems” that brought various inputs together to produce desired outputs. In this view, the system could be optimized simply by measuring, monitoring, and adjusting the inputs. Such a view was deeply appealing to federal policymakers who, in the midst of the Cold War, had become interested in maximizing the development of American brainpower.

The problem: It was deeply out of step with the reality of American schooling and required standardized information that simply didn’t exist.

In 1958, researchers with the New York Quality Measurement Project, one of the first federally funded projects to scrutinize the relationships between inputs and outputs, found that even in a state with a relatively centralized school system like New York, there was too much variation in district record-keeping to collect usable information.

The result was an effort that would be repeated regularly over the next half-century: In the absence of a standardized system of schools, researchers produced standardized data about schools — about the quality of their institutional resources, the character of their communities, the performance of their students — to stand in its place and enable statistical analysis.

“Our willingness to accept comparisons at face value, without interrogating the historical and contemporary processes that produced them, has left us chasing statistical trends instead of taking on the political challenges necessary to improve our schools.”
Ethan Hutt

The stylized statistical portraits of the U.S. school system created by these data sets provided descriptive insight but imposed an artificial order on an inevitably messy reality. For instance, in the absence of national curriculum, the National Assessment of Educational Progress (NAEP), which was first given in 1969, promised a nationalized picture of student achievement.

But the administered test didn’t resemble the curriculum students were exposed to. How could it in a country with no national standards and no national curriculum? As a result, the information produced could not be used to improve school quality. One researcher remarked that even if uniform data about the nation’s schools could be produced, little would be gained through its analysis because “Chicago and San Francisco differ on so many dimensions that it is not an interpretable comparison.”

Nevertheless, the possibility of collecting and mining mountains of newly available data proved extremely appealing to a new generation of policymakers and analysts trained in quantitative analysis, as well as to reformers coming to terms with the dimming prospects for radical systemic change. In the absence of big budgets or social movements for change, policymakers narrowed their focus to variables — class size, algebra for all, teacher credentials — that required only organizational change and, therefore, were available irrespective of local context or the prevailing politics.

In 1972, Harvard sociology professor Christopher Jencks’s book “Inequality” sought to draw on the newly available school performance data to argue that school systems were fundamentally incapable of addressing social inequality.

Unsatisfied with this “new quantification,” a group of Black scholars, including Ronald Edmonds, Andrew Billingsley, and James Comer, pointed out the local realities these generalized statistical accounts ignored but profoundly shaped racial inequality in schools.

“We hasten to point out,” they wrote, “that public schools are not now, nor have they ever been committed to the radical notion that they are obliged to teach certain minimum school skills … to all pupils.”

To announce a statistical relationship between schools and inequality without consideration of past and present inequities was, in their view, to absolve society of its obligation to provide quality education to all children. These scholars worried the statistics would be used to short-circuit the political push for equality.

Those concerns proved well-founded.

Indeed, the foundational premise of the No Child Left Behind Act of 2001 and its successor, the Every Student Succeeds Act of 2015, was that schools can achieve equality in student test scores, irrespective of history or place. Attempts to excuse or explain variation by pointing to historical injustice or contemporary inequality was taken as a sign of what President George W. Bush described in 2000 as the “soft bigotry of low expectations.”

Today, the production of quantitative data about schools — the same data used to compute the U.S. News rankings — has become the backbone of U.S. school overhauls. Whereas a century ago, these data served as a basis for political debates about schools, their production is now often seen as an end in itself — not to facilitate public debate but to enable private decision-making, often through parent choice.

But numbers narrow attention and shift blame: The data displayed in ranking tables imply that if only the leaders of a particular school would offer more AP classes, improve student-teacher ratios, or raise test scores, then theirs could be among the “best” schools. This simplicity is appealing: It implies a clear silver bullet for school improvement. It also paints a picture of schooling that is divorced from reality.

Regardless of the picture presented in the ranking tables, the performance of schools cannot be understood separated from place, politics, and history.

Our willingness to accept comparisons at face value, without interrogating the historical and contemporary processes that produced them, has left us chasing statistical trends instead of taking on the political challenges necessary to improve our schools.

A version of this article, with the headline “This is the problem with ranking schools,” appeared in The Washington Post’s “Made By History” series on Oct. 22, 2021.

More from Ethan Hutt

Following is a Q&A with Ethan Hutt regarding his essay:

Edge: Informed by knowledge of the history of the use of data regarding schools, what do you anticipate will be some of the consequences of these rankings?

“The rankings don’t provide additional resources for schools to improve so the result is schools do what they can to make themselves appear stronger on the metrics that count.”
Ethan Hutt

Hutt: We can reasonably expect two different consequences from these rankings. First, we should expect that the information produced by rankings will exacerbate inequality and economic, if not racial, segregation in our schools. Though the idea of publicly available school information might appear egalitarian in theory, in practice, we have seen time and again that parents with more resources are in a much better position to act on this information.

Wealthier parents are much more likely, for instance, to have the resources to buy into the neighborhood with the more highly ranked schools or, in the context of school choice systems, have the time and access to transportation necessary to drive their children across town to the school of their choice. Because wealth is tied to student and school performance, this becomes a self-fulfilling cycle that exacerbates inequality and segregation in our schools.

The second consequence in part follows from the first: To the extent the parents respond to the information in the rankings, we should expect schools to try to improve their standing in the rankings. Again, in the abstract this might seem like a positive development: Don’t we want schools improving their metrics, maybe even competing with each other to do so?

In practice, this almost always ends badly. That’s because school reform is hard, but gaming metrics is easy.

We’ve seen this at literally every level of schooling from K-12 to colleges and graduate schools: Schools respond to rankings pressures by goosing the numbers. Whether they do this through single-minded attention or outright fraud, the result is almost never substantive improvement.

The rankings don’t provide additional resources for schools to improve so the result is schools do what they can to make themselves appear stronger on the metrics that count.

Edge: In a paper you co-authored in 2020, you described how greater access to information about schools can exacerbate inequality as affluent families are more likely to seek the information and have more ability and resources to act on it. How might that play out with rankings of elementary and middle schools?

Hutt: In that paper, we point out that producing information about schools is a time-honored tradition in American public education going all the way back to the 19th century. But in the last three decades, in particular, it has been a more explicit goal of public policy.

The original ideas of accountability, transparency, and a generally informed public have been supplanted by a more explicit rhetoric of empowering and facilitating informed choices by parents and, to a lesser degree, a general notion of competition and holding schools accountable for performance.

This latter logic we call “public accountability” (as opposed to high stakes, sanctions-based accountability) — the public through their choices and, perhaps, through political action will hold schools accountable for performance.

It is not surprising that when families make private choices for their children, the benefits of those choices tend to accrue to those families best able to leverage the information and enact their preferences. There is very little evidence that these benefits spill over into a more general public good. In fact, we see the opposite.

School quality gets reflected in home prices and philanthropic giving in a way that exacerbates inequalities along racial and socioeconomic lines. Real estate websites already include general information about school quality in house listings, and I can’t imagine these websites won’t soon incorporate ranking information as well. Since these are likely just proxies for school demographics, the effects will only push in one direction: toward more stratification.

Edge: In that paper, you proposed a framework for policies around disclosure of information about schools that takes into account how actionable — which is the ability of families to make choices based on that data — and whether benefits from the disclosure are of a public or private nature. Given that, how could school district leaders or other education policymakers respond to these U.S. News rankings? Are there additional data or other information they could share to help inform families about schools?

“The best way to figure out what information to put out is to engage the local community in conversations about what it means to have a good school and then try to measure those things.”
Ethan Hutt

Hutt: No one is going to argue in favor of less information or less transparency, and that’s probably a good thing. The distinction we were trying to draw in the paper was between actionable information where the benefit was likely to accrue for individual families versus the larger public.

For instance, before you decide to publish in the newspaper the value-added score for every third-grade teacher, you might ask: What is the most likely use of this information? The answer is almost certainly going to be: Individual parents with a particular disposition — and one might say sense of entitlement — are going to lobby the principal or whoever else to ensure their child gets the best teacher or, at the very least, avoids the worst one.

Does this produce a public benefit? Almost certainly not. The benefit here is to particular children and families.

There is other information a school could produce, however, that is much more likely to produce a public discussion and public benefit. For instance, we’ve learned a lot about school discipline policies as a result of the collection and public release of information about school suspensions and referrals, especially as it concerns the racial disparities in school discipline. The reform of school discipline policies, of course, produces certain private benefits to individuals whose punishments might have been more severe in the absence of reform, but it also establishes greater equity in disciplinary practices, which is a public benefit.

To take another example, if a school or district produced survey information from students, teachers, and parents on school climate or demographic information about honors/Advanced Placement enrollments, this information is unlikely to trigger a cycle of lobbying resulting in private benefits. It is much more likely to spark a community conversation about what it means to be a good, supportive school or about the barriers to equal educational access.

My general advice is always that more metrics are better than fewer metrics because more measures means — hopefully — a more holistic, substantive view of school quality. If we reduce measures of school quality to test scores in two areas, for instance, that contributes to a pretty narrow view of schooling.

The best way to figure out what information to put out is to engage the local community in conversations about what it means to have a good school and then try to measure those things. I doubt very much that communities would come up with uniform answers to that question and doubt still more that they would produce the answers that are embedded in current rankings.

Quantification is a powerful force, so we need to wield it carefully. We must ensure, to the maximum degree possible, that our measures reflect, rather than degrade, our core commitments about our schools.

Hutt, E., & Polikoff, M. S. (2020). Toward a Framework for Public Accountability in Education Reform. Educational Researcher, 49(7), 503–511.

Hutt, E. L., Polikoff, M. S. (2018). Reasonable expectations: A reply to Elmendorf and Shanske 2018. University of Illinois Law Review Online, 2018, 194–208.