Skip to main content

Invigorating methods of measurement and building better theory

Researcher: Gregory Cizek
performance meter

When a student takes a modern, well-developed test in school today, we can trust that higher scores indicate a sufficient grasp of a subject such as reading or math, while lower scores suggest a need for … something. But who decides what that something is, and how can we have confidence that the something will truly benefit the students or teachers who must own the scores – and the consequences?

There are few, if any, rigorous approaches for answering such questions, says Gregory Cizek, Guy B. Phillips Distinguished Professor of Educational Measurement and Evaluation, and his research is addressing that gap. It’s the lesser-studied side of validity theory which Cizek says needs a thorough reboot to help schools, systems and teachers investigate whether the decisions based on test scores actually meet the school’s intended goals and justify their use.

The Edge
The use of standardized testing in education is a given. And, there is a tremendous amount of research on and evidence for what test scores mean when it comes to measuring a student’s grasp of a concept. How to determine what are relevant, effective uses of that information is an understudied area of validity theory where Gregory Cizek is breaking new ground. Educators have little to rely on in terms of literature when it comes to justifying the way they use test scores. Cizek’s revitalization of this area of research can bring new ways of thinking and strategizing about what is fair and equitable when it comes to student learning and achievement, as well as the evaluation of educators. This new research could be essential to developing more transparent, objective testing practices that invite trust from parents, students and teachers.

Cizek is a national authority on educational measurement and evaluation, having conducted research for more than 30 years in the field of applied assessment with specializations in standard setting, validity and test security. Prior to joining the faculty at Carolina’s School of Education, Cizek managed national licensure and certification testing programs for American College Testing, served as a test development specialist for a statewide assessment program, and taught elementary school for five years in Michigan.

He has written extensively on the subjects, including authoring or editing books such as “Setting Performance Standards: Foundations, Methods, and Innovations.” He has served as president of the National Council on Measurement in Education. He is currently serving on the National Assessment Governing Board, which helps set policy for the National Assessment of Educational Progress, also known as the “Nation’s Report Card.”

Cizek regularly examines questions about the development and use of tests and their results.

Relying on little research

Imagine your school offers two choices for third grade students whose year-end reading scores don’t meet the standards for promotion to fourth grade: retention or reading camp.

In order to avoid repeating third grade, most would choose the reading camp. But that choice alone leads to an entirely new set of considerations that go beyond books.

If you’re a parent, you might be worried about disrupting long-ago set summer schedules and vacations. If you’re a teacher, you might wonder if the overtime is worth the interruption to a well-earned break. Administrators may consider the cost to cool the buildings during the summer, cut the grass or run the busses out to rural areas. And, of course, the third grader complains he’ll be kissing his summer goodbye.

It would be important to verify, before such school policy decisions are even made, that they are going to work. There needs to be a considerable amount of certainty that summer reading camp is an appropriate response to low reading scores.

It’s difficult to turn to the research for advice when there’s not a lot of it out there, says Cizek. For example, considering the plight of the third grade student just described, there is often strong evidence that the lower test score means the child needs more instruction to be prepared for success in fourth grade, but there is often too little concern about gathering the evidence necessary to support that it is justified to use that score to make summer placements.

“If the end-of-summer retests shows growth in the students, then the camp was a success, and some of the annoyances surrounding it may be worth it because it helped the students master the subject,” he said. “But, if no one thinks far enough down the line beforehand to consider the consequences – both good and bad – or even alternatives, you risk making decisions that cost both the school and the families.”

Inspiring trust in tests

And, when those decisions cost schools and families, but do little to affect improvement in a child’s reading level, you can count on discontent.

In education, few topics inspire such quick controversy as standardized testing, especially the end-of-grade tests that can determine if a student meets the benchmarks for promotion. No matter how you feel about testing, you can feel the tension rise when parents and teachers start to talk about it.

Cizek’s work in measurement doesn’t address whether or not schools should use standardized testing in classrooms, but to study how that testing can be fair and equitable when it comes to student learning and achievement, as well as in the evaluation of educators.

“Testing and measurements should be used to make sure educational decisions are fair,” says Cizek. “We gather information that is useful in making important decisions like assigning letter grades, awarding diplomas, or issuing certifications and promotions. In order to be fair, we need to make sure this information is accurate in order to make these decisions.”

Validity theory in education evaluates the extent to which a test yields scores that have the meaning they are intended to have. It uses known, credible sources of evidence to support those interpretations of test scores. Nearly half a century of educational research, guidelines and traditions on measurement mean there are plenty of verified resources available, such as Standards for Educational and Psychological Testing, to make reasonable choices when investigating and confirming the meaning scores.

Developing the side of validity theory that focuses on justifying score use is essential when working toward a more comprehensive and objective system of testing that invites trust, says Cizek. Providing some certainty to the practices schools and systems have in response to scores will inspire confidence among worried parents who wonder how testing impacts their child.

Teachers who may be evaluated positively or negatively based on a class’s test scores, or who are responsible for developing plans that would bring those students up to grade level, need better guidance to support them in using the best practices.

“There is a lot of educator discontent surrounding testing, especially when we use a class’s performance on a test for teachers’ evaluations, promotions and bonuses, but we haven’t done a good job of backing up those choices with evidence showing they really do fit,” says Cizek. “A lower score can tell us a student needs help, and we’re confident in that knowledge. But a teacher using those test scores to divide students into different reading groups based on reading level or myriad other important decisions still doesn’t have clear guidelines for when it’s justified to use a score for a specified purpose.”

Modernizing the method

Validity theory is an ever-evolving process, and it should be, says Cizek. As with score meaning, unbiased, objective guides for score use protect learners and educators from any one singular voice deciding how to use that information. Decisions should be based on data and broad input, and not on hierarchies within school systems or the concerns of a few.

Modernizing validity theory to make a clear distinction between justifying score meaning and justifying score use can open doors to new work on the latter part of that theory, which Cizek says can make tests more valuable and useful for all involved.

That’s why Cizek is hoping to see this new area of study grow. Through his research and publications, Cizek has invited his peers in the national education community to call for a more systematic and rigorous process for justifying use to explore this part of validity theory in a way that balances validation for score meaning.

“This part of validity theory needs some real revision and thought because it should include many voices – teachers, parents, academics – as we build these traditions for justifying test score use,” he said. “Our stakeholders have to be involved in helping us create the educational policies that affect them, because if their voices aren’t heard, we alienate them from the start.”


  • Cizek, G. J., Rosenberg, S. L., & Koons, H. K. (2011). Finding validity evidence: An analysis using Mental Measurements Yearbook. In J. A. Bovaird, K. F. Geisinger, & C. W. Buckendahl (Eds.), High stakes testing in education: Science and practice in K-12 settings (pp. 119-138). Washington, DC: APA Books.
  • Cizek, G. J. (2016). Validating test score meaning and defending test score use: different aims, different methods. Assessment in Education: Principles, Policy & Practice, 23(2), 212-225. DOI: 10.1080/0969594X.2015.1063479
  • Cizek, G. J. (2016). Progress on validity: The glass half full, the work half done. Assessment in Education: Principles, Policy & Practice, 23(2), 304-308, DOI: 10.1080/0969594X.2016.1156642
  • Cizek, G. J. (2012). Defining and distinguishing validity: Interpretations of score meaning and justifications of test use, Psychological Methods, 17(1), 31-43.
  • Cizek, G. J., Bowen, D., & Church, K. (2010). Sources of validity evidence in educational and psychological tests: A follow-up study. Educational and Psychological Measurement, 70, 732-743.
  • Cizek, G. J. (2009). Reliability and validity of information about student achievement: Comparing the contexts of large scale and classroom testing. Theory Into Practice, 48, 63–71.
  • Cizek, G. J., Rosenberg, S., & Koons, H. (2008). Sources of validity evidence for educational and psychological tests. Educational and Psychological Measurement, 68, 397-412.
  • Cizek, G. J., Kosh, A. E., & Toutkoushian, E. K. (2017, April). Essential content validity evidence and innovation in alignment methodology. Paper presented at the annual meeting of the National Council on Measurement in Education, San Antonio, TX.
  • Kosh, A. E., Cizek, G. J., & Toutkoushian, E. K. (2017, April). Gathering and evaluating validity evidence: The generalized assessment alignment tool. Presentation at the annual meeting of the National Council on Measurement in Education, San Antonio, TX.
  • Cizek, G. J. (2012d, April). Validation and justification: Recommendations for conceptual clarity in test score meaning and use. Paper presented at the annual meeting of the American Educational Research Association, Vancouver, WA.