Imagine that the winner of the Final Four is decided by analyzing the past records of the coaches and teams, the offensive and defensive strategies, the players’
cumulative statistics, and other factors that go into a basketball program. Imagine choosing the champion not by actually playing the games but by evaluating the inputs that went into building the teams.
Now imagine that schools of education were evaluated similarly on the inputs – course syllabi, program policies, weeks spent student teaching, etc. – rather than the performance of their students and graduates in actual classrooms with actual students.
Yet, some critics continue to use such an outmoded and invalid method for evaluating educator preparation programs. The results tell us nothing about how our graduates are actually performing — only whether they met someone’s idea about the policies and practices that should go into teacher preparation.
Our School of Education and others around the state and the country have moved well beyond the “input” method of determining program quality and are generating the data needed to improve educator preparation. We regularly receive data on our programs through the UNC General Administration-supported Teacher Quality Protect (TQP). Researchers involved in the latter have built a database that includes the end-of-grade or end-of-course results for every North Carolina teacher who teaches a tested grade or subject. This enables the researchers to determine how much students’ scores change during the time they are with a specific teacher.
Questions about how much of students’ results can be traced back to their teacher preparation program, the stability of individual teachers’ results over time, the validity of the state student assessment, and the absence of teachers who don’t teach the tested grades or subjects all urge caution in interpreting value-added data. At the same time, however, the data help us gauge where we are in comparisons to other campuses in the UNC system as well as out-of-state and alternate-route prepared teachers. (By the way, our teachers generally outperform teachers in the latter two categories.) The data are also helpful in identifying program elements that we need to improve but are of limited value in figuring out how to improve programs.
More useful for program improvement are data we at the School collect on our students and graduates. Previously, I have written about our adoption of the edTPA — Teacher Performance Assessment — to both evaluate our teacher candidates’ preparedness for the classroom and provide program faculty with information on improving the preparation experience. Three states now require all university-based teacher preparation programs to assess their candidates with the edTPA and another 26 states are at various stages of implementation. In North Carolina, five schools of education in the UNC system are piloting the portfolio-based candidate assessment while three are now using it in all their programs – including us.
In the edTPA process, candidates collect various artifacts on a three- to five-day teaching event or unit. These include videos of their teaching, student work samples, lesson plans, reflection, and so on. Faculty can then analyze the fine-grained data in students’ portfolios to better understand what candidates are actually “taking up” and using in the classroom.. This process often leads faculty to rethink and redesign both individual courses and the program as a whole.
In addition, some institutions send to the Pearson company a sample of portfolios to be scored by trained raters – half of whom are P-12 teachers and the other half from higher education. The results are returned to the institutions so that local faculty can compare their ratings with those from the national raters. This helps them calibrate their local standards to those at the national level.
In addition, the School of Education has created a means to collect fine-grained data on our program quality. We invite recent graduates back to campus four times a year. We ask that they bring specific “problems of practice” to share. They then meet in like-job small groups that our graduate students facilitate. While one graduate student facilitates, a second records the conversation. Faculty subsequently review summaries of these notes to identify the problems their graduates confront and how well they are prepared to deal with them.
Faculty have found these data so useful that some of the “problems of practice” have been compiled into a case book that is now used in our preparation courses.
In addition to these data, the Department of Public Instruction provides observational data on our graduates’ classroom performance. Fully instituted last year, the North Carolina “Teacher Evaluation Process” includes principals’ observations of teachers who are rated on each of the state’s teaching standards. This year’s results for our graduates were encouraging: Most were judged either “Proficient” or “Accomplished” (92 percent) while 4 percent were deemed “Distinguished” and only 4 percent “Developing.” None of our graduates appear in the lowest category — “Not Demonstrated.”
Like all other evaluative tools, the observational protocol has flaws. This is why the researchers in the Gates Foundation-supported “Measures of Effective Teachers” (MET) project concluded that the most valid picture of effective teaching requires data from value-added models, classroom observations, and student surveys.
Finally, the UNC system’s General Administration is also supporting a survey of new graduates’ evaluations of their programs. The low response rates to the first attempt to survey graduates across the system limits the value of the data. The potential value, however, is worth continuing efforts to improve the response rate.
Again, the flaws of such self-reported data are well-known. Such surveys used in conjunction with other sources of data – like those described above – do add value just as the MET study touts the value of student surveys as a critical component in teacher evaluation systems.
In sum, a seismic shift in teacher preparation has occurred over the past couple of decades. The field has rejected the “evaluating inputs” model of determining program quality and is now sharply focused on evaluating candidates’ and graduates’ actual classroom performance as the measure of program quality.
Other fields of professional education — medicine, nursing, pharmacy, public health, and so on — have made a similar shift. Yet, I don’t believe any of these fields is collecting such a similarly broad array of data on their candidates and graduates.
This signals our agreement with many of our critics that we need to do a much better job of preparing teachers. To do that, we need more and better data on our candidates’ and graduates’ classroom performance to inform our program improvement efforts. The data we are receiving tell us that many of our graduates appear to be making a real difference for their students but that we still have a ways to go in specific areas such as elementary reading.
We are committed to making programmatic changes within our School of Education aimed at making our students more effective when they go to work. The increasing use of multiple measures of program quality reflects the commitment of teacher educators to ensuring that their charges are prepared for the demands of 21st century schools and classrooms.
Like any Final Four team, we ask that we be judged not by selected inputs but rather by our graduates’ performance “on the court.”
Bill McDiarmid is dean of the School of Education at UNC-Chapel Hill.