State Test Results Are In. Are They Useless?

Save to favorites
Print

Copy URL

Educators have been bracing for them, and now they鈥檙e here: the first state test results since COVID-19 interrupted K-12 schooling. Districts, states, and schools are poring over the data from spring 2021 tests, hoping to understand exactly how鈥攁nd how badly鈥攖he pandemic affected children鈥檚 learning.

But even though educators are hungry for insight, assessment experts are urging caution. This year, more than any in recent memory, calls for extreme care and restraint when analyzing statewide test scores, drawing conclusions, and taking action, they say.

Like schooling itself, standardized testing was deeply disrupted in many ways last spring, which may have distorted the meaning and utility of the results. In some cases, state test data will be virtually useless, the experts say. In others, with thoughtful analysis, the data can yield insights that could help leaders and educators allocate resources and help children rebuild academic muscle.

Here are some key considerations鈥攁nd important cautions鈥攆or state, district, and school leaders, and teachers, to bear in mind as they review state test scores.

A lot happened with state tests in 2021 that could affect the results

In 2020, the U.S. Department of 91制片厂视频 allowed states to skip federally required assessments. In 2021, however, states had to administer those tests. But that doesn鈥檛 mean it was business as usual.

In a handful of states, some students took tests remotely, while others took them in person. Massachusetts, for instance, allowed students in grades 3-8 to take remote tests if their schools were in remote learning mode, and more than 15 percent of those students did so.

Some states made other changes to their testing regimens. A few gave shortened versions of their tests. gave its English/language arts test only in grades 3, 5, and 7 and its math test only in grades 4, 6, and 8. , some districts gave the Smarter Balanced test, and others used assessments of their choosing.

Many states saw fewer students take the test than usual, though, and that is the factor poised to exert the most widespread influence on the validity and comparability of state test data. According to the Center for Reinventing Public 91制片厂视频, which has been monitoring states鈥� responses to COVID-19, or more.

Some states reported participation rates as low as 10 percent (New Mexico) and 30 percent (Oregon). Participation also varied markedly within states: Colorado reported regional participation rates ranging from 51 percent to 88 percent.

A number of factors fueled low participation rates, including many parents who chose not to send their children into school buildings simply to take a test. And schools likely felt less pressure to insist that students show up for testing, since the 91制片厂视频 Department waived its accountability rules that normally penalize schools for testing fewer than 95 percent of their students.

鈥淭here was a wide variety in the ways testing played out,鈥� said Terra Wallin, who advised the 91制片厂视频 Department on assessment and accountability from 2014 to 2017 and now oversees those issues for 91制片厂视频 Trust, a civil rights advocacy group. 鈥淭here are still ways states could look at general patterns [in test-score data], do a higher-level examination, to help them think about how best to use federal funding for recovery, but they need to proceed with caution.鈥�

Ask key questions before deciding how to use the data

Experts say it鈥檚 important to ask three crucial questions about your state test data.

Did any of our students take the test remotely? If so, those scores shouldn鈥檛 be viewed as comparable to the scores of students who took it in person. That 鈥渕ode effect鈥� is a key tenet of assessment: Whether a student takes a test online or with paper and pencil can influence the results.

Did we use the same test as in 2019? If you switched tests, or changed the length or frequency of your test, a detailed expert analysis could be needed to confirm the validity of the 2021 results鈥攚ere there enough questions in each strand of the academic standards, for instance, to generate a valid score?鈥攁nd to establish that those results can be compared with 2019 results.

How many of our students鈥攁nd which ones鈥攖ook the test? This 鈥減articipation rate,鈥� experts say, is very important in understanding what state tests say鈥攐r can鈥檛 say鈥攁bout student learning. They urge educators to dig deeper than the overall state or district participation rate and find out who took the test and who didn鈥檛.

Imagine that an analysis shows that the students who skipped the test were disproportionately those who scored low in previous years. That would skew test results artificially high, and stalled progress might appear less severe than it actually is.

That isn鈥檛 just speculation, either. It鈥檚 likely that remote learners account for many missed tests and it became increasingly apparent during the pandemic that low-income, Black, and Latino students were far likelier to be learning remotely than other students. And on state test results is finding that COVID鈥檚 impact on learning isn鈥檛 concentrated just in elementary schools, or among traditionally low-performing students, as early analyses of interim tests suggested; it鈥檚 broader, affecting students at all grades and achievement levels.

Enrollment declines, widely documented in many grades, can also play havoc with sound interpretations of test scores. Again, it鈥檚 important to understand the academic and demographic profiles of who stopped coming to school, experts say.

鈥淚f you aren鈥檛 paying attention to how the population is changing, you鈥檙e misinterpreting your scores,鈥� said Andrew Ho, a Harvard University professor of education who focuses on assessment. He urges state leaders to perform a of their test scores to ensure valid comparisons. This is done by separately comparing each group鈥攖he students who took tests and those who didn鈥檛鈥攐nly to groups who performed similarly in the past.

鈥淲e鈥檝e just got to avoid a na茂ve analysis鈥� of 2021 test-score data, said Derek Briggs, a University of Colorado professor who leads the National Council on Measurement in 91制片厂视频, whose members design and study K-12 assessments.

鈥淭he danger here is that we report 2021 scores as observed in 2021, without doing any other analysis. People want to compare them to 2019, and they鈥檙e going to interpret the difference as the effect of COVID.鈥� But the pool of students who took the tests in 2021 changed, and that requires deeper analysis than in other years, he said.

Briggs is worried that districts and states won鈥檛 take the shifting test pool into account, and they鈥檒l take reassurance from a falsely rosy picture. That鈥檚 a particular danger in any state or district where fewer than 90 percent of students took the test, he said. Smaller margins of missing students means less of a chance those missing scores affect overall results.

Participation rates below 50 percent would make it tough to draw any meaningful conclusions from test results, said Marianne Perie, the president of Measurement in Practice, which advises states on test design and use.

Sean Reardon, who leads a Stanford University project that analyzes the links between test scores and children鈥檚 learning opportunities, said the insight into learning offered by last spring鈥檚 test scores is very limited because of all the factors influencing the scores.

鈥淚f you had a random sample of kids [in the testing pool], then that would be fine,鈥� he said. 鈥淏ut testing in 2021 wasn鈥檛 random. Kids and families chose whether they took the test. Unless you have a lot of information to support a claim of comparability, I think the default assumption for 2021 is that they鈥檙e not comparable [to 2019 test scores]. I wouldn鈥檛 draw too many conclusions based on them and I鈥檇 use a lot of caveats.鈥�

Consider ways to get insight into motivation and learning conditions

Ellen Forte, the chief executive officer and chief scientist at edCount, which advises states and districts on testing, said educators should bear in mind that millions of students, anxiety-riddled during COVID-19, were likely less motivated to do well on tests. Given that distortion, and the fact that state tests are not designed to yield highly detailed pictures of students鈥� achievement, she wouldn鈥檛 want to see students鈥� test scores used to make instructional decisions.

鈥淩emember, these tests were designed for accountability,鈥� Forte said. 鈥淭he unit of focus should be the school, district, or state. Not the student.鈥�

It also would behoove educators to understand more about the conditions in which students were learning, said Scott Marion, the executive director of the Center for Assessment, a consultant to states on testing. The organization has helped several states create student surveys that asked about things like their access to livestreamed instruction and how much they鈥檇 learned compared with the previous year. Teachers were asked, among other things, whether they鈥檇 been adequately supported with good professional development during the pandemic.

In a year like 2021, 鈥淚 think it鈥檚 important,鈥� Marion said. If a child tested in 2021 under conditions similar to 2019, educators can probably make sound鈥攁nd very general鈥攊nferences about whether she gained or lost ground in those two years, Marion said. But what鈥檚 missing is the 鈥渨hy.鈥� Gathering other data, from surveys, teacher observations, formative strategies, and interim assessments embedded in good curriculum, can shed light on 鈥渨hy my kids did poorly and what I might need to do differently,鈥� he said.

Takeaway message: Multiple sources of data are more important than ever

Most experts consulted for this story agreed that with the right kinds of analyses, states can probably glean valuable information about patterns of low achievement so they can provide appropriate supports. They urged districts to press their states for detailed information and analysis to guide similar decisions at the district level.

In the classroom, though, experts differed on the role state test data should play in guiding instructional decisions for groups or individual students. Perie of Measurement in Practice said she wouldn鈥檛 want to see scores used for high-stakes decisions like grade promotion but thinks they could help teachers create flexible groupings in math or reading or dive more deeply into strands where class scores seemed weak.

Even better, Perie and other experts said, would be to blend test-score information with a portfolio of other data from formative or diagnostic tests, reports from students鈥� previous teachers, and other sources. This year, 鈥測ou鈥檝e got to triangulate, leveraging other measures like you never have before,鈥� Harvard鈥檚 Ho said.

Superintendents understand this, said Dan Domenech, the executive director of AASA, the School Superintendents Association. They know it鈥檚鈥渃ritical to ascertain how much loss has taken place so they know where to begin,鈥� but they recognize that standardized tests, while valuable, provide only 鈥渁 general overview.鈥� Accordingly, teachers will rely heavily on quizzes and other formative strategies to understand what their students need, he said.

Catherine Gewertz

Senior Contributing Writer, 91制片厂视频 Week

Catherine Gewertz was a writer for 91制片厂视频 Week who covered national news and features.