A sociologist provides some insights into how firms “norm” test scores from year to year and what this means about how to interpret the results:
The most challenging part of this process, though, is trying to place this year’s test results on the same scale as last year’s results, so that a score of 650 on this year’s test represents the same level of performance as a score of 650 on last year’s test. It’s this process of equating the tests from one year to the next which allows us to judge whether scores this year went up, declined or stayed the same.But it’s not straightforward, because the test questions change from one year to the next, and even the format and content coverage of the test may change.
Different test companies even have different computer programs and statistical techniques to estimate a student’s score and, hence, the overall picture of how a student, school or state is performing. (Teachers too, but that’s a subject for another day.)
All of these variables – different test questions from year to year; variations in test length, difficulty and content coverage; and different statistical procedures to calculate the scores – introduce some uncertainty about what the “true” results are…
In testing, every year is like changing labs, in somewhat unpredictable ways, even if a state hires the same testing contractor from one year to the next. For this reason, I urge readers to not react too strongly to changes from last year to this year, or to consider them a referendum on whether a particular set of education policies – or worse, a particular initiative – is working.
One-year changes have many uncertainties built into them; if there’s a real positive trend, it will persist over a period of several years. Schooling is a long-term process, the collective and sustained work of students, teachers and administrators; and there are few “silver bullets” that can be counted on to elevate scores over the period of a single school year.
Overall, this piece gives us some important things to remember: one data point is hard to put into context. You can draw a trend line between two data points. Having more data points gives you a better indication of what is happening over time. However, just having statistics isn’t enough; we also need to consider the reliability and validity of the data. Politicians and administrators seem to like test scores because they offer concrete numbers which can help them point out progress or suggest that changes need to be made. Yet, just because these are numbers doesn’t mean that there isn’t a process that goes into them or that we need to understand exactly what the numbers involve.