Event Calendar
View upcoming events at Boston College
Full story:
Video
Slideshow
Audio
Data file
Reader's List
Books by alumni, faculty, and staff
Headliners
Alumni in the news
BC Bookstore Connection
Order books noted in Boston College Magazine
Test driven
Mistakes happen

Illustration: Chris Sharp
Add rainy test days to the list of things that can give SAT takers sweaty palms. The College Board says humidity warped more than 4,400 answer sheets from the October 2005 SAT, skewing scores, some by nearly 400 points. But weather aside, faulty scores are no fluke in the testing industry, where errors are proliferating, according to researchers at BC’s Center for the Study of Testing, Evaluation, and Education Policy. In “Errors in Standardized Tests: A Systemic Problem,” released in May 2003 (data is currently being collected for an update), Kathleen Rhoades, a research associate, and George Madaus, a professor at the Lynch School of Education, described the standardized testing industry as “shrouded in secrecy” and stretched thin.
In their initial study, Rhoades and Madaus counted 103 publicized errors on state and national standardized tests between 1976 and early 2003. More than two-thirds occurred in the final four years of the study. In 1999, for instance, a programming error that lowered the lowest scores and raised the highest scores in a test by CTB McGraw Hill caused 8,668 New York City students to be required to attend summer school unnecessarily; in 2000 in Minnesota, 50 high school seniors were denied diplomas owing to scoring errors in a National Computer Systems math test, passage of which was required by the state for graduation. Prospective lawyers have failed the New York Bar Exam, public schools in Florida have seen their funding cut, and teaching candidates have been denied licenses owing to testing errors. The mistakes can vary—a mis-keyed answer (on a math exam in Arizona), inconsistent scoring (on a Maryland writing test), questions that appear more than once (on the SAT), or poorly framed math problems.
The researchers attribute the surge in errors to the increased demand for high-stakes testing. In particular, under the 2002 federal No Child Left Behind legislation, every state must institute a standardized testing program and show annual progress in overall scores. The resulting growth in testing has been enormous.
In Massachusetts alone, six required tests have been added in response to the federal law, according to Kit Viator, director of student assessment for the Massachusetts Department of Education, and 10 million student responses must be scored every year.
With school funding and individual advancement hanging in the balance, new test items must be developed and piloted each year, involving new “norming” groups (in which students should not be too high achieving) and new cut scores (calibrated to enable year-to-year comparisons). Educators administer the tests as late in the school year as possible and then require fast turnarounds from the testing companies so graduations and promotions can be set before school ends; states can fine companies for late results. Meanwhile, says Rhoades, the competition for large state contracts pushes testing companies into cost-cutting measures, stinting on permanent staff, for example, despite increasingly heavy workloads.
Rhoades and Madaus suspect the error rate is rising but say no systematic review has been possible. “No one is allowed to even examine [the tests] in most states,” says Rhoades. Some errors have come to light only through court orders, sought by aggrieved test takers or parents. The researchers argue for oversight of the industry by a federal or other outside agency with power to enforce quality controls. This has few backers in the business. Stuart Kahl is cofounder, president, and CEO of New Hampshire–based Measured Progress, which last year began a five-year, $118 million contract to administer the Massachusetts MCAS tests, in addition to its contracts with about 20 other states. According to Kahl, an outside auditor would be “amazed” at the quality controls in place at his company—from the prescreening of test items for bias to running mock data through scoring software to redundant analyses by staff teams and specialists at UMass-Boston. “And yet errors still happen,” he says. “It’s still a human process.”
Rhoades and Madaus don’t expect oversight to fully eliminate testing errors, but they say industry self-regulation can’t work when the stakes are so high and the competition so fierce. They will continue updating their inventory of testing errors, because, says Rhoades, “no one else is doing this and it needs to be done.”
Chris Berdik is a writer based in Boston.
Read more by Chris Berdik

