Eliminating the Penalty for Incorrect Responses
Authored by: Carrie Cross, CAS, Chad Buckendahl & Susan Davis-Becker, ACS Ventures, LLC
One of the concerns about multiple choice questions is that candidates can randomly guess and get a correct response. Although true in theory, well-constructed multiple choice questions include a correct response and alternative response options, often called distractors, that represent plausible, but incorrect responses. The plausibility of these incorrect responses is derived from creating distractors that represent common errors that candidates make. As a result, certain distractors will appear more attractive to candidates if they make one of these errors. To try to counteract random guessing, some testing programs have historically assigned a penalty, often 0.25, to discourage candidates from guessing on tests. Prior to March 2016, the SAT, a well-known college admissions test published by The College Board, had such a guessing penalty for multiple choice items. The program eliminated this scoring approach in part because some studies suggested that the policy may have had an unintended effect on the interpretation of examinees’ scores. In contrast, the ACT, another widely used college admissions test, never had a guessing penalty for items on their exam.
A growing body of research on this topic suggests that application of a guessing penalty for multiple choice questions on a test can have differential effects on scores that are related to participants’ risk tolerance (Lang, 2019). This means that the candidate’s test score is changed from purely representing whether they know the material to being influenced by an extraneous factor that psychometricians call “construct irrelevant variance.” To put it another way, a candidate’s risk tolerance is not something the CAS wants to measure.
A related concern is that although this scoring approach may be characterized as a guessing penalty, it is not clear if candidates are randomly guessing or if distractors representing common errors are providing information on where there are misconceptions within the candidate population. For example, Budescu and Bar-Hillel (1993) describe how including a correction for guessing could work in theory, but candidates are often mis-calibrated on what they do and do not understand. This miscalibration often leads candidates who are less risk averse to guess more on items they do not actually know; and candidates who are more risk-averse to omit more items they might have answered correctly.
Differences in guessing behavior become more apparent across genders and cultures. To illustrate, Coffman and Klinowski (2020) evaluated the effect of a policy change that removed guessing penalties on a national college entry exam in Chile. After removing these penalties, there was a large reduction in the gender gap when comparing the number of items that were omitted on the test. There was also a narrowing of the gender gaps in performance on math, social studies, and science exams. Extending this concept to the CAS exams means that removing the guessing penalty scoring practice could help reduce any potential differences that may be currently observed within the CAS population of test-takers.
Budescu, D., and Bar-Hillel, M. (1993). To Guess or Not to Guess: A Decision-Theoretic View of Formula Scoring. JEM, 30(4), 277-291. https://doi.org/10.1111/j.1745-3984.1993.tb00427.x
Coffman, K. B., and Klinowski, D. (2020). The Impact of penalties for wrong answers on the gender gap in test scores. PNAS, 117(16), 8794-8803. https://doi.org/10.1073/pnas.1920945117
Lang, D. (2019). Strategic Omission and Risk Aversion: A Bias-Reliability Tradeoff [Paper presentation]. International Conference on Learning Analytics & Knowledge, Tempe, AZ. https://files.eric.ed.gov/fulltext/ED594761.pdf