Eliminating the Penalty for Incorrect Responses
Authored by: Carrie Cross, CAS, Chad Buckendahl & Susan Davis-Becker, ACS Ventures, LLC
One of the concerns about multiple choice questions is that candidates can randomly guess and get a correct response. Although true in theory, well-constructed multiple choice questions include a correct response and alternative response options, often called distractors, that represent plausible, but incorrect responses. The plausibility of these incorrect responses is derived from creating distractors that represent common errors that candidates make. As a result, certain distractors will appear more attractive to candidates if they make one of these errors. To try to counteract random guessing, some testing programs have historically assigned a penalty, often 0.25, to discourage candidates from guessing on tests. Prior to March 2016, the SAT, a well-known college admissions test published by The College Board, had such a guessing penalty for multiple choice items. The program eliminated this scoring approach in part because some studies suggested that the policy may have had an unintended effect on the interpretation of examinees’ scores. In contrast, the ACT, another widely used college admissions test, never had a guessing penalty for items on their exam.
A growing body of research on this topic suggests that application of a guessing penalty for multiple choice questions on a test can have differential effects on scores that are related to participants’ risk tolerance (Lang, 2019). This means that the candidate’s test score is changed from purely representing whether they know the material to being influenced by an extraneous factor that psychometricians call “construct irrelevant variance.” To put it another way, a candidate’s risk tolerance is not something the CAS wants to measure.
A related concern is that although this scoring approach may be characterized as a guessing penalty, it is not clear if candidates are randomly guessing or if distractors representing common errors are providing information on where there are misconceptions within the candidate population. For example, Budescu and Bar-Hillel (1993) describe how including a correction for guessing could work in theory, but candidates are often mis-calibrated on what they do and do not understand. This miscalibration often leads candidates who are less risk averse to guess more on items they do not actually know; and candidates who are more risk-averse to omit more items they might have answered correctly.
Differences in guessing behavior become more apparent across genders and cultures. To illustrate, Coffman and Klinowski (2020) evaluated the effect of a policy change that removed guessing penalties on a national college entry exam in Chile. After removing these penalties, there was a large reduction in the gender gap when comparing the number of items that were omitted on the test. There was also a narrowing of the gender gaps in performance on math, social studies, and science exams. Extending this concept to the CAS exams means that removing the guessing penalty scoring practice could help reduce any potential differences that may be currently observed within the CAS population of test-takers.
References
Budescu, D., and Bar-Hillel, M. (1993). To Guess or Not to Guess: A Decision-Theoretic View of Formula Scoring. JEM, 30(4), 277-291. https://doi.org/10.1111/j.1745-3984.1993.tb00427.x
Coffman, K. B., and Klinowski, D. (2020). The Impact of penalties for wrong answers on the gender gap in test scores. PNAS, 117(16), 8794-8803. https://doi.org/10.1073/pnas.1920945117
Lang, D. (2019). Strategic Omission and Risk Aversion: A Bias-Reliability Tradeoff [Paper presentation]. International Conference on Learning Analytics & Knowledge, Tempe, AZ. https://files.eric.ed.gov/fulltext/ED594761.pdf
An example of the dumbing down of the CAS.
CAS exams are not the SATs.
A majority of CAS multiple choice questions do not have detractors.
CAS Exams are hard, and there are a significant number of questions for which many candidates do not have a clue.
Now they will always just guess.
Lucky guesses will often lead to a pass.
The CAS does not want to measure the luckiness of candidates.
Form a former head of the Examination Committee.
Hi Howard,
We received several comments on this blog post and appreciate those who took time to share feedback. While the wrong answer penalty used to be the norm on most competitive standardized tests, in recent years test industry best practices have shifted. The use of strong, plausible distractors reduces the risk that unqualified or unprepared candidates will pass an exam. The MAS I & II multiple choice questions are written to feature strong, plausible distractors, and exam results are constantly analyzed to ensure every exam question is performing fairly and at the right difficulty level to uphold the integrity of the exam.
Additionally, our psychometric consultants, ACS Ventures, notes the following:
The MAS-I and MAS-II exams are designed to measure candidates’ actuarial knowledge and skills; not construct-irrelevant traits or behaviors such as a candidate’s risk tolerance or confidence. It is a common misperception to believe that candidates would be able to guess their way to a passing score. However, in test development, there are multiple safeguards in place. High quality multiple-choice test questions use plausible distractors that represent common errors in practice. These distractors help distinguish between the minimally qualified candidate and those who are unable to demonstrate the necessary knowledge and skills. It is also important to note that passing scores are systematically established through a process called standard setting which applies an expected performance level definition (i.e., minimally qualified) to the examination, considering the difficulty of each question. The comparability of the passing score across examinations is then maintained through the statistical process of equating. Because of the potential differential impact of the guessing penalty for groups of examinees, removing the guessing penalty is an important step in CAS’s continuing efforts to develop exams that are valid, reliable, and fair for all candidates.
Although the post used the SATs as one example, there are many professional credential exams that have no wrong answer penalty for multiple choice questions. A few examples include the Certified Public Accountant (CPA) exams and the American Board of Internal Medicine (AIBM) certification exam.
Let us take a second and ignore the current set of curriculum the CAS employs to the exams administered in your era of the CAS (1981). How is it not feasible to simply raise the passing mark and not have the same effective passing rate? This would increase the difficulty of questions which would lead to educated guessing instead of easier questions and testing one’s risk tolerance to a question. Throwing terms around like “dumbing down” simply relays how out of touch you are with the current exam experience. Your knowledge of taking exams 40+ years ago and participation in the Examination Committee does not emulate sitting in a chair at a computer-based testing center and testing concepts that aren’t easily comparable to those you were tested on. Furthermore, you can’t remove luck as a metric on multiple choice questions or easily measure the reduction in risk with or without the guessing penalty. Multiple choice exams are used as an easily graded metric for one’s mastery on any particular discipline. I will side with the larger sample size and body of the SAT organization that, although established a decade after the CAS, has a lot more resources to evaluate the metric of a guessing penalty than your anecdotal assertion.