As you probably know by now, the GMAT is a Computer Adaptive Test (“CAT”). This means that the questions that you see on the exam are selected by the computer based on your performance on earlier questions. For example, if you answer a question correctly, your next question will be harder. If you answer a question incorrectly, your next question will be easier. The exam is trying to gauge your ability level by seeing how well you do with questions (known as “items” in testing parlance) of varying degrees of difficulty. Generally speaking, the harder the questions you answer correctly, the better your score will be.
There are other factors besides difficulty level that influence the selection of items on a particular exam (e.g., question type (data sufficiency vs. problem solving, for example), content (e.g., algebra, ratios, assumptions, etc.), and exposure (i.e., how many times has the question been seen by other test takers already that month?)). But difficulty level is arguably the most important.
The CAT does not “bucket” items into “easy”, “medium”, and “hard” categories. Instead, each item can be considered easy, medium, or hard depending on the person to whom it is given. Each item is tested out for a period as an unscored “experimental” during the actual exams of people taking the GMAT. After a sufficient sampling of test-takers has answered the items, ETS compares the overall scores of the test-takers with their performance on the experimental items.
If, say, fifty percent of all test-takers scoring in the 600-620 range got a particular experimental item right, that item would be considered of medium difficulty for that ability level. If ninety percent of those scoring in the 700-720 range got the item right, it would be considered easy for that ability level. When the item is then presented as a real scored question on subsequent exams, the computer uses the experimental data to determine whether the item is appropriately difficult for someone performing at a given level thus far in the exam. The computer tries to give you questions that you have a 50/50 shot at, based on your performance up to that point. The better you do, the harder your 50/50 items will be.
Each item has an “item characteristic curve” that graphs the likelihood of answering that item correctly, based on the experimental data. The curve looks like this:
The Curve and Your Performance
This curve is the driving force behind the CAT’s assessment of your ability. What does the shape of an item’s curve indicate about one’s performance at that point in the exam?
Each question that you see on your GMAT has a curve like this. The curve contains information about the likelihood of getting the question right based on data collected during that question’s experimental phase. For this particular item, for example, someone with a 500-level ability (meaning that he or she performs at a level consistent with an overall score of 500) has a 30% chance of answering correctly. Someone with a 600-level ability has a 45% chance, someone with a 700-level ability has a 90% chance, and someone with an 800-level ability has a 95% chance. Since the CAT seeks always to present items for which you have a 50% chance of success, this item would be presented to someone who is performing at a level that the CAT is estimating to be somewhere between 600 and 700, since the 50% mark falls between those levels for this item.
The curve does not begin at 0% probability because of what is known as the “guessing parameter”, which is just a fancy way of describing the minimum probability of answering correctly with a random guess. Since there are five answer choices, a random guesser has a 1/5 chance of answering correctly. So the baseline probability for answering an item correctly is 20%.
Notice that the probability of answering correctly jumps significantly in the curve above as ability level increases from 600 to 700 (45% to 90%). The steepness of the curve between these ability levels indicates that this particular item can be used most effectively to zero in on a precise level between those points. During the item’s experimental phase, performance on this item changed dramatically as ability level increased from 600 to 700. At the extreme ends of the score range, however, the probabilities do not change that much. Therefore, this item is not as useful to distinguish a 700-level test taker from an 800-level one, since both are quite likely to answer this item correctly. However, someone with a 600-level ability is much less likely to answer correctly than is someone at the 700 level.
An item, such as this one, whose curve has its steepest part between 600 and 700 can be considered a “threshold” item (this is not ETS terminology) for the 600-700 range, for example. This means that when the CAT needs to determine whether its estimate of your ability should be closer to 600 or to 700, it may very well select this item to refine its image of your capabilities. If you get it right, the CAT will move its estimate of your ability towards 700. If you get it wrong, your estimate will move towards 600. In essence, your estimated ability is a fluid concept to the CAT: it is always changing, based on your performance on the “threshold” items for various ability levels.
The curve retains its basic shape from item to item as the exam progresses, though the location of the steep part changes.
Steepness of the Curve
Here is the curve again:
Notice that the curve suddenly becomes steep between 600 and 700 along the ability axis. The sudden pitch at this point indicates that ETS considers the difficulty of this item (remember that ETS refers to questions as “items”) to correlate with an overall ability level between 600 and 700. In other words, ETS believes that this item represents the greatest level of difficulty that someone scoring in the 600 to 700 range could handle with a greater than 50% chance of success.
As items increase in difficulty, the steep part of the curve moves farther to the right. So when you answer an item correctly, the next item you see will have the steep part of its curve farther to the right. By the same token, when you answer an item incorrectly, the next item you see will have the steep part of its curve farther to the left.
So let’s assume a certain test-taker has been performing at a relatively high level, corresponding roughly to an overall score of 640. The next item she sees could very well have the curve shown above. Now let’s assume that she answers this item correctly. The CAT will then select an item whose curve will have its steep part slightly farther to the right, indicating that a correct answer for that item demonstrates an ability consistent with a slightly higher overall score. However, keep in mind that items appearing earlier in the exam contribute more assessment information than do later items. This means that correct answers early in the exam move the steep part of the curve farther to the right than do later items. For example, if the item shown above were, say, the fourth item on an exam, a correct answer would probably move the steep part to a range corresponding to an overall score of 680 or 690. If this item appeared as the twenty-fifth question, a correct answer might move the steep part to a range corresponding to an overall score of “only” 660.
Why is this? Because by the time you have answered twenty-five questions, the CAT has gained significantly more information about your ability than after only four questions. Correct answers later in the exam will not cause the steep part of the curve to move as far to the right because the CAT is already zeroing in on your precise level. Earlier in the exam, the steep part moves farther because the exam is giving you the “benefit of the doubt.” That is, the CAT “thinks” to itself, “Because I do not know you, I was not expecting you to answer this item correctly, so perhaps your ability level is 70 points higher than I thought.” After twenty-five items (a somewhat arbitrary number), the CAT “thinks” to itself, “I have seen you answer twenty-five items, so the fact that you answered this item correctly makes a small difference in my assessment of your ability.”
Imagine the exam as a baseball game. If a player hits a homerun the first time at bat, your initial impression of his ability will be pretty high, though it is certainly possible that he will lose your goodwill by fumbling easy plays later on. If, however, that player is mediocre for eight innings, a homerun in the ninth (let’s assume it is not a winning run) will certainly improve your assessment of his ability, but you are already predisposed to think of him as a player of a lower caliber and the homer will seem like a fluke.
What happens, though, when you answer an item incorrectly?
Inverse Item Characteristic Curve
Now we will discuss the “inverse item characteristic curve” as well as the “estimator curve,” both of which serve to help the CAT find your score.
If you recall, the item characteristic curve displays the probabilities of answering a particular item correctly for different ability levels.
The “inverse item characteristic curve”, on the other hand, displays the probabilities of answering a particular item incorrectly for different ability levels. If, for example, the curve for an item indicates that someone performing at the 600-level has a 45% chance of answering the item correctly, the inverse curve for that item would indicate a 55% chance of answering the item incorrectly. This is because the probability of answering correctly and the probability of answering incorrectly must sum to 100%, since those are the only possibilities. In essence, then, each point on the inverse curve is simply 100 minus the probability of answering the item correctly for that ability level.
What is the purpose of the inverse curve? After every question, the CAT takes all the regular curves for the items you answered correctly and all the inverse curves for the items you answered incorrectly and multiplies all the curves to arrive at the “estimator curve” of your ability. Basically, the CAT determines the probability of your answer pattern by multiplying the regular and inverse curves. The new “estimator curve” has a humped shape, the highest part of which corresponds to the CAT’s best estimate of your ability at that point in the exam. As you proceed through each section, your estimator curve changes according to the items you get right and wrong. If you answer hard items correctly, the hump shifts to the right, indicating a higher estimated ability. If you answer easy items incorrectly, the hump shifts to the left, indicating a lower estimated ability.
At the end of each section, the CAT converts its current estimate of your ability (based on the hump in its final estimator curve) to a “paper-test equivalent”. This equivalent takes into account the number of questions you left unanswered at the end of the section (if you ran out of time, for example). There is a penalty for unanswered questions so it is better to complete all the questions (randomly guessing at the end, if necessary) than to leave questions at the end of a section blank. The equivalent is then converted to a “scaled score” (out of 60 points) for that section. After both sections are complete, the CAT takes the two scaled scores (quant and verbal) and converts them to an overall score out of 800 points.
Now you have seen the process by which the CAT determines your score. We hope this series has dispelled some of the mystery surrounding the GMAT.