If examination items are the building materials used to construct a test, then an Examination Specifications Document is the blueprint for building a test. In any building, materials need to be repaired or replaced due to wear-and-tear, unexpected damage, and changes to building standards and regulatory requirements. Likewise, examination items have a shelf life that is affected by regular exposure to examinees, exam security incidents, and changes in the practice area being assessed. It follows then, that two of the pressing questions in designing an exam development process are: How many items do we need and how often do we create more?
At bare minimum, an examination program requires enough items to be able to publish an examination that conforms to the Examination Specifications Document. If we have told candidates and other stakeholders that there will be 80 scored items and 20 unscored items and that there will be a set number of items allocated to certain content areas, then we are bound by that. That said, anyone who has worked with knowledge-focused examinations will tell you that items on these exams are not evergreen. There is always an attrition rate whenever we evaluate items, and that is because statistical analysis and content review can help us identify content-related and performance-related issues with items. As such, ensuring that we have 100 items today but no more leaves us in a precarious position for the future unless additional item development is underway.
There will always be a need for more exam items, but how many more?
Well, it all comes down to item usage, which is sometimes envisioned as exposure (or the number of eyeballs looking at your items). Imagine that we expect 1500 examinees to take our 100-item exam from above in the coming year. It would be folly to have only one form of this exam available at a time because we can anticipate that items will be increasingly discussed by examinees over time and there is always the risk of security incidents that could compromise content. In this case, we know that we need more than one form at a time, but the exact number of forms needed is determined by the amount of item exposure that can be tolerated by the certification program and must be balanced with the resources available for exam development. If we have a low risk appetite or suspect that cheating is more likely to occur, we may opt for four forms, which caps the number of item views at 375. If fewer resources (e.g., SMEs, dollars, pretesting capability) are available or we have a higher risk appetite, we may opt for two forms because we feel comfortable with 750 item views in the coming year.
There is a flipside to this: If we have another exam that has 50 examinees in a year, then more than one form would be ill-advised because there is little utility to limit exposure even further – and it would decrease the interpretability of statistical analysis if the data we receive is based on 25 examinees per item rather than 50. Our quest to improve exam security shouldn’t introduce other serious issues, like score reliability!
It should be noted that an increasing number of credentialing programs look to pool-based approaches like linear on-the-fly testing (LOFT) rather than using static fixed forms because it allows for dynamic updates to examinations. In any case, the same line of thinking allows us to determine the size of the item pool: If we have 1500 candidates and we want to ensure that items are not viewed more than 500 times, then we would look to create a pool size of 300 (or 3x, as in three times the size of what is called for by the examination specifications).
The frequency of exam development depends on the expected rate of change to examination content. For example, a program that assesses technology-focused competencies may opt for exam development to occur every six months because changes in practice occur very frequently, while a program that assesses interpersonal competencies may opt for exam development to occur every two years because changes in practice occur infrequently or very gradually. In either case, we need to ensure that we have enough forms or a large enough item pool to accommodate the length of time in between publications.
In our metaphorical building, it would be fair to expect that areas more frequently used (like the kitchen or living room) will need more frequent upkeep than less-used areas (like the basement and attic). We know we can’t take our responsibility to examinees and just brush it under the rug, so we continue to rebuild, repair, and refresh our metaphorical home for examinees. That said, tidy and well-behaved house guests leave less work for us to do than guests who outstay their welcome. And who can forget that there is only so much money that we can spend? Exam items don’t grow on trees after all!