As the amount of testing data we have access to has grown, so too have the tools we use to analyze the data. And while the analysis of patterns among item response choices or scores has been in focus for some time when monitoring test security, increasing attention in our data forensics toolkit has been given to modeling and analyzing item response times. Specifically, increasing attention has been directed at the Lognormal Response Time (LNRT) model, which measures the speed of a test taker’s responses to test items compared to the norm.
There are multiple reasons for modeling response times, such as estimating the appropriate time limit for a test or improving detection of anomalous test taker behavior. For the purposes of test security, a fast or very unusual response time pattern might indicate potential fraudulent activity during test taking.
While it is well established that the LNRT model is useful for measuring speed and giving a general index of model “misfit” for test takers, my PSI colleague Regi Mucino and I hypothesized that there might be more information to be gleaned from this model. So we conducted a study to further explore model residuals (the difference between statistically expected and actual response times) and their profile dispersion and shape (the spread of residuals and how that spread relates to normal patterns).
The headline for testing organizations wanting to enhance their test security is our finding that distinguishing residual dispersion and shape provides meaningful and useful information for identifying anomalous testing behavior that might indicate the test taker was engaged in some form of cheating. This blog shares more details of our study, which was published in the Journal of Educational Measurement (May 2024).
Existing methods for analyzing response times
There are several methods for evaluating and modeling item response times. These include means (average of a data set), medians (middle number in a sequence), and standard deviations (dispersion of a data set relative to its mean) of test taker response times. These measures can be useful in providing a general sense of how quickly test takers are completing items and help identify outliers. They can also help with generating expectations of how much time it normally takes to respond to each specific item, as a building block for statistical expectations when modeling other test takers’ response times to those same items.
More formal response time models have been developed that can be useful for multiple purposes. A prominent example is the LNRT model which estimates the time demand or intensity of each item from the time spent by a normative sample of test takers. This, along with a discrimination parameter (expected variation on an item, also derived from a normative sample), is used with the logs of observed response times to derive each test taker’s overall speed. Once a test taker’s speed is estimated it can be used to derive their expected log-time spent on each item – and their actual times can be flagged when they deviate far from the statistical expectations.
New data forensics to improve test security
However, the LNRT model assumes that test takers work at a constant speed throughout their test. But it is well known that test takers may vary in their working speed due to factors such as rapid guessing, fatigue, mental breaks, and possibly cheating – among other possible factors. For example, item pre-knowledge may lead to fast response times on memorized items, producing localized changes in working speed for certain items that may not have a noticeable impact on the overall speed value. In such cases, patterns of variability in working speed are passed into the model residuals (where response times differ from the norm). These model residuals are often described as random deviations from statistical expectations, but in fact they can contain potentially valuable systematic information about anomalies in a test taker’s response time behavior.
While the LNRT model parameters account for “normal” response time behavior and give a global “sum of squared residuals” index of a test taker’s deviations from model expectations, deeper analysis of patterns among a test taker’s residuals allows for enhanced detection of abnormalities that might call their test taking behavior into question. To this end, our research built on this summed-residuals approach while demonstrating connections between the LNRT model and methods of assessing similarities between score profiles. We found that by recognizing the connections between the LNRT and a classic model of profile similarity, new insights and parameters emerged for the LNRT. Namely, after accounting for profile levels (speed), splitting residuals into overall variability in working speed (dispersion) versus patterns of dissimilarity to normal item time demands (shape) might each be of use in understanding a test taker’s behavior. The existing LNRT model conflates the dispersion and shape features in its global index of model misfit.
This expanded approach gives us additional diagnostic tools for the LNRT, including:
- A new measure of person misfit: By isolating shape dissimilarity to show how far a test taker’s response time patterns tend to deviate from the normative pattern for the items.
- New model parameters: By directly quantifying a test taker’s relative dispersion and shape, to gain insight into their behavior beyond the assessment of speed.
Study evaluation
At this point in our study, we had demonstrated that the existing LNRT model can be extended by incorporating additional measures. To evaluate this finding, we then applied the full set of measures to two real datasets to further explore their behavior and evaluate the value of adding newer indices beyond those in the existing LNRT.
Results from data in a situation where many test takers had pre-knowledge of test items revealed that profile shape, not currently isolated in the LNRT model, was the most sensitive response time index to abnormal test taking behavior patterns. Of particular importance, overall speed and overall variability in speed were not as sensitive as the new shape-based measures at detecting abnormal patterns of item-level response times relative to the items’ time demands. These results strongly support expanding the LNRT model to measure not only each test taker’s speed, but also distinguish between the dispersion and shape of their response time profiles when analyzing model residuals.
Our conclusion
The use of this extended approach to the LNRT model is a powerful addition to our data forensics toolkit. It provides a comprehensive assessment of test performance by providing additional information about test taker behavior – revealing more information about the time spent in the process of choosing from a list of possible responses to test items.
Read our guide on how to use data forensics in the assessment lifecycle to increase test security.
Further research into the use and application of the response time metrics defined in this study will likely expand our approach even further. While the shape feature was the strongest indicator of the pre-knowledge breach in this study, does that finding generalize to other types of security breach? And will the analysis of shape improve other methods that currently combine measures of response speed with measures of abnormal response choice patterns (person fit)? Watch this space…