Creating and maintaining legally defensible tests


Creating and maintaining legally defensible tests

Pamela Ing, PhD, ICE-CCP


In high stakes assessment, stakeholders such as testing organizations and test takers, rely on rigorous processes to ensure the validity, reliability, and fairness of test scores. When an assessment is perceived as unfair or lacking credibility, legal challenges could arise. 

With fairness as a primary concern, we prioritize equitable treatment for every test taker. We also support testing organizations in the event of a legal challenge. For some protected groups, equity is a legal requirement and simply taking steps towards validity and fairness is not sufficient. In the event of a challenge, documentation must be provided to demonstrate the fairness and validity of an assessment.  

While challenges to testing have originated from various sources, they often result from differences in scores across test taker subgroups, complaints about access to testing and accommodations, and data privacy concerns. Let’s begin with test development and how to produce science based, legally defensible content that is fair for all test takers.   

A rigorous process 

High stakes assessments for certification, licensing, qualifications, and admissions require intentional action to ensure validity and fairness. And you must be able to provide evidence that you have done so. This involves a rigorous and objective methodology, committed to minimizing risks of bias, to define the essential elements of an occupation or educational curriculum, that need to be tested. A thorough process that builds a valid measurement of the knowledge, skills, abilities, and other relevant characteristics (KSAOs) needed to perform a role or achieve a certain standard. 

Test development fundamentals 

Be intentional with recruitment of Subject Matter Experts (SMEs). Engaging stakeholders that are representative of the diversity of your test taker population is essential to minimizing risks of bias in test content. At each step in the test development process, it is vital to have involvement of individuals that represent test taker demographics, such as age, gender, race, or ethnicity.  

Rigorous test development requires alignment with industry best practices and collaboration with our clients during every stage of the test development life cycle. To ensure test content will stand up to scrutiny in the event of a legal challenge, here are five fundamental steps to safeguard validity and fairness and support the credibility of your testing program. 

  • Job analysis – engage a diverse, representative group of SMEs from the industry or discipline to objectively define exactly what is essential to perform the role or achieve the required standard.  The job analysis results serve as the foundation for building test specifications, which define the depth and breadth of the domains to be assessed and the level of complexity required. 
  • Item development – work with a representative, diverse team of SMEs to generate, review, and approve for use items that align with the test specifications. 
  • Psychometric evaluation – pretest items using real test taker data. Use psychometric modeling and statistics to calibrate item difficulty and determine score reliability, to construct a test form (or multiple equivalent test forms) in accordance with the test specifications. When reasonable, a differential item functioning (DIF) analysis can be implemented to identify items with content that is potentially biased. 
  • Fairness analysis – conduct a thorough cultural fairness review to evaluate item content for the presence of potential biased language or stereotypes.  
  • Standard setting – establish a passing score, based on the level of competency required to meet assessment objectives, industry standards, and/or legal requirements to ensure public safety. 

By involving diverse, representative SMEs in these steps, and openly sharing the rigorous process, you provide assurances to test takers of the validity and fairness of the assessment. This supports the credibility of your assessment and reduces the likelihood of challenges to test scores related to bias. 

Ongoing evaluation 

The validity of an assessment is demonstrated by evidence that it measures what it was intended to measure. Implementing the above test development steps will help establish the initial validity and fairness of an assessment. Of course, every industry and discipline evolves over time. Best practices are updated to reflect changes in legislation, advances in technology, and changes to education and training. Accordingly, your tests need to be reflective of these changes. To ensure currency and relevancy, job analyses and standard settings need to be conducted at a cadence that matches the pace of the field (e.g., every 5 years) to ensure that necessary updates are made to the test specifications and passing standards. Evaluation of item content is equally important. On a regular basis (e.g., annually), item performance needs to be evaluated, poorly performing items need to be retired from use, and new items need to be written, reviewed for fairness, and pretested for use as scored items on future test forms.  

Science expertise 

To be clear, legal challenges arise infrequently and no test developed by PSI, when used as recommended by PSI, has ever been successfully challenged in court. A legal challenge can be costly, especially if it doesn’t go your way. Not just financially – the credibility of your program is at stake. 

The PSI team provides the depth and breadth of expertise across the various competencies required to effectively develop valid and fair assessments to assure the defensibility of your testing programs and organization. We combine our expertise in science and technology to develop and deliver defensibility in high-stakes testing. 

The fairness and integrity of an assessment may also be impacted by the methods of administration. Given widespread use of technology-based assessments, it is important to ensure that test takers are prepared to use and able to access available testing options in an equitable way that prevents favoring or disadvantaging any test taker subgroups. With the increasing use of remote online testing in combination with test centers for multi-modal test delivery, it is important to consider how the use of technology and testing modalities may differentially affect test takers. We must also consider the concerns that test takers and other stakeholders may have regarding data privacy and security. Any of these factors could prompt legal challenges to the testing program. 

Online vs. test center delivery of high stakes exams 

Assuring the security of exams and the integrity of the scores they produce are key concerns, especially when the results are used in high-stakes programs, such as qualifications, admissions, licensing, and certification. Typical security methods include ID authentication and monitoring by a proctor, video surveillance and recording, and capturing test taker response data to analyze potential misconduct. With remote online testing, test takers use their own technology and are monitored via online proctoring to re-create the experience of in-person proctored testing. Whereas test centers offer a physical location and standard technology for those test takers who need and may not have access to either. Online testing offers several advantages, including flexibility in testing times, no travel costs, and additional access to testing programs. 

With the increase in online remote testing in multi-modal delivery, testing organizations and other stakeholders have raised questions about the comparability, security, and integrity of online remote testing and the impact on the test taker experience. To this end, research studies have begun to appear in the scientific testing literature, some produced by PSI, examining these issues. The results of several studies offer promising evidence that programs using multi-modal test delivery provide psychometrically sound measurement and comparable test scores across online vs. test center, with no appreciable differences in test security issues flagged, and equally favorable test taker experience ratings across testing modes. 

Another consideration in online proctored testing is data privacy. Testing organizations should be aware of regulations in the US and Europe regarding access to, storage, and use of test taker data, particularly in the context of secure proctored testing. A good source of information about these laws and regulations is the Association of Test Publishers. Careful use and management of test taker data is important in maintaining a legally defensible exam program. 

Test Taker Preparation 

It is vital to provide guidance to test takers as they prepare for their assessments. Ensure that advance information on the technology they will need to access and have familiarity with is provided, and include information on the security measures that will be taken to protect the integrity of the exam. Here are some recommendations to address potential test taker concerns and to help protect testing organizations: 

  • Prepare test takers by providing clear and complete information on technology requirements and providing practice opportunities to use the testing delivery interface. 
  • Educate test takers and stakeholders on the online proctoring process and its benefits. 
  • Communicate with test takers about the steps involved with online proctoring, including how their identity will be confirmed, how they will be observed during a test, and the test taking behaviors they should avoid (i.e., behaviors that will be flagged by proctors). 
  • Inform test takers about the security measures taken to collect, store, transfer, and protect data to the highest possible standards. 
  • Gather and record explicit consent for any data or personal information collected and stored. 
  • Reassure test takers that a secure browser is in use and the online proctor will not have remote access to the test taker’s personal device. 

How to Prepare Test Takers for Online Proctoring 

Enhancing security in multi-modal programs 

Additional steps may be taken to help assure the quality and integrity of testing programs using multi-modal test delivery, and to further assure defensibility. 

  • Evaluate the psychometric quality of exams across test delivery modes to ensure comparability of scores and pass rates, and to confirm that test takers do not experience advantages or disadvantages based on delivery modality. 
  • Monitor test taker performance through data forensics methods, to detect potential misconduct and assure fairness for all test takers. Data forensics can complement other security methods to provide a powerful defense against security risks. Read more: Data Forensics for Test Integrity 
  • Use automated test assembly methods to administer unique combinations of items to test takers. This minimizes content exposure and prevents test takers from unfair advantages resulting from having advance knowledge of the item content. For example, methods such as linear on-the-fly (LOFT) and multi-stage adaptive testing can be used to generate equivalent test forms with minimum overlap, providing a unique and fair experience for each test taker. 

Find out more about LOFT. 


Multi-modal delivery options, including online remote proctored testing, are now part of high-stakes testing. We reviewed key considerations that can give testing organizations confidence in using online testing methods that offer many benefits to test takers. With the requisite expertise in science and technology, it’s possible to ensure the necessary rigor to safeguard fairness and defensibility in multi-modal test delivery.