Reflections of the eAA conference – AI in testing: what’s real and what’s not? 


Reflections of the eAA conference - AI in testing: what’s real and what’s not? 

Eunice McAllister


At the recent e-Assessment Association (eAA) Conference in London, the potential for Artificial Intelligence (AI) to transform the testing industry was a major focus. However, despite the significant buzz around AI and testing, we are still in the early days of fully integrating these technologies into the assessment lifecycle. The Conference shed light on the promising yet complex journey ahead for our industry when it comes to AI.

Discussions at the Conference underscored that while there is a lot of excitement about AI’s capabilities, it can be challenging to distinguish between the reality and the hype. It’s crucial that we navigate this evolving landscape with a clear understanding of both the opportunities and the challenges.

This piece aims to demystify AI in the context of test development and test delivery, providing you with key insights from the eAA Conference.

AI in test development

Automatic Item Generation (AIG)

One of the areas discussed was Automatic Item Generation (AIG). Using a template with placeholders for variables, and logic to determine how different variables interact to form correct conclusions, AIG creates multiple versions of test items simultaneously. Even before the advent of AI, AIG was able to create test items swiftly and efficiently.

AIG’s potential to leverage AI was highlighted in several sessions at the Conference. The combination of AIG and AI further reduces the time and cost associated with item creation, while ensuring a diverse and extensive item pool. What’s more, the larger number of items created by AI-enabled AIG helps to maintain the security and integrity of tests by minimizing item exposure and enabling the continuous refreshment of item banks. This is particularly beneficial for large-scale testing programs managing high volumes of items.

Further benefits of leveraging AI with AIG were discussed, while acknowledging that the testing industry is in the early stages of adopting this technology for high and medium stakes tesing and human oversight is still required:

  • Enhance the quality of tests by generating items tailored to different difficulty levels and learning objectives.
  • Generate personalized items for test taker abilities and learning history, making tests more adaptive and relevant.
  • Reduce bias with AI algorithms designed to minimize human biases in item creation, promoting fairness and inclusivity in testing.
  • Create interactive and engaging test items, such as simulations and scenario-based questions, which can provide a more accurate measure of a test taker’s abilities.

Generative AI for test content creation

Another significant topic discussed at the Conference was the use of AI in test development, outside of AIG. Through large language models (LLMs), generative AI is transforming how new test items are developed. These advanced AI systems can generate human-like text, making them highly suited to support the testing industry by integrating into existing systems and processes.

This technology enhances the accuracy and objectivity of job analyses and enables the creation of diverse item types beyond traditional multiple-choice questions, such as Situational Judgment Tests (SJTs) and Virtual Reality assessments. Additionally, generative AI can efficiently tag items in large item banks, facilitating easier program maintenance and health checks.

This innovation not only saves time and reduces costs, allowing resources to be allocated more effectively, but it also improves practice tests by incorporating fresh content rather than relying on outdated material. However, Conference speakers stressed that AI should always work in tandem with human oversight, to ensure the quality and relevance of the generated items.

Low stakes practice tests

An area highlighted as a potential application for testing organizations to consider, if they are cautious about using AI in test content creation, is to pilot it in low stake practice tests. This presents numerous opportunities for enhancing the test taker experience and improving test preparation.

For example, AI can personalize learning by generating recommendations as well as an overall result for the test taker. This allows the test taker to focus on their weaker areas. This potential to provide instant, detailed feedback on performance helps the test taker understand their mistakes immediately and learn from them. This makes practice tests more interactive and engaging and might also include features like gamification or different item formats that enhance motivation.

AI in test delivery

AI in live remote proctoring

AI in online proctoring is becoming increasingly significant in the testing industry, offering both substantial benefits and notable challenges. Conference sessions covered the benefits of integrating AI into live proctoring to enhance the security, scalability, and accessibility of remote tests:

  • Security: AI-powered live proctoring can monitor test takers in real-time, detecting suspicious behaviors and potential cheating with greater accuracy than human proctors alone. This ensures a higher level of test integrity.
  • Scalability and accessibility: AI in live proctoring enables the scaling of remote tests to accommodate large numbers of test takers simultaneously, making it easier for organizations to administer tests to a global audience without the need for physical test centers.
  • Cost efficiency: By reducing the need for a large number of human proctors, AI-driven live proctoring can lower the overall costs associated with testing, providing a more economical solution for both testing organizations and test takers.

Challenges of AI in live proctoring

However, there are challenges to the adoption of AI in test delivery and these were well covered at the Conference, alongside advice about how to address them. This included a session on effective AI governance and strategy and a panel discussion involving PSI Chief Assessment Officer, Isabelle Gonthier, and David Yunger from Vaital.

As we saw in the pandemic, the use of AI in live proctoring raises significant privacy issues, as continuous monitoring and data collection can be perceived as intrusive. Ensuring that systems comply with data protection regulations and ethical standards is critical.

In addition, AI algorithms may inadvertently introduce biases or make errors in detecting misconduct, which can lead to unfair consequences for test takers. Ongoing checks and refinement of these algorithms is necessary to mitigate such risks. Most importantly, human oversight is always essential when decisions are being made about a test taker’s future.

Key considerations for the application of AI shared at the Conference include:

  • User experience: Consider the impact of AI on the test taker experience, including potential stress or discomfort caused by the involvement of AI.
  • Transparency and communication: Ensure clear communication with test takers about how AI is used in proctoring, what data is collected, and how it will be used and protected.
  • Continuous improvement: The industry must commit to ongoing research and development to improve AI capabilities in proctoring, addressing current limitations and enhancing overall effectiveness.

New threats to test security

Several speakers explored how threats to test security have evolved with the advancement of AI and other technologies. When asked, the audience identified threats they are most concerned about as deep fakes, remote access tools, AI-assisted cheating, undetectable cameras, hidden devices, cyber-attacks, denial of service attacks, VPN tunnels, and hacking. Notably, one session showed how, if a secure browser is not used, generative AI can be used to answer multiple-choice questions (MCQs) as a page loads, presenting a significant challenge.

The takeaway is that a holistic approach to test security is needed to face these constantly evolving threats. An approach that covers the whole assessment lifecycle, from test content creation through to post-test data forensics. While it is impossible to eliminate malpractice entirely, our goal should be to minimize its likelihood and impact.

Conference wrap up

The numerous sessions and discussions about AI at the eAA Conference demonstrate the tremendous potential it holds to enhance various stages of the assessment lifecycle. From AIG and AI-enabled item creation to improving test security and meeting the diverse needs of test takers. The insights shared highlight both the transformative possibilities and the inherent challenges.

What is clear is that we are still at a point where human oversight is crucial at every stage. Ensuring the accuracy, fairness, and ethical use of AI in testing requires continuous human involvement and vigilance. As we move forward, the collaboration between AI technologies and human expertise will be key to realizing the full benefits of AI while maintaining the integrity and trustworthiness of our tests.