The impact of ChatGPT in the testing industry  


The impact of ChatGPT in the testing industry  

Contributors: Sean Gasperson and Steve Trollinger

Editor’s note: as an illustration of the rapid pace of change in the AI space, we’ve updated our original post to reflect exciting and dramatic advancements in ChatGPT’s capabilities with GPT-4 and Anthropic’s Claude over the past several days. As ever, we keep our eyes on the future in support of better test taker outcomes.

Opportunities and risks with ChatGPT in testing

As in almost every other sector, the testing community is seeing both opportunities and risks with artificial intelligence tools. One example hitting the headlines is ChatGPT, a natural language processing (NLP) AI language model designed to provide conversational responses to ‘natural language’ queries. PSI clients in both licensure and certification sectors have expressed concerns about the implications of tools like ChatGPT on test integrity and security. Indeed, when asked about the risks it presents to secure tests and exams, ChatGPT admits: “It is possible for students to use the chatbot to generate answers to exam questions. While this may seem like a convenient way to get answers, it could lead to cheating and undermine the integrity of the exam.” At PSI, we are not only responsive and proactive about the potential misuse of this technology but also eager to utilize AI ethically and responsibly. AI is not uncharted territory for us, as we are already using it to improve our services when it benefits our clients and their test takers.

ChatGPT and Search Engines: Are they really that different?

If you’re a typical user posing similar queries to a search engine and a conversational AI tool like ChatGPT, the responses you receive will generally differ in several ways. Search engine results pages (SERPs) typically use keywords from search phrases to display several relevant web pages and structured snippets from prioritized sources, requiring users to browse the options and decide on the best answer. In contrast, ChatGPT’s AI language model attempts to provide a definitive, singular response—albeit not always accurately. The biased algorithms and training data in many AI models, as well as the secretive nature of these details, can potentially hinder academic development and critical thinking skills compared to traditional search engines and research methods that require users to compare, contrast, and vet sources. Where SERP listings are typically short summaries of information sourced from various websites with different levels of accuracy, relevance, and authority, ChatGPT uses its natural language processing capabilities to generate a human-like response that, while tailored to your query, may vary dramatically based on the type and structure of the prompt you provided. Ask the right question in the right form and you may get incredibly accurate results. The opposite, of course, can also be true.

What’s the risk of ChatGPT in testing?

There is significant concern about the potential for cheating on tests and exams using ChatGPT, especially given the AI chatbot’s proven ability to generate text that is difficult to distinguish from human-written content. ChatGPT’s latest advancement, GPT-4, has also displayed impressive accuracy when given test questions to complete, raising understandable concerns about test integrity. At PSI, we have always prioritized test security. The advent of ChatGPT and other conversational AI tools has made the need for secure test delivery even more critical. Currently, specific concerns around ChatGPT are largely focused on its use as a browser extension or plugin that could, in theory, make it simple for test takers to use during a test or exam. As with other browser extensions, however, fears surrounding ChatGPT’s use as a browser extension or plugin during tests are largely mitigated by PSI’s secure browser, which prevents access to unauthorized applications during secure onsite and remote testing.

Will the risk increase with the introduction of a ChatGPT app for desktop?

There is currently no standalone ChatGPT app, and any attempts to access ChatGPT via browser would be blocked by PSI’s secure browser. Additionally, advancements in the flexibility of PSI’s test delivery system means future ChatGPT developments of in-app integrations (think Office 365) or standalone desktop apps will be easily mitigated and quickly updated. Beyond the secure browser, PSI’s agile approach to test security, which is applied to any new technology that might be exploited for cheating, includes:
  • Linear on the Fly Testing (LOFT) to create unique and equivalent test forms, therefore reducing test content exposure.
  • Human proctoring, both online and in person, to ensure test takers are not involved in malpractice of any kind.
  • Data forensics and Web Crawling to detect and investigate any suspicious activity that does take place.
No doubt, mass market familiarity and engagement with platforms like ChatGPT is only just beginning. Further research will be conducted to fully understand the technology’s implications and devise strategies for combating the risk of cheating using these tools as they are developed.

What is the potential of ChatGPT for test content generation?

Although ChatGPT could be used to generate test content, it would be critical for the user first to fully understand all OpenAI’s (the company behind ChatGPT) policies and procedures regarding data privacy and security, content ownership, and source material. Additionally, Subject Matter Experts should review and verify any test content generated by ChatGPT. While there is great potential, much must be considered before fully engaging ChatGPT or similar tools to create sensitive test content that meets our high standards of security and psychometric rigour.

How might ChatGPT impact the practise test experience?

There are two primary applications for a platform like ChatGPT in the area of practise tests – how the test taker could use it and how PSI might.
  • To state the obvious, there is ultimately no benefit to a test taker in cheating on a practise test. While ChatGPT or similar tools could, of course, be used to help answer questions correctly, there are two fundamental considerations here. First, cheating during a practise test is counterproductive to passing the actual test, and second, test takers already have access to search engines – a far more relevant and speedier avenue for sourcing correct multiple choice answers in most cases.
  • From PSI’s point of view, ChatGPT offers significant opportunities for driving advancements in the pre-testing preparation process. Coupling subject matter expertise with GPT-4 content creation capabilities, PSI is innovating a comprehensive suite of test prep and practise test products designed to drive more positive test taker outcomes. By offering a real-world testing experience and targeted study opportunities while ensuring that no live test questions appear outside of the actual test, PSI is developing a scalable suite of pre-knowledge products that simultaneously supports test takers and protects high-stakes test integrity.

What about test takers using ChatGPT to prepare for a test?

ChatGPT could potentially be used to reproduce or replicate test items in a similar or identical format to actual test items. Although potentially advantageous for the test taker, this may pose a risk to test reliability, as not all test takers have equal access to or may benefit from this resource. To assess the risk of specific test content being compromised by a tool like ChatGPT, it’s crucial to understand how AI tools source their knowledge. The training data sets fed to large language models (LLMs) like ChatGPT, Anthropic, and Bard, while broad in scope, still represent a small fraction of the internet. Running these models is incredibly expensive, so reputable and popular sources take precedence. If test items were leaked on the internet, they would likely appear on low-repute websites not included in the training data sets of commercially available LLMs. However, many new products, like GPT-4-powered Bing, pull context from real-time web search results to enhance the AI’s response. Therefore, the potential risk of items being compromised is best described as a general copyright infringement concern on the internet, not a new concern exacerbated by chatbots. This scenario highlights the importance of PSI’s comprehensive Data Forensics program and web crawling service in identifying and mitigating risks associated with potentially exposed questions.

PSI is helping to lead the way

While applications for natural language AI platforms like ChatGPT in the prep, practise and test development processes are still largely academic, PSI and its partners are leading the way in the development of protocols designed to unlock its potential in a responsible way. The future is bright as we continue to explore the opportunities to benefit our clients and support more successful test taker outcomes.