How Does Web Crawling Protect the Integrity of a Test?


How Does Web Crawling Protect the Integrity of a Test?

Nicole Tucker


Do you use Google to ask some rather random questions? You’re not alone. Googling “the most often ‘why’ questions asked on Google” reveals “why is there a leap year?” and “why is the sky blue?” as strong contenders. Much like the average person, I conduct multiple internet searches every day. And it’s such a routine act that most of us don’t stop to consider how it works.

It all comes down to web crawlers – sometimes referred to as spiders or bots – which are computer programs that systematically browse the internet searching for specific content. And importantly for anyone involved in the delivery of high-stakes tests and exams, web crawlers aren’t just useful for search engines, they also play an important role when it comes to increasing online test security.

Protect test content

So how do web crawlers work? For the purposes of test security, web crawling is the process of identifying potentially compromised test content posted to the web. The content can then be matched to actual test content for confirmation, and if it is found to be exposed, steps can be taken to get the content removed.

The use of web crawlers to search for proprietary test content is a highly proactive approach to test security. With the accelerating move to online and remote testing, this has become an important precautionary step for test content protection that complements other elements of a test security program.

How it works

PSI’s web crawling service starts by identifying key search terms within your active item bank. A watch-list of websites is compiled that includes known actors who have posted or attempted to sell leaked test content for other high-stakes exams as well as sites that advertise “real” test questions. This list of suspicious sites is continually updated as new sites are identified by means such as social media listening and active searches. A web crawler is programmed to browse these sites to search for the specific terms and any references to proprietary test content, including content hidden in PDF documents.

In addition to finding test content on websites, the web crawler will also discover proprietary content on discussion boards and social media. This includes stolen items offered for sale or past test takers who discuss test content. Suspected compromised content is purchased if necessary and then matched to actual test content.

Read more: Data Forensics for Test Integrity

Next steps

If suspicious content is found and confirmed as a match for actual test content, the next steps are similar to any suspected breach in test security. The testing organization is notified immediately, and a detailed review and investigation then takes place.

At PSI, we work closely with our clients to gather as much information as possible during the investigation. For example, a client may already have suspicions relevant to the investigation about a particular educator or specific test. Depending on this intelligence and our own findings from data forensics and web crawling, further investigations can be conducted at a test taker, test, or test center level.

If investigations prove that test content has been compromised, we work with a testing organization to develop a response plan. This might involve:

  • Steps to get content removed from the internet, including legal intervention if necessary
  • Deactivation of a test form or removal of specific items from the bank that have been compromised
  • Action against test takers, educators, or other individuals involved in leaking test content

Whole life cycle

An effective test security program will involve steps to protect proprietary content across the whole test life cycle. This begins with secure content storage and transfer during test development and continues through online proctoring and the use of alternate or unique forms during test delivery. And finally, detection measures should be put in place if any misconduct were to take place. This is where data forensics and operational tools such as web crawling are invaluable.