What is the Turing Test?

The Turing Test is a thought experiment proposed by Alan Turing in 1950 to decide whether a machine can think. A human interrogator chats over a text-only link with both a person and a computer; if the interrogator cannot reliably tell which is which, the machine is said to have passed.

Who invented the Turing Test?

British mathematician and computer scientist Alan Turing. He gave a 1947 lecture to the London Mathematical Society on machine intelligence, expanded the ideas in his 1948 NPL report Intelligent Machinery, and formally introduced the test in Computing Machinery and Intelligence in the October 1950 issue of the journal Mind.

What is the imitation game?

The original imitation game in Turing’s 1950 paper had three players: a man (A), a woman (B), and an interrogator (C) of either sex. The interrogator, communicating only by text, has to decide which of A and B is the woman. Turing then asked: what happens if the role of A is taken by a machine? Will the machine fool the interrogator as well as a man would?

Has any AI passed the Turing Test?

Earlier claims like the 2014 Eugene Goostman chatbot were heavily disputed. In 2024, a UC San Diego study found GPT-4 fooled interrogators in 49.7% of two-party games. In a 2025 follow-up using a stricter three-party setup, GPT-4.5 with a persona prompt was judged human 73% of the time, more often than the actual humans — the first empirical evidence that any AI passes a standard Turing Test.

Is the Turing Test still relevant?

It remains the most quoted question in AI, but most researchers no longer treat it as the gold standard. Modern systems are evaluated against capability-specific benchmarks like MMLU, HumanEval, ARC, and HELM, which test reasoning, coding, and broad knowledge rather than mere ability to impersonate.

Turing Test: Definition, History, Explanation, Examples And Games

Table of Contents (click to expand)

History Of Turing Test
Imitation Game And Standard Interpretation

The Turing Test, proposed by Alan Turing in his 1950 paper Computing Machinery and Intelligence, is a thought experiment for deciding whether a machine can think. A human interrogator chats over a text link with one human and one computer and has to tell which is which; if the interrogator cannot reliably do better than chance, the machine is said to have passed. In 2025, GPT-4.5 became the first AI system to convincingly pass a standard three-party version of the test.

Today’s computers have become the true testament of what a multipurpose machine can be. However, even though we’ve come quite far in the computing world, one of the milestones that Computer Science and Artificial Intelligence researchers would still like to achieve is a machine that can reach human-level intelligence. Although this is far easier said than done, there was a test developed back in the 1950s to determine this exact qualification. It was a test to design machine intelligence. The Turing Test, to this day, stands as the gold standard for scientists to test their machines and they keep the Turing Test rules as the paradigm from which they judge. Now, before we get into the what the Turing Test is, let’s first take a look at the history of this legendary test.

The term ‘machine intelligence’ was loosely being thrown around years before the field of Artificial Intelligence was formally christened at Dartmouth in 1956. It was a common topic among the members of the Ratio Club, a London-based gathering of cybernetics, mathematics, and neuroscience researchers active from 1949 onward. Alan Turing was part of this group. Turing himself had been mulling over machine intelligence with his wartime colleagues at Bletchley Park, and he gave what is probably the first public scientific lecture on the topic to the London Mathematical Society in February 1947. He expanded those ideas into the report Intelligent Machinery for the National Physical Laboratory in 1948, and then into the paper Computing Machinery and Intelligence, published in the journal Mind in October 1950, where the Turing Test was formally introduced. Several variations of the test are used today to check the intelligence of a machine.

Imitation Game And Standard Interpretation

Concerning the Turing Test, it is widely accepted that there are three standards to it. The first two are in accordance with the standard of “Computing Machinery and Intelligence”, which Alan Turing himself set up. The third one, which is a relatively hot button topic and widely debated, is known as “Standard Interpretation.” There are some qualms regarding the Standard Interpretation as a legitimate variation of the test, or whether it is nothing more than a misreading of his paper. Each of these variations has its own strengths and weakness.

https://en.wikipedia.org/wiki/Turing_test#/media/File:Turing_test_diagram.png — (Photo Credit : Juan Alberto Sánchez Margallo/Wikimedia Commons)

The original publication that Turing wrote, which laid the framework for the Turing Test, was a simple party game involving three players. Let’s name these players A, B, and C. In this game, let’s assume that player A is male, player B is female and player C, who plays the role of the interrogator, can be either of the two sexes. Now, let’s place both the players behind a screen, compartmentalized and anonymously. By asking questions to player A and player B, player C tries to gauge who is the male and who is the female. By asking a series of questions, player C tries to evaluate which one is male and female. In this, player A’s role is to trick the interrogator (player C), while player B must make an effort to assist the interrogator in making the right decision. Turing then goes on to ask, what would happen when a machine takes the place of player A in this game. Will the machine be able to fool the interrogator? Will the machine be able to think on its own? A second version appeared by Turing in a 1950 paper. However, in this imitation game, the role of player A is performed by the computer, and the role of player B is played by either a man or woman.

Now, the point of the Turing Test must be clearly understood. It’s not really about fooling the interrogator for its own sake, but about acting as a measure of how closely a computer can behave like a human. In the standard interpretation, player A is a computer and player B is a person of either sex. The role of the interrogator is not to determine which is male and which is female, but rather which is a computer and which is a human. The fundamental issue with the standard interpretation is that, in a passing run, the interrogator simply cannot reliably tell which responder is human and which is the machine. There are issues about duration, but the standard interpretation generally treats this limitation as something that should be reasonable.

For decades, no machine came close to a clean pass. That changed with modern large language models. In a 2023–2024 study by Cameron Jones and Benjamin Bergen at UC San Diego, the best GPT-4 prompt fooled human interrogators in 49.7% of two-party games — better than ELIZA (22%) and GPT-3.5 (20%), but still under human baseline. The same group’s 2025 follow-up using a stricter three-party setup found that GPT-4.5 with a "persona" prompt was judged to be the human 73% of the time, statistically more often than the actual humans, while Meta’s LLaMa-3.1 hit a roughly even 56%. The authors described this as the first empirical evidence that any artificial system passes a standard three-party Turing Test.

Even with that result on the books, the Turing Test has its known weaknesses — it measures impersonation, not understanding, and modern LLMs can be confidently wrong while still sounding convincingly human. Researchers now lean on benchmarks like MMLU, HumanEval, ARC, and HELM to probe specific capabilities. Still, after 75 years, Turing’s "Can machines think?" remains the most quoted question in artificial intelligence, and a passable answer to it now comes out of a chat box.

References (click to expand)