# Explainer | When ‘cheating’ in chess becomes a matter of statistics in the courtroom

**The story so far:**

The chess world was thrown into turmoil in late 2022 when Magnus Carlsen, the current world champion, accused Hans Niemann, a 19-year-old US chess grandmaster, of cheating with a chess-playing artificial intelligence (AI) system. Niemann beat Carlsen, prompting Carlsen’s accusation; Niemann insisted that he had beaten Carlsen fairly even though he later admitted that he cheated twice in online chess games at the ages of 12 and 16.

A month later, a 72-page investigative report produced by Chess.com claimed that Niemann “probably cheated” more than a hundred times while playing online chess. But the report also said, “There is no direct evidence to prove that Hans cheated in the September 4, 2022, game with Magnus.”

Cheating in chess has become a huge problem, especially in the online era. Of the more than 500,000 accounts Chess.com has terminated for cheating, more than 500 belong to titled players (titling is a sign of skill). By the beginning of 2024, the site expects to close more than one million accounts.

#### How do you know if a player has cheated?

First, the researchers developed a statistical model using a database of millions of completed chess matches. They then estimate the probability that a human player’s move coincides with a move made by a chess engine using the fitted model.

It’s a bit like DNA crime scene analysis for every chess player in the world. Chess engines like Leela Chess Zero and Stockfish are not only better players than their human counterparts (on average) but also play differently. Stockfish has an Elo rating of over 3,500, compared to Carlsen’s 2014 Elo score of 2,882, the highest ever achieved by a human. Additionally, machines’ playing styles may be from another planet because they are developed differently than humans in developing their styles. Thus the probability of cheating is said to increase when the correlation between a player’s moves and the chess engines increases.

By feeding records of Niemann’s games into chess engines, some experts discovered that Niemann played a long series of AI-recommended moves in tournament games and that his tactics were often similar to on a computer. But some experts insist that the movements onboard in actual multiplayer games can be similar to those of an AI, since the training, preparation, and skills of human players are already affected by these machines. .

The Carlsen-Niemann dispute may finally be settled in court: Niemann is suing Carlsen, Chess.com and chess prodigy Hikaru Nakamura, who also accused Niemann of cheating in online games, for $100 million for in slander. And then, this isn’t the first time statistics matter in legal proceedings. There are many instances in the USA, UK and other countries where statistical theories – mainly those related to the calculation of probabilities – have been applied in both good and bad ways.

#### How reliable are the statistics? What is the case of Sally Clark?

The use of statistics in court requires great care and expertise. An infamous criminal case from the UK involving a woman named Sally Clark is a prime example of how the use of false statistics has resulted in an injustice.

Following the untimely deaths of two of her infant sons from sudden infant death syndrome (SIDS) on separate occasions, Clark was charged with murder. A pediatrician says the odds of a random SIDS death when the mother is older than 26, rich, and a non-smoker, is 1 in 8,543. So the probability of two such deaths, the expert continued, was calculated as 1/8,543^2, or 1 in 73 million. Clark was promptly convicted in 1999.

But the Royal Statistical Society disagreed and said there was “no statistical basis” for the paediatrician’s figure. In fact, the pediatrician committed the ‘prosecutor’s error’ by mistakenly considering the two deaths as independent. When Ray Hill, a professor of mathematics at the University of Salford, analyzed additional data in 2002, he concluded that the chance of a second child dying of SIDS because the first child died of SIDS could be as high as 1 in 60 ! Clark was thus released from prison in 2003.

In a 2011 paper, Norman Fenton, a professor of risk information management at Queen Mary, London, wrote, “Most of the common fallacies of statistical reasoning can be avoided by applying Bayes’ theorem, a rule allowing evidence to be weighed.”

Let’s say a crime scene sample yields a partial DNA profile that matches corresponding parts of Swami’s profile with a random probability of a match of 2 in 1,000. So the prosecutor claims that there is a 99.8% probability that Swami committed the crime because only 0.2% of people can have such a DNA match. Consider, however, that there are 10,000 people who could be at the scene of the crime. So Swami is only one of about 20 expected matching sources. Instead of 99.8%, then, the probability of Swami committing the crime is only 5%.

(Note that this method assumes that each of the 10,000 potential sources has an equal probability of being the source.)

In a lecture in July 2021, Justice Lady Rose of the UK Supreme Court said, “There are some areas where people are particularly wrong about using statistics to make rational decisions. An important one is in assessing risk and probability.”

Carlsen expressed the belief that cheating is “an existential threat” in chess. It may be tempting, against this backdrop, to see the future of this 1,500-year-old game in the hands of the Carlsen-Niemann case, particularly in the proper use of statistics and their interpretation. But there will be several ways to calculate and interpret them, just as the case itself can swing either way.

For example, according to the analysis of an anonymous Chessbase user called gambit-man, Niemann has an unusually high number of games with 100% engine correlation. Niemann’s defense may be that his play is less computer-like than Carlsen’s in the past.

There is a metric called centipawn loss: it measures how bad a player’s moves are compared to the machine’s top choice. A lower value indicates a closer match to the engine selection. There is another metric called depth: the number of a player’s upcoming moves that a chess engine tries to predict. Compared to the open-source chess engine Stockfish (v. 15) at depth 18, Niemann and Carlsen’s centipawn loss is 25.6 and 16.9, respectively.

#### So did Niemann win the argument or did Carlsen?

Hard to say. We will probably never know if Niemann actually cheated because statistical tests only suggest if cheating might have occurred; they do not give an absolute verdict. Experts will examine every aspect of these analyzes – including their statistical rationale, appropriateness and interpretation – and base that approach with similarly valid arguments and counter-arguments.

The only thing we can be reasonably certain of is that whoever wins the case, an honest game of chess need not hang in the balance – but not for the reasons that Carlsen worries about.

*Atanu Biswas is professor of statistics, Indian Statistical Institute, Kolkata.*