Reviewing the statistical evidence
This is most welcome. I had not realized that on top of the massive confirmation bias of counting any of multiple cloud engines as a hit at any time, a big feedback loop was operating.
There is one vital factor you're still missing about average centipawn loss. It grows in proportion to how much one side is ahead. Simply said, if you're a Queen ahead, a pawn difference is minimal. One cannot avert this by cutting off at 2.00 or 4.00 advantage, say, because the proportional effect goes clear down to zero. And as I demonstrated in my article https://rjlipton.wpcomstaging.com/2016/11/30/when-data-serves-turkey/, not correcting for this gives you an unrealistic y-intercept for the rating of perfect play.
Great analysis! Until someone comes with substantial evidence of cheating by Niemann, all I see is some of the most powerful entities in chess hammering on a 19 year old, it's a witch hunt.
And THIS is why I subscribe.
Nice work, Nate. The most useful analysis I've seen.
Very clear analysis and exposition of both the statistical issues and the larger context. Sharing immediately with some statistics-savvy non-chess people who have become interested in this whole scandal.
Thank you for the thoughtful analysis. I have waffled so much on this, and I'm probably now in the best position: I have no idea and shouldn't speculate. But I do trust the top players' analyses and intuition, though they're not free from bias, as you mention. I thought Fabi gave a pretty balanced take on the C-squared podcast, though he admits that he thinks integrity in professional chess is a huge issue, sharing an experience where he knows for a fact that someone cheated and nothing came of it. The only thing I can think to compare it to is my own intuition after teaching intro statistics for 6 years. When a student hands in a report talking about "heteroscedasticity" and "leptokurtosis," I "know" that they receveived outside assistance (and that they're oblivious to how out-of-place those terms are in an intro class), though I don't know how or from whom. I give them a chance to explain it, and when they can't, I have no qualms giving them a zero and saying they failed to convince me it was their original work. Some confess and apologize, and some double-down, never admit it, and when I enforce the policy, threaten me, complain to my chair and dean, and proceed to destroy my reputation online. But I still sleep at night because I also take integrity in my profession very seriously, and there is simply no doubt in my mind in those cases (if there were, I would err on the side of a Type II error--sorry, couldn't resist). When the top GMs think something strange is going on, including time for each move, I listen, but that's all it is for now, and no one has said "there is no doubt" that Niemann cheated. What is exceedingly clear is that all of this could have been handled better. Hopefully it gets cleared up soon.
Without convincing proof, Carlsen implicated Niemann as a cheat at the Sinquefield Cup, Carlsen withdrew, Niemann was allowed to stay in the tourney, and finished in 7th place of the 9 remaining players. Niemann’s prior online cheating caused Carlsen to be suspicious, and the prior cheating complicates the resolution of the matter. But Carlsen was wrong to publicly imply Niemann cheated OTB, he should have privately reported his concerns to the SC TD, and to FIDE, and let them handle the problem, and Carlsen should have stayed in the SC. Niemann, if he could prove $$$ damages, could win a defamation lawsuit against Carlsen. But if Niemann keeps getting tournament invites, where is the economic harm? Perhaps Niemann could win $$$ damages from Carlsen using a legal theory of intentional infliction of mental distress.
Just saw this from highly respected chess author and three-time Scottish chess champion GM Rowson. He writes:
Chessdotcom's report is marketing.
1. It's 20 pages, not 72.
2. No evidence of OTB cheating at all, and their attempt to whip it up is embarrassing.
3. It challenges Hans's version of prior cheating, but mostly w' online blitz games in 2020, and reasonable doubt.#chessdrama https://t.co/Ehc3vuM0Iu
— Jonathan Rowson (@Jonathan_Rowson) October 5, 2022
Subscribed to thank you for this.
The analysis made by Yosha Iglesias was frustrating to watch because no one understood what the figures were being compared to, including myself, because no one explained what Let’s Check.
In truth any statistical analysis will have to be done through experimentation with several control groups:
- one group of players rated between 2600-2800 we know are not cheating;
- one group of players rated between 2600-2800 with at least one cheater receiving all his moves from an engine
- one group of players rated between 2600-2800 with one cheater receiving some moves from an engine
- one group of players rated between 2600-2800 with one cheater rated 2000-2200 receiving some moves from an engine
In other words how do we know any comparaison makes any sens? What if Ian is cheating as well? What if everyone except Hans is cheating? Any attempt to solve this statistically with clear data sets allowing for clear comparisons is destined to fail…
Did you saw analysis by MilkyChess - https://www.youtube.com/watch?v=YnnJ0Da4Rp0 and https://www.youtube.com/watch?v=Q5nEFaRdwZY ? These are, IMO, most significant analysis so far on given topic.
> "The thing that struck me when looking over Niemann’s games is his aggression. Most of the top grandmasters like to avoid risks when possible. Niemann seems more willing to take the game into murky territory, and especially to sacrifice material. Maybe he feels confident doing that because he has engine assistance. Or maybe he’s just an unusually aggressive, intuitive player."
I think there's a third possibility that I don't think I've seen anyone really talk about: Hans might have known that there were rumours about him cheating in the past (at least on Chess.com), and might well have exploited his over the board opponents' suspicions and "tilt" to his otherwise (clean) advantage.
This is evidenced by his intentionally vague statements ("I don't even know why I even looked at that line - "it was a miracle", etc) after his Round 3 game against Carlsen regarding opening prep, which he later (Round 5, if I'm not mistaken) reneged on by saying "it wasn't a miracle", despite earlier saying that it was a miracle *multiple* times in the round 3 interview. A lot of people seem to have somehow forgotten his contradictions in those interviews just days apart - he has to have been less than truthful in at least *one* of those interviews. Both cannot be fully honest at the same time. All of this doubt that he helped play into his opponents' minds would have worked to his advantage. Cheating wouldn't have been necessary if some of the time, he merely needs to put his opponent off enough to make them /think/ he cheats (while maintaining plausible deniability).
But now, especially after his post-Firouzja game interview and all the attention, every top player will recognise his style as well as "psychological" tricks, and I think this alone will make it really tough for him to remain about 2700 in the future.
Neimann did not cheat against Carlsen. But I do wonder if I could have believed the nonsense before becoming more educated. I spent ten years with a mere high school diploma--then 9 years of higher education (to complete just 7 years of schooling). The fantastic BS I believed during those 10 years with a mere high school diploma is embarrassing.
But when you can't rely on yourself to research and understand issues, you might be tempted to rely on other popular characters--like Magnus Carlsen (or Donald Trump, Tom Cruise, Kanye West, flat-earthers, etc.) It can be difficult and isolating to be disloyal to your idols in favor of being loyal the truth. Indeed, I wish I could return to idolizing Magnus...
However, I'm not saying I know the truth, per se. But I believe there is far too much uncertainty, obfuscation, and gaslighting to assert that Neimann cheated. Instead, I find sufficient evidence to believe Neimann did not cheat.
Another misleading thing about centipawn accuracy is the level of subtlety of the move. If I respond to Rd8xRd1+ with RaxRd1, I will get perfect accuracy score on that move and yet every 900 level player will find the move. Further, move subtlety is very complicated: a quiet position might require a nuanced long term planning move that is, on deep analysis, superior to many other quiet moves, whereas a dangerous position might require a precise refutation. And yet, some quiet positions have many near-top moves that are guessable by a club player.
Thanks for the balanced article.
Nice analysis and post!
Is it fair to say that the conclusion is that data do not suggest Hans has being cheating on ALL games?
In other words what would happen if he only cheated in some games? He is for sure not a beginner and maybe cheat is only the extra boost used from time to time. In this case analysing the best games there could be patterns that are hidden within the aggregate average performance. In this case we could try to answer a different question: did hans cheat FROM TIME TO TIME?
One thing I have understood in all this murky story is that it is very difficult to catch a smart high level chess player who is not a super GM but can use help to achieve such a level. If he knows how cheaters are statistically identified by existing methods, he will avoid to achieve that degree of accuracy. He will not use the best engines, or the top lines, but older engines that are good enough to outplay super GMs, and second or third best lines unless he needs the best line to not lose or get worse. I would say that what would characterise such a player would not be its absolute accuracy by comparison with the best engine and best lines, but the absence of blunders, the consistency in not making mistakes, and the fact of finding very good moves (top lines) only when it is needed.
The interest of the chessbase methodology, while it has evident flaws well highlighted in the article, is that it can be the right weapon to catch a "smart cheater" : the most important point here is that when you compare with this same methodology Hans with all the other top players then you see that by these standards Hans's accuracy as a "smart cheater" is incredibly high. I do not know if this makes sense, but looking at Hans's games with stockfish 15 might and expecting to see strong correlations in the case he was cheating it does not sound right. If this was the case, then even the Fide statistical system would have caught him (assuming of course that he is a cheater). Remember, Caruana stated on youtube that he was 100% sure that somebody was cheating OTB, but the FIDE system did not find anything suspicious about that player. So far, I have not heard a single super GMs (in activity) defending Hans or blaming Magnus for his behaviour, which is quite exceptional.
Two additional factors that can explain the Carlsen's gambit against Hans (in my opinion he has much more to lose by taking this open position than to win). First, Magnus played some casual OTB games with Niemann over the beach (impossible place to cheat) during the Miami tournament and according to Anish's record of the two matches Magnus completely crushed Niemman and in both games Magnus had black, and Hans made several blunders and inaccuracies. Second, the Magnus's company has been acquired by chess.com few weeks before the Sinquefield cup, and according to some sources Magnus had access to chess.com's anti-cheating analysis of Hans's games including very recent ones with evidence of systematic cheating. We will see what the chess.com statement will say in the following days. Third, considering all the above points, the post-game interviews of Niemann and probably his capacity of talking and calculating chess lines with his pairs, might have added further "evidence" to the case.
Thank you for this analysis. In addition to ACPL, we should also look at their standard deviations. This could be an additional dimension to look at. Also, you have probably by now also seen the analysis of Milky Chess (Brazilian YouTuber) who uses the same approach but with SF15 at a depth of 20. Unfortunately, although he starts with more than 440K observation (Centipawn loss per move), he finally agregates the data in bins of 100-Elo points which clearly reduces the significance of the correlation. If you would perform the same analysis based on each move for each Elo rating at the time of playing, we should obtain more meaningful data (both for average and standard deviation).