38 Comments

This is most welcome. I had not realized that on top of the massive confirmation bias of counting any of multiple cloud engines as a hit at any time, a big feedback loop was operating.

There is one vital factor you're still missing about average centipawn loss. It grows in proportion to how much one side is ahead. Simply said, if you're a Queen ahead, a pawn difference is minimal. One cannot avert this by cutting off at 2.00 or 4.00 advantage, say, because the proportional effect goes clear down to zero. And as I demonstrated in my article https://rjlipton.wpcomstaging.com/2016/11/30/when-data-serves-turkey/, not correcting for this gives you an unrealistic y-intercept for the rating of perfect play.

Expand full comment
author

Thanks for your insight! Do you think converting centipawns to win% based on how cp differences correlate with win expectancy in real games is a good way to mitigate this? This is the approach taken by Lichess: https://lichess.org/page/accuracy#:~:text=The%20Accuracy%20metric%20indicates%20how,all%20the%20preferred%20Stockfish%20moves.

Expand full comment

That takes care of much of it---I notice that links to your article https://zwischenzug.substack.com/p/centipawns-suck in turn---but there's a further foible. The Win% curve (I say "points expectation curve" to include draws at 0.5) itself depends on the rating of the player. Simply put, 2800 vs,. 2800 will convert a 2.00 advantage more often than 1200 vs. 1200. AlphaZero need not care, but when forecasting players of all ranks it's a sizable issue. I discoursed on this in my article https://rjlipton.wpcomstaging.com/2018/09/07/sliding-scale-problems/, which I wrote at the time I discontinued my policy of converting the evaluations of various chess engines onto a common scale. This---building the model separately for each engine---removed a discrepancy in an actual cheating case at the time.

This leads to a second issue, which is my pet example of a well-motivated and actually-effective idea that yet has a technical hole one can drive a tank through. My article links a 2012 techreport by Amir Ban (Deep Junior programmer and USB flash drive co-inventor) showing that having chess engine evaluations conform to the points expectation curve maximizes the learning rate. But engines are free to apply any order-preserving post-processing to their evaluations they please, and not only will it preserve the final ranks of moves, it will not affect the internal minimax search. Where the Houdini website advertises its "calibrated evaluations", I think this is going on. And some further sleuthing and private info revealed this as the explanation for the anomaly in my article https://rjlipton.wpcomstaging.com/2016/12/08/magnus-and-the-turkey-grinder/, which is the "Part 2" of the article on ACPL in my first reply.

Expand full comment

The articles Kenneth refers to and your excellent article give very good reasons why centipawns is not a good barometer of a players strength. However, the results of Rafael Vleite's analysis appear to show a very consistent correlation of an inversely proportional relationship of ratings vs AVERAGE centipawn loss.

Furthermore it is very strange that Han's Neiman's correlation is entirely different than numerous other players who have ascended to the lofty heights of 2700.

Rafael Vleite appears to have indeed shown a novel correlation with the single exception being none other than HN.

So as flawed as the centipaawn loss is as a barometer of how well a player is performing the correlation with this flawed metric is still striking and that is the fundamental question that needs to be addressed.

I have seen no material offered that refutes Rafael Vleite's presentation.

Your article argues that there is a confirmation bias as to what the definition of the strongest move is by chessbases approach of multiple engines therefore bad methodology,

Along the same line if the flawed methodology is used the same for all of the players in the comparison it seems odd that again Hn's results are anomalous compared to the other players just as they were with using the flawed centipawn anaylsis of Rafael Vleite.

Some have mentioned that Rafael Vleite's presentation, is somewhat "snarky". I ascribe to youthful hubris. More to the point, let's hear a direct refutation to this presentation.

https://medium.com/@rafaelvleite82/how-i-found-perfect-correlation-between-chess-player-rating-and-acpl-and-stdcpl-bea9485055de

Your thoughts on this?

Expand full comment

Great analysis! Until someone comes with substantial evidence of cheating by Niemann, all I see is some of the most powerful entities in chess hammering on a 19 year old, it's a witch hunt.

Expand full comment

And THIS is why I subscribe.

Nice work, Nate. The most useful analysis I've seen.

Expand full comment

Very clear analysis and exposition of both the statistical issues and the larger context. Sharing immediately with some statistics-savvy non-chess people who have become interested in this whole scandal.

Expand full comment

Thank you for the thoughtful analysis. I have waffled so much on this, and I'm probably now in the best position: I have no idea and shouldn't speculate. But I do trust the top players' analyses and intuition, though they're not free from bias, as you mention. I thought Fabi gave a pretty balanced take on the C-squared podcast, though he admits that he thinks integrity in professional chess is a huge issue, sharing an experience where he knows for a fact that someone cheated and nothing came of it. The only thing I can think to compare it to is my own intuition after teaching intro statistics for 6 years. When a student hands in a report talking about "heteroscedasticity" and "leptokurtosis," I "know" that they receveived outside assistance (and that they're oblivious to how out-of-place those terms are in an intro class), though I don't know how or from whom. I give them a chance to explain it, and when they can't, I have no qualms giving them a zero and saying they failed to convince me it was their original work. Some confess and apologize, and some double-down, never admit it, and when I enforce the policy, threaten me, complain to my chair and dean, and proceed to destroy my reputation online. But I still sleep at night because I also take integrity in my profession very seriously, and there is simply no doubt in my mind in those cases (if there were, I would err on the side of a Type II error--sorry, couldn't resist). When the top GMs think something strange is going on, including time for each move, I listen, but that's all it is for now, and no one has said "there is no doubt" that Niemann cheated. What is exceedingly clear is that all of this could have been handled better. Hopefully it gets cleared up soon.

Expand full comment
author

It certainly seems that many GMs are suspicious of Hans. Almost like there's an open secret within the GM community that he cheated. But what's the source of this belief? Nothing has come out publicly that is anywhere close to solid proof of Hans cheating OTB. Is there some secret evidence that hasn't been revealed, or a case of groupthink and confirmation bias? That's very hard to evaluate from where I'm sitting.

Expand full comment

Without convincing proof, Carlsen implicated Niemann as a cheat at the Sinquefield Cup, Carlsen withdrew, Niemann was allowed to stay in the tourney, and finished in 7th place of the 9 remaining players. Niemann’s prior online cheating caused Carlsen to be suspicious, and the prior cheating complicates the resolution of the matter. But Carlsen was wrong to publicly imply Niemann cheated OTB, he should have privately reported his concerns to the SC TD, and to FIDE, and let them handle the problem, and Carlsen should have stayed in the SC. Niemann, if he could prove $$$ damages, could win a defamation lawsuit against Carlsen. But if Niemann keeps getting tournament invites, where is the economic harm? Perhaps Niemann could win $$$ damages from Carlsen using a legal theory of intentional infliction of mental distress.

Expand full comment

He finished in 6th place, I believe. And had Magnus not quit, then Hans would have kept the full point he earned by defeating Magnus.

If Magnus had then beaten Wesley So and Fabiano Carauana, then Hans would tied for 3rd place.

Expand full comment

Carlsen never stated that HN cheated OTB. He said that it seems he cheated more frequently and recently that what he says, that is "I've cheated online when I was 12 and later at 16, in some random chess.com games".

Expand full comment

There’s no need to parse Carlsen’s statements for the bald declaration that Niemann is an OTB cheat, since a person can be defamed by false implications and can suffer mentally from the same false implications. Everything that Carlsen has said and done re Niemann since 9/1/22 adds up to a public statement of Carlsen’s belief that Niemann is an OTB cheat, and that’s the message the media is passing on. So, does Carlsen’s misbehavior give Niemann tort theories supporting a $$$ judgment against Carlsen? We’ll see, as I expect Niemann’s lawyers will be filing suit soon.

Expand full comment

Good point. Found this online:

The next level of complexity deals with what used to be known as “false light” defamation. False light does not necessarily involve false facts, but rather, organizing the facts in such a manner which causes the reader or hearer to form a false impression. In 2008, the Florida Supreme Court, in a landmark decision, Jews for Jesus v. Rapp, No. SC06-2491 (Fla 2008) abolished the separate cause of action of false light and merged it into Defamation by implication. The Court stated:

“Defamation by implication arises, not from what is stated, but from what is implied when a defendant ‘(1) juxtaposes a series of facts so as to imply a defamatory connection between them, or (2) creates a defamatory implication by omitting facts, [such that] he may be held responsible for the defamatory implication ․’ ” (quoting W. Page Keeton et al., Prosser & Keeton on the Law of Torts § 116, at 117 (5th ed. Supp.1988))); Mohr v. Grant, 153 Wash.2d 812, 108 P.3d 768, 774-76 (2005) (same); Guilford Transp. Indus., Inc. v. Wilner, 760 A.2d 580, 596 (D.C.2000) (“[B]ecause the Constitution provides a sanctuary for truth, ․ [t]he [defamatory] language must not only be reasonably read to impart the false innuendo, but it must also affirmatively suggest that the author intends or endorses the inference.” (quoting Chapin v. Knight-Ridder, 993 F.2d 1087, 1092-93 (4th Cir.1993))); Armstrong v. Simon & Schuster, Inc., 85 N.Y.2d 373, 625 N.Y.S.2d 477, 649 N.E.2d 825, 829-30 (1995) (“ ‘Defamation by implication’ is premised not on direct statements but on false suggestions, impressions and implications arising from otherwise truthful statements.”); see also Milkovich v. Lorain Journal Co., 497 U.S. 1, 13, 20, 110 S.Ct. 2695, 111 L.Ed.2d 1 (1990) (recognizing that defamation can arise where a statement of opinion reasonably implies false and defamatory facts); Cooper v. Greeley & McElrath, 1 Denio 347, 348 (N.Y.Sup.Ct.1845) (holding that a publisher was liable to James Fennimore Cooper for a publication that implied Fennimore had a poor reputation); Restatement (Second) of Torts § 566 (“A defamatory communication may consist of a statement in the form of an opinion, but a statement of this nature is actionable only if it implies the allegation of undisclosed defamatory facts․”).”

Expand full comment

Thank you!! You would not believe the number of online commentators -- some attorneys -- who seem to think that putting "my opinion" in front of a defamatory statement makes it immune to litigation. Truth is a strong defense against defamation/libel, but it is not an absolute defense. I am glad to see someone post specifics about how "truth" can be arranged to present a false narrative or false implication.... and how an "opinion" can still be defamatory.

Expand full comment

Well, dont doubt that Carlsen statement was also written by his lawyers. The fact is that Carlsen asked HN permission to unreveal more information v and HN silence talks by itself. Anyway the analysis by Nate that HN is playing with the same percentil error in the last 3 years, also talks by itself. I mean, his rating is improving dramatically but the quality of his play doesn't 🤔

Expand full comment

I think people are losing sight of the most important question, which has to do with Magnus and not Hans Niemann. There is no evidence that Hans cheated in the Sinquefield Cup, other than his prior history of (maybe) showing a propensity to cheat (this is the kind of evidence, by the way, that is not allowed in criminal trials in the US, except for charged sex offenses). And despite Magnus' extremely thin veil of wordsmithing, he has clearly and forcefully alleged that Hans cheated against him in the Sinquefield Cup. This is atrocious behavior, especially given the power Magnus has.

Expand full comment

Nobody has sue HN about his cheating. Not maybe but confessed by himself makes your (maybe) a clear fact. Chess.com closed (not for the 1st. time) privately and Magnus clearly showed tyhat he doesn't want to play against him, OTB and online. You can't oblige a player to play against another with the excuse that if you don't, you are damaging the image of potential opponents. BTW, in the Julius Baer, I haven't heared anybody commenting that Ivanchuk used 3min. to play some 20 moves, giving up both bishops for knights and resign. This is also giving up a game, in a more disguised way. Ivanchuk is retired and it's not going to face HN never more. He just didn't want to lose his time playing what he considered a cheater. And I respect Magnus, Ivanchuk and any other player to choose with whom they want to play and whom not.

Expand full comment

That is not the way it works --- professional players sign contracts to play in events. They are expected to complete their schedules (otherwise they may be breaking their contracts).

Players who break contracts may not be invited to play in future events. They are subject to other sanctions.

Players who accuse other players of cheating or other misdeeds -- especially without direct evidence -- can expect both chess sanctions and possibly lawsuits.

Past history of online misdeeds is not evidence about specific circumstances at the 2022 S Cup. It does not matter -- 1 time, 2 times, 10 times -- all that matters is that the claim (MC's claim) is about OTB games at the 2022 S Cup. And there is no evidence, no mechanism, no method, no accomplice shown to date. That is very bad for the MC side.

Expand full comment

Just saw this from highly respected chess author and three-time Scottish chess champion GM Rowson. He writes:

Chessdotcom's report is marketing.

1. It's 20 pages, not 72.

2. No evidence of OTB cheating at all, and their attempt to whip it up is embarrassing.

3. It challenges Hans's version of prior cheating, but mostly w' online blitz games in 2020, and reasonable doubt.#chessdrama https://t.co/Ehc3vuM0Iu

— Jonathan Rowson (@Jonathan_Rowson) October 5, 2022

Expand full comment

Subscribed to thank you for this.

The analysis made by Yosha Iglesias was frustrating to watch because no one understood what the figures were being compared to, including myself, because no one explained what Let’s Check.

In truth any statistical analysis will have to be done through experimentation with several control groups:

- one group of players rated between 2600-2800 we know are not cheating;

- one group of players rated between 2600-2800 with at least one cheater receiving all his moves from an engine

- one group of players rated between 2600-2800 with one cheater receiving some moves from an engine

- one group of players rated between 2600-2800 with one cheater rated 2000-2200 receiving some moves from an engine

In other words how do we know any comparaison makes any sens? What if Ian is cheating as well? What if everyone except Hans is cheating? Any attempt to solve this statistically with clear data sets allowing for clear comparisons is destined to fail…

Expand full comment

Did you saw analysis by MilkyChess - https://www.youtube.com/watch?v=YnnJ0Da4Rp0 and https://www.youtube.com/watch?v=Q5nEFaRdwZY ? These are, IMO, most significant analysis so far on given topic.

s.

Expand full comment
author

Yes, this is an interesting line of analysis. I haven't had time to properly evaluate it yet.

Expand full comment

Please do, many would love to hear what the issues if any are.

Expand full comment

Thanks for these. Definite discrepancy wrt Hans. Experts should consider this and comment

Expand full comment

> "The thing that struck me when looking over Niemann’s games is his aggression. Most of the top grandmasters like to avoid risks when possible. Niemann seems more willing to take the game into murky territory, and especially to sacrifice material. Maybe he feels confident doing that because he has engine assistance. Or maybe he’s just an unusually aggressive, intuitive player."

I think there's a third possibility that I don't think I've seen anyone really talk about: Hans might have known that there were rumours about him cheating in the past (at least on Chess.com), and might well have exploited his over the board opponents' suspicions and "tilt" to his otherwise (clean) advantage.

This is evidenced by his intentionally vague statements ("I don't even know why I even looked at that line - "it was a miracle", etc) after his Round 3 game against Carlsen regarding opening prep, which he later (Round 5, if I'm not mistaken) reneged on by saying "it wasn't a miracle", despite earlier saying that it was a miracle *multiple* times in the round 3 interview. A lot of people seem to have somehow forgotten his contradictions in those interviews just days apart - he has to have been less than truthful in at least *one* of those interviews. Both cannot be fully honest at the same time. All of this doubt that he helped play into his opponents' minds would have worked to his advantage. Cheating wouldn't have been necessary if some of the time, he merely needs to put his opponent off enough to make them /think/ he cheats (while maintaining plausible deniability).

But now, especially after his post-Firouzja game interview and all the attention, every top player will recognise his style as well as "psychological" tricks, and I think this alone will make it really tough for him to remain about 2700 in the future.

Expand full comment

Neimann did not cheat against Carlsen. But I do wonder if I could have believed the nonsense before becoming more educated. I spent ten years with a mere high school diploma--then 9 years of higher education (to complete just 7 years of schooling). The fantastic BS I believed during those 10 years with a mere high school diploma is embarrassing.

But when you can't rely on yourself to research and understand issues, you might be tempted to rely on other popular characters--like Magnus Carlsen (or Donald Trump, Tom Cruise, Kanye West, flat-earthers, etc.) It can be difficult and isolating to be disloyal to your idols in favor of being loyal the truth. Indeed, I wish I could return to idolizing Magnus...

However, I'm not saying I know the truth, per se. But I believe there is far too much uncertainty, obfuscation, and gaslighting to assert that Neimann cheated. Instead, I find sufficient evidence to believe Neimann did not cheat.

Expand full comment

Another misleading thing about centipawn accuracy is the level of subtlety of the move. If I respond to Rd8xRd1+ with RaxRd1, I will get perfect accuracy score on that move and yet every 900 level player will find the move. Further, move subtlety is very complicated: a quiet position might require a nuanced long term planning move that is, on deep analysis, superior to many other quiet moves, whereas a dangerous position might require a precise refutation. And yet, some quiet positions have many near-top moves that are guessable by a club player.

Expand full comment

Thanks for the balanced article.

Expand full comment

Nice analysis and post!

Is it fair to say that the conclusion is that data do not suggest Hans has being cheating on ALL games?

In other words what would happen if he only cheated in some games? He is for sure not a beginner and maybe cheat is only the extra boost used from time to time. In this case analysing the best games there could be patterns that are hidden within the aggregate average performance. In this case we could try to answer a different question: did hans cheat FROM TIME TO TIME?

Expand full comment

One thing I have understood in all this murky story is that it is very difficult to catch a smart high level chess player who is not a super GM but can use help to achieve such a level. If he knows how cheaters are statistically identified by existing methods, he will avoid to achieve that degree of accuracy. He will not use the best engines, or the top lines, but older engines that are good enough to outplay super GMs, and second or third best lines unless he needs the best line to not lose or get worse. I would say that what would characterise such a player would not be its absolute accuracy by comparison with the best engine and best lines, but the absence of blunders, the consistency in not making mistakes, and the fact of finding very good moves (top lines) only when it is needed.

The interest of the chessbase methodology, while it has evident flaws well highlighted in the article, is that it can be the right weapon to catch a "smart cheater" : the most important point here is that when you compare with this same methodology Hans with all the other top players then you see that by these standards Hans's accuracy as a "smart cheater" is incredibly high. I do not know if this makes sense, but looking at Hans's games with stockfish 15 might and expecting to see strong correlations in the case he was cheating it does not sound right. If this was the case, then even the Fide statistical system would have caught him (assuming of course that he is a cheater). Remember, Caruana stated on youtube that he was 100% sure that somebody was cheating OTB, but the FIDE system did not find anything suspicious about that player. So far, I have not heard a single super GMs (in activity) defending Hans or blaming Magnus for his behaviour, which is quite exceptional.

Two additional factors that can explain the Carlsen's gambit against Hans (in my opinion he has much more to lose by taking this open position than to win). First, Magnus played some casual OTB games with Niemann over the beach (impossible place to cheat) during the Miami tournament and according to Anish's record of the two matches Magnus completely crushed Niemman and in both games Magnus had black, and Hans made several blunders and inaccuracies. Second, the Magnus's company has been acquired by chess.com few weeks before the Sinquefield cup, and according to some sources Magnus had access to chess.com's anti-cheating analysis of Hans's games including very recent ones with evidence of systematic cheating. We will see what the chess.com statement will say in the following days. Third, considering all the above points, the post-game interviews of Niemann and probably his capacity of talking and calculating chess lines with his pairs, might have added further "evidence" to the case.

Expand full comment

Thank you for this analysis. In addition to ACPL, we should also look at their standard deviations. This could be an additional dimension to look at. Also, you have probably by now also seen the analysis of Milky Chess (Brazilian YouTuber) who uses the same approach but with SF15 at a depth of 20. Unfortunately, although he starts with more than 440K observation (Centipawn loss per move), he finally agregates the data in bins of 100-Elo points which clearly reduces the significance of the correlation. If you would perform the same analysis based on each move for each Elo rating at the time of playing, we should obtain more meaningful data (both for average and standard deviation).

Expand full comment