Analysis vs. Analytics

Chess is ready for its Moneyball moment

Jan 03, 2021

Moneyball tells the story of how analytics infiltrated baseball. Nerds with spreadsheets and Harvard math degrees pushed out grizzled scouts who had been watching and evaluating players for decades. Analytics has also transformed basketball, poker, and the stock market. But not, as of yet, chess.

But wait a second. In chess, aren’t we all nerds? And haven’t we been analyzing non-stop all the way back to El Greco in the 1600s? Well yes, but I’d like to draw a distinction between analysis and analytics.

Why is Analytics Valuable?

If you happen to be a human who plays against other humans, you might be very interested in what has happened when humans have played against each other in the past. In particular, the way humans experience a position is often very different from the “objective” computer evaluation. In his French Defense course, Anish Giri said a healthy position is one where many moves are viable. Yet when engines evaluate a position, their evaluation is based solely on what they consider best play - inferior lines do not factor in at all. So if there is a narrow path to equality, the computer will find it and evaluate the position as equal, but in your own games you will inevitably make mistakes. By looking at practical results you can spot positions that, even though they are evaluated as equal by the computer, are not healthy for humans.

Additionally, even casually playing around with the filters on the lichess opening explorer will convince you that openings perform very differently depending on the time control and the level of the players involved. My sense is that many players are aware of this in a general sense, but don’t take these factors very seriously when putting together an opening repertoire.

For example, you might expect something like the Göring Gambit (1. e4 e5 2. Nf3 Nc6 3. d4 exd4 4. c3) to perform better at lower rating levels and in faster time controls.

Intuitively this makes sense - according to the computer, this gambit isn’t really sound, but it does give white easy development and a small initiative, so it seems like it should do better in contexts where black will struggle to defend accurately. But intuition can easily be wrong, so it’s a good idea to check the data. I got these numbers by manipulating the settings in the lichess opening explorer.

The rating data confirms my suspicions: every time you step up to a higher rating group, the win percentage goes down. I would agree with Greg Shahade that the first thing to ask yourself when building an opening repertoire is, “What are my goals in chess?” Based on this data, if you are aiming for Master level or above, the Göring Gambit is probably a bad choice. None of this is surprising so far, but let’s also check the results by time control.

In contrast to the rating data, the time control data goes against my initial intuition. The Göring Gambit actually performs better at slower time controls. I have to admit, I found this pretty surprising. Maybe being down a pawn puts pressure on white to come up with a concrete follow-up, so the opening works better when you have more time to calculate.

You could speculate on the reasons for this all day, but the important point is that the data clearly showed that one of my initial intuitions was wrong, in a way that directly bears on the advice I would give to someone building an opening repertoire. Before looking at the data I would have said, “Don’t make the Göring a central part of your repertoire if you are aiming for Master level or play slow time controls.” After looking at the data, I would still advise against using it if your longterm ambitions are to reach Master or higher, but if not, it seems to be totally fine at slower time controls.

This is really the core of analytics. Looking at what actually happened over a large number of games allows you to reality check your intuition. In many cases, doing this kind of analytics work is not only possible, but actually pretty easy. It only took me a few minutes of clicking around on lichess to gather the data for this section. I’m excited about building up the tooling for analytics that are more nuanced, repeatable, and actionable.

Who’s doing analytics?

Of course, none of this is exactly new. I’m just gathering up various threads and framing them through this analysis/analytics lens to try to show you that there’s a whole category of questions in chess that have received surprisingly little attention, have a lot of potential to improve competitive results, and are surprisingly easy to answer with modern tools.

Some sites are already leading the charge on analytics. Aimchess provides a service that downloads your games from lichess or chess.com and analyzes them on dimensions like advantage capitalization, long thinking outcome, and resourcefulness. The thing I like about this site is that they’ve tried to make the insights actionable: each claim about your results comes with a recommendation for improving that dimension.

Chessgoals offers study plans based on data from hundreds of players. I like that they put their money where their mouth with specific, concrete study plans. A lot of people make chess improvement sound easy, but get curiously vague when it comes to the actual steps you should take. Chessgoals has very specific plans and rationales for each rating category.

And I’ve tried to get in on the act myself with some posts over on my programming blog. Nonetheless, I feel like all this is only the beginning. With chess booming, there’s more data than ever, and it’s never been easier to access and analyze it. Those looking to gain an edge in competitive chess in 2021 might do well to think less about the eternal truths of the game, and more about the patterns that emerge when fallible humans slug it out over the board.

Eric Jensen

Jan 4, 2021

I was wondering about cross correlation between the variables. I suspect that far more long time control games feature lower rated opponents. I'm not aware of many titled players that play a significant amount of long time control games on lichess, for example. The ratings alone might explain why the gambit seems to perform so much better at long time controls.

Expand full comment

1 reply by Nate Solon

1 more comment...

Zwischenzug

Discussion about this post