Are you best in the opening, middlegame, or endgame? How sure are you about that?
In a previous post on my programming blog, I tried to answer this question for myself by pulling 1000 of my games from lichess and using an engine to evaluate the position after each move. (I also did some more math to make the results more interpretable, which you can read about in more detail in the original post if you’re interested.) In this way, for each move of the game, I was able to see if my position got better or worse, on average, over many games.
The upshot of this approach is it means I don’t have to rely on my subjective interpretation of what’s happening in my games. The computer evaluates the position after every single move in every single game.
As a coach I’ve found that self-assessments are often flawed. For example, maybe a student comes to you and complains that she always misses tactics. You go over her games and find that she has a very aggressive style, which often works out, but sometimes she does end up on the worse end of tactics. But perhaps more importantly, she seems to be unaware of certain positional principles, and this is causing her to get in bad positions time and again. Actually tactics are a strength of her game, but she’s hyper-aware of her tactical mistakes, because basically, she’s obsessed with tactics. The lowest-hanging fruit is to work on applying positional principles.
I’ve found cases like this are very common. Often, your biggest weakness is something you’re not aware of at all. In fact, one of the best reasons to get a coach is to get an outside perspective on your strengths and weaknesses.
After my last article, GM Eugene Perelshteyn got in touch with me. He was interested in running the analysis on his own games. As a grandmaster, Eugene obviously needs to be proficient in all phases of the game, but as a content creator he’s probably best-known for the opening and endgame. For that reason, I was surprised when the analysis indicated he does best in the middlegame! Was this a valuable insight, or a glitch in the system?
As Eugene pointed out, the best way to find out would be to run the same analysis for more players. We can’t really tell if it’s working without a better baseline for what “normal” looks like. So I decided to run the same analysis for various players and see how the results compared to my expectations.
Before we get into the results, it’s worth discussing what we’re looking for. There’s a weird sort of paradox here. If the metric confirms what we already know, what good is it? But if it runs counter to what we think we know, isn’t it likely just wrong?
The NBA stats community has had to grapple with the same question. One of the biggest trends in NBA analytics over the last few years has been the rise of all-in-one metrics, which try to express a player’s value in a single number. The same issue applies: do you want the metric to assign high scores to players you already think are good, or pick out underrated players? How do you know if it’s working?
When I discussed this with Kostya Medvedovsky, the creator of the DARKO metric, he suggested that a good metric should follow the 80-20 rule: 80% of the time it should confirm what you already know, 20% of the time it should tell you something surprising. Ideally, those 20% of surprising predictions should make sense when you dig into them.
NBA all-in-one metrics are an example of this. Mostly, they confirm what we already know, that superstars like LeBron James and Steph Curry are the most valuable players. But it raised eyebrows when some all-in-one metrics flagged Nikola Jokić, a slow, doughy, unconventional center from Serbia, as one of the best players in the league. Several years later he’s recognized as perhaps the best passing center of all time and an undisputed star.
So this is what we’re looking for with this chess metric. Ideally, it should mostly confirm what we already know, but occasionally tell us something surprising, which actually makes a lot of sense when we dig into it.
I ran the analysis on four other players: Carlsen, Caruana, Firouzja, and Kasparov. Going in, my expectations for strongest phase of the game for each player were:
Carlsen - Endgame
Caruana - Middlegame
Firouzja - Endgame
Kasparov - Opening
Here are the graphs for those four players. It might be fun to try to guess which is which before looking.
Ready?
…
…
…
Okay, the four graphs from top to bottom are
Firouzja
Kasparov
Caruana
Carlsen
So, did it work? Well it’s not really clear. After doing the analysis, I realized there are a few more issues. First, it’s not equally easy to gain ground in each phase of the game. It’s pretty hard to beat your opponent outright in the opening unless they really mess up. It seems logical that, in general, the biggest changes in evaluation would happen in the middlegame. (Maybe this explain’s Eugene’s graph.) Interestingly, I found it easy to come up with players known especially for their opening or endgame play, but hard to think of anyone known especially for the middlegame.
Another issue is that players this strong need to be good at all phases of the game. Additionally, they’re also playing against really strong opposition, so you wouldn’t expect huge gaps in any phase. I suspect if I did the same analysis with weaker players, the gaps between different phases might be bigger.
Having said all that, there are some interesting patterns. In particular, Kasparov, the one player I expected to do best in the opening, had a markedly different shaped graph than the other players, which peaked earlier.
A quick aside on Kasparov’s opening prowess. Kasparov was fanatical about working on the opening and, as world champion, he could command a team of strong grandmasters supporting him. This was especially critical in an era where computers weren’t nearly as strong as they are today. These days, anyone can access a strong enough computer to check their opening preparation. If the computer says your line works, it works. Your opponent might take you out of your comfort zone with a weird move, but there is very little chance that they refute your opening outright. This wasn’t the case in Kasparov’s day. There was a chance you would simply miss something in your preparation and be blown out of the water. Kasparov’s opponents lived in constant fear of this. He had an opening edge that is simply impossible in today’s game.
For that reason, I find the fact that Kasparov’s graph peaks earlier than the other players encouraging. The other three players’ graphs all look somewhat similar, which is neither particularly surprising, nor reassuring. As a next step, I’d like to try the analysis on club players, who might have bigger gaps in their abilities in different parts of the game.