My New AI Chess Coach
ChatGPT can plausibly answer nearly any question. Can it annotate a chess game?
In case you haven’t heard, our new AI overlords are here. ChatGPT is an artificial intelligence developed by OpenAI that can generate human-like text with impressive accuracy. But as impressive as ChatGPT is, it's not perfect. For one thing, it can be a bit of a bullshitter at times, spouting off made up or false information with confidence. And while efforts have been made to ensure that ChatGPT aligns with human values and gives helpful advice, it's still possible for it to be subverted and give harmful recommendations. Despite these flaws, interacting with ChatGPT is a mind-blowing experience that allows us to glimpse the potential of AI and what it could mean for our future.
So what did I use this newfound power for? Curing cancer or reducing poverty? No, I wanted to see if it could annotate my chess games. I had already tried this with a previous version of the model, with mixed results. I entered in a pgn of a classic game to see how it would react.
It’s important to understand that Large Language Models, which is the family of model ChatGPT belongs to, don’t really understand the meaning of the text. Rather, they work by learning statistical patterns of text sampled from the internet. Asking the model to complete a prompt is effectively asking, “Here’s a fragment of text. Tell me how this fragment might go on. According to your model of the statistics of human language, what words are likely to come next?”
In this case, the model has apparently learned that a pgn is often followed by HTML for a new section (that’s the bracketed part at the beginning of the response). It says this is from the “fourth and final” match between McDonnell And La Bourdonnais, but according to Wikipedia they actually played six matches. Presenting this kind of incorrect deal with a perfect poker face is quite characteristic.
When it comes to the chess, it starts out well enough, correctly identifying the opening as the Sicilian Defense, but gets the variation wrong. It says it’s the Sveshnikov Variation when in fact it’s a Löwenthal. To be fair, the Sveshnikov and Löwenthal are closely related, both being defined by an early pawn to e5. The only difference is that in the Sveshnikov it’s preceded by Nc6. Even an experienced human player could easily mix these up.
But it goes downhill from there. It says McDonnell played the Maroczy Bind, which is characterized by White pawns on e4 and c4. In fact, McDonnell put a pawn on e4 but a bishop on c4. This may sound like another subtle difference, like the Sveshnikov/Löwenthal distinction, but the Maroczy is all about those twin pawns and swapping one out for a bishop leads to a totally different kind of position. A knowledgeable human wouldn’t make this mistake. Key word: knowledgeable. You could argue that, if anything, mixing up moves makes the bot more human-like.
Speaking of human-like, the bot uses the phrase “open up the position” twice, at the end of each of the last two paragraphs. I’ve found this phrase to be very common among intermediate players when they want to change the position but aren’t quite clear on the strategic goal they should be aiming for. “Open up the position” sounds like you’re doing something, but is vague enough to apply to almost any position. Similarly, it’s almost as though the bot knows it doesn’t really understand the position but is trying to bullshit its way through.
It correctly identifies that McDonnell created a passed pawn on d2, but from the human perspective what makes this game unique isn’t the passed pawn – that’s common enough – but three passed pawns side-by-side on the second rank. Now that’s something special. (“What a pawn phalanx!” is Kasparov’s comment in My Great Predecessors.)
Finally, it says the game ended in a checkmate, when in fact White resigned. So all in all, the bot knows some historical details about the game and is able to speak about chess in a plausible-sounding way, but doesn’t really know what’s happening on the board.
Of course, computers are great at chess – much better than humans – just not this computer. So I had an idea: What if I simply asked it to use Stockfish to analyze the position? Can it do that? When I asked, it duly produced a sequence of alternative moves and their Stockfish evaluations.
But when I checked the moves with my Stockfish, it didn’t add up. The moves don’t make sense and the evaluations are all wrong. It was totally lying! It can’t actually run Stockfish, but it bluffed that it could.
So the previous model left something to be desired. Let’s see how the new model, ChatGPT, did with the same task.
It certainly seems eager to help. It requests the game in standard algebraic notation and even helpfully explains what that is. This is a bit odd though, because as we’ll soon see, it doesn’t understand standard algebraic notation.
Like the previous model, it starts well enough, correctly identifying the first few moves, but quickly loses the thread. Black does move a knight to c6, but this neither supports the pawn on d5 (there is no pawn on d5) nor prepares to castle kingside.
Like its predecessor, it struggles to understand what’s happening on the chess board, so I tried the same trick of asking it to use Stockfish.
Rather than playing along and pretending to use Stockfish, it correctly explains why it can’t do that. That’s a big upgrade! Except… it might be lying about this too. The restriction on running code seems to be hardcoded in and not an actual limit of what the model is capable of. Someone was already able to trick it into running javascript with a combination of clever prompts and a browser extension.
So we’ve got an AI that’s good at generating plausible-sounding text about chess, but doesn’t really have a clue about what’s happening on the board. And we already know we’ve got engines like Stockfish and Leela that far exceed humans in their chess ability. The obvious question is, could we combine these into an AI chess coach that would explain its superior chess understanding in a way we could understand?
And I don’t really see why not. You could, for example, pipe in Leela’s representation of the chess position as an input to a language model. OpenAI already allows you to finetune their models for specific tasks, but only by providing prompt-completion pairs. I did try to finetune it using a pgn version of My Great Predecessors – the unannotated pgn as the prompt, the pgn with Kasparov’s annotations as the completion – but it didn’t work. For now, its inability to really understand the rules of chess is still a blocker.
For the moment, I still can’t use an AI to annotate my chess games for me. But I can use it to solve another annoying problem I deal with every week: writing a conclusion.
Eerily good conclusion, if a bit cookie-cutter. This part killed me: "It requests the game in standard algebraic notation and even helpfully explains what that is. This is a bit odd though, because as we’ll soon see, it doesn’t understand standard algebraic notation." I'm glad we can laugh at it now in its infancy, but this technology really does scare me.
Thanks, Nate, I was laughing so heartily at how you ended your blog post that my wife asked me what I was reacting to!