Discussion about this post

User's avatar
Tim's avatar

Great article. I thought a lot about this in grad school, where I used math models to predict dynamics of gene drives. I think even those working closely with the models tend to be overly trusting of the results, and I wrote a paper on the "reification" of these models. It was helpful to keep the humbling "all models are wrong, but some are useful" in mind.

For chess, I agree that the probability measure is at least more useful, and it seems similar to calculation for win probability based on difference in Elo ratings. I'm reminded of a completely closed position I once saw where there was no way to make progress, but the engines would give something like +2.0.

I wonder if Chess.com's Game Review uses the Q score, or some combination of Q and centipawns. I've noticed in a won position you can go from +9 to +6, and it will give you the thumbs up "good move" whereas anywhere else a -3 would be a blunder. Makes sense, since it makes little difference in winning probability when that far ahead.

Expand full comment
A. Nayar's avatar

Another confusing thing about the evaluation of the above position is despite stockfish evaluating white's advantage as "worth" three and a half pawns, if you actually try to give black say a bad knight or a bishop or two pawns, the evaluation will immediately swing in black's favor (any light squared bishop not immediately attacked will put the eval at -6 or so). I've heard tournament commentators for big events say stuff like "the engine thinks white's advantage is worth an extra rook" when the eval bar says +5, even though the evaluations don't work that way at all.

Expand full comment
17 more comments...

No posts