Introduction to ChatGPT vs. Stockfish, and Game 1

I’ve been spending some time playing around with ChatGPT (the technology is fascinating, and I recommend you give it a try). And then I wondered how well ChatGPT would hold up against a chess engine. So I decided to find out. And now, I’m really curious to see how ChatGPT continues to learn the game of chess and how long it will take until ChatGPT can actually compete against an engine.

The goal of this mini blog is casually explore the following questions: Will ChatGPT’s chess playing improve over time? And if so, how?

My hypothesis is, yes … it will improve. But I’m quite unsure how it will improve.

Ground Rules

In this not-so-experimental experiment, I’ve decided to establish a few ground rules to help stay consistent across games. They are as follows:

  • ChatGPT will play as white.
  • I will make the first move for ChatGPT with 1. e4 — This will give ChatGPT the advantage of tempo (initial control of the game) and it is well established that 1. e4 is an excellent first move for white (though I have much respect for the 1. d4 players!).
  • I will use the first legal move ChatGPT suggests. In my pilot work, ChatGPT tends to give multiple options to play. So I will simply go with the first move that works.
  • If none of the moves ChatGPT suggests are legal, white will resign.
  • If I make a mistake, I will tell you (transparency is good).

It’s possible I may deviate from these rules (or add to them). If so, I’ll be sure to let the reader know.

So let’s take a look at my first game using ChatGPT to play against a chess engine.

Game 1 (ChatGPT vs Stockfish Level 8)

This game took place on February 1, 2023. I decided to play against Stockfish Level 8 on lichess.org.

TL;DR: White resigns on move 18.

The game begins in a pretty standard way. 1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 Nc6 6. Be2

Here, after ChatGPT’s (white’s) 6th move, we have the classical variation of the Sicilian Defense. A very standard line of play (Figure 1).

Figure 1

The dubious play begins 2 moves later, following 6. … g6 7. O-O Bg7. Here, white faces increasing pressure on the d4 square from the knight on c6 and a discovered attack from the dark-squared bishop (Figure 2). The engine recommends white move the dark-squared bishop to e4 to defend the d4 square. Probably not a bad idea.

Figure 2

But ChatGPT has other ideas and recommends 8. f4 (?) …

After 8. f4 (?) Nxe4, ChatGPT gets a bit skiddish and retreats its remaining knight to f3. Then, black brings out the Queen, playing 9. … Qb6+ and taking advantage of the open diagonal that white made possible with its f4 blunder (Figure 3).

Figure 3

ChatGPT Starts to Struggle at Move 12

Let’s take a look at the position after 11 moves (Figure 4). 1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 Nc6 6. Be2 g6 7. O-O Bg7 8. f4 Nxe4 9. Nf3 Qb6+ 10. Kh1 Nxc3 11. bxc3 Bxc3

Here, black has a material advantage and the engine indicates white is down -5.4. In lay terms: it’s not looking good for white. And black’s dark-squared bishop is looking to make quick work of the rook on a1.

Figure 4

The engine recommends Bd2, but ChatGPT has other plans:

a common move for White is 12. Ba3, which aims to attack Black’s bishop on c3 and regain control of the d4 square.

Here, ChatGPT’s reasoning seems to be breaking down. Though Bd2 would certainly put pressure on the bishop on c3, moving the bishop to a3 certainly does not. Also… take a look back at Figure 4 and notice how white already has some command of the d4 square (courtesy of the queen and knight).

After 12. Ba3 Bxa1, ChatGPT recommends 13. Qxa1. This move makes total sense. Not only does it capture the bishop, but also sets white’s queen to threaten black’s h8 rook along the long diagonal. However, the explanation from ChatGPT doesn’t make much sense, noting:

a typical move for white is 13. Qxa1, regaining the queen and material advantage.

Note, black has the material advantage, not white. And I am not sure what queen is being regained?

ChatGPT also suggested 13. Rxa1 as an alternative, noting:

Another option is 13. Rxa1, regaining the rook and material advantage.

13. Rxa1 is an illegal move, since its path is obstructed by white’s queen (Figure 5).

Figure 5

Anyway, white plays Qxa1 and black replies by castling kingside 13. Qxa1 O-O. See Figure 6. White could certainly develop a threat by pairing the dark-squared bishop and queen along the long diagonal. And perhaps white’s remaining knight could get involved as well.

Figure 6

Another option is for white to move its rook to an open file, such as the d file. In some ways, ChatGPT suggests such a move, developing white’s rook. But not in the way you would expect:

a typical move for white is 14. Rad1, which develops a rook, prepares to castle kingside, and puts pressure on Black’s d6 pawn

Two important things come to mind here:
1. First, the rook on a1 was captured by black on move 12. So the move is illegal (it cannot be played).
2. Second, white has already castled kingside. White cannot castle kingside again.

It seems like ChatGPT is really beginning to lose track, despite having the full information about the game up to this point (Figure 7):

Figure 7

As you will note in Figure 7, the second suggestion to play 14. f5 is legal. It’s not clear to me how much value there is to marching an undefended pawn up the board, but that’s what ChatGPT suggests.

As it turns out, I did play 14. Rd1 instead of 14. f5 … Oops. I’ll try not to make that mistake again. But at least it’s not a critical blunder as white really is in the hole here.

Following 14. Rd1 Qe3, ChatGPT suggests 15. Qf1. This move is illegal since it is blocked by the rook on d1. As a second choice, ChatGPT then recommends 15. Nd2, which, in its own words “develops the knight and prepares to castle kingside.”

Figure 8 depicts the position after 14 moves. The green arrow highlight’s ChatGPT’s main recommendation to play Qf1, which is illegal.

Figure 8

Following 15. Nd2 Qxa3, ChatGPT starts to unravel even further:

After the moves 1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 Nc6 6. Be2 g6 7. O-O Bg7 8. f4 Nxe4 9. Nf3 Qb6+ 10. Kh1 Nxc3 11. bxc3 Bxc3 12. Ba3 Bxa1 13. Qxa1 O-O 14. Rd1 Qe3 15. Nd2 Qxa3, a typical move for White is 16. Rxa3, preserving material advantage. Another option is 16. b3, which prepares to develop the bishop and protect the d4 pawn. Another possibility is 16. c4, which aims to open up the position and control more space.

Rxa3 is an illegal move. And there is no pawn on b2 to move to b3. See Figure 9.

Figure 9

BUT! There is a c pawn! So the play continues on with 16. c4 Qe3

ChatGPT’s first suggestion for move 17 is to put the queen on c3 (undefended, might I add). And black promptly responds with Qxc3.

Figure 10 depicts the final position, move 18 for white. ChatGPT suggests 18. Qxc3 (WE JUST DID THAT!) and thinks, for some reason, the b pawn still exists, noting:

After the moves 1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 Nc6 6. Be2 g6 7. O-O Bg7 8. f4 Nxe4 9. Nf3 Qb6+ 10. Kh1 Nxc3 11. bxc3 Bxc3 12. Ba3 Bxa1 13. Qxa1 O-O 14. Rd1 Qe3 15. Nd2 Qxa3 16. c4 Qe3, a typical move for White is 17. Qc3, which defends the c4 pawn and attacks Black’s queen. Another option is 17. Qf1, which defends the f3 knight and aims to preserve material advantage. Another possibility is 17. b3, which prepares to develop the bishop and protect the c4 pawn.

Figure 10

Note that ChatGPT still maintains white has a material advantage, which it certainly does not. Black is absolutely smothering white.

Given there are no more legal moves for white to make, white resigns.

Full Game and Notation

1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 Nc6 6. Be2 g6 7. O-O Bg7 8. f4 Nxe4 9. Nf3 Qb6+ 10. Kh1 Nxc3 11. bxc3 Bxc3 12. Ba3 Bxa1 13. Qxa1 O-O 14. Rd1 Qe3 15. Nd2 Qxa3 16. c4 Qe3 17. Qc3 Qxc3 { White resigns. } 0-1

Reflection

In some ways, I thought ChatGPT would hold up against Stockfish. But at the same time, I’m not surprised it lost. What did surprise me is how ChatGPT seemed to lose track of the game after move 7. Given that the Sicilian Defense, in particular, is heavily studied, there are a number of lines/variations that exist for ChatGPT to pull from. It’s obviously not there yet.

In the future, I will look to reduce the difficulty of the engine to see if that matters at all. But my guess is it will not (at least in the immediate future), given how ChatGPT suggested moves that were illegal, either due to obstruction from other pieces or because those pieces did not exist (e.g., our good friend the b pawn [RIP]).

I will also try not to make a mistake in carrying out ChatGPT’s recommendation (oops).

If you have any thoughts or suggestions, or notice any errors, do not hesitate to e-mail me to let me know.