> The 2 + 2 = 5 stuff is just an illustration that LLMs are fairly simple probab...

> The 2 + 2 = 5 stuff is just an illustration that LLMs are fairly simple probability models and not, yet, understanding anything.

See for example this research paper[1]. The researchers trained the model on sequences of moves in the game of Othello[2]. The model started with no knowledge of the game, and was fed a bunch of move sequences (e.g. "c4 c3 d3 e3"). The researchers then were able to look at the model activations and figure out what it thought the board state was. When they updated those activations so that they represented a different board state, the model made moves that made sense with the altered board state but not the original board state.

See also this post[3], which demonstrates that not only does that language model have an internal model of the board state, that internal model is pretty simple. Specifically, for each square on the board, there is a dimension that corresponds to "my color" vs "opponent color" and a dimension that corresponds to whether that square is blank. Changing the activations in the corresponding directions leads to the outputs you would expect.

Recall that this model has never seen an 8x8 board, just sequences of moves. It derived an accurate model of board geometry and the rules from that data. If that doesn't count as "understanding" I'm not sure what would.

[1] https://arxiv.org/pdf/2210.13382.pdf

[2] https://en.wikipedia.org/wiki/Reversi

[3] https://www.alignmentforum.org/posts/nmxzr2zsjNtjaHh7x/actua...