I think the authors misunderstand what's actually going on.
I think this is the crux:
>They are vastly more powerful than what you get on an iPhone, but the principle is similar.
This analogy is bad.
It is true that the _training objective_ of LLMs during pretraining might be next token prediction, but that doesn't mean that 'your phone's autocomplete' is a good analogy, because systems can develop far beyond what their training objective might suggest.
Literally humans, optimized to spread their genes, have developed much higher level faculties than you might naively guess from the simplicity of the optimisation objective.
If the behavior of top LLMs didn't convince you of this, they clearly develop much more powerful internal representations than an autocomplete does, are much more capable etc.
I would point to papers like Othello-gpt, or lines of work on mechanistic interpretability, by Anthropic, and others, as very compelling evidence.
I think that, contrary to the authors, using words like 'understand' and 'think' for these systems is much more helpful than to conceptualise them as autocomplete.
The irony is that many people are autocompleting from the training objective to the limits of the system; or from generally being right by calling BS on AI, to concluding it's right to call BS here.
I think this is the crux:
>They are vastly more powerful than what you get on an iPhone, but the principle is similar.
This analogy is bad.
It is true that the _training objective_ of LLMs during pretraining might be next token prediction, but that doesn't mean that 'your phone's autocomplete' is a good analogy, because systems can develop far beyond what their training objective might suggest.
Literally humans, optimized to spread their genes, have developed much higher level faculties than you might naively guess from the simplicity of the optimisation objective.
If the behavior of top LLMs didn't convince you of this, they clearly develop much more powerful internal representations than an autocomplete does, are much more capable etc.
I would point to papers like Othello-gpt, or lines of work on mechanistic interpretability, by Anthropic, and others, as very compelling evidence.
I think that, contrary to the authors, using words like 'understand' and 'think' for these systems is much more helpful than to conceptualise them as autocomplete.
The irony is that many people are autocompleting from the training objective to the limits of the system; or from generally being right by calling BS on AI, to concluding it's right to call BS here.