> *Take a multivariate regression model that predicts blood pressure from demogr...

> Take a multivariate regression model that predicts blood pressure from demographic data (age, sex, weight, etc). You can train a pretty accurate model for that kind of task if you have enough data (a few thousand data points). Does that model need to "reason" about human behaviour in order to be good at predicting BP? Nope. All it needs is a lot of data. That's how statistics works. So why is it different for a predictive model of BP and different for a next-token prediction model?

For one, because the goal function for the latter is "predict output that makes sense to humans", in the fully broad, fully general sense of that statement.

It's not just one thing, like parse grocery lists, XOR write simple code, XOR write a story, XOR infer sentiment. XOR be a lossy cache for Wikipedia. It's all of them, separate or together, plus much more, plus correctly handling humor, sarcasm, surface-level errors (e.g. typos, naming), implied rules, shorthands, deep errors (think user being confused and using terminology wrong; LLMs can handle that fine), and an uncountable number of other things (because language is special, see below). It's quite obvious this is a different class of things than a narrowly specialized model like BP predictor.

And yes, language is special. Despite Chomsky's protestations to the contrary, it's not really formally structured; all the grammar and syntax and vocabulary is merely classification of high-level patterns that tend to occur (though invention of print and public education definitely strengthened them). Any experience with learning a language, or actual talking to other people, makes it obvious that grammar or vocabulary are neither necessary nor sufficient to communication. At the same time, though, once established, the particular choices become another dimension that packs meaning (as it becomes apparent when e.g. pondering why some books or articles seem better than other).

Ultimately, language not a set of easy patterns you can learn (or code symbolically!) - it's a dance people do when communicating, whose structure is fluid and bound by reasoning capabilities of humans. Being able to reason this way is required to communicate with real humans in real, generic scenarios. Now, this isn't a proof LLMs can do it, but the degree to which they excel at this is at least a strong suggestion they qualitatively could be.