So if I ask ChatGpt about bears and in the middle of explaining their diet it tells me something about how much they like porridge and in the middle of habitat it tells me they live in a quaint cabin in the woods, that's ... True?
Statistically we certainly have a lot of words about 3 bears and their love for porridge. That doesn't mean it's true, it just means it's statistically significant. If I asked someone a scientific question about bears and they told me about Goldilocks, id think it was bullshit.
If that were the case, then you are right.. However, the current crop of LLMs seem to be good at understanding the context.
A scientific data point about bears is unlikely to have Goldilocks in there (unless talking about evolution of life and Goldilocks zone). You can argue that there is meaning hidden in words that is not captured by words themselves in a given context - psychic knowledge as opposed to reasoned out knowledge. That is a philosophical debate.
Words don't carry meaning. Meaning exists in how words are or are not used together with other words. That is, in their.... statistical relationships to each other!
ChatGPT has enough dimensions in its latent space to represent and distinguish between the various meanings of porridge and is able to be informed by the Goldilocks story without switching to it mid-sentence.
It's actually a good example of what I have in mind by saying human text isn't random. The Goldilocks story may not be scientific, but it's still highly correlated with scientific truth about matters like food, bears, or the daily lives of people. Put yourself in the shoes of an alien trying to make heads or tails of that story, you'll see just how many things in it are not arbitrary.
Statistically we certainly have a lot of words about 3 bears and their love for porridge. That doesn't mean it's true, it just means it's statistically significant. If I asked someone a scientific question about bears and they told me about Goldilocks, id think it was bullshit.