> Like when someone says the only thing stopping LLMs is hallucinations… that is...

ninetyninenine · 2025-08-22T20:00:57 1755892857

That's not true. There is we just need to find it.

Humans have a condition called schizophrenia where we literally are incapable of differentiating hallucination and reality. What that capability is, is something we need to find out and discover for both ourselves and LLMs.

For example: Mathematically speaking it's possible to know how far away an inferenced point is away from a cluster of real world data. That delta when fed back into the neural network can allow the LLM to know how speculative a response is. From there we can feed the response back into itself for refinement.

dns_snek · 2025-08-23T10:12:02 1755943922

You're confounding hallucination in humans, which is a purely sensory experience, and hallucinations in LLMs, which seems to be used to describe every kind of mistake and deficiency in LLMs.

And even if we were to cure schizophrenia in humans, just what makes you think that it would apply to LLMs? Having an extremely weak conceptual model of the world and not being able to reason out of rather simple problems (like LLMs struggle with) isn't schizophrenia.

This oversimplified explanation which posits that neural networks are just like human brains has truly gone too far now.

> Mathematically speaking it's possible to know how far away an inferenced point is away from a cluster of real world data.

And mathematically speaking, how would you accomplish this? As you probably know LLMs don't operate on conceptual ideas, they operate on tokens. That's why LLMs tend to fail when asked to do things that aren't well represented in their training data, they don't have a working model of the world even if they can fake it to a certain degree.

ninetyninenine · 2025-08-23T12:05:31 1755950731

No there is no confounding. When you hallucinate with schizophrenia you know things that are not true and you sense things that are not true. The hallucinations involve both sensory and knowledge.

A weak conceptual model of the world is the problem. But realize humans also have a weak conceptual model of the world as well and make a bunch of hallucinations based on that weak model. For example many people are still making the claim about LLMs that it’s all stochastic parroting when it’s been proven that it’s not. That is an hallucination. Or the people betting (and not) on the financial success of crypto or AI. We don’t know how either of these things will pan out but people on either team act as if they know definitively. A huge part of human behavior is driven by hallucinations that fill in gaps.

> And mathematically speaking, how would you accomplish this? As you probably know LLMs don't operate on conceptual ideas, they operate on tokens. That's why LLMs tend to fail when asked to do things that aren't well represented in their training data, they don't have a working model of the world even if they can fake it to a certain degree.

It’s not an incorrect model of the world as technically both you and an LLM ultimately have an incorrect model of the world and both you and the LLM fake it. The best you can say is that the LLM has a less accurate approximation of the world than you but ultimately both you and the LLM hold an incorrect model and both you and the LLM regularly hallucinate off of it. You also make up bullshit on things not well represented in your own model.

But like I said we are often (and often not) aware of our own bullshit so providing that to the LLM quantitatively will help it too.

The LLM is not just trained on random tokens it’s trained on highly specific groups of tokens and those groups of represent conceptual ideas. So an LLM is 100 percent trained on concepts and tokens are only an encoding of that concept.

If a group of tokens represents a vector then we can for sure calculate distance between vectors. We know that there are also different types of vectors represented at each layer of the feed forward network that encode reasoning and not just the syntactic order of the tokens.

Like literally there is not very much training data of a human giving instructions to someone to write code and the associated code diff. The fact that an LLM can do this to a useable degree without volumes of similar training data speaks to the fact it knows concepts. This is the same tired argument that has been proven wrong. We already know LLMs aren’t just parroting training data as the majority of the agentic coding operations we currently use LLMs for actually don’t have associated training data to copy.

Given that we know all of these embeddings from the training data (the model had to calculate the embeddings at one point) we can encode proximity and distance into the model via addition and subtraction of the magnitude of vectors and from this we extract a number that ascertains distance between vectors embeddings.

Imagine a best fit 2D curve through a scatter plot of data points. But at the same time that curve has a gradient color along it. Red indicates its very close to existing data points blue indicates its far. We can definitely derive and algorithm that calculates the additional “self awareness” dimension here encoded in color and this can extend to the higher dimensional encoding that is the LLM.

If an LLm is aware that the output is red or blue then it can sort of tell that if the line is blue it’s likely to be an hallucination.

dns_snek · 2025-08-23T13:24:01 1755955441

> It’s not an incorrect model of the world as technically both you and an LLM ultimately have an incorrect model of the world and both you and the LLM fake it.

I should've said that the model is "missing", not "weak" when talking about LLMs, that was my mistake. Yes I'm a human with an imperfect and in many aspects incorrect conceptual model of the world, that is true. The following aren't real examples, they're hyperbolic to better illustrate the category of errors I'm talking about.

If someone asks me "can I stare into the sun without eye protection", my answer isn't going to change based on how the question is phrased because I conceptually understand that the radiation coming from the sun (and more broadly, intense visible radiation emitted from any source) causes irreversible damage to your eyes, which is a fact stored in my conceptual understanding of the world.

However LLMs will flip flop based on tone and phrasing of your question. Asked normally, they will warn you about the dangers of staring into the sun, but if your question hints at disbelief, they might reply "No you're right, staring into the sun isn't that bad".

I also know that mirrors reflect light, which allows me to intuitively understand that staring at the sun through a mirror is dangerous without being explicitly taught that fact.

If you ask an LLM whether staring into a mirror which is pointed at the sun (oriented such that you see the sun through the mirror) is safe, they might agree that it's safe to do so, even though they "know" that staring into the sun is dangerous, and they "know" that mirrors reflect light. Presumably this is because their training data doesn't explicitly state that staring at a mirror is dangerous.

The way the question is framed can completely change their answer which betrays their lack of conceptual understanding. Those are distinctly different problems. You might say that humans do this too, but we don't call that intelligent behavior, and we tend to have a low opinion of those who exhibit this behavior often.

ninetyninenine · 2025-08-23T17:02:20 1755968540

No it doesn’t. Conceptual understanding is there. But the LLM is not obligated towards correctness. The fact that at one point it gave you the correct answer is indicative that an aspect of it understands the concept.

Like if I told it solve a complex puzzle equation not in its training data and it correctly solved that problem. We know from the low probability of arriving at that solution from random chance that the LLM must know and understand and reason to arrive at that solution.

Now you’re saying you perturb the input with some grammar changes but leave everything else the same and the LLM will now produce a wrong answer. But this doesn’t change the fact that it was able to get the right answer.

Humans can be dumb and inconsistent. LLMs can be dumb and inconsistent too. This happens to be a quirk of the LLM. But you cannot deny that it is intelligent on the sole fact that LLMs can produce output that we know for sure can only be arrived at through reasoning.

dns_snek · 2025-08-24T07:27:52 1756020472

> The fact that at one point it gave you the correct answer is indicative that an aspect of it understands the concept.

Having a conceptual understanding means that you always provide the same answer to a conceptually equivalent question. Producing the wrong answer when a question is rephrased is indicative of rote memorization.

The fact that it provided the right answer at one point is only indicative of memorization, not understanding which is precisely the difference between sometimes getting it right and always getting it right.

ninetyninenine · 2025-08-24T16:52:01 1756054321

>Having a conceptual understanding means that you always provide the same answer to a conceptually equivalent question. Producing the wrong answer when a question is rephrased is indicative of rote memorization.

False. I can lie right? I can shift. I don't need to be consistent. And I don't need to consistently understand something. I can understand something right now and suddenly not understand later. This FITS the definition of understanding a concept.

But If I gave an answer that has such a low probability of being correct, and the answer is correct, then the answer arrived at by random chance. If the answer wasn't arrived at by random chance it must be reasoning AND understanding.

The logic is inescapable.

dns_snek · 2025-08-25T08:18:56 1756109936

> I can understand something right now and suddenly not understand later. This FITS the definition of understanding a concept.

Not any definition that I would agree with, that's for sure.

ninetyninenine · 2025-08-25T12:22:07 1756124527

You must agree with it. The fact I can formulate a sentence with it indicates it fits with the colloquial definition of the word. Every human recognizes it, even you. You’re just being stubborn.

When I say I can understand something now and then not understand something later it doesn’t violate the definition of the word. Now you are making a claim that your personal definition of understanding is violated but that’s also a lie. It’s highly unlikely.

First of all death. I understand something now. Then I die, I don’t understand something later due to loss of consciousness.

Amnesia. I understand something now and I don’t understand something later due to loss of memory.

In both cases someone understood something now and didn’t later. Every human understands this conceptually. Don’t lie to my face and say you don’t agree with the definition. This is fundamental.

The act of understanding something now and then not understanding something later exists as not only some virtual construct by human language but it exists in REALITY.

What happened here is that when I pointed out the nuances of the logic you were too stubborn to reformulate your conclusion. It’s typical human behavior. Instead you are unconsciously re-scaffolding the rationale in order to fit your pre existing idea.

If you’re capable of thinking deeper you’ll be able to see what I’m in essence talking about this:

In the gap between prompt and response. The LLM is capable of understanding the prompt and capable of reasoning about the prompt. It does so on an ephemeral and momentary basis. We can’t control when it will do it and that’s the major issue. But it does do it often enough that we know the LLM has reasoning capabilities however rudimentary and inconsistent because the answer it arrives at via the prompt is too low probability to be arrived at using any other means OTHER than reasoning.