No. They're saying that the model they analyzed used mainly information on _how_...

spitfire · on Dec 1, 2024

What I further got from this is the models are learning the methods, but not evaluating themselves along the way. They don’t check for errors.

So once they go down a path they can’t properly backtrack.

This feels like the ground truth I’ve experienced in LLMs to date.

spitfire · on Dec 1, 2024

I’ll add when I say “learning” I mean memorization. Memorizing on a higher level than facts.

I would love to spend the time and see how altering the query alters the reasoning path. How firm is in the path once it’s chosen?

A high level approach has the possibility to be very computer efficient.

Nevermark · on Dec 2, 2024

> Memorizing on a higher level than facts

Which is not memorization, since memorization is defined by its limits: storing information based on its literal form, as apposed to some higher meaning.

It is called generalization. Learning specific examples with a shared memory too small to memorize all the examples, creates a gradient toward a more compact storage form: patterns. Which, unlike memorized examples, are able to generate reasonable guesses for similar but previously unencountered problems.

Generalization does not require reasoning, nor is it required for reasoning. But they often complement each other.

Where reasoning usually means some kind of flexible application of multiple steps. I.e. a sequence of steps, trying alternative steps, stepping forward to a solution, stepping back from the goal, accumulation of ever larger solved subsets or substeps of the problem, etc.

NitpickLawyer · on Dec 1, 2024

> So once they go down a path they can’t properly backtrack.

That's what the specific training in o1 / r1 / qwq are addressing. The model outputs things like "i need to ... > thought 1 > ... > wait that's wrong > i need to go back > thought 2 > ... etc