Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No. They're saying that the model they analyzed used mainly information on _how_ to solve math problems from its training data, rather than documents that contained the answers to the (identical) math problems:

> "We investigate which data influence the model’s produced reasoning traces and how those data relate to the specific problems being addressed. Are models simply ‘retrieving’ answers from previously seen pretraining data and reassembling them, or are they employing a more robust strategy for generalisation?"

> "When we characterise the top ranked documents for the reasoning questions qualitatively, we confirm that the influential documents often contain procedural knowledge, like demonstrating how to obtain a solution using formulae or code. Our findings indicate that the approach to reasoning the models use is unlike retrieval, and more like a generalisable strategy that synthesises procedural knowledge from documents doing a similar form of reasoning."

Example reasoning question: > "Prompt Calculate the answer: (7 - 4) * 7 Think step-by-step."



What I further got from this is the models are learning the methods, but not evaluating themselves along the way. They don’t check for errors.

So once they go down a path they can’t properly backtrack.

This feels like the ground truth I’ve experienced in LLMs to date.


I’ll add when I say “learning” I mean memorization. Memorizing on a higher level than facts.

I would love to spend the time and see how altering the query alters the reasoning path. How firm is in the path once it’s chosen?

A high level approach has the possibility to be very computer efficient.


> Memorizing on a higher level than facts

Which is not memorization, since memorization is defined by its limits: storing information based on its literal form, as apposed to some higher meaning.

It is called generalization. Learning specific examples with a shared memory too small to memorize all the examples, creates a gradient toward a more compact storage form: patterns. Which, unlike memorized examples, are able to generate reasonable guesses for similar but previously unencountered problems.

Generalization does not require reasoning, nor is it required for reasoning. But they often complement each other.

Where reasoning usually means some kind of flexible application of multiple steps. I.e. a sequence of steps, trying alternative steps, stepping forward to a solution, stepping back from the goal, accumulation of ever larger solved subsets or substeps of the problem, etc.


> So once they go down a path they can’t properly backtrack.

That's what the specific training in o1 / r1 / qwq are addressing. The model outputs things like "i need to ... > thought 1 > ... > wait that's wrong > i need to go back > thought 2 > ... etc




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: