In my experience with deepseek and o1, openai's big talk about (and investment i...

flir · 2025-01-28T14:26:09 1738074369

My experience gels with yours. Given the same code sample, DeepSeek has better, more creative suggestions about how to improve it, but it can't implement them without breaking the code. o1, generally, can implement DeepSeek's suggestions successfully. I think chaining them together might have quite interesting results.

wordpad25 · 2025-01-28T15:13:30 1738077210

Is there a tool that can automate chaining like that?

throwup238 · 2025-01-28T15:18:50 1738077530

Aider has an architect mode where it asks one model to plan out the changes and another to actually write the code.

manmal · 2025-01-28T22:33:24 1738103604

I've used it today, with R1 as architect and Sonnet as editor model. So far, this works great. There's no need to use a reasoning model as editor IMO.

Alex (https://alexcodes.app) also does this now btw.

HarHarVeryFunny · 2025-01-28T15:30:22 1738078222

That's ok if all you want to know is which model should I use today, but a test like that is totally dependent on training data, and there is no reason to expect that either DeepSeek-V3 (the base model for R1) or the additional training data for R1 is that same as what OpenAI used for O1 and whatever base model it was built on.

The benchmark comparisons are perhaps, for now, the best way to compare reasoning prowess of R1 vs O1, since it seems pretty certain they both trained for those cases.

I think the real significance of R1 isn't the released model/weights itself, but more the paper detailing (sans training data) how to replicate it, and how effective "distillation" (i.e. generate synthetic reasoning data for SFT) can be to enhance reasoning even without using RL.