But given the option, do you choose bigger models or more reasoning? Or medium o...

paladin314159 · 2025-08-07T19:19:24 1754594364

If you need world knowledge, then bigger models. If you need problem-solving, then more reasoning.

But the specific nuance of picking nano/mini/main and minimal/low/medium/high comes down to experimentation and what your cost/latency constraints are.

impossiblefork · 2025-08-07T19:14:58 1754594098

I would have to get experience with them. I mostly use Mistral, so I have only the choice of thinking or not thinking.

gunalx · 2025-08-07T20:46:38 1754599598

Mistral also has small medium and large. With both small and medium håving a thinking one, devstral codestral ++

Not really that mich simpler.

impossiblefork · 2025-08-07T20:49:19 1754599759

Ah, but I never route to these manually. I only use LLMs a little bit, mostly to try to see what they can't do.

namibj · 2025-08-07T19:05:49 1754593549

Depends on what you're doing.

addaon · 2025-08-07T19:08:09 1754593689

> Depends on what you're doing.

Trying to get an accurate answer (best correlated with objective truth) on a topic I don't already know the answer to (or why would I ask?). This is, to me, the challenge with the "it depends, tune it" answers that always come up in how to use these tools -- it requires the tools to not be useful for you (because there's already a solution) to be able to do the tuning.

wongarsu · 2025-08-07T20:06:10 1754597170

If cost is no concern (as in infrequent one-off tasks) then you can always go with the biggest model with the most reasoning. Maybe compare it with the biggest model with no/less reasoning, since sometimes reasoning can hurt (just as with humans overthinking something).

If you have a task you do frequently you need some kind of benchmark. Which might just be comparing how good the output of the smaller models holds up to the output of the bigger model, if you don't know the ground truth

Breza · 2025-08-15T03:51:05 1755229865

I agree. Public benchmarks aren't very useful for a bunch of reasons. Any company relying on LLMs for a critical function should have its own internal benchmark system. I maintain such a system for my job. If you are able, use the same prompt every time. It's fun to be able to include models like the original Bard on our leader board.