Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>with tool use

A LLM with tool use can solve anything. It is interesting to try and measure its capabilities without tools.



I don't think the first is true at all, unless you imagine some powerful oracle tools.

I think the second is interesting for comparing models, but not interesting for determining the limits of what models can automate in practice.

It's the prospect of automating labour which makes AI exciting and revolutionary, not their ability when arbitrarily restricted.


Isn't the point of automating labour though to automate that which is not/was already not automated?

It would draw on many previously written examples of algorithms to write the code for solving Hanoi. To solve a novel problem with tool use, one needs to work sequentially while staying on task, notice where you've gone wrong, and backtrack.

I don't want to overstate the case here, I'm sure there is work where there's enough intersection between previously existing stuff in the dataset and few enough sequential steps required that useful work can be done, but idk how much you've tried using this stuff as a labour saving device, there's less low hanging fruit than one might think, but more than zero.


There is a decent labour savings to be had in code generation, but under strict guidance with examples.

There's a more substantial savings to be had in research scenarios. The AI can read more and synthesize more, and faster, than I can on my own, and provide references for checking correctness.

I'm not confident enough to say that the approaches being taken now have a hard stopping point any time soon or are inherently bound to a certain complexity.

Human minds can only cope with a certain complexity too and need abstraction to chunk details into atomic units following simpler rules. Yet we've come a long way with our limited ability to cope with complexity.


Search is already a pretty powerful oracle to defer an answer to a human and is a common tool most AI use today.

What current models can automate is not what the paper was trying to answer.


What current models can automate is why they are exciting, and the attention the paper is getting because of how it cuts into this excitement. It follows logically that the attention is somewhat misplaced.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: