Isn't the point of automating labour though to automate that which is not/was already not automated?
It would draw on many previously written examples of algorithms to write the code for solving Hanoi. To solve a novel problem with tool use, one needs to work sequentially while staying on task, notice where you've gone wrong, and backtrack.
I don't want to overstate the case here, I'm sure there is work where there's enough intersection between previously existing stuff in the dataset and few enough sequential steps required that useful work can be done, but idk how much you've tried using this stuff as a labour saving device, there's less low hanging fruit than one might think, but more than zero.
There is a decent labour savings to be had in code generation, but under strict guidance with examples.
There's a more substantial savings to be had in research scenarios. The AI can read more and synthesize more, and faster, than I can on my own, and provide references for checking correctness.
I'm not confident enough to say that the approaches being taken now have a hard stopping point any time soon or are inherently bound to a certain complexity.
Human minds can only cope with a certain complexity too and need abstraction to chunk details into atomic units following simpler rules. Yet we've come a long way with our limited ability to cope with complexity.
What current models can automate is why they are exciting, and the attention the paper is getting because of how it cuts into this excitement. It follows logically that the attention is somewhat misplaced.
A LLM with tool use can solve anything. It is interesting to try and measure its capabilities without tools.