> LLMs are perfectly capable of writing code to solve problems that are not in their training set.
Examples of these problems? You'll probably find that they're simply compositions of things already in the training set. For example, you might think that "here's a class containing an ID field and foobar field. Make a linked list class that stores inserted items in reverse foobar order with the ID field breaking ties" is something "not in" the training set, but it's really just a composition of the "make a linked list class" and "sort these things based on a field" problems.
>> Ie: a human with no experience of board games cannot reason about chess moves. A human with no math knowledge cannot reason about math problems.
Then how did the first humans solve math and chess problems, if there were none around solved to give them examples of how to solve them in the first place?
Incrementally, by tiny steps. Including a lot of doing first, then realizing later this is relevant to some chess/math thing.
Also the idea of "problems" like "chess problems" and "math problems" is itself constructed. Chess wasn't created by stacking together enough "chess problems" until they turned into a game - it was invented and tuned as a game for a long time before someone thought about distilling "problems" from it, in order to aid learning the game; from there, it also spilled out into space of logical puzzles in general.
This is true of every skill, too. You first have people who master something by experience, and then you have others who try to distill elements of that skill into "problems" or "exercise regimes" or such, in order to help others reach mastery quicker. "Problems" never come first.
Also: most "problems" are constructed around a known solution. So another answer to "how did the first humans solve" them is simply, one human back-constructed a problem around a solution, and then gave it to a friend to solve. The problem couldn't be too hard either, as it's no fun to not be able to solve it, or to require too much hints. Hence, tiny increments.
The problem with this is that anything presented can be claimed to be in the training set, which is likely a zetebyte in size if not larger. However the counter-factual, the LLM failing a problem that is provably in it's training set (there are many), seems to carry no weight.
Examples of these problems? You'll probably find that they're simply compositions of things already in the training set. For example, you might think that "here's a class containing an ID field and foobar field. Make a linked list class that stores inserted items in reverse foobar order with the ID field breaking ties" is something "not in" the training set, but it's really just a composition of the "make a linked list class" and "sort these things based on a field" problems.