That's also something I often encounter with ChatGPT. It often is very wrong about things I ask, because I often wonder and try things off the beaten path. That's our advantage against LLMs.
No the advantage is that you have context. That context is HUGE. That's why you can't and won't be able to have chat gpt actually do your job for you. You still need to ask it questions. And to even know what questions you need to ask you need to have context.
You don't just need a context, you need to finetune a model around that context since working memory isn't enough. That is what human workers do, they spend the first months finetuning their model around the company context before they can start to provide valuable code.
Once you can pay to get your own fine tuned version of the big LLMs maybe we can start to do real work with these.
However my point is that having the context and the ability to use it is sooooooo far out of reach for a computer it's unfathomable. In much the same way how life is unfathomable at some level. It's this self referencing infinity that is somehow collapsing into results while being uncollapseable.
Computers can't do that not without something fundamentally different
Purely probabilistically, trying things off the beaten path is just a matter of higher LLM temperature. Turning up GPT-4's temperature is basically an expensive /dev/urandom pipe today, but I don't see any fundamental reason why LLMs can't catch up. Maybe all it takes is tinkering with how temperature is calculated.
You've missed the metaphor, I think. Higher temperatures will make it "more creative" for lack of a better term, but there's a lot of specialist knowledge it doesn't have and which you can't give it just by twiddling a dial.
It has a massive speed advantage that lets it read the whole internet, but it's dumb enough that it also needs to, and when even that doesn't give it enough examples, it's like asking me to improvise Latin based on what I recognise of the loanwords in English.