GPT-4 already seems better at reasoning than most people. It just has an unusual...

havercosine · on May 14, 2024

I was going to say the same thing. For some real world estimation tasks where I don't want 100% accuracy (example: analysing working capital of a business based on balance sheet, analysing some images and estimating inventory etc.) the job done by GPT-4o is better than fresh MBA graduates from tier 2/tier 3 cities in my part of world.

Job seekers currently in college have no idea what is about to hit them in 3-5 years.

j-krieger · on May 14, 2024

I agree. HN's and the tech bubble's bias many people are not noticing is that it's full of engineers comparing GPT-4 to software engineering tasks. In programming, the margin of error is incredibly slim in the way that a compiler either accepts entirely correct code (in its syntax of course) or rejects it. There is no in between, and verifying software to be correct is hard.

In any other industry where just need an average margin of error close to a human's work and verification is much easier than generating possible outputs, the market will change drastically.

happypumpkin · on May 14, 2024

On the other hand, programming and software engineering data is almost certainly over-represented on the internet compared to information from most professional disciplines. It also seems to be getting dramatically more focus than other disciplines from model developers. For those models that disclose their training data, I've been seeing decent sized double-digit percentages of the training corpus being just code. Finally, tools like copilot seem ideally positioned to get real-world data about model performance.

jahnu · on May 14, 2024

I’d love to see this! Can you give us a couple of concrete examples of this that we can check?

golol · on May 14, 2024

not really. Even a human bad at reasoning can take 1 hour of time to tinker around and figure things out. GPT-4 just does not have the deep planning/reasoning ability necessary for that.

theshrike79 · on May 14, 2024

Have you seen some people with technology? =)

They won't "take 1 hour of time", they try it once or twice and give up.

lynx23 · on May 14, 2024

I think you might be falling for selection bias. I guess you are surrounding yourself with a lot of smart people. "tinker around and figure things out" is definitely something certain humans (bad at reasoning) can't do. I already prefer the vision model when it comes to asking for a picture description (blind user) over many humans I personally know. The machine is usually more detailed, and takes the time to read the text, instead of trying to shortcut and decide for me whats important. Besides, people from the english speaking countries do not have to deal with foreign languages. Everyone else has to. "Aber das ist ja in englisch" is a common blocker for consuming information around here. I tell you, if we dont manage to ramp up education a few notches, we'll end up with even higher stddev when it comes to practical intelligence. We already have perfectly normal seeming humans absolutely unable to participate on the internet.

trashtester · on May 14, 2024

Reasoning and planning are different things. It's certainly getting quite good at deductive reasoning, especially when forced to check it's own arguments for flaws every time it states something. (I had a several hour chat with it yesterday, and I was very impressed about the progress.)

Planning is different in that it is an essential part of agency. That's what Q* is supposed to add. My guess is that planning is the next type of functionality to be added to GPT. I wouldn't be surprised if they already have a version internally with such functionality, but that they've decided to hold it back for now for reasons such as safety (some may care about the election this year) or simply that the inference costs are so huge they cannot possibly expose it publicly.

__MatrixMan__ · on May 14, 2024

Does it need those things if it can just tap into artifacts generated by humans who did spend that hour?

ben_w · on May 14, 2024

The only reason I still have a job is that it can't (yet) take full advantage of artefacts generated by humans.

"Intern of all trades, senior of none", to modernise the cliché.

bamboozled · on May 14, 2024

If everyone is average at reasoning then it must not be a very important trait or we’d all be at reasoning school getting better at it.

Really philosophy seems to be one of the least important subjects right now. Hardly anyone learns about it in school.

If it was so important to success in the wild than it would stand to reason we all work hard at improving our reasoning skills, but very few do.

ben_w · on May 14, 2024

What schools teach is what governments who set the curriculum like to think is important, which is why my English lessons had a whole section on the Shakespearean (400-year-old, English, Christian) take on the life and motivations of a Jewish merchant living in Venice, followed up with a 80 year old (at the time) English poem on exactly how bad it is to watch your friends choke to death as their lungs melt from chlorine gas in the trenches of the first world war.

These did not provide useful life-lessons for me.

(The philosophy A-level I did voluntarily seemed to be 50% "can you find the flaws in this supposed proof of the existence of god?")

andsoitis · on May 14, 2024

> These did not provide useful life-lessons for me.

Shakespeare is packed with insight.

ben_w · on May 14, 2024

None of the stuff we did at school showed any indication of insight into things of relevance to our world.

If I took out a loan on the value of goods being shipped to me, only for my ship to be lost at sea… it would be covered by insurance, and no bank would even consider acting like Shylock (nor have the motivation of being constantly tormented over religion) for such weird collateral, and the bank manager's daughters wouldn't get away with dressing up as lawyers (no chance their arguments would pass the sniff test today given the bar requirement) to argue against their dad… and they wouldn't need to because the collateral would be legally void anyway and rejected by any court.

The ships would also not then suddenly make a final act appearance to apologise for being late, to contradict the previous belief they were lost at sea, because we have radio now.

The closest to "relevant" that I would accept, is the extent to which some of the plots can be remade into e.g. The Lion King or Wyrd Sisters — but even then…

"Methinks, to employeth antiquated tongues doth render naught but confusion, tis less even than naughty, for such conceits doth veil true import with shadows."