Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have no inside info, however, I would be shocked to find out that OpenAI does not have several knobs for load shedding across their consumer products.

Had I been responsible for implementing that, the very first thing I'd reach for is "effort". I'd dynamically remap what the various "reasoning effort" presets mean, and the thinking token budgets (where relevant).

The next thing I'd have looked to do is have smaller distillations of their flagship models - the ones used in their consumer apps - available to be served in their place.

One or both of these things being in place would explain every tweet about "why does [ChatGPT|Claude Code] feel so dumb right now?" If they haven't taken my approach, it's because they figured out something smarter. But that approach would necessarily still lead to this huge variability we all feel with using these products a lot.

(I want to reiterate I don't have any inside information, just drawing on a lot of experience building big systems with unpredictable load.)



I sort of always assumed OpenAI was constantly training the next new model.

I wonder what percent of compute goes towards training vs. inference. If it’s a meaningful percent, you could possibly dial down training to make room for high inference load (if both use the same hardware).

I also wouldn’t be surprised if they’re overspending and throwing money at it to maximize the user experience. They’re still a high growth company that doesn’t want anything to slow it down. They’re not in “optimize everything for profit margin” mode yet.


Agreed. I wasn't necessarily thinking for cost optimization, simply for capacity purposes. Whether because they're using a bunch themselves (like you're saying, via training for example), or otherwise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: