> I like to imagine certain billion/trillion-dollar mega corps had a back-room say regarding things that they would really prefer OpenAI's models not be able to emit. Microsoft is a big stakeholder and they might not want to get sued... Liability could explain a lot of it.
I don't think it's any of these things.
OpenAI and the company I work for have a very similar problem: the workload shape and size for a query, isn't strictly determined by any analytically-derivable rule regarding any "query compile-time"-recognizable element of the query; but rather is determined by the shape of connected data found during initial steps of something that can be modelled as a graph search, done inside the query. Where, for efficiency, that search must be done "in-engine", fused to the rest of the query — rather than being separated out and done first on its own, such that its results could be legible to the "query planner."
This paradigm means that, for any arbitrary query you haven't seen before, you can't "predict spend" for that query — not just in the sense of charging the user, but also in the sense that you don't know how much capacity you'll have to reserve in order to be able to schedule the query and have it successfully run to completion.
Which means that sometimes, innocuous-looking queries come in, that totally bowl over your backend. They suck up all the resources you have, and run super-long, and maybe eventually spit out an answer (if they don't OOM the query-runner worker process first)... but often this answer takes so long that the user doesn't even want it any more. (Think: IDE autocomplete.) In fact, maybe the user got annoyed, and refreshed the app; and since you can't control exactly how people integrate with your API, maybe that refresh caused a second, third, Nth request for the same heavyweight query!
What do you do in this situation? Well, what we did, is to make a block-list of specific data-values for parameters of queries, that we have previously observed to cause our backend to fall over. Not because we don't want to serve these queries, but because we know we'll predictably fail to serve these queries within the constraints that would make them useful to anyone — so we may as well not spend the energy trying, to preserve query capacity for everyone else. (For us, one of those constraints is a literal time-limit: we're behind Cloudflare, and so if we take longer than 100s to respond to a [synchronous HTTP] API call, then Cloudflare disconnects and sends the client a 524 error.)
"A block-list of specific data-values for parameters of queries" probably won't work for OpenAI — but I imagine that if they trained a text-classifier AI on what input text would predictably result in timeout errors in their backend, they could probably achieve something similar.
In short: their query-planner probably has a spam filter.
I don't think it's any of these things.
OpenAI and the company I work for have a very similar problem: the workload shape and size for a query, isn't strictly determined by any analytically-derivable rule regarding any "query compile-time"-recognizable element of the query; but rather is determined by the shape of connected data found during initial steps of something that can be modelled as a graph search, done inside the query. Where, for efficiency, that search must be done "in-engine", fused to the rest of the query — rather than being separated out and done first on its own, such that its results could be legible to the "query planner."
This paradigm means that, for any arbitrary query you haven't seen before, you can't "predict spend" for that query — not just in the sense of charging the user, but also in the sense that you don't know how much capacity you'll have to reserve in order to be able to schedule the query and have it successfully run to completion.
Which means that sometimes, innocuous-looking queries come in, that totally bowl over your backend. They suck up all the resources you have, and run super-long, and maybe eventually spit out an answer (if they don't OOM the query-runner worker process first)... but often this answer takes so long that the user doesn't even want it any more. (Think: IDE autocomplete.) In fact, maybe the user got annoyed, and refreshed the app; and since you can't control exactly how people integrate with your API, maybe that refresh caused a second, third, Nth request for the same heavyweight query!
What do you do in this situation? Well, what we did, is to make a block-list of specific data-values for parameters of queries, that we have previously observed to cause our backend to fall over. Not because we don't want to serve these queries, but because we know we'll predictably fail to serve these queries within the constraints that would make them useful to anyone — so we may as well not spend the energy trying, to preserve query capacity for everyone else. (For us, one of those constraints is a literal time-limit: we're behind Cloudflare, and so if we take longer than 100s to respond to a [synchronous HTTP] API call, then Cloudflare disconnects and sends the client a 524 error.)
"A block-list of specific data-values for parameters of queries" probably won't work for OpenAI — but I imagine that if they trained a text-classifier AI on what input text would predictably result in timeout errors in their backend, they could probably achieve something similar.
In short: their query-planner probably has a spam filter.