> Inference cost are in free fall. The recent optimizations from DeepSeek means ...

> Inference cost are in free fall. The recent optimizations from DeepSeek means that all the available GPUs could cover a demand of 10k tokens per day from a frontier model for… the entire earth population. There is nowhere this level of demand. The economics of selling tokens does not work anymore for model providers: they have to move higher up in the value chain.

I've been using Cline so I can understand the pricing of these models and it's insane how much goes into input context + output. My most recent query on openrouter.ai was 26,098 input tokens -> 147 output tokens. I'm easily burning multiple dollars an hour. Without a doubt there is still demand for cheaper inference.