Better question: why does a simple search for “What color is a labrador retriever” require any compute time when the answer can be cached? This is a simple example, but 90% of my searches don’t require an llm to process a simple question.
One time I came across a git repo that let me download a gigabyte of prime numbers and I thought to myself, is that more or less efficient than me running a program locally to generate a gigabyte of prime numbers?
The compute for a direct answer like that is fractions of a penny, it might be better to create answers on the fly than store an index of every question anyone has asked (well, that's essentially what the weights are after all)