> Every 10x increase in model size requires 10x more power
Does it? I’ll be the first to admit I am so far behind on this area, but isn’t this assuming the hardware isn’t improving over time as well? Or am I missing the boat here?
Hardware isn’t improving exponentially anymore, especially not on the flops/watt metric.
That’s part of what motivated the transition to bfloat16 and even smaller minifloat formats, but you can only quantize so far before you’re just GEMMing noise.
Does it? I’ll be the first to admit I am so far behind on this area, but isn’t this assuming the hardware isn’t improving over time as well? Or am I missing the boat here?