You're right, that's a good point. It is possible to make a model dumber via qua...

You're right, that's a good point. It is possible to make a model dumber via quantization.

But even F16 -> llama.cpp Q4 (3.8 bits) has negligible perplexity loss.

Theoratically, a leading AI lab could quantize absurdly poorly after the initial release where they know they're going to have huge usage.

Theoratically, they could be lying even though they said nothing changed.

At that point, I don't think there's anything to talk about. I agree both of those things are theoratically possible. But it would be very unusual, 2 colossal screwups, then active lying, with many observers not leaking a word.