There is no “one size fits all” here. A bigger model is just a bigger hammer, that in many uses is too bulky and slow to be a proper solution.
At my job, I can’t casually fire up 8xA100 80gb instances. And if I could, the performance wouldn’t have the throughput I require to be useful. Big models are operationally much more expensive.
The smallest/fastest model that is accurate enough for your use case is ideal.
At my job, I can’t casually fire up 8xA100 80gb instances. And if I could, the performance wouldn’t have the throughput I require to be useful. Big models are operationally much more expensive.
The smallest/fastest model that is accurate enough for your use case is ideal.