Well, the statement that GPT-4 is 1.8T parameters is a little misleading since it's really a 8 x 220B MoE (according to the rumors at least).
Also the size of the model itself isn't the only factor that determines performance, LLama 3 70B outperforms LLama 2 70B even though they have the same size.
Also the size of the model itself isn't the only factor that determines performance, LLama 3 70B outperforms LLama 2 70B even though they have the same size.