Has anyone prepared a comparison to Mixtral 8x22B? (Life sure moves fast.)

Davidzheng · on April 18, 2024

it's in the official post the comparison with Mixtral 8x22B

cjbprime · on April 18, 2024

Where? I only see comparisons to Mistral 7B and Mistral Medium, which are totally different models.

gs17 · on April 18, 2024

https://ai.meta.com/blog/meta-llama-3/ has it about a third of the way down. It's a little bit better on every benchmark than Mixtral 8x22B (according to Meta).

cjbprime · on April 19, 2024

Oh cool! But at the cost of twice the VRAM and only having 1/8th of the context, I suppose?

modeless · on April 20, 2024

Llama 3 70B takes half the VRAM as Mixtral 8x22B. But it does need almost twice the FLOPS/bandwidth. Yes, Llama's context is smaller although that should be fixable in the near future. Another thing is that Llama is English-focused while Mixtral is more multilingual.

pzo · on April 18, 2024

also curious how it compares to WizardLM 2 8x22B