https://ai.meta.com/blog/meta-llama-3/ has it about a third of the way down. It's a little bit better on every benchmark than Mixtral 8x22B (according to Meta).
Llama 3 70B takes half the VRAM as Mixtral 8x22B. But it does need almost twice the FLOPS/bandwidth. Yes, Llama's context is smaller although that should be fixable in the near future. Another thing is that Llama is English-focused while Mixtral is more multilingual.