Gemini likely uses something based on RingAttention to achieve its long context ...

		Centigonal 9 months ago \| parent \| context \| favorite \| on: The Llama 4 herd Gemini likely uses something based on RingAttention to achieve its long context sizes. This requires massive inference clusters, and can't be the same approach llama4 is using. Very curious how llama4 achieves its context length.