Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Gemini likely uses something based on RingAttention to achieve its long context sizes. This requires massive inference clusters, and can't be the same approach llama4 is using. Very curious how llama4 achieves its context length.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: