Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I am confused what actually happens in the vectorized ADD and MULT instructions in the GPU with these quantized numbers.

I might be wrong, but I think LLM is all about comparing distance between tokens. You can tell that -255 and +255 are very separated, but you are also away that -8 and +8 are also very far away.

Microsoft Bitnet and Google TurboQuant shows that in extreme you can use just -1, 0, +1



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: