Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

800m is good for mobile, 8b for graphics cards.

Bigger than that is also possible, not saturated yet but need more GPUs.



Do you know how the memory demands compare to LLMs at the same number of parameters? For example, Mistral 7B quantized to 4 bits works very well on an 8GB card, though there isn’t room for long context.


you ca also quantisation which lowers memory requirements at a small lose of performance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: