Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

With 400 billion parameters and 8 bits per parameter, wouldn't it be ~400 GB? Plus context size which could be quite large.


he said "Q4" - meaning 4-bit weights.


Ok but at 16-bit it would be 800GB+, right? Not 512.


Divide not multiply. If a size is estimated in 8-bit, reducing to 4-bit halves the size (and entropy of each value). Difference between INT_MAX and SHORT_MAX (assuming you have such defs).

I could be wrong too but that’s my understanding. Like float vs half-float.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: