Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Strix Halo can only allocate 96GB RAM to the GPU. So GPT-OSS 120B can be ran only at Q6 at best (but activations would need to be partially stored in the CPU mem then).


It can use only 96GB RAM on Windows, on Linux people have allocated up to 120GB. Here's one source: https://www.reddit.com/r/LocalLLaMA/comments/1nmlluu/comment...


GPT-OSS 120B uses native 4 bit representation, so it fits fine.


I bet you're confusing VRAM (the old fixed thing) and GTT (dynamic) memory allocation. Linux amdgpu does GTT just fine. amdgpu_top is an example monitoring app that shows them separately.

More: https://news.ycombinator.com/item?id=44859582


>Strix Halo can only allocate 96GB RAM to the GPU.

Are you referring to exclusive or shared allocation? I think shared allocation allows using all available memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: