Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When running inference workloads via something like llama.cpp, only 1 GPU is ever used at a time, so you would have 1 active GPU and 4 idle GPUs. That should make the power usage less insane in practice than you expect.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: