Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not sure, I did not run any benchmarks. As a ballpark figure -- with both cards throttled down to 250W, running a Qwen-30B FP8 model (variant depending on task), I get upwards of 60 tok/sec. It feels on par with the premium models, tbh.

Of course this is in a single-user environment, with vLLM keeping the model warm.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: