Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Right on, they're closing in on "Open"AI's best models. Can this still be run on a GPU, or does it require a lot more VRAM?


It can be run on an A40 or A6000, as well as the largest A100s. But other than that, no.


You could use Microsoft's DeepSpeed to run the model for inference on multiple GPUS, see https://www.deepspeed.ai/tutorials/inference-tutorial/


How much VRAM does it use during inference?


~40 GB with standard optimization. I suspect you can shrink it down more with some work, but it would require significant innovation to cram it into the next largest common chip size (24 GB, unless I’m misremembering)


Is 40GB already on float16?


Yes




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: