Right on, they're closing in on "Open"AI's best models. Can this still be run on...

stellaathena · on Feb 2, 2022

It can be run on an A40 or A6000, as well as the largest A100s. But other than that, no.

bm-rf · on Feb 2, 2022

You could use Microsoft's DeepSpeed to run the model for inference on multiple GPUS, see https://www.deepspeed.ai/tutorials/inference-tutorial/

djoldman · on Feb 2, 2022

How much VRAM does it use during inference?

stellaathena · on Feb 2, 2022

~40 GB with standard optimization. I suspect you can shrink it down more with some work, but it would require significant innovation to cram it into the next largest common chip size (24 GB, unless I’m misremembering)

komuher · on Feb 2, 2022

Is 40GB already on float16?

stellaathena · on Feb 5, 2022