Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
f38zf5vdt
on Feb 2, 2022
|
parent
|
context
|
favorite
| on:
Announcing GPT-NeoX-20B
Right on, they're closing in on "Open"AI's best models. Can this still be run on a GPU, or does it require a lot more VRAM?
stellaathena
on Feb 2, 2022
[–]
It can be run on an A40 or A6000, as well as the largest A100s. But other than that, no.
bm-rf
on Feb 2, 2022
|
parent
|
next
[–]
You could use Microsoft's DeepSpeed to run the model for inference on multiple GPUS, see
https://www.deepspeed.ai/tutorials/inference-tutorial/
djoldman
on Feb 2, 2022
|
parent
|
prev
[–]
How much VRAM does it use during inference?
stellaathena
on Feb 2, 2022
|
root
|
parent
[–]
~40 GB with standard optimization. I suspect you can shrink it down more with some work, but it would require significant innovation to cram it into the next largest common chip size (24 GB, unless I’m misremembering)
komuher
on Feb 2, 2022
|
root
|
parent
[–]
Is 40GB already on float16?
stellaathena
on Feb 5, 2022
|
root
|
parent
[–]
Yes
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: