Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We recently trained GPT-3 (SMALL) at work on our GPU cluster for fun, took 4 days across a couple dozen machines...

Millions of dollars in CAPEX and OPEX just for one model



I'm curious: Did you get comparable results to openai? I know a few people tried to train GPT-2 themselves (before it was openly released) and their results were quite inferior.


You're saying your project costed millions of dollars, or the big boys' projects did?


If "4 days across a couple dozen machines" cost millions, something is very wrong.


Not if it was a couple dozen of these machines:

https://www.hardwarezone.com.sg/tech-news-nvidia-dgx-a100-su...


Running your own DC is quite expensive with GPU hardware. One DGX-2 is $400k and draws something like 24kW.


> draws something like 24 kW.

That number is off. The DGX-2 consumes 10 kW at peak [0] and the DGX-2H consumes 12 kW at peak [1].

[0] https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Cent...

[1] https://www.nvidia.com/content/dam/en-zz/es_em/Solutions/Dat...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: