I can't find how much it will cost or how much power it will use. I mean it will be a lot and maybe only Google and Microsoft and Facebook can afford it but I still want to know.
The DGX A100 was $200k at launch. I found a DGX H100 in the mid $300k area. And those are 8 GPU systems. So you need 32 of those, and each one will definitely cost more plus networking. Super low estimate would be $500k each for $16M total. But considering its moving from 98GB to 480GB RAM per GPU. Might be more like $1.5M per 8, round it to say $50M.
And at 1/8th the power per GB, you have 700 Watts / 96GB / 8 * 480GB come to around 450 Watts per. And 115kw for the 256.
What does this mean for the AI race? For example what if a newish company (newer than Google/Facebook/Microsoft/etc.) like Anthropic, Scale, Perplexity, or Stability is able to scrape together $5B USD funding and spend their hardware budget on these things. Say that can buy $1B of them and spend the rest on hackers and operating expenses (idk if that's realistic). So maybe they could purchase and operate like 20 of them. Say that they spend six months doing experimental things and then the next six months training their Tsar Model. If they follow the Chinchilla scaling laws and normal architectures, how good will these models be?
I have no expertise in GPU System used for AI Learning, but would It be possible to buy a bunch of consumer cards and get the same performance?
Or is this not possible because consumer cards go to 40 ish GB RAM and Models would not fit or „swapping“ like crazy and be slow.
Consumer cards only have PCIe 4.0, at most 24GB VRAM and the only recent model with NVLink, the RTX 3090, can only be connected to exactly one other card. It doesn't scale beyond that. So you are limited to PCIe 4.0 x16 speeds.
The NVLink interconnect on all the GPUs is a huge part of it, and cannot come even remotely close to that bandwidth with consumer goods. Then the density of RAM to compute and power is huge. A single 4090 is 450 watts, for 24GB where this is 20x the memory for the same watts. 2.3Mw or so. If you say $0.14 / kwh, thats something like $325 / hour in power costs to run. Not counting additional cooling you are definitely going to need. And I am sure there is inefficiency this doesn't cover but 240v 10,000+ Amps for that?
It’s in interest of Nvidia, to try to make sure they do not end up in a situation where they have a small group of very big customer that buy a large slice of their production. For Nvidia a market of the same size, with many small to medium customer is a lot better as those customers will have a lot less power to force Nvidia to do something that isn't in it's interest or it does not want.
I expect to see moves from Nvidia to help smaller players, open source or semi open models to not be crushed by the big players. Not because they are nice, but because it is in their best interest.
When we first got our induction cooktop I was so excited about how ridiculously fast we could boil water. Which is definitely an odd thing to get excited about. It definitely isn't that powerful though. That's a lot of power.
For a single Grace+Hopper node? I'd bet it fits in that budget, the Grace Hopper datasheet says the combo has a CPU + GPU + memory TDP of 450W - 1000W programmable, and that leaves more than half of the room for the rest of the node's power budget. For the DGX GH200? It's 18,432 CPU cores with 256 GPUs across 16 full racks of servers :p.