It's not an increase in capacity of per-GPU "GPU memory" (the HBM directly conne...

senttoschool · on May 31, 2023

>480+96GB of total memory

Is this memory unified like Apple Silicon? Meaning, can a model be deployed onto 574GB of total memory? Can the GPU read memory directly from the 480GB pool? Same question for CPU being able to directly access the 96GB.

smoldesu · on May 31, 2023

It should be mapped as one address space, so yes to the loading across multiple GPU question. It's not fully unified though, at this scale of computer it's simply impossible to put 100s of GB on an SOC like that. Instead, the GPU and CPU have DMA over PCI and NvLink, which is plenty fast for AI and scientific compute purposes. "Unified memory" doesn't make much sense for supercomputers this large.

tacticus · on May 31, 2023

`Nvidia discovers DMA`

llm_nerd · on May 31, 2023

This device has a fully switched fabric allowing comms between any of the 256 "superchip" clusters at 900GB/s. That is dramatically faster than a direct host to GPU 32-lane PCI-E connection (which is crazy), and obviously dwarfs any existing machine to machine connectivity. The actual usability of shared memory across the array is improved significantly.

I mean...nvidia has obviously been using DMA for decades. This isn't just DMA.

zeusk · on May 31, 2023

Parent discovers the difference between DMA and RDMA

tacticus · on May 31, 2023

No i mean the fact that Nvidia is now claiming that the memory the CPU has access to can be counted as memory for the GPU. the fabric is neat. the "We have 500 GB of ram per gpu" claim is questionable.

jabl · on May 31, 2023

Nvlink provides cache coherent load-store access, so the point is actually that it's not DMA.

smoldesu · on May 31, 2023

They do make PCI hardware, don't they?