Your intuition is correct and the sibling comments are wrong. Modern LLM inferen...

simonw · 2025-10-15T23:31:57 1760571117

OK that's pretty fascinating, turns out Mooncake includes a trick that can populate GPU VRAM directly from NVMe SSD without it having to go through the host's regular CPU and RAM first!

https://github.com/kvcache-ai/Mooncake/blob/main/doc/en/tran...

> Transfer Engine also leverages the NVMeof protocol to support direct data transfer from files on NVMe to DRAM/VRAM via PCIe, without going through the CPU and achieving zero-copy.