Not the GP, but I believe that they are talking about the size of the training d... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		gmueckl on May 31, 2023 \| parent \| context \| favorite \| on: Nvidia DGX GH200: 100 Terabyte GPU Memory System Not the GP, but I believe that they are talking about the size of the training data set in relation to the model size.

mirekrusin on May 31, 2023 [–]

You don't need to and can't really load all training data.

For LLMs you need to load single row of context size, that's vector of ie. 8k numbers, which is 32kB for single precision floats.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact