“Run large language models like BLOOM-176B collaboratively — you load a small pa...

“Run large language models like BLOOM-176B collaboratively — you load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning.”

According to this excerpt, a node in the network doesn’t need load the entire model. Only a part.