I think NVlink is a re-configuration of the PCIe standard, while using the same physical pathways it communicates in a different way.
Currently PCIe can talk directly to the memory controller, and have direct RAM access without CPU overhead. So the whole GPU's run as fast as CPU's can give them memory is a bit false (but since memory controllers are on chip now I guess its true?).
What this maybe, if I had to guess. Is dynamic lane allocation. Where 1 GPU can free its PCIe lanes to let other devices use them. So 1 GPU can use all ~90 or how ever PCIe lanes IBM power9 chips have, instead of 4 GPU's each having 22.
Currently PCIe can talk directly to the memory controller, and have direct RAM access without CPU overhead. So the whole GPU's run as fast as CPU's can give them memory is a bit false (but since memory controllers are on chip now I guess its true?).
What this maybe, if I had to guess. Is dynamic lane allocation. Where 1 GPU can free its PCIe lanes to let other devices use them. So 1 GPU can use all ~90 or how ever PCIe lanes IBM power9 chips have, instead of 4 GPU's each having 22.