> Reading from a pointer could be from L1 cache, or it could be from a pci-e car... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		dragontamer on Feb 2, 2022 \| parent \| context \| favorite \| on: Failing to reach DDR4 bandwidth > Reading from a pointer could be from L1 cache, or it could be from a pci-e card attached to another socket. The fun one is TLB (translation lookaside buffers) and the virtual memory system. Today's AMD core's have more L3 cache than what the TLB can handle with 4k-pages. You need to enable 2MB hugepages or 1GB hugepages to even access L3 cache at full speeds in practice... EDIT: Milan-X has 96MB L3 cache per CCX. 4kB-pages would require 24,000 (24-thousand) TLB-entries. IIRC, Milan only has 2000-TLB-entries. Hurraaahhhhhh.... ------ CPUs are devilishly complicated. It makes optimization "fun". Apparently, running "memcpy" requires Ph.D levels of study before you can "memcpy" at full speeds these days.

my123 on Feb 2, 2022 | [–]

In the same kind of funny subject, GPUs nowadays have full MMUs, with TLBs and all present too…

ncmncm on Feb 3, 2022 | [–]

Hugepages FTW

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact