On modern hardware a key lookup in a hash table isn't necessarily a single page read! Sure, it's a single virtual memory access, but if that page isn't in your TLB you need to read the page table... and if the page containing that part of the page table isn't in the TLB you need to read that page...
On modern hardware, every memory access looks very much like a B-tree lookup.
TLB hierarchy is not a B-tree, it is a trie in all of the CPUs. Very different layout (not balanced and also hard sized), much faster on happy path.
Making a 48-bit B-tree would have a bit of a memory problem making TLB huge.
And then CPU cache is an array. Single virtual memory access is bound to be a single physical as well, with minor exceptions for NUMA nodes being crossed.
> and if the page containing that part of the page table isn't in the TLB you need to read that page...
I thought page tables used physical addresses, which are accessed directly without any TLB lookup (except when nested paging during virtualization, which adds another level of indirection). Of course, the processor still needs to read each level into the data cache(s) while doing a page table walk.
Typically. Although on a virtualized Arm what the guest views as a physical address is really an intermediate physical address that must be translated by the second stage MMU. So it’s possible that reading the first stage page tables can cause a page fault in the second stage MMU. I suspect modern x86 works similarly, but I’m less familiar with that.
Page tables can have multiple levels. For example in x86_64 you'd have 4 levels, i.e the virtual->physical mapping is implemented as a tree with depth 4, where each leave and internal node of such tree is 4kb (page size). (As usual, details are more complicated than that)
Yes, and each level of the tree has the physical address of the next level, so no TLB lookup is necessary (the top of the tree, in the TTBRn or equivalent registers, is also a physical address).
the TLB is just one element of the process that leads to resolve a virtual address into a physical one: it's a cache that hosts the most recently resolved addresses.
When the virtual address you're looking to resolve is not present in that cache (i.e. when you have TLB miss), the CPU falls back to walking the page table hierarchy. At each level of the tree, the CPU reads an physical address of the next level of the tree and performs a memory fetch of that page table entry (in my previous comment I erroneously said a "page fetch", but it's actually only performing a cache-line sized fetch) and repeatedly so until it reaches the leaves of the tree which contain the Page Table Entry that contains the physical address of the (4k) physical page associated with the virtual page address you wanted to resolve.
Depends on your workload and how many TLB entries your CPU has for superpages. The Zen 2 TLB can hold tons (1000s) of 2MB superpages but relatively few (64) 1GB superpages. Older CPU models had worse capacity for 1GB and 2MB superpages. E.g., Haswell (2013) had only 32 entries for 2MB superpages and 4 entries for 1GB superpages (data).
In addition to the limited number of cache slots available for superpages (varies depending on cpu), remember that those can be invalidated (again, depending on cpu). If you're ping-ponging processes on a single CPU, you won't necessarily have what you need in the TLB.
Depends on the design. At a minimum you're ping-ponging between userland and kernel; but you might also be bouncing between a transport layer unwrapper, an authentication front-end, the database core, and a storage back-end.
On modern hardware, every memory access looks very much like a B-tree lookup.