Real-world large filesystems are distributed across many thousands of hosts and multiple datacenters, not mounted as a Linux filesystem on a single host. Because whole racks and whole datacenters fail, not just disk drives.
So they used 128 bit because of bikeshedding. Committees always make the most conservative decision possible. Like IPv6.
The fact that real-world storage systems are distributed on the network bolsters the case for supporting 128-bit and even larger types.
Creating unified namespaces is really useful and a _great_ simplifier. The reason we don't do that as often as we should is because of limitations in various layers of modern software stacks, especially in the OS layers.
Unfortunately, AFAIU ZFS only supports 64-bit inodes. A large inode space, like 128-bit or even 256-bit, would be ideal for distributed systems.
Larger spaces for unique values are useful for more than just enumerating objects. IPv6 uses 128 bits not because anybody ever expected 2^128-1 devices attached to the network, but because a larger namespace means you can segment it easier. Routing tables are smaller with IPv6 because its easier to create subnets with contiguous addressing without practical constraints on the size of the subnet. Similarly, it's easier to create subnets of subnets (think Kubernetes clusters) with a very simple segmenting scheme and minimal centralized planning and control.
Similarly, content-addressable storage requires types much larger than 128 bits (e.g. 160 bits for Plan 9 Fossil using SHA-1). Not because you ever expect more than 2^128-1 objects, but because generating unique identifiers in a non-centralized manner is much easier. This is why almost everybody, knowingly or unknowingly, only generates version 4 UUIDs (usually improperly because they randomly generate all 128 bits rather than preserving the structure of the internal 6 bits as required by the standard).
ZFS failed not by supporting a 128-bit type for describing sizes, but by only supporting a 64-bit type for inodes. And probably they did this because 1) changing the size of an inode would have been much more painful for the Solaris kernel and userland given Solaris' strong backward compatibility guarantees, and 2) because they were focusing on the future of attached storage through the lens of contemporary technologies like SCSI, not on distributed systems more generally.
Unified namespaces on many-petabyte filesystems are perfectly commonplace
HDFS, QFS,.... even old GFS
You wouldn’t make them Linux/fuse mountpoints though, that’s just an unneeded abstraction. Command line tools don’t work with files that are 100TB each.
Command line tools don’t work with files that are 100TB each.
No, but they do work with small files, which presumably most would be if the number of objects visible in the namespace system were pushing 2^64.
100TB files are often databases in their own right, with many internal objects. But because we can't easily create a giant unified namespace that crosses these architectural boundaries, we can't abstract away those architectural boundaries like we should be doing and would be doing if it were easier to do so.
Just to be more specific, imagine inodes were 1024 bits. An inode could become a handle that not only described a unique object, but encode howto reach that object. Which means every read/write operation would contain enough data for forwarding the operation through the stack of layers. Systems like FUSE can't scale well because of how they manage state. One of the obvious ways to fix that is to embed state in the object identifier.
A real world example are pointers on IBM mainframes. They're 128 bits. Not because there's a real 128-bit address space, but because the pointer also encodes information about the object, information used and enforced by both software and hardware to ensure safety and security. Importantly, this is language agnostic. An implementation of C in such an environment is very straight forward; you get object capability built directly into the language without having to even modify the semantics of the language or expose an API.
Language implementations like Swift, LuaJIT, and various JavaScript implementations also make use of unused bits in 64-bit pointers for tagging data. This is either not possible on 32-bit implementations, or in those environments they use 64-bit doubles instead of pointers. In any event, my point is that larger address spaces can actually make it much easier to optimize performance because it's much simpler to encode metadata in a single, structured, shared identifier than to craft a system that relies on a broker to query metadata. Obviously you can't encode all metadata, but it's really nice to be able to encode some of the most important metadata, like type.
For IPv6 the 128 bit has its justification. It's supposed to enable proper hierarchical routing and to reduce the number of entries inside the routing tables which is the pain point where it gets expensive. The idea is that no one, at any level, needs to request the allocation of a second subnet because what he has is large enough by default. So you need more bits than necessary to allow a little bit of "wastefulness" even after some layers of subnetting.
Moreover the convention that there should be no subnet smaller than /64 enables stateless auto configuration for hosts. 64 bits is enough to fit common (supposed to be unique) hardware identifiers and even is large enough to assign random addresses (like with privacy extensions) with a very low probability of collisions.
That was the idea, but it didn't really turn out that way. Stateless auto configuration leaks your MAC address, which is a privacy issue. Most servers use static IPs and most desktops use random IPs, with checks for collisions.
IMHO, the 128 space was a big mistake. It's twice as hard to communicate between humans, most languages and databases don't support the data type natively and it complicates high-speed routing.
An average of 48 bits for a network and 15 for the host would have been better. For other reasons you almost never want more than a few hundred hosts on one layer-2 network anyway.
Except IPv6 being 128-bit makes fast hardware implementations much easier than if it had to deal with shorter prefix lengths. Nothing shorter than 128 really makes sense at all in an IPv4 replacement.
Consider a tiny (5 machine) piece of the internet. Three hubs, an outlink and two smaller hubs, all connected (a triangle). With 4 bits, the left hub can have all the 0xxx addresses and the right hub can have all the 1xxx addresses. No matter where the devices connect, they can all get an IP and the outlink only needs to remember a simple rule (starts with one, go right, else go left).
Compare to a 3 bit network. By moving IPs from hub to hub, all five devices can always get an address, but the small hubs need to communicate which addresses they own to the outlink and to each other in order to avoid address exhaustion on either hub. Routing a packet is slower because the routing is more complex.
IPV6 is basically 64 bits for routing and 64 bits for the local network segment. Seems plausible that it's faster than trying to mask out the bits you need.
Hierarchy is nice, though. If you can model the bits as a tree, it becomes super quick to figure out where to route a packet. You can model stuff like that trivially with an FPGA.
On the other hand such committee bikeshedding seems to work rather well for PR. It makes them ignore hard problems and instead focus on things most people can understand and relate. Gaining more trust as opposed to a well designed thing with nothing to understand or relate.
So they used 128 bit because of bikeshedding. Committees always make the most conservative decision possible. Like IPv6.