Is there a real fundamental difference between the two approaches? It seems to me that it's all about API, not underlying concepts. A filesystem may organize things hierarchically, such that the full path to a file is composed of the names of the various folders that contain it plus the name of the file itself, glued together with a path separator. Or a filesystem may organize things as a pure key-value store, where the file's name is the full path. Code that uses the filesystem could then treat it either way. You can address files purely by full path on a hierarchical system, and you can pick a "path separator" and treat it like there were directories on a pure key-value store.
Applied to code, it's really a matter of how you look up names. Prefixing is just the situation where there's no language assistance in looking up names. Namespacing is just prefixes plus some automatic help in looking up partial names.
> Is there a real fundamental difference between the two approaches?
Yes. With true directories, you can implement directory rename() with O(a+b) time complexity, where a and b are the depths of the paths to the directories. With a flat key/value store, directory rename() requires at least O(n) time complexity for n directory children, since each child must be renamed.
I'd be curious to know how to make it more efficient. Either the descendants of a directory include the directory's name in their identifier, or they don't. If they do, then to implement rename() you need to do something such that all descendants of the directory can be queried by a different identifier than the one they were given upon creation (where the difference is exactly the name change of an ancestor directory). You can:
* rename all descendants on directory rename (expensive--takes O(n) time for n descendants).
* log each rename() and translate a key with a new directory name into a key with the original directory name (requires O(k) space for k renames and O(d) time in the worst case for translating a path of depth d whose directories have all been renamed; introduces the need to garbage-collect log records when all descendants of a renamed directory get deleted).
Granted, this is a more nuanced than what I said before about rename() requiring O(n) time complexity when implemented on top of a key/value store, but the second option is still more costly than rename() with true directories. I'm not sure (but cannot prove at this time) that we can do better than the second option above, but would love to hear your thoughts.
I'm not debating that it's inefficient if you use the same data structure that is used now. That's why I suggested using a different one.
Instead of a flat list of file names for each directory, you store the file names in a trie, so that files with a shared prefix in their name have a shared ancestor node in the trie. To do a directory rename, you just have to find the node in the trie that corresponds to the directory, move that node to a different location in the trie and change the portion of the name that is stored in that node.
> Instead of a flat list of file names for each directory, you store the file names in a trie, so that files with a shared prefix in their name have a shared ancestor node in the trie.
But then, if they behave like directories, and the software treats them like directories, aren't you effectively implementing directories in everything but name? (I mean, of course current software doesn't actually treat directory names this way; but it seems like changing the data structure is going a long way to maintain a directory-less fiction.)
No, because you could also rename any shared prefix of file names, not just the parts that end with a slash. Instead of directories having a special significance, now any arbitrary prefix in a file name has the same significance.
I'm not seeing how this is any different from true directories in terms of time and space complexity, as far as rename and path resolution are concerned. Sure, the trie nodes don't have to be implemented as key/value pairs in the data store, but don't they otherwise serve the same purpose in the affected algorithms--pointing to key/value pairs or other directories? Doesn't this still mean that path resolution is O(d) for a path of depth d in the trie, and rename is O(a+b) for two paths of depth a and b?
The time and space complexity aren't different. That's the whole point I have been arguing, i.e. that you can support the paths as an arbitrary string semantics without losing efficiency for a directory rename operation.
> No, because you could also rename any shared prefix of file names, not just the parts that end with a slash.
I think that
$ mkdir -p a/bc; touch a/bc/{d,e}
$ cd a; mv bc Bc
(changing just the 'b', not the whole 'bc') amounts to the same behaviour for directories. I guess you could argue that it's different because you still have to mention the `c`, even if you don't change it.
What if instead of just one file starting with b, you have multiple files starting with b?
If there are files with names bc, bd and be, `mv b B` now renames all of those to Bc, Bd and Be. With normal directories, there is no way to do that as one operation.
There"s no efficiency requirement on atomicity. Who said we need POSIX filesystem semantics anyway? Is it so important we should trade scalability? Obviously that wasn"t the s3 design choice.
Filesystems users (developers) already assume that renames are fast. Changing the mode now would mean that some of the applications would seem like blocked or could behave strangely.
S3 users already know and should handle that.
Applied to code, it's really a matter of how you look up names. Prefixing is just the situation where there's no language assistance in looking up names. Namespacing is just prefixes plus some automatic help in looking up partial names.