Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
DwarFS – Deduplicating Warp-Speed Advanced Read-Only File System (github.com/mhx)
211 points by pyinstallwoes on April 12, 2024 | hide | past | favorite | 81 comments


Related:

DwarFS: A fast high compression read-only file system - https://news.ycombinator.com/item?id=32216275 - July 2022 (64 comments)

DwarFS: A fast high compression read-only file system - https://news.ycombinator.com/item?id=25246050 - Nov 2020 (111 comments)


This is really neat.

Feature request: Add a "library" option to give mkdwarfs a list of files that should be loaded into the dedup mechanism first, but not stored, allowing the image to be even smaller if the contents of the file can be retrieved from that library instead. Bonus points if you can specify a dwarfs image as a library and have it sensibly use the files contained in it.

Then you have the basis for a deduplicating incremental backup system. Currently, I have a system I wrote that will take a single file and a list of library files and produce a compressed deduplicated file that can re-create that single file using the library, which is great if you use tar to create that single file, but a little unwieldy when coming to decompress and restore everything. The bonus of making it a proper mountable filesystem instead is that then it's a proper mountable filesystem and retrieving single files is a doddle.

My use case is that I have students, and I have given them coursework, which involves them logging in to a Linux machine and hacking away. I want to store regular snapshots of their work so that I can keep a backup for their sake but also so I can see a progression of development to try to work out if they are cheating (yes, I have had to deal with this), but I don't want to store 100 copies of the same fairly large files.


borg and restic/rustic are alternatives that do this multi layered backup thing. they can be mounted as a filesystem as well (the rustic reimplementation of restic however doesn't support this yet)

https://www.borgbackup.org/

https://restic.net/

https://github.com/rustic-rs/rustic


Can't say enough nice things about Borg and it's dedup. Saves me a lot of money in cloud storage.


I think Kopia would be great for your use case

https://kopia.io/

It has a great system to snapshot files but only store data if it's changed. I use it in an environment where I can't use something like zfs to snapshot data because I don't have the ability to make decisions about what filesystem we're using. It's been amazing, love it so much!


would this be doable with overlayFS using a tmpfs to prepare a snapshot?


I always thought a deduplicating squashfs might be really cool for a read-only MAME system. Since most of the rom files seem to be variants of each other, instead of having zip archives for each game, just put all the raw files in a directory and create a deduplicated squashfs-type filesystem with all of them.


Yeah, the original rationale is kinda perversely interesting:

> my main use case and major motivation was that I had several hundred different versions of Perl

okay... The question is if you want to trust a github filesystem or just spend the $dough to deal with it.


It's not an idle use case by the way. mhx wrote and maintains a library that provides important backwards compatibility for native (typically C based) extensions for Perl across decades of language releases.


No real doubt here this guy isn't 100% serious, so thanks for the explanation.


I would like to archive every web page I ever visit. Something like this sounds very useful but it would need to be updated periodically.

I’m thinking a hybrid approach could make sense here. Historical data is thrown into a big archive dwarFS while new data of e.g. the current day is kept in a simple normal folder. Every now and then this would be merged together.

Skimming through the docs I can’t tell if it will be possible to recreate a new dwarFS by taking an existing one and adding just a few more files, or if I would need to create a completely new one which also means I need to temporarily have enough space for twice the archive size


I want exactly the same.


How do you put the files in it if it's read-only?

To me it seems such an fs should be immutable-first rather than read-only, i.e. let you create (by copying from another fs) files which just can't be changed (can be moved though) ever after. And such would be what I actually need as I have a lot of big and relatively redundant files which aren't meant to change (e.g. video files, picture originals, distro and backup disk images etc.).


It's in the man page[1], you put the file at the filesystem creation:

$ mkdwarfs -i /path/dir -o image.dwarfs

[1]: https://github.com/mhx/dwarfs/blob/main/doc/mkdwarfs.md


It's meant as squashfs replacement. That is image used for rootfs on live-boot system with overlayfs on top for writable tempfs for example. But I imagine this being useful for many embedded applications in general where you need to save space.


Append only files might be enough to be fair. And still very useful.

As then, you can dedup per block, even with a sliding window. Nicest is that you can do it fully asynchronously, so real time appends are not slowed down.


Then you add delete and you have a cow fs.

Which are many.

Some with deduplicarion built in.


> How do you put the files in it if it's read-only?

The same way we [used to] use specialized software tools to assemble an ISO 9660 image with our data, and then burn that image to a [single-session] write-once optical disk like CD-R.


It's read-only after you create the image. And then you can use overlayfs on top of it as others mentioned -- useful for playing with live Linux distributions or embedded systems where you never want to change the root bootable image except when you are actually updating the firmware.

...Or, in the author's case, when you have hundreds of directories that have only small differences compared to each other.

As other posters in the bigger thread mentioned, this is also very useful for big arcade game collections.


So how much space would the entire english wikipedia take up on this filesystem, I wonder.


I've got a (relatively old) snapshot of the English Wikipedia that I'm using for testing. The snapshot is around 200 GiB in 14,000,000 files and compresses down to an 11 GiB DwarFS image.


I guess it's without the pictures then ? Because if I compare with the zim file format (which is optimized for this use-case) https://kiwix.org/en/what-is-the-size-of-wikipedia/ I read "As of October 2022, the Full English Wikipedia (ca. 6.5 million articles), with images will use up 91GB of storage space (German and French, the second-largest: 36 GB). (...) If you can do without the images (what we call the nopic version), then you are down to 46 GB."


Correct, there are no images in the data except for 68 PNGs. It's just HTML files.


how it's possible that a bunch of html files would add up to 200gb? is it because of some kind of overhead?

would maybe a database dump be smaller?


Well, "a bunch" is an understatement, I bet they have a bit more than just a bunch! It does pass a sniff test, since from https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia:

>As of May 2015, the current version of the English Wikipedia article / template / redirect text was about 51 GB uncompressed in XML format.

Compressed data at the same time was 11.5 GB. And that's data from 9 years ago, and just English Wikipedia.

For comparison, I collect leaked password dumps and they (combined, after deduplication) go into hundreds of GBs too. And that's for just username:password lines, not even text.


That's a substantial reduction. Thanks for testing this!


It's ever so slightly smaller than a .tar.xz of the same data. The main difference being that you don't have to fully extract it in order to access the data.


Only one way to find out :-)


Neat. This FS is used predominantly in torrents of pirated steam games by a certain group that advertises their work with "play without extracting".


> "play without extracting".

This kind of feature is really important if you to encourage people to seed even after their download is completed.


ipfs mount.

Since ipfs is content-addressable storage, it's readonly. If we make it cow, encourage people to seed is easy.


IPFS is slow as molasses and their solution was to introduce paid pinning services. Complete failure.

And then they announced they will willingly and proactively delete any hash that any legislative agency tells them to, and it was dead in the next minute.


Most distributed systems simply don't live up to their theory. Having a hub is simply always easier and consumers hopping temporally from hub to hub is just the most efficient mechanism.


Agreed. The idea behind IPFS is nice but we need to combine it with all the good lessons from the torrent software and stuff like NNCP in order to have something truly distributed, resilient, fault-tolerant and automatically replicated.


I'm tempted to try using this for container images. I would imagine the savings could be pretty significant.


If you do, you definitely should blog about it and promote it. I'm very interested in knowing the results.


I'm curious to know, but I believe a major portion of this idea comes into play when the data is massively redundant. I typical container (say, a vhdx file) normally wouldn't apply.

Did I misunderstand?


No you didn't misunderstand, however, I do believe there would be quite a bit of redundant data in the layers of a docker image, especially when the Dockerfile hasn't been optimized and was just haphazardly constructed.


That's a reasonable assumption. I'm curious to know how that turns out if you ever feel like posting it. :)

Thanks


It's a userspace only FS. Can we use that for container images?


This seems promising:

> "You need to mount the fuse drive with the option "allow_root"." https://serverfault.com/a/1085669


That was the thing I'm not sure about. Will try to find the answer.


I would love to see a comparison with EroFS (https://en.m.wikipedia.org/wiki/EROFS )


https://github.com/mhx/dwarfs?tab=readme-ov-file#with-erofs

It's been a while since I did that comparison, so the results could be significantly different now.


FWIW, I've updated the comparison using the latest versions of EROFS and DwarFS. There definitely have been improvements to EROFS in the meantime. DwarFS is still orders of magnitude faster in creating the file system and can achieve significantly better compression, but throughput is very similar unless you start optimizing the EROFS image for better compression.


Thanks ! One question though : why do you say "as it's pretty obvious that the domains in which EROFS and DwarFS are being used have extremely little overlap. DwarFS will likely never be able to run on embedded devices" ? (I mean : as far as I know EroFS was mostly used in smartphones, which are actually powerful devices !)


That's definitely a fair point! DwarFS actually works perfectly fine on 64-bit ARM and would likely work fine on a smartphone as well. Still (and I might be completely wrong about this), I think the primary goal of EROFS is to consume as little resources as possible when the file system is accessed and to be able to run on much less capable hardware, including 32-bit systems. DwarFS primarily cares about maximizing compression as long as it doesn't negatively impact performance (access times and throughput). This involves a certain amount of caching / pre-fetching and assumes there's plenty of memory available. This can be configured to a certain extent, but my pessimistic assumption is that DwarFS would use significantly more memory than EROFS. Might be worth actually backing this by numbers! :)


Do I miss ZipMagic that makes us use zipfile like folder in Windows explorer, including read write operation. It is diacontinued and its name is reused by another tool so I can only find its description in Amazon [1]

[1] https://www.amazon.com/Ontrack-Data-International-99-00030-0...


... and that's because my immediate usecase thought is compressing node_modules


Very cool for linux gaming. Highly recommended. works great for me.

https://gitlab.com/jc141x/setup


What does the code in this repo actually do, and how does it use DwarFS? The README just explains how to install it.


Very cool idea. I wonder how it would work to first turn every file you want to store into a stream of symbol files using a fountain code like RaptorQ, and then trying to compress all of the symbol files together using something like Z-standard with a big dictionary. It would probably be too slow to use for many file system applications, but I would think it would be pretty good at exploiting redundancy across files and getting good compression ratios.


This sounds like something that could be useful for immutable distros like Vanilla and others, but I wonder how practical it is if it’s not part of the kernel.


This looks really cool! though its a bit limited since it is a FUSE module and not a kernel driver, and unlikely to become a kernel module since it is written in C++ with large dependencies :-\

Would it be possible to take the core design changes here and apply them to squashfs, and maybe propose a next major version of the squashfs internal format to make all these things possible?


Given the prose in the README I don’t get the impression the author has much interest in that.


I've heard that SMB-fs is "better" than FUSE. More cross-platform, can also be implemented in user-space, and less likely to get jammed up/deadlocked by slow network calls inside the kernel.


You've heard from where? Are there any extant specialty fs that expose an smb interface?


Whoops: WebDAV:

https://news.ycombinator.com/item?id=39417503

SeaweedFS supports WebDAV. https://github.com/seaweedfs/seaweedfs/wiki/WebDAV

I'm not able to find if both/restic supports mounting backups as WebDAV, but in theory there's nothing stopping you.

It's 100% user space (expose a rest service) and supported by a bunch of file-browsers with a bit of a network aware component to it as well.


I feel like 9p is a better competitor in this space, and I've actually heard of people using it (e.g. the Windows Subsystem for Linux mounts host filesystems this way and so does Crostini, the ChromeOS equivalent).




Not to be confused with dwarffs – https://github.com/edolstra/dwarffs.


there are maybe gonna be a lot of mis-spelled pull requests and confusion.


All tech aside, this is the best name for a filesystem, period.


Can someone explain what this is and when I would use it in layman’s terms? I understand it’s a fuse filesystem but don’t understand quite what it does?


It fills a similar niche as https://en.wikipedia.org/wiki/SquashFS but has different data compression characteristics.

However, since it's a FUSE only file system, it's difficult to see how it would be used on embedded system firmware, so it could perhaps see use as a distribution mechanism. Similar to tar or zip files, but possibly with (much) better performance for random access, should you need only smaller portion of the whole archive.

The author indicates need for keeping multiple similar copies of sets of unchanging files on their computer, and made this to reduce the space needed for them, while retaining the access through the file system. So that is also a use case.


Today I discovered _mm512_mask_compressstoreu_epi64.

Neat instruction.


Pretty neat!

"Clustering of files by similarity using a similarity hash function" does anyone have an intuition of how similarity hashes work?



This is quite interesting for the similarity hash function alone.

But the self-congratulatory tone from description is… unusual, to say the least.


can you give an example of "self-congratulatory tone"? is it things like, "this is still 10 times larger than dwarfs", because a statement of fact can not be insolent.

I actually only went through a significant portion of the readme (through the CromFS part) because of your comment, and i just don't see it. I see a person who wrote actually useful software that is multi-platform and gives the positives and negatives of the software they wrote compared to alternatives available today.

In every test DwarFS compared favorably, and on tests where one aspect was marginal, the DwarFS code was better in other regards: power at the wall, extract/read times, etc.

How would it be better presented by a solo developer?


“While this is already impressive, it gets even better.”

I can almost hear “and if you call now, you get this amazing towel for free!”

But judging by the comments here, it seems I’m more sensitive to this tone than most.


that's fair, and i completely missed that. I remember HAProxy used to have similar verbiage on their main page, back when they were the only software load balancer to be able to sustainably manage 10gbit of throughput. I gave a quick scan of the current intro.txt and there's still some ...

> HAProxy offers a fairly complete set of load balancing features, most of which are unfortunately not available in a number of other load balancing products

It seems like a minor nit that could be "fixed", especially since the "it gets better" is in reference to "not only smaller, but also much faster" and it's not an insignificant performance increase, it's 100 times faster. Basically everyone (mostly) uses squashfs, and this absolutely trounces it - according to the author.

anyhow i hope my reply wasn't too extra


Not at all. These things are cultural and on a person by person basis as well. I’m definitely on the low profile side, to a fault.


Thanks for the feedback! I was probably a little too excited when writing the documentation. Reading it again with some distance, I can certainly see why this might be a bit off-putting. I'll keep this in mind for the future!


Well it performs extremely well, and if I wrote it, I’d be proud of it, too.


So would I, but I’m uncomfortable with that tone even in reading, let alone writing. Just an observation, something that jumped out.


This will have huge benifits for Linux ISO images, in reducing there size by using this instead of squashfs.


Kudos to the developer, and thanks for sharing this.


For rock and stone!


yes, now let's put all public government data on one of this so that we can all inspect the public institutions which we use to rule over ourselves and achieve the next level of real digital democracy: full institutional public transparency

...as if... now, let me spit out all this purple democratic kool aid




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: