Counterpoint: data hoarding is not like physical hoarding (or at least, it hasn't been up to this point), because we've lived through an era of exponentially increasing storage capacity (with file sizes to match, in many cases).
I still have a folder full of notes from several of my university courses, grouped by course. Some of it is source code (either the lecturer's or my own); some is assignment text (in a mixture of plain text, PDF, legacy .doc, etc.). There aren't any repositories because this was many years before Git existed and professors back then apparently didn't think we needed to be taught about the systems that did exist.
But why not keep it? The whole collection is smaller than, say, the OpenBLAS shared library that comes with a NumPy installation. It's maybe 1% of the size of the ISO for a modern desktop Linux distribution.
It's part of a folder with even older stuff - all the way back to toy Turing programs I wrote as a child. There are countless random files that are probably poorly organized internally, that I'll likely never revisit with any good reason. But the whole thing is less data than I'd likely end up downloading if I spent an hour on YouTube or Twitch. The ability to store it permanently costs me literally pennies, amortized over the cost of the drive.
... And yet, the size of modern applications still bothers me. It feels almost disrespectful, somehow. Old habits die hard, I guess.
Indexing is an alternative to pruning that can be just as effective at increasing SNR without being as destructive. You can both keep a best of collection as well as the whole thing in case you really do file like going through it or want something very specific that you would have never thought would become important to you again.
> And yet, the size of modern applications still bothers me. It feels almost disrespectful, somehow. Old habits die hard, I guess.
Data size != memory size, and even memory size != binary size. It's totally fair to rail against the program text, and associated application data, that have to be loaded onto your machine in order for you to do something as simple as send a message on Slack -- RAM, unlike cold storage space, has not grown quite so exponentially, and wasting that space is expensive. And of course, the larger the binary, the slower the program, and the worse your programs will interface with other ones on the system.
RAM has grown quite a bit and the only reason it hasn't grown as exponentially is because it is essentially a cache for permanent storage. For caches speed is much more important and speed is very much still a trade off.
I still have a folder full of notes from several of my university courses, grouped by course. Some of it is source code (either the lecturer's or my own); some is assignment text (in a mixture of plain text, PDF, legacy .doc, etc.). There aren't any repositories because this was many years before Git existed and professors back then apparently didn't think we needed to be taught about the systems that did exist.
But why not keep it? The whole collection is smaller than, say, the OpenBLAS shared library that comes with a NumPy installation. It's maybe 1% of the size of the ISO for a modern desktop Linux distribution.
It's part of a folder with even older stuff - all the way back to toy Turing programs I wrote as a child. There are countless random files that are probably poorly organized internally, that I'll likely never revisit with any good reason. But the whole thing is less data than I'd likely end up downloading if I spent an hour on YouTube or Twitch. The ability to store it permanently costs me literally pennies, amortized over the cost of the drive.
... And yet, the size of modern applications still bothers me. It feels almost disrespectful, somehow. Old habits die hard, I guess.