Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Out of curiosity, how large would the time capsule need to be to contain 5TB of data encoded that way?


The storage density of A-format mass market paperbacks containing dense UTF-8 text is roughly 4Mb/Kg. (A 400 page novel weighs around 250 grams and contains roughly 1Mb. Source: I went and weighed one of my novels, of known word count.) We can up the density somewhat by bzip2-compressing and then uuencoding (or similar); maybe 10Mb/kg is achievable.

Normal A-format paperbacks use acidic wood pulp for paper, but acid-free paper doesn't add a whole lot to the cost. So we get roughly 10Gb/ton, and the whole kaboodle comes in at roughly 500 tons. As the density of wood pulp is close to that of water, this approximates to a 10 x 10 x 5 metre lump. Quite a large time capsule, but not unmanagable :)

However. If we posit the availability of decent optical scanners in 50 years' time, there's no reason not to go denser.

We can print the data as a bitmap at 300dpi and be reasonably sure of retrieving it using a 2400dpi scanner. Let's approximate 300dpi to 254dpi, and call it 10 bits/mm. Printed on reasonable quality acid-free paper, we can get 100mbits/square metre, and that square metre will weigh around 80 grams (it's 80gsm paper -- same as your laser printer runs on). Call it 1.1 mb/gram. At this density, we can print the whole 5Tb (or 40tbits) on just 40 tons of paper, without compression; with compression, call it 20 tons. That's a 2.71 metre cube; just under 9' on an edge.

This assumes a monochrome bitmap. If we go to colour and use archival-quality inks we can probably -- conservatively -- retrieve one bit each of red/blue/green per dot, and it's probably not unreasonable to expect to get a whole byte out in place of each bit in our mono bitmap. So we can probably shrink our archive to roughly 2.5 tons of paper -- a pallet-load.


I wonder if anyone has actually attempted this and seen how dense you can pack it and reliably recover. I imagine you would need measures to counter small misalignments when rescanning and imperfections in the physical media.


200kb per A4 page using a 600dpi b/w laser printer: http://ronja.twibright.com/optar/


Take a look at http://www.ollydbg.de/Paperbak/index.html It encodes the binary onto paper and uses Reed-Solomon ECC to restore unreadable data.

I've tested it out myself, and it's only after you start to crumple it together that it stops working. I tested it with an inkjet printer though. A laser printer may stand up better.



So roughly half a Library of Congress.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: