Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
42.zip (2004) (unforgettable.dk)
120 points by geekam on June 14, 2014 | hide | past | favorite | 49 comments


That's nothing. How about an infinite zip file :)

http://research.swtch.com/zip


Interesting.

I wonder if it might be possible to prove the existence of fixed points of compression programs using only general assumptions (which would be satisfied by the Lempel-Ziv implementations the author discusses so well).

Such a fixed-point theorem would probably not tell us how to find solutions but it would be illuminating if the assumptions could be made weak enough.


Here are some other compression oddities: http://www.maximumcompression.com/compression_fun.php

A 42 byte file that expands to 5 MB; a file that compresses in one compression software but not another; a file that compresses to itself; and some extreme compression software.


> a file that compresses to itself

Could this be used as DoS against anti-virus such as the one that GMail uses?


Probably not, as antivirus programs these days all have protections against zip bombs [1].

[1] https://en.wikipedia.org/wiki/Zip_bomb


10 years ago most virus scanners were vulnerable to it.

actually you can build your own collection of files with cat /dev/zero >foo.txt ... then zip all those into a final collection of zip's -> n-levels deep ... resulting individual zip files will be tiny, but once extracted the file eats massive space (idea is to fill disk on scanning server up and create I/O errors or keep the server busy with a DOS). Also servers usually limit now the depth they are willing to scan in a zip to avoid this type of attack (workaround: only use up to 4 levels ...)


Here's an information-theory question: would there be a way to estimate the size of the decompressed output, just given the raw zip data and no size metadata, without attempting to extract it (i.e. without spending time or space proportional to the compression factor)?


It's trivial if you know the zip file contains just a single file. A zip file starts with a "local file header" (see [1]) that contains the compressed and uncompressed size metadata.

So you can decompress the first n bytes of the first file (where n = enough to get a complete header), look at the header, seek past it, then decompress the header of the next file, and so on, recursively.

Unless you meant how to do this without reading the size information from the headers at all, in which case I don't think it would be possible without actually decompressing. There wouldn't be any need to spend "space proportional to the compression factor", though; you would decompress recursively with a pipelined function, but it would just count, not emit, any data. For a 42 KB zipfile, the time required would be near zero.

[1] http://www.pkware.com/documents/casestudies/APPNOTE.TXT



I don't know whether it's possible with ZIP, but there are definitely compression schemes for which this can be done in time proportional to the size of the compressed file.


So long as it used the usual deflate algorithm, and you don't need to deal with recursive compression as in the article, yes. The size counter program would be structured similarly to the decompression algorithm, but there are some points where the compressed file says "repeat this many bytes from this location" where a decompressor would have to copy that many bytes, where the counter would only have to add the number.


Well, yes. The linked page shows you how to calculate it.


Your answer is wrong. He says "Just given the raw zipped data and no size metadata".

The page calculates it using exactly that size metadata he excluded.

The question, as I see it, is if that size metadata (the 4.3 gigs at the bottom) can be determined from the zip file without unzipping a rather lot of data.

It would be interesting if someone who knew the exact details of the zip format could comment.



Here's an old page with various compression bomb demos: http://www.aerasec.de/security/advisories/decompression-bomb...

It's not just archive formats, anything which uses standard compression algorithm could be a compression bomb (e.g. PNG)


http://www.aerasec.de/security/advisories/html-bomb/bomb-htm...

Modern computers don't choke on ten million X's on a page, but the fans do spin up wildly. Maybe smartphones die with this.


Any examples of a PNG bomb in the wild?


There's this[1]. eog doesn't choke on it, but it takes 4GB of RAM during processing. After it return to 2GB. WARNING: Will decompress to potentially 5GB of RAM.

[1]: ftp://ftp.aerasec.de/pub/advisories/decompressionbombs/pictures/picture-1G-19000x19000.png


The link in parent comment has instructions for creating PNG bombs and a link to an ftp site with some downloads.


So, do we need to change our file uploaders and whatnot to autokill after a certain amount of time? Or just set `ulimit -m` or whatever judiciously?

This looks like an annoying attack vector for people that, say, resize image uploads to make thumbnails.


> So, do we need to change our file uploaders and whatnot to autokill after a certain amount of time? Or just set `ulimit -m` or whatever judiciously?

Most image formats specify the size before providing any data, you can just check that w*h is below whatever limit you deem fit before loading the image data. The file upload size should already have a "sane" limit so people don't upload truckloads of data, the problem with compression bomb is that you can upload a very small amount of data and it eats the target's RAM on decoding.

> This looks like an annoying attack vector for people that, say, resize image uploads to make thumbnails.

It is, although for JPEG you don't need to decode the image to resize it (depending how whatever you're using is implemented it may still do so regardless).


That's a good test for pass-through antivirus appliances :)


Makes me wonder what are the implication of the halting problem for compression formats


The halting problem only applies to full Turing machines, and their equivalents. So, a compression algorithm based on x86 assembly is uncheckable, but one based on a restricted format like .zip would be fine. Or, you could just give it a time limit.


I made the file DanBC referred to above, and wondered about how much computation you could do with iterated decompression (specifically, deflate-type decompression). Could you make a zip file representing a Turing machine, so that after unzipping it it represented the machine after one step of processing, and after unzipping the result, represented the machine after a second step, etc.

Or if not a direct translation of a Turing machine, then some other Turing complete system.

As far as I know, no one has done this, but if it is possible, one implication would be that you couldn't prove whether the process of recursively unzipping a file could continue forever, or would halt.


Or for Google-scale storage systems!


Gmail won't allow you to send this file to anyone, throws an error pointing you to https://support.google.com/mail/answer/6590

Ah, I almost forgot we are not spied on. Tried via local email client.


What? A lot of spam filters will block zip files and the like, include free software and company run ones.


Yes. And Google isn't blocking all zip files; or even all password protected zip files. Google just blocks password protected zip files that contain another zipfile.


> Mac users, sending a zip file which also contains another zip file increases your chances of file corruption. We recommend that you de-compress all files first and create only one zip file.

what the fuck?


Yeah, that's pretty crazy. If zipping a zip causes "corruption", they should probably file a bug at bugreport.apple.com


I like that they "recommend" something, but what they really mean is demand.


All files at the same "level" are identical. The 4GB file contains zeros only. I think these "data" could be compressed into even smaller size using some different methods.


It would be interesting to see how small it could be made.

I posted a link earlier, but here it is again with a quote:

http://www.maximumcompression.com/compression_fun.php

> 'What is the best compressor to get really extreme compression?' is a question often asked on the internet. The achieved compression ratio of course depends on the quality of the compressor used, but the type of data that is being compressed is much more important. To show this I created a tiny 115 byte rar file. When you decompress it, it will turn into a textfile of almost 5 MB (a compression ratio of 99.998%). Here it is: test.rar. Note: Turtle 0.07 compresses the same file to 49 bytes, Hook09c (with switches 3 1 1 1) to 36 bytes and UHBC (with switches -b128m -m3) to 24 bytes!!.

I enjoy watching this "code golf" style competing to optimise one feature of something.


There's this old discussion about zip bombs: https://news.ycombinator.com/item?id=4616081 A certain Caspian Maclean published a gzip quine on the Web in 2004, but I cannot seem to find it.


It's funny, first got to know about this a few days ago when I came across this [1, 2] interesting thread on Quora.

[1] http://qr.ae/sQnm5

[2] What is the most compressed file ever?


NOD32 detects it as an Arch-bomb. Made me think of the somewhat similar 'Billion laughs' XML bomb but for archives.


Now we have to ask, how did he create that file without using petabytes of disk space?


I would guess you would just need to create the single 4G file and zip it. Once it's zipped, copy 16 times, put in a zip file, repeat.


Precisely that.

It's rather a pain in the ass for antivirus products, which tend to have a nesting limit to prevent these kind of shenanigans.

There are much weirder things you can do. With knowledge of what's packing it and if the packer doesn't bail out on expanded files (like, say, LZMA2 used in XZ does, but the older LZMA doesn't), you can make a small file which tickles every worst-case performance/size wrinkle in the cruncher and comes out very large!

I thought executable compressors had reached a plateau until Mentor/TBC & Blueberry/Loonies came up with crinkler - http://crinkler.net/ - which is a compressing linker specialised for demoscene 4ktros, and cheats like a motherfucker.


Interesting, is there a description of how it works or cheats?


By piping dd to zip?


so what's the content of the final files ? just a bunch of redundant zero's and one's ?


Isn't that the contents of all files?


You think all file contents are redundant? Deep man, deep.


I'm sorry, but I have to say this: I like how this entry had 42 comments before this one: http://i.imgur.com/KmCodSM.png


42.zip has a weissman score of 42


wow, that has a Weissman Score close to the theoretical limit


got nothing on Pied Piper




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: