42.zip (2004)

0x0 · on June 14, 2014

That's nothing. How about an infinite zip file :)

http://research.swtch.com/zip

ocfnash · on June 15, 2014

Interesting.

I wonder if it might be possible to prove the existence of fixed points of compression programs using only general assumptions (which would be satisfied by the Lempel-Ziv implementations the author discusses so well).

Such a fixed-point theorem would probably not tell us how to find solutions but it would be illuminating if the assumptions could be made weak enough.

DanBC · on June 14, 2014

Here are some other compression oddities: http://www.maximumcompression.com/compression_fun.php

A 42 byte file that expands to 5 MB; a file that compresses in one compression software but not another; a file that compresses to itself; and some extreme compression software.

nileshtrivedi · on June 15, 2014

> a file that compresses to itself

Could this be used as DoS against anti-virus such as the one that GMail uses?

laurent123456 · on June 15, 2014

Probably not, as antivirus programs these days all have protections against zip bombs [1].

[1] https://en.wikipedia.org/wiki/Zip_bomb

DyslexicAtheist · on June 14, 2014

10 years ago most virus scanners were vulnerable to it.

actually you can build your own collection of files with cat /dev/zero >foo.txt ... then zip all those into a final collection of zip's -> n-levels deep ... resulting individual zip files will be tiny, but once extracted the file eats massive space (idea is to fill disk on scanning server up and create I/O errors or keep the server busy with a DOS). Also servers usually limit now the depth they are willing to scan in a zip to avoid this type of attack (workaround: only use up to 4 levels ...)

derefr · on June 14, 2014

Here's an information-theory question: would there be a way to estimate the size of the decompressed output, just given the raw zip data and no size metadata, without attempting to extract it (i.e. without spending time or space proportional to the compression factor)?

atombender · on June 15, 2014

It's trivial if you know the zip file contains just a single file. A zip file starts with a "local file header" (see [1]) that contains the compressed and uncompressed size metadata.

So you can decompress the first n bytes of the first file (where n = enough to get a complete header), look at the header, seek past it, then decompress the header of the next file, and so on, recursively.

Unless you meant how to do this without reading the size information from the headers at all, in which case I don't think it would be possible without actually decompressing. There wouldn't be any need to spend "space proportional to the compression factor", though; you would decompress recursively with a pipelined function, but it would just count, not emit, any data. For a 42 KB zipfile, the time required would be near zero.

[1] http://www.pkware.com/documents/casestudies/APPNOTE.TXT

DanBC · on June 14, 2014

Long answer: http://mattmahoney.net/dc/dce.html

igul222 · on June 14, 2014

I don't know whether it's possible with ZIP, but there are definitely compression schemes for which this can be done in time proportional to the size of the compressed file.

caspianm · on June 15, 2014

So long as it used the usual deflate algorithm, and you don't need to deal with recursive compression as in the article, yes. The size counter program would be structured similarly to the decompression algorithm, but there are some points where the compressed file says "repeat this many bytes from this location" where a decompressor would have to copy that many bytes, where the counter would only have to add the number.

nawitus · on June 14, 2014

Well, yes. The linked page shows you how to calculate it.

euank · on June 14, 2014

Your answer is wrong. He says "Just given the raw zipped data and no size metadata".

The page calculates it using exactly that size metadata he excluded.

The question, as I see it, is if that size metadata (the 4.3 gigs at the bottom) can be determined from the zip file without unzipping a rather lot of data.

It would be interesting if someone who knew the exact details of the zip format could comment.

vezzy-fnord · on June 14, 2014

Context: https://en.wikipedia.org/wiki/Zip_bomb

masklinn · on June 14, 2014

Here's an old page with various compression bomb demos: http://www.aerasec.de/security/advisories/decompression-bomb...

It's not just archive formats, anything which uses standard compression algorithm could be a compression bomb (e.g. PNG)

JetSpiegel · on June 15, 2014

http://www.aerasec.de/security/advisories/html-bomb/bomb-htm...

Modern computers don't choke on ten million X's on a page, but the fans do spin up wildly. Maybe smartphones die with this.

angersock · on June 14, 2014

Any examples of a PNG bomb in the wild?

JetSpiegel · on June 15, 2014

There's this[1]. eog doesn't choke on it, but it takes 4GB of RAM during processing. After it return to 2GB. WARNING: Will decompress to potentially 5GB of RAM.

[1]: ftp://ftp.aerasec.de/pub/advisories/decompressionbombs/pictures/picture-1G-19000x19000.png

DanBC · on June 15, 2014

The link in parent comment has instructions for creating PNG bombs and a link to an ftp site with some downloads.

angersock · on June 15, 2014

So, do we need to change our file uploaders and whatnot to autokill after a certain amount of time? Or just set `ulimit -m` or whatever judiciously?

This looks like an annoying attack vector for people that, say, resize image uploads to make thumbnails.

masklinn · on June 15, 2014

> So, do we need to change our file uploaders and whatnot to autokill after a certain amount of time? Or just set `ulimit -m` or whatever judiciously?

Most image formats specify the size before providing any data, you can just check that w*h is below whatever limit you deem fit before loading the image data. The file upload size should already have a "sane" limit so people don't upload truckloads of data, the problem with compression bomb is that you can upload a very small amount of data and it eats the target's RAM on decoding.

> This looks like an annoying attack vector for people that, say, resize image uploads to make thumbnails.

It is, although for JPEG you don't need to decode the image to resize it (depending how whatever you're using is implemented it may still do so regardless).

huhtenberg · on June 14, 2014

That's a good test for pass-through antivirus appliances :)

raverbashing · on June 14, 2014

Makes me wonder what are the implication of the halting problem for compression formats

whatshisface · on June 14, 2014

The halting problem only applies to full Turing machines, and their equivalents. So, a compression algorithm based on x86 assembly is uncheckable, but one based on a restricted format like .zip would be fine. Or, you could just give it a time limit.

caspianm · on June 15, 2014

I made the file DanBC referred to above, and wondered about how much computation you could do with iterated decompression (specifically, deflate-type decompression). Could you make a zip file representing a Turing machine, so that after unzipping it it represented the machine after one step of processing, and after unzipping the result, represented the machine after a second step, etc.

Or if not a direct translation of a Turing machine, then some other Turing complete system.

As far as I know, no one has done this, but if it is possible, one implication would be that you couldn't prove whether the process of recursively unzipping a file could continue forever, or would halt.

Oculus · on June 14, 2014

Or for Google-scale storage systems!

zupa-hu · on June 14, 2014

Gmail won't allow you to send this file to anyone, throws an error pointing you to https://support.google.com/mail/answer/6590

Ah, I almost forgot we are not spied on. Tried via local email client.

BCM43 · on June 14, 2014

What? A lot of spam filters will block zip files and the like, include free software and company run ones.

DanBC · on June 14, 2014

Yes. And Google isn't blocking all zip files; or even all password protected zip files. Google just blocks password protected zip files that contain another zipfile.

qnaal · on June 15, 2014

> Mac users, sending a zip file which also contains another zip file increases your chances of file corruption. We recommend that you de-compress all files first and create only one zip file.

what the fuck?

0x0 · on June 15, 2014

Yeah, that's pretty crazy. If zipping a zip causes "corruption", they should probably file a bug at bugreport.apple.com

JetSpiegel · on June 15, 2014

I like that they "recommend" something, but what they really mean is demand.

IvanK_net · on June 14, 2014

All files at the same "level" are identical. The 4GB file contains zeros only. I think these "data" could be compressed into even smaller size using some different methods.

DanBC · on June 15, 2014

It would be interesting to see how small it could be made.

I posted a link earlier, but here it is again with a quote:

http://www.maximumcompression.com/compression_fun.php

> 'What is the best compressor to get really extreme compression?' is a question often asked on the internet. The achieved compression ratio of course depends on the quality of the compressor used, but the type of data that is being compressed is much more important. To show this I created a tiny 115 byte rar file. When you decompress it, it will turn into a textfile of almost 5 MB (a compression ratio of 99.998%). Here it is: test.rar. Note: Turtle 0.07 compresses the same file to 49 bytes, Hook09c (with switches 3 1 1 1) to 36 bytes and UHBC (with switches -b128m -m3) to 24 bytes!!.

I enjoy watching this "code golf" style competing to optimise one feature of something.

nsajko · on June 15, 2014

There's this old discussion about zip bombs: https://news.ycombinator.com/item?id=4616081 A certain Caspian Maclean published a gzip quine on the Web in 2004, but I cannot seem to find it.

phiri · on June 14, 2014

It's funny, first got to know about this a few days ago when I came across this [1, 2] interesting thread on Quora.

[1] http://qr.ae/sQnm5

[2] What is the most compressed file ever?

flux_w42 · on June 14, 2014

NOD32 detects it as an Arch-bomb. Made me think of the somewhat similar 'Billion laughs' XML bomb but for archives.

iancarroll · on June 14, 2014

Now we have to ask, how did he create that file without using petabytes of disk space?

scragg · on June 14, 2014

I would guess you would just need to create the single 4G file and zip it. Once it's zipped, copy 16 times, put in a zip file, repeat.

AlyssaRowan · on June 15, 2014

Precisely that.

It's rather a pain in the ass for antivirus products, which tend to have a nesting limit to prevent these kind of shenanigans.

There are much weirder things you can do. With knowledge of what's packing it and if the packer doesn't bail out on expanded files (like, say, LZMA2 used in XZ does, but the older LZMA doesn't), you can make a small file which tickles every worst-case performance/size wrinkle in the cruncher and comes out very large!

I thought executable compressors had reached a plateau until Mentor/TBC & Blueberry/Loonies came up with crinkler - http://crinkler.net/ - which is a compressing linker specialised for demoscene 4ktros, and cheats like a motherfucker.

caspianm · on June 15, 2014

Interesting, is there a description of how it works or cheats?

eps · on June 15, 2014

By piping dd to zip?

level09 · on June 14, 2014

so what's the content of the final files ? just a bunch of redundant zero's and one's ?

escaped_hn · on June 14, 2014

Isn't that the contents of all files?

taspeotis · on June 15, 2014

You think all file contents are redundant? Deep man, deep.

gegenschall · on June 15, 2014

I'm sorry, but I have to say this: I like how this entry had 42 comments before this one: http://i.imgur.com/KmCodSM.png

gdi2290 · on June 15, 2014

42.zip has a weissman score of 42

mundizzle · on June 14, 2014

wow, that has a Weissman Score close to the theoretical limit

untilHellbanned · on June 15, 2014

got nothing on Pied Piper