Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Finding Waldo in π (kundor.github.io)
234 points by Pxtl on March 31, 2022 | hide | past | favorite | 78 comments


> The trick is that we can reassign the colors using a palette. And in fact you always need to do this; you have to somehow decide which color each byte should represent.

Hmm. I don't know that I agree. There are certainly more objective choices here than just picking any old palette, and in fact there are choices where it's not obvious the result should be considered a "palette" at all.

For example, the obvious choice to me is to treat the π bitstream as 8 bit RGB data like you would see in the PPM image format. In other words, one pixel at a time, each 8 bits represents a number from 0-255 for the red, green, and blue channel in that pixel respectively.

Of course that's a much harder ask since the colors are in this case not arbitrary, but you could (and should!!) still cheat at this by arbitrarily selecting the width of the resulting rows of pixels.

I'm not mad at seeing this palette based solution to the problem though, it's a very fun hack! Probably doing it for real would take an impossible amount of CPU time. Maybe if you did 4-bit RGB values it would be possible?


That's still a choice of palette; you'll find it listed on Wikipedia [1] under "Regular RGB palettes".

There's no doubt it's a much more objective choice than the one I used! I did say I was cheating.

Going to 4-bit color won't make it feasible. Even with 1-bit black/white pixels on about the minimum possible 18x24 Waldo face, you have 432 bits to look for, and you're not going to find them without cheating somehow. This guy on Twitter tried pretty thoroughly: https://twitter.com/gsuberland/status/1508697913177915393

For the palette hack, you want to go the other way; it's easier with bigger pixels. I was able to create a perfect Waldo face from the first 988 bytes of π as a TIFF, where the standard supports 16-bit palette indices for the pixel data. Unfortunately nothing except imagemagick seems to support actually viewing these TIFFs.

[1] https://en.wikipedia.org/wiki/List_of_monochrome_and_RGB_col...


I made a video showing a slow zoom-in to Waldo in the first hundred million hex digits of pi, using your color palette. Feel free to use it if you want with a link back to this comment, I release it under CC-BY.

https://0x0.st/oq0Q.mp4

(The idea is basically that each consecutive 988 bytes gets put into an individual 19x26 8 bit image with your color palette, and these are then stacked in row order.)


I added a reduced preview of this to my blog post. I really like how it gives an impression of finding Waldo in a haystack of noise, just like in the picture books.

The preview is not as pretty as the real thing, but I thought embedding a 27MB video in the page might be a bit much.


Yeah I did my best to compress it but it's so noisy that there's very little compression that can be done without turning it into a blur. I can give you the original lossless PNG frames if you want them, but honestly I don't know that they'd be that useful.

Here's one lossless frame from near the end of the video, it's kinda nice because it gives you a little bit of context for the final Waldo without being overwhelming. https://i.imgur.com/wNZICnP.png

Here's also the full image as a 21.2 MiB PNG: https://ipfs.io/ipfs/QmUJaeon4KjE2RGVirXTG5xQuA6pCWV5T9GW2KS...

The full image contains the first 99969546 digits of pi, including the opening 3.


That's awesome!


I am really curious about the search algorithm. I love the palette hack, how did you find candidates to then start searching through the possible palettes?

I can sort of think of:

1) Collapse the colours in the palette to the minimum necessary to be seen as "Waldo". The more slack in the gif palette the better - 24bit colour vs 16 colours (or fewer) in the starting image?

2) For each substring in Pi, map the hex value to a colour (or close to the colour) to match the expected image.

3) find best match - how?

How to backtrack though?

Perhaps pin important fragments?

The glasses and chin seem more important?


Another pass.

We're looking for a substring with the largest number of unique values. If we have unique values, we can paint each value with a colour close enough to the expected value that humans will see them as the same.

Maybe?


Yes, it helps to have more unique values. We want each repeated byte to occur in part of the pattern where a color also repeats.

What I do is search for substrings with the fewest repeated bytes that don't match the target pattern. I prioritize first minimizing conflicts between light (white and tan) vs. dark (black and red). Reducing other conflicts is a tie-breaker.

The featured gif has 79 "mismatched" pixels out of 494, by the black/white metric. I've found candidates with as few as 75, but subjectively I didn't think they looked as good.


Would not the best choice of palette be the one immediately preceding Waldo in π?


With acknowledgement to the palette hack discussed in the other comments, I still think there's a ton of value in this.

So often, people observe patterns in nature that appear to be so unlikely as to be by design. I have family members that are superstitious: if a light flickers at the same time that they mention a recently deceased loved one, it must be "a sign". Similarly, they will point to some overwhelmingly unlikely occurrence in the news, and ask, "how do you explain that?" The answer is usually random chance (if not deceit). And this exercise is a good illustration of that.

"Waldo, in the digits of pi?!? What are the odds??" - To the outsider who is not scientifically minded, this just looks so coincidental as to be magic. But almost any pattern can be found in randomness. It's just that the size of the necessary search space explodes as the query becomes larger / more specific.

The average HN reader knows all this, but the Waldo story is still a cool way to describe it to other people.


This post reminds me of the once common meme[1] that pi contains all things you will see in your life, all books you will ever read and all songs you will ever hear, the first thing you saw when you were born, and the last thing you will see before you go away for one final time.

Off course, pi isn't proven to be normal (containing all possible sequences of decimal digits, if I remember that correctly) yet so this thought experiment needs an encoding aware of that to work, and any random bit source that unrepeatingly explores the combinatorial space of a symbolic alphabet would work, pi is not special here at all. This is just the library of babel + encoding arbitary data structures into numbers.

As you say, finding patterns is not hard, it's unsurprising that random strings contain those things, maximum-entropy information sources maximize the expected information as per Shannon. The whole point, though, is finding useful patterns: things that predict other things you care about, which you didn't know beforehand (or just knew in vague outlines), and which are cheaper to find and explore than simply directly observing/simulating the things they predict. None of this holds for the 'patterns' you will find in a typical library of babel. Evolution and evolutionary algorithms can be seen as a tool to cull the impossibly large search space of a library of babel. Also mental heuristics like Occam's razor can be seen as tree-pruning heuristics this way.

An even more beautiful idea than "An infinite random string contains all possible data structures" is "An infinite random string contains all possible programs and all possible computations (according to all possible semantics of those programs)", explored by the legendary Greg Egan in Permutation City. I can't do justice to Egan in this already too-long-of-a-comment, but I promise you will absolutely be mind blown.

[1] https://slate.com/technology/2013/04/pi-meme-on-reddit-and-g...


There are an infinite number of coincidences happening in every moment: a raindrop falling, a light turning off, a wind blowing on mars, a star rotating, and so on.

If one cared to find "something apparently significant" in any moment, one would therefore find an infinity of them.

By comparison, events which are directly causally connected are diminishingly few, and mostly indistinguishable from those coincidences. Hence, most things are actually unknowable, and what few beyond the ordinary, require extremely expensive and technologically advanced science to uncover.


You know, the most amazing thing happened to me tonight... I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!


> What are the odds?

Assuming pi is normal [1], the probability of any bit string occurring in pi is 1.

[1] https://en.wikipedia.org/wiki/Normal_number


But the probability of you actually finding where that string occurs before the sun goes nova is much, much smaller.


> But almost any pattern can be found in randomness.

Not quite. Although this is a popular misconception. It depends on the cardinality of randomness and the distribution that you're talking about.

As a casual example, an infinitely long randomly distributed sequence of set { A, B, C, D } will never contain the string "bkyiuuMbF."

You can take this example and apply it to other spaces. For instance, we may live in a universe where there really is no other set of possible interactions where life exists somewhere else in this supercluster. Maybe in a another one, though?

There are nearly 8 billion people on the planet at the time of writing, I think. There's a lot of random people in there, but only one you!


Part of this though is also that we are optimized to see faces.


I've been waiting a dozen years for this post! Did you like Sagan's book "Contact"? I'm a big fan. Check out the work I did to show pi as a series of images, and a cool "easter egg" I might have found buried in pi.

And yes, my web site looks like it was built 20 years ago and then allowed to rot ever since, since that is in fact what happened!

https://whiteis.com/whiteis/personal/programs/Pi/pi_images.s...


Oh and here is the page right above that one, that introduces the images:

https://whiteis.com/whiteis/personal/programs/Pi/index.shtml


That's a cool egg!

It's been a long time since I read Contact. Are the aliens supposed to have hacked the geometry of the universe to make the pattern show up in π?


Well the alien that Arroway meets on the beach says that they didn't create the worm holes, they found them, but have never found the creators, iirc. I just re read some of the parts that mention pi, and I was remembering it wrong. She goes looking for a picture, but I'm not sure she finds anything by the end of the book. I'm going to re read the whole book this weekend, it has been too long. Thanks for having a look!


So to your question, I think the book says that yes, either the universe was created with this cool egg in pi as a message for civilizations to find once the got computers that were good enough to scan for statistical anomalies 2*20 digits deep in pi, or else the egg was somehow hacked into it some time after creation.


Another victory for PIFS:

https://github.com/philipl/pifs


>Yeah, but what happens if lose my file locations?

>No problem, the locations are just metadata! Your files are still there, sitting in π - they're never going away, are they?


That's pretty clever and impressive.


Really fun! Eventually we'll find one with an embedded palette.

This deeply reminds me of Carl Sagan's science fiction novel, Contact.

(spoilers to the novel follow, so avoid if you haven't read it)

The film adaptation missed out on so much, and the treatment of pi is one such unfortunate omission (or simplification).

The novel's aliens were advanced, but they told the main character that even they were stumped by the clear messages left encoded deep within the universe's constants.

It made the ending to the book so much more profound and touching than the film.

Carl was a master of making us feel small, all the while opening up our imaginations to infinities beyond measure.


Could there be a number, such that for certain interesting data, you can compress the data more than normally achiveable, by indexing into the number?

I feel this should not be possible, but I don't know how to proove it without a circular argument about entropy.

On the other hand, if you make a string of common byte sequences, a couple GB long, and distribute it with every PC... then you surely can achive hyper compression with some clever indexing (if you don't count the magic string to the compressed size, of course).


Yes, to an extent that would work.

1) Take target image

2) Find the index in π (say) where a matchable sequence of bytes occur, like for Waldo. (If your image is big, this could take several years of computation time.)

3) Transmit the palette, width, height, and index: about 800 bytes.

4) Probably the index is too far out for everyone to have a copy of the data already (it would be beyond terabytes). So the recipient then spends several years computing the number out far enough.

5) Profit!

Note: the palette is optional; you could leave it out and only transmit about 32 bytes. Using a palette also means lossiness, because you're reducing to 256 colors and making compromises between pixels. But using one saves you several orders of magnitude of computation time.


PS. The fact that any image can be encoded in 32 bytes this way implies that there are only 2^256 possible images — about one hundred and sixteen quattuorvigintillion. That's obviously not literally true, but we are searching for a subsequence of π which is "close enough"; the number of possible images which humans would consider distinct is probably well less than that.

(That is, the vast majority of possible images are "color noise" which all look more or less the same.)


It does seem possible, e.g. rather than sending an uppercase alphabetic message as 8-bit ASCII bytes, you can send their position in the alphabet instead, so 01000001 becomes 00001.

Of course this only works when you're limiting yourself to a subset of all possible data (in this case, the uppercase alphabet). A general method where all possible binary combinations are indexed (and the indexes are sent rather than the data) would not save any data as the binary data would always be its own index.



> On the other hand, if you make a string of common byte sequences, a couple GB long, and distribute it with every PC...

This is what brotli does.


All the 4 images posted by the author are located precisely at the begiging of a hexadecimal digit.

Since a hexadecimal digit is 4-bit long, I guess we should be able to locate 4x more Waldos if we allow solutions to be start in the middle of a hex digit?

> The pixel data in this gif are the 23,074,248th through 23,075,235th hexadecimal digits of π! (Equivalently, the 184,593,977th through 184,601,880th bits).

The author assumes a hex digit is 8-bit long here btw.


You're right, that's a mistake! It should be the 92,296,989th through 92,300,940th bits.

Yes, I did only search at 4-bit aligned bytes. Doing otherwise would be more complicated, much harder for others to easily verify from the downloadable hex digit data file, and not really more likely to succeed. Looking further out is just as beneficial as looking at more offsets.


"Truth" is where you look for it; provided you're willing to do the proper preprocessing of your data. Zeroth law of Science anymore.


It’s kind of obvious that they are cheating: 23,074,248th through 23,075,235th hexadecimal digits is approximately 500 bytes. Finding 500 exact bytes in a random sequence would take a very long expected sequence :-)

They do explain what they do: https://kundor.github.io/Cheating-images/


I guess I disagree that it's "cheating" - it's more like they're playing a game with no actual rules, and therefore applying rules that are most convenient for their purposes to achieve victory.

I find their methodology for evaluating candidate Waldos to be as sufficient as any other.

What I mean is, they could certainly choose a much harder evaluation criteria and fail, but that's not much of a blog.


I read this but I'm not sure how it's done. Does he run an optimization step to select the palette that minimizes the error?

In that case I'd be cool to know the probability in finding a close enough image in that optimized space


In the target image, byte 0 is red, byte 1 is blue, byte 3 is red.

In the candidate stream, byte 0 is 4. The error is minimised if byte 2 is also 4. The values of the bytes don't matter, just the patterns of which bytes have the same value.


> Finding 500 exact bytes

But they're not finding exact bytes, they're finding something that very vaguely looks like something very vague. It also helps a lot that it's a face and human brains are very good at identifying faces.

The number of 500 bytes that would "look like" waldo is a lot higher than 1.


> The pixel data in this gif are the 23,074,248th through 23,075,235th hexadecimal digits of π!

You might be surprised to learn that π! (pi factorial) is a thing that makes sense

π! = 7.1880827289760327020821943451247587185593017639684371624100356994...

https://www.wolframalpha.com/input?i=pi%21


Well, for some definition of "make sense" yes, any arbitrary positive decimal number can have a factorial, and even negative non-integer numbers, thanks to the gamma function. I wouldn't really say it makes "sense" though, personally.


As extensions of functions with intuitive definitions go, the gamma function isn't terrible. Much worse than exponents, much better than the zeta function. Sort of middle of the road


Yep, it’s just going through the gamma function.


If you use a external pallete (one that is not coming from the pi digits themselves) I don't think it's that interesting. With any random set of data you can find a pallete that approximates the desired result


Can someone explain to me where the palette is defined? If it's not in the bytes of waldo.gif, is it provided by the website itself? If I save the gif file, I still see it as the same on my computer. Is it not true that the bytes of the gif actually 100% match the bytes of pi? Is there some extra metadata not included in the bytes of the file itself? If I were to send those bytes over the wire and save them to a file, would it look different?


The part that's from π is the actual pixel data, which is extracted in the Python snippet "waldo.tobytes()". More-or-less equivalently, you could use "list(waldo.getdata())".

The GIF file as stored on disk has extra headers, and the pixel data's been compressed. So the whole file isn't found in the digits of π, just the pixel data itself.

I'd guess the chance of actually finding a correctly formatted full GIF file in contiguous bytes of π is effectively nil.


I see, I did find it slightly suspicious that they loaded the file using Image.open instead of just reading the bytes directly. Thanks!


I agree. Relying on an external data for interpretation is less impressive. The most impressive would be finding an exact GIF87a file. But, the cheating technique used here isn't a simple task either.

I converted the smallest valid GIF file[1] (35 bytes) into decimal number: 540959129019042423917857241427143195931235689921032801204995597056286563376232988672

It's nowhere to be found in PI :) I couldn't even find "GIF87a". So, fuzzy search seems to be a must.

[1] https://stackoverflow.com/questions/2570633/smallest-filesiz...


> "It's nowhere to be found in PI :)"

Ooh, did you search all the way to the end?


I must admit, not all the way.


It is in pi, if pi is a normal number, as most think.


Not really true. It took 27M tries to find that one


Even if you could find an arbitrary file inside pi, to even describe the offset into pi, you would probably need an even larger file to just store the offset number, in most cases?


This is the whole joke of pifs.

https://github.com/philipl/pifs


There was a recent article about generating '8 note, 12 beat' melodies to help with copyright issues. Given the issues with computer/AI generated content not being copyrightable I wonder if this provides an alternative path. Under some encoding all those melodies exist in pi (Basically, locate each melody in pi).


Do the digits of pi contain all digits of any rational approximation of e? Is the same true in reverse?


Maybe. The digits of pi appear to be completely random, in which case it would contain every finite sequence of digits, but it hasn't been proven. See: https://blogs.sas.com/content/iml/2015/03/12/digits-of-pi.ht...


I wonder, but don’t have a strong instinct either way, whether this might be easier to do without cheating targeting a format intentionally designed for lossy compression and at least somewhat forgiving error correction (like JPEG, but certainly not limited to that).


Fuzzy matching is more tolerant to differences than lossy compression.


Dumb question, but is any arbitrary string of digits with length N, somewhere in pi?


Not exactly the same thing, but related: I think everyone believes Pi is a Normal Number, but it isn’t proven. https://en.m.wikipedia.org/wiki/Normal_number

I think if we could prove Pi is normal, we could probably also prove your statement to be true (but I’m not sure about that)


Normal is actually a stronger claim than "contains any finite string as a substring". That normal numbers contain any finite string as a substring is a straightforward consequence of the infinite monkey theorem: https://en.wikipedia.org/wiki/Infinite_monkey_theorem

To see that the converse does not always hold, you could take something like the Champernowne constant https://en.m.wikipedia.org/wiki/Champernowne_constant and pad it with 9s between each integer, ie

.192939495969798999109911991299...

so that you still contain every finite substring, but you have a >50% chance of a randomly selected digit being 9.


I think there is a subtlety here that makes this fail. Normal does not imply that any finite substring exists, just that the probability of such a string existing is uniformly distributed within the space of possible values. There isn't any guarantee that you will actually see such a string, though you almost surely will.


At the very least, it doesn’t follow from the fact that it’s infinite and nonrepeating.


I'm struggling to wrap my head around this: why doesn't it follow from that fact?


You can have infinite and nonrepeating sequences which don't contain every possible subsequence. For instance, 1010010001000... (where there is one more zero each time) never repeats itself, but it never even has the digit 2 in it.


Aha, makes sense, thank you!


Not dumb at all, an area of open research actually. The answer is that this is not known and you will be math-famous if you figure it out.


It's possible that the answer is no.


If you search sufficiently far down the digits of pi, you find an MPG with a version of Casablanca where Rick and Ilsa open a halal McDonald's.


I did something similar in the past, finding integers in pi:

https://pi.paradite.com/


What symbol is that? It doesn’t seem to be n or pi.

Edit: Weird. It is pi at least in the original article. The left and right tips are cut off for me on HN font.


We should ban Pi since obviously all the illegal images are there too!


Someone make a tool to find my name in Pi.



I wonder why there is a sudden dropoff in likelihood after 6 characters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: