I don't know that the distrust in similarity between images of noise is unwarran...

I don't know that the distrust in similarity between images of noise is unwarranted. They are pseudo-random, not purely random. Each is not independent, but is a pile of sequential draws in a row, perhaps 16384. There are thought to be shortcuts that the NSA uses to quickly short-circuit encryption, and is alleged to have salted public methods so as to make that job easier. Random-number-generation and encryption are related to each other:a properly encrypted chunk of data looks nearly exactly like pure random noise, as does a good pseudo-random number. I would not be surprised if there were similarities that the mathematical methods find that the human eyeball does not.

This feels like a minimum description length problem. I think that if the agents had to use hierarchical descriptors, thinking of cat as some assembly of tail, legs, body, head, eyes, ears, mouth, and all, that an internal hierarchy would show up in the communication, and a divergence between the training and communicated hierarchies would have a better defining contrast at showing an inferred structure.