I understand that Sora is very popular, so it makes sense to refer to it, but when saying it is similar to Sora, I guess it actually makes more sense to say that it uses a Diffusion Transformer (DiT) (https://arxiv.org/abs/2212.09748) like Sora. We don't really know more details on Sora, while the original DiT has all the details.
Is anyone else struck by the similarities in textures between the images in the appendix of the above "Scalable Diffusion Models with Transformers" paper?
If you size the browser window right, paging with the arrow keys (so the document doesn't scroll) you'll see (eg, pages 20-21) the textures of the parrot's feathers are almost identical to the textures of bark on the tree behind the panda bear, or the forest behind the red panda is very similar to the undersea environment.
Even if I'm misunderstanding something fundamental here about this technique, I still find this interesting!