Seems very limited. I wonder if the same can be achieved with just stable diffusion and neighbor latent walks with very small steps. On the other hand the interpolation techniques with the GigaGAN txt2img produce much higher quality “videos” than this