You need to define "look". How many nanoseconds? How much is the lighting changing during that time? How still is the photo? How still is the person's head? In DL a training example in a batch is perfectly defined in bits. Once you try to define a training example for a human you see that a "single static image" means totally different things. A human seeing a static image is the equivalent of training a model on at least thousands, if not millions+ of training images.
But in an important way, they can't.
You need to define "look". How many nanoseconds? How much is the lighting changing during that time? How still is the photo? How still is the person's head? In DL a training example in a batch is perfectly defined in bits. Once you try to define a training example for a human you see that a "single static image" means totally different things. A human seeing a static image is the equivalent of training a model on at least thousands, if not millions+ of training images.