Its not exactly the same, as this is designed for compression, but there's an excerpt from Infinite Jest about the rise and fall of "videophony", wherein the users would get Video-Physiognmoic Dsyphoria which was just anxiety from having to suddenly be presentable in a previously audio-only interaction. The 'solution' if I recall correctly was the marketing of videophony specific make-up, followed by pre-make-upped latex masks, followed by beautified filters of the user, finally ending up at full 3D rendered perfect representations of the user that covered the camera and screen such that the only thing viewing the interaction was each others fake avatar.
One of Vernor Vinge's books has ultra-low-bandwidth videoconferences that are described in approximately the same way- the computer is passed the absolute minimum of data to try and generate an image that sort of approximates the interlocutor, and when the bitrate gets too low it can end up straight-up hallucinating.
In the movie Surrogates, people have physical robot bodies which they pilot through real-world interactions. The surrogate bodies can look like whatever you want. One poor guy has to go outside, and he gets severe social anxiety from people actually being able to look at him.
Very cool tech, especially for bandwidth reduction!
Also, I know a lot of people who're doing makeup / dressing up for video meetings. First order model (e.g. https://github.com/alew3/faceit_live3) is not good enough for those things. Maybe Nvidia's algorithm is? Wonder if there's a project allowing you to record your styled self to train a model witch you can then use to transform your "out-of-bed" natural self into the styled version, haha :D
After a couple of months of staring at each faces and probably seeing most "configurations" of ones appearance and their room, people now just don't bother to switch on cameras. Not sure if people would like to stare at fake heads when they don't particularly enjoy the real ones...
Maybe it could be more fun if you could choose like fantasy attributes of your character e.g. armor, extra head, two noses etc.
Personally and totally off topic... what I'd really like to see in a video synthesizer is something that takes my webcam input, detects eye position and pastes googly eyes onto my head.
They should now focus on eyes, because so far it is too easy to tell which is fake. Great development nonetheless!
Soon we will be able to program an expert system that will join a Zoom meeting while we will be free to do other things and then we would get meeting minutes and resolutions.
The transmitting side should be able to compare this version to the ground truth and if it diverges too much then it tells the receiving end to fall back to “normal” blocky compression artifacts instead of keeping the eerie one.
So long as the face-specific-compression faithfully reproduces the ground truth, it should be fine. In a way it’s similar to voice-specific compression for audio. Knowing what’s transmitted (a head) is information that should be used.
I’d love to see one of these algorithms used for other context specific areas where there is much less to “lose”: sports. Compressed streams of a grass pitch with players running after a ball has horrible compression artifacts when the camera pans at low bitrates. But the receiver should know what the pitch looks like where the camera pans - it’s static, and we had it on screen a moment ago!
If you are using a compression algorithm, you want to use optimal algorithm, right? That is performing compression nearly optimally, right? Well, this is a step in this direction.
I disagree. I am often in calls with half a dozen folks from a customer, none of whom I've met before, all with English names that I have more trouble telling apart than my native German names, and due to their audio setup and the different language I might have trouble distinguishing their voices. Worse, Microsoft teams just shows a bubble with initials in it. Having some kind of visual anchor, like this kind of virtual representation, would really help me remember people and calls a lot better, and nobody would have to share their video if they just got out of bed.
On Google Hangouts, people can upload a static avatar image that shows in place of their initials. And when they speak, there is a visual sound level indicator so you can tell which person is talking. I haven't tried this in Teams but the process looks very similar.
I agree. In addition to a visual anchor, I find that it makes handoffs a lot easier - there are visual queues that can be used to see when someone is about to talk, which makes it easier to avoid talking over each other and knowing when to stop talking.
I'm hearing impaired. The visual channel contains useful information for people like me, at least when there's adequate framerate and quality. (I have a feeling something like this synthesized video is going to be an anti-pattern for accessibility though in practice.)