Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A makeup influencer I follow noticed youtube and instagram are automatically adding filters to his face without permission to his videos. If his content was about lip makeup they make his lips enormous and if it was about eye makeup the filters make his eyes gigantic. They're having AI detecting the type of content and automatically applying filters.

https://www.instagram.com/reel/DO9MwTHCoR_/?igsh=MTZybml2NDB...

The screenshots/videos of them doing it are pretty wild, and insane they are editing creators' uploads without consent!





The video shown as evidence is full of compression artifacts. The influencer is non-technical and assumes it's an AI filter, but the output is obviously not good quality anywhere.

To me, this clearly looks like a case of a very high compression ratio with the motion blocks swimming around on screen. They might have some detail enhancement in the loop to try to overcome the blockiness which, in this case, results in the swimming effect.

It's strange to see these claims being taken at face value on a technical forum. It should be a dead giveaway that this is a compression issue because the entire video is obviously highly compressed and lacking detail.


You obviously didn't watch the video, the claims are beyond the scope of compression and include things like eye and mouth enlargement, and you can clearly see the filter glitching off on some frames.

Someone in the comments explained that this effect was in auto translated videos. Meta and YT apparently use AI to modify the videos to have people match the language when speaking. Which is a nightmare on its own, but not exactly the same.

I've come across these auto translated videos while traveling, and actually found them quite helpful. Lot of local "authentic" content that I wouldn't have seen otherwise.

Its all kinds of annoying if you’re bilingual. Youtube now autotranslates ads served in my mother tongue to English and I have not found a way to turn it off.

I really hate them. Once again, Google have completely failed to consider multi-lingual people. Like Google search, even if you explicitly tell it what languages it should show results in, it's often wrong and only gives results in Russian when searching in Cyrillic, even for words that do not exist in Russian but do in the language defined in the settings.

Also the voice is pretty unemotional and nothing to do with the original voice. And it being a default that you can't even seem to disable...


Last night, I came across a video with a title in English and an "Autodubbed" tag. I assumed it would be dubbed into English (my language) from some other language. But it wasn't. It was in French, and clearly the creator's original voice. The automatic subtitles were also in French. I don't know what the "Autodubbed" tag meant, but clearly something wasn't working.

I am by no means fluent in French, but I speak it well enough to get by with the aid of the subtitles, so that was fine. In an ideal world, I'd have the original French audio with English subtitles, but that did not appear to be an option.


Recently they added a setting for default language

But I'm fluent in multiple, and wouldn't want a video in a language I'm fluent in to be shittily AI dubbed to another language.

There are some very clear examples elsewhere. It looks as if youtube applied AI filters to make compression better by removing artifacts and smoothing colors.

> There are some very clear examples elsewhere.

Such as?

This seems like such an easy thing for someone to document with screenshots and tests against the content they uploaded.

So why is the top voted comment an Instagram reel of a non-technical person trying to interpret what's happening? If this is common, please share some examples (that aren't in Instagram reel format from non-technical influencers)


The TFA.

Rheet Shull's video is quite high quality and shows it.

When it was published I went to Youtube's website and saw Rick Beato's short video mentioned by him and it was clearly AI enhanced.

I used to work with codec people and have them as friends for years so what TFA is talking about is definitely not something a codec would do.


> So why is the top voted comment an Instagram reel of a non-technical person trying to interpret what's happening?

It's difficult for me to read this as anything other than dismissing this person's views as being unworthy of discussing because they are are "non-technical," a characterization you objected to, but if you feel this shouldn't be the top level comment I'd suggest you submit a better one.

Here's a more detailed breakdown I found after about 15m of searching, I imagine there are better sources out there if you or anyone else cares to look harder: https://www.reddit.com/r/youtube/comments/1lllnse/youtube_sh...

To me it's fairly subtle but there's a waxy texture to the second screenshot. This video presents some more examples, some of them have are more textured: https://www.youtube.com/watch?v=86nhP8tvbLY


Upscaling and even de-noising is something very different to applying filters to increase size of lips/eyes...

It's a different diagnosis, but the problem is still, "you transformed my content in a way that changes my appearance and undermines my credibility." The distinction is worth discussing but the people levying the criticism aren't wrong.

Perhaps a useful analogy is "breaking userspace." It's important to correctly diagnose a bug breaking userspace to ship a fix. But it's a bug if its a change that breaks userspace workflows, full stop. Whether it met the letter of some specification and is "correct" in that sense doesn't matter.

If you change someone's appearance in your post processing to the point it looks like they've applied a filter, your post processing is functionally a filter. Whether you intended it that way doesn't change that.


Well, this was the original claim: > If his content was about lip makeup they make his lips enormous and if it was about eye makeup the filters make his eyes gigantic. They're having AI detecting the type of content and automatically applying filters.

No need to downplay it.


I didn't downplay it, I just wasn't talking about that at all. The video I was talking about didn't make that claim, and I wasn't responding to the comment which did. I don't see any evidence for that claim though. I would agree the most likely hypothesis is some kind of compression pipeline with an upsampling stage or similar.

ETA: I rewatched the video to the end, and I do see that they pose the question about whether it is targeted at certain content at the very end of the video. I had missed that, and I don't think that's what's happening.


This is an unfair analysis. They discuss compression artifacts. They highlight things like their eyes getting bigger which are not what you usually expect from a compression artifact.

If your compression pipeline gives people anime eyes because it's doing "detail enhancement", your compression pipeline is also a filter. If you apply some transformation to a creator's content, and then their viewers perceive that as them disingenuously using a filter, and your response to their complaints is to "well actually" them about whether it is a filter or a compression artifact, you've lost the plot.

To be honest, calling someone "non-technical" and then "well actually"ing them about hair splitting details when the outcome is the same is patronizing, and I really wish we wouldn't treat "normies" that way. Regardless of whether they are technical, they are living in a world increasingly intermediated by technology, and we should be listening to their feedback on it. They have to live with the consequences of our design decisions. If we believe them to be non-technical, we should extend a lot of generosity to them in their use of terminology, and address what they mean instead of nitpicking.


> To be honest, calling someone "non-technical" and then "well actually"ing them about hair splitting details when the outcome is the same is patronizing, and I really wish we wouldn't treat "normies" that way.

I'm not critiquing their opinion that the result is bad. I also said the result was bad! I was critiquing the fact that someone on HN was presenting their non-technical analysis as a conclusive technical fact.

Non-technical is describing their background. It's not an insult.

I will be the first to admit I have no experience or knowledge in their domain, and I'm not going to try to interpret anything I see in their world.

It's a simple fact. This person is not qualified to be explaining what's happening, yet their analysis was being repeated as conclusive fact here on a technical forum


"The influencer is non-technical" and "It's strange to see these claims being taken at face value on a technical forum," to me, reads as a dismissal. As in, "these claims are not true and this person doesn't have the background to comment." Non-technical doesn't need to be an insult to be dismissive. You are giving us a reason not to down weight their perspective, but since the outcome is the same regardless of their background, I don't think that's productive.

I don't really see where you said the output was "bad," you said it was a compression artifact which had a "swimming effect", but I don't really see any acknowledgement that the influencer had a point or that the transformation was functionally a filter because it changed their appearance above and beyond losing detail (made their eyes bigger in a way an "anime eyes" filter might).

If I've misread you I apologize but I don't really see where it is I misread you.


The difference is wether the effect is intentional or not.

"Non-technical" isn't an insult.

What you call "well actually"ing is well within limits on a technical forum.


From a technical standpoint it's interesting whether it's deliberate and whether it's compression, but it's not a fair criticism of this video, no. Dismissing someone's concerns over hair splitting is text book "well actually"ing. I wouldn't have taken issue to a comment discussing the difference from a perspective of technical curiosity.

> Dismissing someone's concerns

I agreed that the output was bad! I'm not dismissing their concerns, I was explaining that their analysis was not a good technical explanation for what was happening.


I can hear the ballpoint pens now…

This is going to be a huge legal fight as the terms of service you agree to on their platform is “they get to do whatever they want” (IANAL). Watch them try to spin this as “user preference” that just opted everyone into.


That’s the rude awakening creators get on these platforms. If you’re a writer or an artist or a musician, you own your work by default. But if you upload it to these platforms, they own it more or less. It’s there in the terms of service.

What are they going to do though, go to one of the ten competing video hosting platforms?

Yeah, we decided there is only YouTube and only YouTube.

Also, no one else can bear the shear amount of traffic and cost


nonexistent

What if someone else uploads your work?

Section 230 immunity for doing whatever they want, as long as they remove it if you complain.

Do they also remove it from the AI model weights they trained on it while it was uploaded?


One of the comments on IG explains this perfectly:

"Meta has been doing this; when they auto-translate the audio of a video they are also adding an Al filter to make the mouth of who is speaking match the audio more closely. But doing this can also add a weird filter over all the face."

I don't know why you have to get into conspiracy theories about them applying different filters based on the video content, that would be such a weird micro optimization why would they bother with that


I doubt that’s what’s happening too but it’s not beyond the pale. They could be feeding both the input video and audio/transcript into their transformer and it has learned “when the audio is talking about lips the person is usually puckering their lips for the camera” so it regurgitates that.

Some random team or engineer does it to get a promo.

There is no option to turn that off? Or they even don't publish those things anywhere??

That's actually hilarious

This is ridiculous

This is an experiment in data compression.

What type of compression would change the relative scale of elements within an image? None that I'm aware of, and these platforms can't really make up new video codecs on the spot since hardware accelerated decoding is so essential for performance.

Excessive smoothing can be explained by compression, sure, but that's not the issue being raised there.


> What type of compression would change the relative scale of elements within an image?

Video compression operates on macroblocks and calculates motion vectors of those macroblocks between frames.

When you push it to the limit, the macroblocks can appear like they're swimming around on screen.

Some decoders attempt to smooth out the boundaries between macroblocks and restore sharpness.

The giveaway is that the entire video is extremely low quality. The compression ratio is extreme.


One that represented compressed videos as an embedding that gets reinflated by having gen AI interpret it back into image frames.

AI models are a form of compression.

Neural compression wouldn't be like HVEC, operating on frames and pixels. Rather, these techniques can encode entire features and optical flow, which can explain the larger discrepancies. Larger fingers, slightly misplaced items, etc.

Neural compression techniques reshape the image itself.

If you've ever input an image into `gpt-image-1` and asked it to output it again, you'll notice that it's 95% similar, but entire features might move around or average out with the concept of what those items are.


Maybe such a thing could exist in the future, but I don't think the idea that YouTube is already serving a secret neural video codec to clients is very plausible. There would be much clearer signs - dramatically higher CPU usage, and tools like yt-dlp running into bizarre undocumented streams that nothing is able to play.

If they were using this compression for storage on the cache layer, it could allow more videos closer to where they serve them, but they decide the. Back to webm or whatever before sending them to the client.

I don't think that's actually what's up, but I don't think it's completely ruled out either.


That doesn't sound worth it, storage is cheap, encoding videos is expensive, caching videos in a more compact form but having to rapidly re-encode them into a different codec every single time they're requested would be ungodly expensive.

The law of entropy appears true of TikToks and Shorts. It would make sense to take advantage of this. That is to say, the content becomes so generic that it merges into one.

Storage gets less cheap for short-form tiktoks where the average rate of consumption is extremely high and the number of niches is extremely large.

A new client-facing encoding scheme would break utilization of hardware encoders, which in turn slows down everyone's experience, chews through battery life, etc. They won't serve it that way - there's no support in the field for it.

It looks like they're compressing the data before it gets further processed with the traditional suite of video codecs. They're relying on the traditional codecs to serve, but running some internal first pass to further compress the data they have to store.


The resources required for putting AI <something> inline in the input (upload) or output (download) chain would likely dwarf the resources needed for the non-AI approaches.

If any engineers think that's what they're doing they should be fired. More likely it's product managers who barely know what's going on in their departments except that there's a word "AI" pinging around that's good for their KPIs and keeps them from getting fired.

> If any engineers think that's what they're doing they should be fired.

Seriously?

Then why is nobody in this thread suggesting what they're actually doing?

Everyone is accusing YouTube of "AI"ing the content with "AI".

What does that even mean?

Look at these people making these (at face value - hilarious, almost "cool aid" levels of conspiratorial) accusations. All because "AI" is "evil" and "big corp" is "evil".

Use occam's razor. Videos are expensive to store. Google gets 20 million videos a day.

I'm frankly shocked Google hasn't started deleting old garbage. They probably should start culling YouTube of cruft nobody watches.


Videos are expensive to store, but generative AI is expensive to run. That will cost them more than storage allegedly saved.

To solve this problem of adding compute heavy processing to serving videos, they will need to cache the output of the AI, which uses up the storage you say they are saving.


https://c3-neural-compression.github.io/

Google has already matched H.266. And this was over a year ago.

They've probably developed some really good models for this and are silently testing how people perceive them.


If you want insight into why they haven't deleted "old garbage" you might try, The Age of Surveillance Capitalism by Zuboff. Pretty enlightening.

I'm pretty sure those 12 year olds uploading 24 hour long Sonic YouTube poops aren't creating value.

I’m afraid to search… what exactly is a “24 hour long sonic Youtube poop?”

1000 years from now those will be very important. A bit like we are now wondering what horrible food average/poor people ate 1000 years ago.

Totally. Unfortunately it's not lossless and instead of just getting pixelated it's changing the size of body parts lol

Probably compression followed by regeneration during decompression. There's a brilliant technique called "Seam Carving" [1] invented two decades ago that enables content aware resizing of photos and can be sequentially applied to frames in a video stream. It's used everywhere nowadays. It wouldn't surprise me that arbitrary enlargements are artifacts produced by such techniques.

[1] https://github.com/vivianhylee/seam-carving


I largely agree, I think that probably is all that it is. And it looks like shit.

Though there is a LOT of room to subtly train many kinds of lossy compression systems, which COULD still imply they're doing this intentionally. And it looks like shit.


It could be, but if compression is codecs, usually new codecs get talked about on a blog.

> This is an experiment

A legal experiment for sure. Hope everyone involved can clear their schedules for hearings in multiple jurisdictions for a few years.


As soon as people start paying Google for the 30,000 hours of video uploaded every hour (2022 figure), then they can dictate what forms of compression and lossiness Google uses to save money.

That doesn't include all of the transcoding and alternate formats stored, either.

People signing up to YouTube agree to Google's ToS.

Google doesn't even say they'll keep your videos. They reserve the right to delete them, transcode them, degrade them, use them in AI training, etc.

It's a free service.


That's the difference between the US and European countries. When you have SO MUCH POWER like Google, you can't just go around and say ItSaFReeSeRViCe in Europe. With great power comes great responsibility, to say it in American words.

Its not the same when you publish something on my platform as when i publish something and put your name on it.

It is bad enough we can deepfake anyone. If we also pretend it was uploaded by you the sky is the limit.


"They're free to do whatever they want with their own service" != "You can't criticize them for doing dumb things"

Ye it is such a strange and common take. Like, "if you don't like it why complain?".



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: