I think you're missing the point. How does Stable Diffusion know "what does saurik look like?". The answer is of course that it's seen saurik's profile pic in training data. Stable Attribution is not showing that.
As another comment[1] points out:
> This appears to be just looking for the nearest neighbors of the image in embedding space and calling those the source data.
Stylistically similar images are not the same as source images.
I think you may have a point. I can see familiar references and recognize certain artists in ai art without help, yet when I look at the supposed source photos, for the most part, I dont really see it.
Interesting. So it only shows images that were transformed / mixed to get the output, but does not show images used to learn how to transform / select them?
Sounds very much like a human would do it.
If I 'know' how to recognize saurik and I know how anime is supposed to look like, I can check my digital photo library for a picture of saurik and than use that picture as a template to draw an anime version of saurik. If someone later asked me what pictures I used the photo is the only one I'd present. Not the thousands of anime pictures I have seen teaching me what anime looks like, nor the picture my eyes took meeting saurik.
I think saying exactly what an ai of sufficient complexity is doing is a mater for philosophy more than science. But idk if transforming or mixing is how I’d describe what this is doing. In particular it truly does not have a complete representation of any of the images it was trained on. They just wouldn’t fit. It does of course have an understanding of how embeddings relate to images that is informed by the images it’s seen so maybe that counts, but I’m not sure if it’s useful in understanding the limitations or how to improve models like it
> i spoke to law profs about this - the analogy which kept coming up is the vcr. initially basically a piracy machine, it brought to life an enormous content market. had it been banned, creators would have been worse off in the long run.
It’s called Sony v Universal, and the legal doctrine for fair use that resulted is a test for “commercially significant non-infringing use”, of which a tool used for inpainting to remove power lines, latent space psychedelic visuals, and photo booth-painterly-style all are.
Imagine if Stable Diffusion was made illegal. Someone accuses me of using this illegal tool for one of these non-infringing uses, that is an image that doesn’t look like anyone else’s image as far as the court is concerned for copyright. I put the image on my website. If the image itself is not at all infringing, then what is the evidence that Stable Diffusion was used? Should the police be issued a warrant to search my private property for proof that I used Stable Diffusion without a shred of evidence or based on a tool that will always have both false positives and negatives?
I do want to clarify that I think stable diffusion and tools like it can engage in illegal copying. For example it will happily produce infringing images of logos and even somewhat random other images https://arxiv.org/pdf/2212.03860.pdf. It seems like it’s devoting an uneven amount of its weights to different images, but I remain unconvinced that’s all it can do, or at least anymore all it can do than for a human artist
This is what happens when you overtrain a model too. Recent developments have allowed partial sets of model weights called LoRAs to be added to the diffusion model. These models can be fine-tuned independently in under half an hour. If you set the learning rate too high, it will start reproducing the source material with extremely high fidelity. This is what overfitting does.
My conclusion is there is an argument to be made for infringement in some cases, but it's based on degrees instead of absolutes. If infringement is defined as "copyrighted works were used in this dataset", then at a certain point (low enough learning rate) it becomes impossible to tell if infringing data was used. You'd be working with weight amounts that are so miniscule they could be rounding errors, yet by that definition would still be infringing.
And since any arbitrary data can be used with some set of keywords, the standard for what constitutes "infringing" changes with each model. As in, it would probably be hard to have a benchmark test that can definitively state "this model violates copyright." Any number of keywords can be trained on to obfuscate the prompt needed to reproduce the data, assuming there was even a high enough LR for the data to be reproduced similarly enough.
I'm unsure if there can ever be one standard for when a set of a bunch of floating point numbers can pass the threshold for constituting infringement. This is applying an absolute standard to a fuzzy algorithm. It's like compressing a JPEG, at some level of compression on the scale a picture of Mickey Mouse becomes unintelligible. But with JPEGs it isn't really useful to have an unintelligible picture of Mickey Mouse. However, it can be extremely useful to have a LoRA with the weights underfit just enough to where the diffusion gives novel outputs.
> historically, creatives have been among the first to embrace new technologies. ever since the renaissance, artists have picked up every new tool as it's become available, and used it to make great things.
> these people aren't 'luddites'
This is just total bullshit. I know plenty of artists who are embracing this technology to make all sorts of things that tools like SD were not designed to do, like psychedelic music videos, etc.
What the author means is that a few loud blue check marks on Twitter who claim to be artists have been tweeting, get ready for it, inflammatory claims.
As another comment[1] points out:
> This appears to be just looking for the nearest neighbors of the image in embedding space and calling those the source data.
Stylistically similar images are not the same as source images.
[1] https://news.ycombinator.com/item?id=34670483