Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The important parts:

> Alsup ruled that Anthropic's use of copyrighted books to train its AI models was "exceedingly transformative" and qualified as fair use

> "All Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies"

It was always somewhat obvious that pirating a library would be copyright infringement. The interesting findings here are that scanning and digitizing a library for internal use is OK, and using it to train models is fair use.



You skipped quotes about the other important side:

> But Alsup drew a firm line when it came to piracy.

> "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy."

That is, he ruled that

- buying, physically cutting up, physically digitizing books, and using them for training is fair use

- pirating the books for their digital library is not fair use.


> buying, physically cutting up, physically digitizing books, and using them for training is fair use

So Suno would only really need to buy the physical albums and rip them to be able to generate music at an industrial scale?


Yes! Training and generation are fair use. You are free to train and generate whatever you want in your basement for whatever purpose you see fit. Build a music collection, go ham.

If the output from said model uses the voice of another person, for example, we already have a legal framework in place for determining if it is infringing on their rights, independent of AI.

Courts have heard cases of individual artists copying melodies, because melodies themselves are copyrightable: https://www.hypebot.com/hypebot/2020/02/every-possible-melod...

Copyright law is a lot more nuanced than anyone seems to have the attention span for.


> Yes!

But Suno is definitely not training models in their basement for fun.

They are a private company selling music, using music made by humans to train their models, to replace human musicians and artists.

We'll see what the courts say but that doesn't sound like fair use.


My understanding is that Suno does not sell music, but instead makes a tool for musicians to generate music and sells access to this tool.

The law doesn't distinguish between basement and cloud – it's a service. You can sell access to the service without selling songs to consumers.


That's like arguing that a restaurant doesn't sell food because it sells the service of cooking it.


Look up what a cloud kitchen is.


The restaurant is not responsible for E. coli if it’s found, are they? Just cooking it out of the food

Suno can’t prevent humans from copying other humans, it can only make sure that the direct output of its system isn’t infringing.


You may be reaching the limits of the metaphor here, but restaurants are absolutely responsible for the e coli if it's found in significant quantities whether it's in the initial ingredients or the cooked end product. A restaurant is required to vet its suppliers and ensure food safety protocols throughout the entire process with several independent checks at many points, and is ultimately directly responsible if a customer sues. A restaurant does not get to cook bad ingredients well and then point at the supplier. They will find themselves shut down immediately, andpermanently if they do not resolve the situation.

In this context, this would be the equivalent of Suno explicitly placing stop points throughout the training, tokenization, and generation processes to verify that there was absolutely no chance of it generating copyrighted material through some kind of clean room reconstruction test. They would also need those tests to be audited at random by a third party governing body. Obviously they are not doing this, so the metaphor definitely does not track here.


The problem here is there is no “test” that is known to work here other than checking for direct infringement, which they have a responsibility to do (as they don’t have a license to the originals).

Anything remotely beyond that and we have teams of humans adjudicating specific cases: https://library.mi.edu/musiccopyright/currentcases


I mean, I was speaking more to the breakdown of the metaphor than the argument itself but if that's your response then it tells me that there is no reasonable way that Suno can ever really claim fair-use. I can't imagine being one of the artists whose material Suno has trained on and being told: "We have no idea when or if it will generate copyrighted content, or how to test for it. But we will continue to use your material and arbitrate on a case-by-case basis as it is brought to our attention." That sounds insane.

Surely, for Suno to claim fair usage and be given free reign to build a commercial business off of literally anyone's original works then the bare minimum bar for allowing that usage would be: make a satisfactory test to prove that you're always doing something transformative and original, within practical limits.


They charge you by the amount of music you get from them. That's selling music. Selling a tool would be if they charge you once, you download the tool, and you can use it on your computer to generate as much music as you want to pay electricity for.


I can’t buy the music you generate using Suno, though, unless you take action to list it somewhere for sale.


You also can't buy a massage I receive. Does that mean I was sold access to a massage generation tool instead of a massage?


Yes! You were sold a service and not a good, and you cannot copyright the act of the massage itself.


That's because a massage is not a copyrightable expression (dance is), not because it's sold as a service.


We don't infringe - we just sell a service that enables users to create infringing works on demand.


Sure, but if you are just essentially making a copyright infringement tool, and then selling it to people so they can use it to infringe, and then they go and use it to infringe, you're a contributory infringer. Not saying this is exactly what Suno is doing, but just pointing out that you can be an infringer without "selling songs to consumers"


When you use a DAW to recreate a favorite song for learning, should the DAW show a warning that you’re infringing on a copyrighted melody? Should it let you make it? Export it? You promise the DAW it’s for personal use? It’s only a matter of time until this stuff is in DAWs.

When a general computer using agent recreates songs in Logic Pro in high fidelity, then what?

It’s called Fair Use for a reason – we let humans Use things generally and ask them to be Fair.

Or we can go in the direction of movies and TV where screenshots of protected content show up blank on my iPhone. Just in case someone wanted to, god forbid, clip the show.


I don't think anyone could reasonably characterize a DAW as a tool designed to infringe copyrights with so I don't think there is an issue. The fact that none of the labels have ever sued DAWs for this reason should be an intuition for you on this matter.

>It’s called Fair Use for a reason – we let humans Use things generally and ask them to be Fair.

So exhausted with people who come to these threads and try to discuss legal issues by only paying lip service to the words and not their meanings, let alone the actual law that they seem to want to debate. Then they go even further and turn it into some grand political statement, or hypothesize why copyright shouldn't exist at all. But there is absolutely no jurisprudence that would indicate a DAW is the kind of tool I described. I understand you came up with an argument in your head why it could be, but I'm letting you know that in the law, it's not what would be considered a reasonable argument and it would go nowhere.

DAWs are tools made to create music, generally. They do not contain banks of copyrighted materials to which the user ultimately pulls the copying "trigger" (that's the system I described).

I hope that helps.


It’s easy to fall back to known concepts to frame new things, but that is not accurate. LLMs do not hold “banks of copyrighted materials”, though they can recreate popular bits, in the same way a human can recall and hum the X Files theme but doesn’t actually have a recording of it in their brain. They are just a lot better at it.


I didn't describe an LLM. Read the thread. I decsribed a particular type of service or machine where the maker would liable as a contributory infringer without directly infringing. That's all. Read my post, I even said "Not saying this is exactly what Suno is doing"

Someone responded and said "Why not DAWs, then?" The answer is because a DAW is not that kind of service or machine.

>t’s easy to fall back to known concepts to frame new things, but that is not accurate. LLMs do not hold a “banks of copyrighted materials”,

As an aside. That's clearly not true in some models given that in a number of the cases, the plaintiffs can recreate their works verbatim.


  DAWs are tools made to create music, generally. They do not contain banks of copyrighted materials to which the user ultimately pulls the copying "trigger" (that's the system I described).
You are quite literally describing sample packs (which are copyrighted). The only difference is that they figured out a fair licensing scheme for those. Is my understanding of copyright law wrong or poor here?

Imagine we invented some new hypothetical technology to take all of the sample packs in the world as input and produce new sample packs that humans haven't thought of before. Should we figure out how to license those packs fairly or pretend we never invented it?

Only so many artists have the patience to make each drum from scratch.


Sure, except that sample packs are original materials by their author (as opposed to whatever Suno contains, which is other people's work). And yes, I imagine that sample packs come with a license to use the samples commercially. Otherwise there would be no market for them. I just did some brief searching and it looks like some sample packs even require royalty kick-backs. So, yeah.


I come to copyright threads because I think Section 1201 of the DMCA is in direct violation of the Hacker Manifesto.


I’m more concerned with the fact that it's, if not a direct violation, an intentional end run around the First Amendment (Fair Use, while enshrined in statute now, being initially established as a limitation on the copyright power derived from the First Amendment.)


I don't think 1201 is invoked in this case and, as a copyright attorney, I don't really ever see it invoked anyway. I understand you have an axe to grind, but I don't see how your approach makes sense. Further, I'm not sure what obligation the law as to the "Hacker Manifesto" that it should be of any consequence anyway. All sorts of laws run against the manifesto. So what? The point of the manifesto wasn't to behave lawfully anyway, right? It's also my experience that so much of this copyright discourse is centered on incorrect assumptions about copyright that these axe-grinding missions are really counterproductive. I don't find it very productive to engage with posters who assume the conclusion that something is wrong and do not regard any of the related details or nuance.


I'm genuinely trying to engage and I'm curious where my preconceptions are "fundamentally wrong" versus not understanding "what makes a dance copyrightable where a massage is not".

Where are you on the continuum? Regarding training an AI model in my basement on purchased music, do you think I should:

- Not be allowed to train it

- Not be allowed to run it

- Not be allowed to share outputs from it anywhere

- Not be allowed to share outputs from it publicly

- Not be allowed to share outputs from it commercially

- Not be allowed to share its weights for others to run it

Or are you primarily focused on the current legal precedent?


>I'm genuinely trying to engage and I'm curious where my preconceptions are

Sure, I appreciate that. My point is that none of this has anything to do with § 1201 so there's really no point in coming to this with a kind of incredulity that is counterproductive stemming from your own beliefs about that one particular law. Not saying that is necessarily what you are doing, but I see that kind of approach so frequently here. A lot of not really knowing what a copyright protects, its limits, how they are adjudicated, etc, but then a lot of confidence about how it is all just wrong for society.

For starters, to answer your first question. Copyright protects creative artistic expressions. What is covered is defined in the copyright statute, and the list does not include massages. So, that would be the reason why a massage is not protected. Why is "massage" on that list? Probably because no one can reasonably consider a massage a creative artistic expression. Choreography is the art in which that form of expression exists and would be covered. Could you copyright a dance that included massage movements? Yeah, sure. Could you copyright a dance that consisted entirely of massage movements. Sure. Could you use that copyright to prevent massage therapists from "performing" massages? No.

That's obviously a very surface level take and what is actually protected in a copyright isn't necessarily the entirety of the work but the aspects of it that original expressions. There are other limitations too, like something being de minimis. You can't copyright "the sky was blue" (Scarlett Begonias, the Grateful Dead) and actually prohibit others from using the phrase. That phrase alone is too small (among other things). The Grateful Dead do have a copyright to the entirety of the lyrics to Scarlett Begonias and can control various kinds of uses of the the lyrics.

>Or are you primarily focused on the current legal precedent?

All litigators are focused on current legal precedent. You cannot make arguments for how things should be without regard for how things are as that is the fundamental basis for what should be changed and why.

>Where are you on the continuum? Regarding training an AI model in my basement on purchased music, do you think I should:

Personally, I find AI abhorrent. I think its wrong for it to be trained without any compensation to the authors of the works used in the training, and I think it's wrong for the output to be commercialized to the benefit of the owner of the model without any compensation to the authors of the works used in generating the outputs.


That doesn't seem to track in my mind. So you can't sell music but you can sell 10 second snippets of music you pirated? It doesn't math out.

But i guess I'm not surprised that 2025 has little respect for artists.


What does "fair use" even mean in a world where models can memorise and remix every book and song ever written? Are we erasing ownership?

The problem is, copyright law wasn't written for machines. It was written for humans who create things.

In the case of songs (or books, paintings, etc), only humans and companies can legally own copyright, a machine can't. If an AI-powered tool generates a song, there’s no author in the legal sense, unless the person using the tool claims authorship by saying they operated the tool.

So we're stuck in a grey zone: the input is human, the output is AI generated, and the law doesn't know what to do with that.

For me the real debate is: Do we need new rules for non-human creation?


why are you saying "memorize"? are people training AIs to regurgitate exact copies? if so, that's just copying. if they return something that is not a literal copy of the whole work, then there is established caselaw about how much is permitted. some clearly is, but not entire works.

when you buy a book, you are not acceding to a license to only ever read it with human eyes, forbearing to memorize it, never to quote it, never to be inspired by it.


> Specifically, the paper estimates that Llama 3.1 70B has memorized 42 percent of the first Harry Potter book well enough to reproduce 50-token excerpts at least half the time. (I’ll unpack how this was measured in the next section.)

> Interestingly, Llama 1 65B, a similar-sized model released in February 2023, had memorized only 4.4 percent of Harry Potter and the Sorcerer's Stone. This suggests that despite the potential legal liability, Meta did not do much to prevent memorization as it trained Llama 3. At least for this book, the problem got much worse between Llama 1 and Llama 3.

> Harry Potter and the Sorcerer's Stone was one of dozens of books tested by the researchers. They found that Llama 3.1 70B was far more likely to reproduce popular books—such as The Hobbit and George Orwell’s 1984—than obscure ones. And for most books, Llama 3.1 70B memorized more than any of the other models.


You are comparing AI to humans, but they're not the same. Humans don't memorise millions of copyrighted work and spit out similar content. AI does that.

Memorising isn't wrong but when machines memorise at scale and the people behind the original work get nothing, it raises big ethical questions.

The law hasn't caught up.


As a former musician, yes, we do. Any above average musician can play "Riders on the Storm" in the style of Johnny Cash, or Green Day, or Nirvana, etc. Successful above average musicians usually have almost encyclopedic knowledge of artists and albums at least in their favorite genre. This is how all art is made. Some artists will be more honest about this than others.


Again, you are comparing machines with humans. We're built for depth, not scale. Machines are built for scale, not depth.

I also play the guitar, and it took me 10 years to learn 30 or 40 songs. So I don't see how anyone can learn 7 million songs in a couple of minutes.


I have learned 100s of songs in a summer for various fill in gigs. Most music is extremely similar. You don't need to learn every song in existence to write suno pop.


Impressive. I rehearsed for a month before a gig where I played 12 songs. So, unfortunately, I can't relate.


And those bands can successfully sue you for that. Especially if you sell it for money. Double especially if your sales of their songs displace them in the market.


The wast majority of piracy are not literal copies. Movies and music get constantly transformed into different sizes and scales, with the majority using lossy transformations that changes the work. A movie taken as raw format and transformed into 144p has far less than 1% of the original work, and is barely recognizable. Copyright law seems to recognize that as infringement.

Most AI seems much better at reproducing a semi-identical copies of an original work than existing video/audio encoders.


If, as a human artist, I decide to train myself on the discography of a famous artist, then produce songs in his style and sell them for cheap so that others don't have to pay for the original artist, then I am sure it is fair use. It is done all the time.

Now, what if instead of training myself using real instruments, I train my AI and do the same. Is it different?

It is complicated, but there are many arguments in favor of fair use, probably more than they are against but as you say, let's the courts decide.

But in any case, piracy is illegal in every case. As a human, it is illegal for me to use pirate copies, whether it is for training myself as a musician, for training my AI, or for simply listening.


> Copyright law is a lot more nuanced than anyone seems to have the attention span for.

Copyright is probably the wrong body of law for regulating AI companies.


If it's fair use to train a model, that doesn't necessarily imply that the model can be legally used to generate anything.


I've been reading a bit more about this. The training might not be considered fair use if it's not considered transformative.

Claude has been considered transformative given it's not really meant to generate books but Suno or Midjourney are absolutely in another category.


really? so Suno or Midjourney can produce literal copies of works they were trained on?


Well I've been able to get Suno to do Beatles covers. It only works maybe 1/20 times, but you can do it. It's not an exact replica either, but you can get the same chords and melodies as the original.


Well there was that legal company who trained an LLM on their oppositions legal documents and then generated their own. I dont think inputs or outputs were ruled legal in that regard.

But as long as the model isnt outputting infringing works theres not really any issue there either.


this is funny and potentially accurate


Not sure we can infer that (or anything) about Suno from this ruling. The judge here said that Anthropic's usage was extremely transformative. Would Suno's also be considered that way?

Anthropic doesn't take books and use them to train a model that is intended to generate new books. (Perhaps it could do that, to some extent, but that's no its [sole] purpose.)

But Suno would be taking music to train a model in order to generate new music. Is that transformative enough? We don't know what a judge thinks, at least not yet.


Only if the physical albums don't have copy protection, otherwise you're circumenventing it and that's illegal. Or is it, against the right to private copy? If anything, AI at least shows that all of the existing copyright laws are utter bullshit made to make Disney happy.

Do keep in mind though: this is only for the wealthy. They're still going to send the Pinkertons at your house if you dare copy a Blu-ray.


No, because they can just play the album for the AI to learn. AI training can be set up to exploit the analog hole. Same with images/movies


> They're still going to send the Pinkertons at your house if you dare copy a Blu-ray.

Hey woah now, that's a Hasbro play, not a Disney one.


With some minor exceptions, CDs don't have copy protection.



Same how it works in the Netherlands.


Yes.


Actually it remains to be seen.

If you read the ruling, training was considered fair use in part because Claude is not a book generation tool. Hence it was deemed transformative. Definitely not what Suno and Udio are doing.


So not only did they pirate works but they destroyed possibly collectible physical copies too. Kafkaesque.


Google set the precedent for this with an even less transformative use case: https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....


As they mentioned, the piracy part is obvious. It's the fair use part that will set an important precedent for being able to train on copyrighted works as long as you have legally acquired a copy.


Cue physical books being licensed not sold in the futur with restricted agreements …



Also music, videos, photos, etc.


So all they have to do is go and buy a copy of each book they pirated. They will have ceased and desisted.


I'm trying to find the quote, but I'm pretty sure the judge specifically said that going and buying the book after the fact won't absolve them of liability. He said that for the books they pirated they broke the law and should stand trial for that and they cannot go back and un-break in by buying a copy now.

Found it: https://www.nbcnews.com/tech/tech-news/federal-judge-rules-c...

> “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft,” [Judge] Alsup wrote, “but it may affect the extent of statutory damages.”


Is copyright in America different to Britain? There, it is legal to download books you don't own. Only distribution is a crime, which most torrenters break by seeding.


I think it's very similar in both countries, but you have got it wrong. Downloading a book without permission is copyright infringement in both countries, regardless of whether you distribute it.

In the UK it's a criminal offense if you distribute a copyrighted work with the intent to make gain or with the expectation that the owner will make a loss.

Gain and loss are only financial in this context.

Meaning that in both countries the copyright owner can sue you for copyright infringement.


What do you mean by 'it is legal'?

Do you mean:

A) It's not a criminal offence?

B) The copyright owner cannot file a civil suit for damages?

C) Something else?


> Only distribution is a crime


What relevance does that have to the present case? The judge, in this civil matter, said there would be a trial. He didn't say anything about it being a criminal trial. The strings 'crim' and 'felon' do not appear in the ruling.

  We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages, actual or statutory (including for willfulness).


There can always be a trial, even if nothing was done to warrant it.

I think the distinction between civil and criminal trials is smaller in my home country. The fact that there is a trial at all implies that someone commited a ‘crime’.


Only distribution with the intent to make money is a crime. If you are doing it for free you are not criminally liable. Unless I am missing something.


Any distribution of copyrighted material can cause you a big trouble.


It's not a crime in the US, either, I believe, but you can certainly be sued in civil court for it.


They also argued that they in no way could ever actually license all the materials they ingested


I love this argument so much. "But judge, there's no way I could ever afford to buy those jewels, so stealing them must be OK."


The argument is more along the lines of, negotiating with millions of individuals each over a single copy of a work would cause the transaction costs to exceed the payments, and that kind of efficiency loss is the sort of thing fair use exists to prevent. It's not socially beneficial for the law to require you to create $2 in deadweight loss in order to transfer $1, and the cost to the author of not selling a single additional copy is not the thing they were really objecting to.


I used to order books in English from the US before shipping costs became prohibitive and the cost of shipping the book went to about twice to thrice the cost of the book itself. Is it fair use for me to download books from Anna's Archive now considering that books in English are not available in my region through other means (including the vast majority of ebooks)?

Rhetorical question, we all know that me reading books is not "transformative" so it won't be considered fair use for me to yoink them (transformative as in transforming more damage to the society at large into more money for the already rich).


In the U.S. at least (obviously not the same everywhere), fair use doesn’t necessarily require your work to be transformative. It’s one of several aspects that gets considered, albeit a fairly significant one in many cases. Downloading books/research articles/pirated works in general wouldn’t be fair use as the purpose of the act (obtaining a book to read) directly impacts the market for the work (selling books). There could still exceptions in some cases, mostly related to teaching I’d imagine.


What’s more interesting to me is if you can hire someone in the US to buy the book for you, cut the spine off with a bandsaw, and send you the scans and destroy the pages afterwards.


That's right, so I can't individually discuss terms with each and every media creator, so from now on, I can just pirate everything.


This is literally why a lot of people pirate content, yes. It’s pretty much always the only way to obtain the content, even if you are otherwise fine with paying for it.


Yes, and it's technically copyright infringement, even for private use. It's just that damages and enforcement is in feasible.

But if you tried to open a black market selling that media: you'd be hunted down to the ends of the earth. Or to China/North Korea, at least.


> But if you tried to open a black market selling that media

Why would you ever do that? Nobody would buy it. They'd just get it in the same place you did.


You’d be surprised, almost any flea market/swap meet will still have bootleg DVDs and “PlayStation 2s” preloaded with a billion games.

Everyone can out a disk in a DVD player; sailing the high seas is much trickier.


Undercut the competition, mainly. People will do a lot of things for a decent discount.


Needing a copy of one book you're going to spend a week reading has a lot less overhead than needing a copy of every book that you're going to process with a computer in bulk.


I like to glance at the cover art. I can do ten per second when I really get into my flow state. Sometimes I read them also, but that's incidental.


If you go to the book store and glance at all the cover art without buying any of them, do you expect to be sued for this?


If you do that and reproduce the covers or the protected elements thereof, you should absolutely expect to be sued.


So for example, if the bookstore has a nice 4k surveillance camera and you have access to it because you work there, sitting at home and using it to look at the cover art on all the books on display is something you'd expect to be sued over?


Probably not sued, but it's possible Le to be. They'd probably just fire you instead.

Having access to a camera doesn't permit you to take the footage home to review.The company still owns that footage, after all.

Now, if you had your own camera recording everything at your desk... I guess that falls into one or two party states.


Re-read my comment: "If you do that and reproduce the covers or the protected elements thereof"

This conversation becomes incredibly unenjoyable when you pull rhetorical techniques like completely ignoring the entirety of what I wrote.


They can. That's how any media service from Spotify to Netflix to Audible have to do things.

They simply don't want to and think they can skirt the law while the judges catch up.


What do you mean by "negotiating"? They can buy the books in paperback form from Amazon. And for e-books available for sale without DRM, they get to skip the cutting and scanning part.

If the book is out of print, then tough luck. That's not a license to infringe on the publisher's copyright. If we're not ok with that, we have legislative means to change that. A judge shouldn't be rewriting law in that manner.


> and that kind of efficiency loss is the sort of thing fair use exists to prevent.

No it's not. And you ever heard of a publishing house? They don't need to negotiate with every single author individually. That's preposterous.


>They don't need to negotiate with every single author individually.

Yeah they do. What do you think the employees of a publishing house do? They make deals, work with authors, and accept/reject pitches. They 100% need to make sure every work is under a negotiated contract.


The publishers could license the works in bulk, without the need for Anthropic to deal with the individual authors. Both sides pointed this out.


It kind of is though?

It's not the only reason fair use exists, but it's the thing that allows e.g. search engines to exist, and that seems pretty important.

> And you ever heard of a publishing house? They don't need to negotiate with every single author individually. That's preposterous.

There are thousands of publishing houses and millions of self-published authors on top of that. Many books are also out of print or have unclear rights ownership.


>It kind of is though?

No, it kinda isn't. Show me anything that supports this idea beyond your own immediate conjecture right now.

>It's not the only reason fair use exists, but it's the thing that allows e.g. search engines to exist, and that seems pretty important.

No, that's the transformative element of what a search engine provides. Search engines are not legal because they can't contact each licensor, they are legal because they are considered hugely transformative features.

>There are thousands of publishing houses and millions of self-published authors on top of that. Many books are also out of print or have unclear rights ownership.

Okay, and? How many customers does Microsoft bill on a monthly basis?


> Show me anything that supports this idea beyond your own immediate conjecture right now

It's inherent in the nature of the test. The most important fair use factor is the effect on the market for the work, so if the use would be uneconomical without fair use then the effect on the market is negligible because the alternative would be that the use doesn't happen rather than that the author gets paid for it.

> No, that's the transformative element of what a search engine provides. Search engines are not legal because they can't contact each licensor, they are legal because they are considered hugely transformative features.

To make a search engine you have to do two things. One is to download a copy of the whole internet, the other is to create a search index. I'm talking about the first one, you're talking about the second one.

> Okay, and? How many customers does Microsoft bill on a monthly basis?

Microsoft does this with an automated system. There is no single automated system where you can get every book ever written, and separately interfacing with all of the many systems needed in order to do it is the source of the overhead.


I think the notion that some sort of god-given right to "scale" can absolve you of laws is preposterous.

If your business model is not economically sustainable in the current legal landscape you operate in, the correct outcome is you go out of business.

There's lots and lots of potential businesses, infinite in fact, that fall into this understanding. They don't exist because they can't because we don't want them to, so you never see them. Which might give the impression of a right to scale, but no, it does not exist.


>It's inherent in the nature of the test. The most important fair use factor is the effect on the market for the work, so if the use would be uneconomical without fair use then the effect on the market is negligible because the alternative would be that the use doesn't happen rather than that the author gets paid for it.

No, that's not the most important factor. The transformative factor is the most important. Effect on market for the work doesn't even support your argument anyway. Your argument is about the cost of making the end product, which is totally distinct from the market effects on the copyright holder when the infringer makes and releases the infringing product.

>To make a search engine you have to do two things. One is to download a copy of the whole internet, the other is to create a search index. I'm talking about the first one, you're talking about the second one.

So? That doesn't make you right. Go read the opinions, dude. This isn't something that's actually up for debate. Search engines are fair uses because of their transformative effect, not because they are really expensive otherwise. Your argument doesn't even make sense. By that logic, anything that's expensive becomes a fair use. It's facially ridiculous. Them being expensive is neither sufficient nor necessary for them to be a fair use. Their transformative nature is both sufficient and necessary to be found a fair use. Full stop.

>Microsoft does this with an automated system. There is no single automated system where you can get every book ever written, and separately interfacing with all of the many systems needed in order to do it is the source of the overhead.

Okay, and? They don't need to get every single book ever written. The libraries they pirated do not consist of "every single book ever written". It's hard to take this argument in good faith because you're being so ridiculous.


> No, that's not the most important factor. The transformative factor is the most important.

It's a four factor test because all of the factors are relevant, but if the use has negligible effect on the market for the work then it's pretty hard to get anywhere with the others. For example, for cases like classroom use, even making verbatim copies of the entire work is often still fair use. Buying a separate copy for each student to use for only a few minutes would make that use uneconomical.

> Effect on market for the work doesn't even support your argument anyway. You're argument is about the cost of making the end product, which is totally distinct from the market effects on the copyright holder when the infringer makes and releases the infringing product.

We're talking about the temporary copies they make during training. Those aren't being distributed to anyone else.

> So? That doesn't make you right.

Making a copy of everything on the internet is a prerequisite to making a search engine. It's something you have to do as a step to making the index, which is the transformative step. Are you suggesting that doing the first step is illegal or what do you propose justifies it?

> By that logic, anything that's expensive becomes a fair use. It's facially ridiculous.

Anything with unreasonably high transaction costs. Why is that ridiculous? It doesn't exempt any of the normal stuff like an individual person buying an individual book.

> They don't need to get every single book ever written.

They need to get as many books as possible, with the platonic ideal being every book. Whether or not the ideal is feasible in practice, the question is whether it's socially beneficial to impose a situation with excessively high transaction costs in order to require something with only trivial benefit to authors (potentially selling one extra copy).


>It's a four factor test because all of the factors are relevant, but if the use has negligible effect on the market for the work then it's pretty hard to get anywhere with the others. For example, for cases like classroom use, even making verbatim copies of the entire work is often still fair use. Buying a separate copy for each student to use for only a few minutes would make that use uneconomical.

All four factors are not equally relevant which is something described in pretty much every single fair use opinion. Educational uses are educational uses and considered fair because of their educational purpose (purpose is one of the factors), again, not because it's expensive. Maybe next time try googling or using ChatGPT "fair use educational".

>We're talking about the temporary copies they make during training. Those aren't being distributed to anyone else.

It's your argument. Not mine. You do not understand the market harm factor and it has nothing to do with Anthropic's transaction costs. That's just fully outright absolutely incorrect application of law.

>Making a copy of everything on the internet is a prerequisite to making a search engine. It's something you have to do as a step to making the index, which is the transformative step. Are you suggesting that doing the first step is illegal or what do you propose justifies it?

The transformative step is why it's a fair use, not the "market harm" (which you misunderstand) or the made up argument that it's "too expensive". In fact, I said this like every single turn in our conversation so it's a bit perplexing to me that you can now ask me "do you mean that it being transformative is what makes it legal" when that was my exact argument three times.

>Anything with unreasonably high transaction costs. Why is that ridiculous? It doesn't exempt any of the normal stuff like an individual person buying an individual book.

It's ridiculous because of the example I gave. Things being expensive is not a defense to copyright infringement and copyright law has no obligation to make expensive business models work. Copyright has an obligation to make transformative business models work because of the overall good they provide to society. Describing it as a "transaction cost" just kicks the can down the road even further and doesn't deal with the substance, either. They could have gone to the major publishers and licensed books from them. They didn't. That's generally who they are being sued by. When they are being sued by copyright owners in the fringe examples you pointed to, they will become relevant then.

>They need to get as many books as possible, with the platonic ideal being every book. Whether or not the ideal is feasible in practice, the question is whether it's socially beneficial to impose a situation with excessively high transaction costs in order to require something with only trivial benefit to authors (potentially selling one extra copy).

Lol dude, it was your example, not mine. They do not need every single book. They aren't being sued over every single book anyway, so it's totally besides the point.


I don't even think their argument is about the money, I think it's more like we couldn't possibly find all these works in any other practical way.


Did they really steal if they didn't deprive anyone of their copy? I don't think copying is theft.


It's copyright infringement, which is not theft, they're legally distinct in the eyes of the law. This is partly why the "you wouldn't download a car" copyright ads were so widely mocked.


Fun fact, they didn't have the rights to use the font they used for those commercials: https://news.ycombinator.com/item?id=43775926


Or the music. It was originally made as a one off for a film festival. Movie industry defended the lawsuit over the music.


Agreed, the judge should avoid slang or even commonly accepted synonyms in an official ruling. The charge is not for theft.

Substitute infringement for theft.


They stole from the amount they would have legally paid to buy a copy from the copyright holder.

Think about it like sneaking into a movie theater and watch a movie without paying. The theater was going to play the movie anyway and, assuming it wasn't a packed theatre, I didn't deprive anyone else of their ability to watch. It's still theft because I'm getting something that costs money for free and depriving the theater of the money that they're owed.


It's fine that you think that way. But this is a discusion of the laws of the United States of America and ruling by American courts, not a discussion of your own legal theories.


The GP isn’t talking about some edge case legal dilemma that requires a lawyer or judge to comment. It’s already widely documented that copyright infringement is legally distinct from theft.


"Tell it to the Judge..."


You may not think it is but the law does.


The law says it’s copyright infringement, not theft.


> So all they have to do is go and buy a copy of each book they pirated.

No, that doesn't undo the infringement. At most, that would mitigate actual damages, but actual damages aren't likely to be important, given that statutory damages are an alternative and are likely to dwarf actual damages. (It may also figure into how the court assigns statutory damages within the very large range available for those, but that range does not go down to $0.)

> They will have ceased and desisted.

"Cease and desist" is just to stop incurring additional liability. (A potential plaintiff may accept that as sufficient to not sue if a request is made and the potential defendant complies, because litigation is uncertain and expensive. But "cease and desist" doesn't undo wrongs and neutralize liability when they've already been sued over.)


> So all they have to do is go and buy a copy of each book they pirated.

For anyone else who wants to do the same thing though this is likely all they need to do.

Cutting up and scanning books is hard work and actually doing the same thing digitally to ebooks isn't labor free either, especially when they have to be downloaded from random sites and cleaned from different formats. Torrenting a bunch of epubs and paying for individual books is probably cheaper


Generally you don't want laws to work that way. You want to set the penalties so that they discourage violating the law.

Setting the penalty to what it would have cost to obey the law in the first place does the opposite.


That's for criminal laws where prosecutorial discretion can then (in principle) be used in borderline cases to prevent unjust outcomes.

If you give people a claim for damages which is an order of magnitude larger than their actual damages, it encourages litigiousness and becomes a vector for shakedowns because the excessive cost of losing pressures innocent defendants to settle even if there was a 90% chance they would have won.

Meanwhile both parties have the incentive to settle in civil cases when it's obvious who is going to win, because a settlement to pay the damages is cheaper than the cost of going to court and then having to pay the same damages anyway. Which also provides a deterrent to doing it to begin with, because even having to pay lawyers to negotiate a settlement is a cost you don't want to pay when it's clear that what you're doing is going to have that result.

And when the result isn't clear, penalizing the defendant in a case of first impression isn't just either, because it wasn't clear and punitive measures should be reserved for instances of unambiguous wrongdoing.


Statutory damages were written into the first federal copyright law in 1790, and earlier in state law (specified in Pounds because the dollar hadn't been invented yet).


The first federal copyright law in 1790:

https://copyright.gov/about/1790-copyright-act.html

Specified in dollars because dollars had been invented (in 1789), but in the amount of one half of one dollar, i.e. $0.50. That's 1790 dollars, of course, so a little under $20 today. (There was basically no inflation for the first 100+ years of that because the US dollar was still backed by precious metals then; a dollar was worth slightly more in 1900 than in 1790.)

That seems more like an attempt to codify some amount of plausible actual damages so people aren't arguing endlessly about valuations, rather than an attempt to impose punitive damages. Most notably because -- unlike the current method -- it scales with the number of sheets reproduced.


My fault for the hanging clause: nearly a dozen state laws preceded it and used pounds. Mostly because they were based on the British law and also because the war made a mess of the currency situation.

Statutory damages were added to reduce the burden on plaintiffs. Which encourages people to stay in line. How well this worked out and what it means when some company nobody heard of 4 years ago downloads a billion copyrighted pages and raises $3.5 billion against a $60 billion valuation...

Well suddenly $20/page still sounds about right.


The <$20/page was the same for maps and charts, i.e. things that typically have a single page in the entire work, and came from a time when printing was done a page at a time, i.e. you'd lay out a page and print as many copies of that page as you'd expect to make copies of the entire book, then hide them somewhere else while you print the next page. It was basically a proxy for the number of copies of the work they caught you trying to make, not an attempt to turn a single copy of a 1000 page book into a 1000x multiplier on liability. Notice that otherwise you're letting the infringer choose the amount of the damages, because a larger page size or tighter layout would fit more words per page and therefore have fewer pages per book. (How many "pages" is an HTML document with infinite scroll?)

> Statutory damages were added to reduce the burden on plaintiffs. Which encourages people to stay in line.

It encourages people to not spend a lot of resources speculating about damages. That doesn't mean you need the amount to be punitive rather than compensatory.


Agree that a photo of a celebrity and a film containing that celebrity shouldn't have the same number. But a large punitive number in the context of willful infringement seems right to me. And in practice it's all negotiated down anyway, as evidenced by Internet Archive's fourth 30-day stay of its pending $600+ million lawsuit.


"In practice it's negotiated down anyway" is precisely the issue. If they bring a questionable case against you and you think there's a significant chance you could win, but then there's a small chance you get bankrupted, there is unreasonable pressure for you to settle even if the plaintiffs are in the wrong.


I'm not sure what a "questionable case" for willful copyright infringement might look like. Or an example where someone was clearly in the right and got screwed. It isn't the debtor's prison era.

Four factor test seems to be working, even in this case. Don't love it (it goes against my values and what I need to do in my job) but I get it.

Edit: we've triggered HN's patience for this discussion and it's now blocking replies. You do seem a bit long on Google and short on practical experience here. How else would you propose these types of disagreements get sorted? ("Anyone can be sued for anything" notwithstanding.)

There are explicltly no punitive damages in US Copyright law. And the "willful" provision in practice means demonstrating ongoing disregard, after being informed. It's a long walk to the end of that plank.


> I'm not sure what a "questionable case" for willful copyright infringement might look like.

You did anything which it's not clear whether it's fair use or not. Willfulness is whether you knew you were doing it, not whether you knew whether it was fair use, which in many cases nobody knows until a court decides it, hence the problem.

You have to do it in order to get into court and find out of you're allowed to do it (a ridiculous prerequisite to begin with), and then if it goes against you, you have to pay punitive damages?


What sort of things do you think people do that it is often the case that there is a genuine question of whether or not the use was fair?

>You have to do it in order to get into court and find out of you're allowed to do it (a ridiculous prerequisite to begin with), and then if it goes against you, you have to pay punitive damages?

Nobody made you undertake the questionable fair use. If you're gonna fease, you better not malfease.


> What sort of things do you think people do that it is often the case that there is a genuine question of whether or not the use was fair

Every time some new technology or other change happens, things become possible that didn't used to be and then nobody knows what the law is going to be until a judge decides the case.

> Nobody made you undertake the questionable fair use. If you're gonna fease, you better not malfease.

But how are you supposed to know that before you get sued? If it's something you are allowed to do, but you don't know that yet for sure, what do you do to find out?


You can be sued for anything and asked to defend your position for anything.

You need to stop positing hypotheticals and start stating examples. Yes, technology advances and everything from abortion to transportation needs to be reconsidered on an ongoing basis. Admittedly the process for doing so hasn't changed much since the 18th century, but do you have a proposal for improving it?

Between the four factor test, being able to petition the Librarian of Congress for exemptions, and compulsory licensing, in many cases copyright is in a better state than most.

Aside from Internet Archive who continued to do something 400,000 times despite being formally asked to stop three times (scanning old records), I'm unaware of any similar acts of stupidity. Can you provide any examples of recent problems in the space?

Do you have some great idea that you're afraid to try because of lack of specificity in copyright law? There's no shortage of VCs who will take the risk sorting it out; they've made lots of money doing this before.

And you can always ask for permission first. But I suppose where's the fun in that.


> That is, he ruled that

> - buying, physically cutting up, physically digitizing books, and using them for training is fair use

> - pirating the books for their digital library is not fair use.

That seems inconsistent with one another. If it's fair use, how is it piracy?

It also seems pragmatically trash. It doesn't do the authors any good for the AI company to buy one copy of their book (and a used one at that), but it does make it much harder for smaller companies to compete with megacorps for AI stuff, so it's basically the stupidest of the plausible outcomes.


These are two separate actions that Anthropic did:

* They downloaded a massive online library of pirated books that someone else was distributing illegally. This was not fair use.

* They then digitised a bunch of books that they physically owned copies of. This was fair use.

This part of the ruling is pretty much existing law. If you have a physical book (or own a digital copy of a book), you can largely do what you like with it within the confines of your own home, including digitising it. But you are not allowed to distribute those digital copies to others, nor are you allowed to download other people's digital copies that you don't own the rights to.

The interesting part of this ruling is that once Anthropic had a legal digital copy of the books, they could use it for training their AI models and then release the AI models. According to the judge, this counts as fair use (assuming the digital copies were legally sourced).


> This part of the ruling is pretty much existing law. If you have a physical book (or own a digital copy of a book), you can largely do what you like with it within the confines of your own home, including digitising it. But you are not allowed to distribute those digital copies to others, nor are you allowed to download other people's digital copies that you don't own the rights to.

Can you point me to the US Supreme Court case where this is existing law?

It's pretty clear that if you have a physical copy of a book, you can lend it to someone. It also seems pretty reasonable that the person borrowing it could make fair use of it, e.g. if you borrow a book from the library to write a book review and then quote an excerpt from it. So the only thing that's left is, what if you do the same thing over the internet?

Shouldn't we be able to distinguish this from the case where someone is distributing multiple copies of a work without authorization and the recipients are each making and keeping permanent copies of it?


I cannot point to the case, because my entire knowledge about the legality of this stuff comes from vaguely following the articles about this case. But feel free to read the judgement in this case where it will be spelled out in much more detail.

Also, I don't quite understand how your example is relevant to the case. If you give a book to a friend, they are now the owner of that book and can do what they like with it. If you photocopy that book and give them the photocopy, they are not the owner of the book and you have reproduced it without permission. The same is, I believe, true of digital copies - this is how ebook libraries work.

In this case, Anthropic were the legal owners of the physical books, and so could do what they wanted with them. They were not the legal owners of the digital books, which means they can get prosecuted for copyright infringement.


> If you give a book to a friend, they are now the owner of that book and can do what they like with it.

We're talking about lending rather than ownership transfers, though of course you could regard lending as a sort of ownership transfer with an agreement to transfer it back later.

> If you photocopy that book and give them the photocopy, they are not the owner of the book and you have reproduced it without permission.

But then the question is whether the copy is fair use, not who the owner of the original copy was, right? For example, you can make a fair use photocopy of a page from a library book.

> They were not the legal owners of the digital books, which means they can get prosecuted for copyright infringement.

Even if the copy they make falls under fair use and the person who does own that copy of the book has no objection to their doing this?


You are talking about lending, but I'm not really sure why because it's not that relevant to the case.

If you photocopy a single page from a library book, this is often (but not always) fair use because you're copying only a limited part of the book. In the same way, you can quote a section or paragraph of a book under fair use. You cannot copy the whole book, though. Therefore:

> Even if the copy they make falls under fair use and the person who does own that copy of the book has no objection to their doing this?

If the copy had been made under fair use, then yes, this wouldn't be illegal. But it wasn't, because it was a reproduction and distribution of the entire book by someone who did not have the right to do that.


It is “established” law because the Copyright Act itself and a string of unanimous or near-unanimous appellate decisions (google ReDigi on digital transfers and Sony and the first-sale for personal use and physical lending) uniformly apply the same principles, leaving no circuit split and no conflicting precedent for the Supreme Court to resolve. In the U.S. system statutory text interpreted consistently by the Courts of Appeals becomes binding law nationwide unless and until the Supreme Court or Congress says otherwise.


Sony v. Universal is a Supreme Court case, but that's the one where they say that sort of thing is fair use rather than that it isn't. ReDigi isn't a Supreme Court case, and it seems rather inconsistent with the Sony case which is. To claim uniformity you'd then need all the other circuit courts coming to the same conclusion rather than just not having had any relevant cases there yet, but is that the case?


Do you think that Anthropic did not have the option of getting legal advice before they decided to pirate libraries of books for their own commercial purposes?

I understand that some of these things might be confusing to you, but Anthropic is absolutely within the position of being able to afford attorneys and get good advice as to what they could legally. I hope you also understand that good legal advice isn't being told what you want so you can do the thing you want to do without any regard for what are likely outcomes.

With that in mind, what do you think the inconsistency is between ReDigi and Sony?


The judge said they can train however I believe the judge did not make any ruling regarding model outputs


Thanks for the clarification!


> You skipped quotes about the other important side:

He said:

> It was always somewhat obvious that pirating a library would be copyright infringement.

??


From my understanding:

> pirating the books for their digital library is not fair use.

"Pirating" is a fuzzy word and has no real meaning. Specifically, I think this is the cruz:

> without adding new copies, creating new works, or redistributing existing copies

Essentially: downloading is fine, sharing/uploading up is not. Which makes sense. The assertion here is that Anthropic (from this line) did not distribute the files they downloaded.


The legal context here is that "format shifting" has not previously been held to be sufficient for fair use on its own, and downloading for personal use has also been considered infringing. Just look at the numerous media industry lawsuits against individuals that only mention downloading, not sharing for examples.

It's a bit surprising that you can suddenly download copyrighted materials for personal use and and it's kosher as long as you don't share them with others.


> the numerous media industry lawsuits against individuals that only mention downloading,

I never saw any of these. All the cases I saw were related to people using torrents or other P2P software (which aren't just downloading). These might exist, but I haven't seen them.

> It's a bit surprising that you can suddenly download copyrighted materials for personal use and it's kosher as long as you don't share them with others.

Every click on a link is a risk of downloading copyrighted material you don't have the rights to.

Searching the internet, it appears that it's a civil infraction, but it's also confused with the notion that "piracy" is illegal, a term that's used for many different purposes. I see "It is illegal to download any music or movies that are copyrighted." under legal advice, which I know as a statement is not true.

Hence my confusion.

I should note: I'm not arguing from the perspective of whether it's morally or ethically right. Only that even in the context of this thread, things are phrased that aren't clear.


I just checked first individual suit I could find, which was BMG v. Gonzalez. She used P2P, but the case was specifically about her downloading, not redistributing.


Most P2P tools work in a way where you cannot download without simultaneously uploading.


Which is beside the point if the plaintiffs don't claim it as an issue. Take the anthropic opinion in the article, where the judge explicitly calls out that there's an unresolved question of whether the model outputs might be infringing that can't be ruled on because the plaintiffs only talk about the inputs.

Gonzalez is a ruling about downloading even though there was also distribution.


Downloading and using pirated software in a company is fine then as long as it is not shared outside? If what you describe is legal it makes no sense to pay for software.


sci-hub suddenly becomes legal if all researchers adhere to one big company, apparently.

After all, illegally downloading research papers in order to write new ones is highly transformative.


> Downloading a document is fine as long as it is not shared outside?

I've fixed your question so that it accurately represents what I said and doesn't put words in my mouth.

If I click on a link and download a document, is that illegal?

I do not know if the person has the right to distribute it or not. IANAL, but when people were getting sued by the RIAA years back, it was never about downloading, but also distribution.

As I said, IANAL, but feel free to correct me, but my understanding is that downloading a document from the internet is not illegal.


> it was never about downloading, but also distribution.

Did you mean to write "but about distribution" here?


Yes, thank you for catching that. Unfortunately, I cannot edit it now.


Given that downloading requires you to copy the data to download it, I'd think it would fall under "adding new copies".


> All Anthropic did was replace the print copies it had purchased ... with more convenient space-saving and searchable digital copies for its central library — without adding new copies..."

That suggests otherwise.


They are using legal speak where I'm just talking making copies. The fact that they talk about how making a copy "without adding new copies" only makes sense in this light.


I don't think that's new. google set precedent for that more than a decade ago. you're allowed to transform a book to digital.


How times change .They wanted to lock up Aaron Schwartz for life for essentially doing the same thing Anthropic is doing.


Aaron Swartz wanted to provide the public with open access to paywalled journal articles, while Anthropic want to use other people's copyrighted material to train their own private models that they restrict access to via a paywall. It's wild (but unsurprising) that Aaron Swartz was prosecuted under the CFAA for this while Anthropic is allowed to become commercially successful


AFAIK, Judge Vince Chhabria has countered that Fair Use argument in a later order involving Meta.

https://www.courtlistener.com/docket/67569326/598/kadrey-v-m...

Note: I am not a lawyer.


Im not sure how I feel about what anthropic did on merit as a matter of scale, but from a legalistic standpoint how is it different from using the book to train the meat model in my head? I could even learn bits by heart and quote them in context.


Not sure about the law, but if you memorize and quote bits of a book and fail to attribute them, you could be accused of plagiarism. If for example you were a journalist or researcher, this could have professional consequences. Anthropic is building tools to do the same at immense scale with no concept of what plagiarism or attribution even is, let alone any method to track sourcing--and they're still willing to sell these tools. So even if your meat model and the trained model do something similar, you have a notably different understanding of what you're doing. Responsibility might ultimately fall to the end user, but it seems like something is getting laundered here.


Machines do not have rights belonging to human now.


Feels like information laundering to me.


Is fruit of the poisonous tree rule applicable here?


That's only really applicable to evidence in criminal cases obtained by the government. No such doctrine exists for civil cases, for instance. It doesn't even bar the government from using evidence that others have collected illegally of their own volition.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: