Don'tIP & Copyright

Using pirated material as training data

How you got the books matters as much as what you did with them. A pirate library is not a data pipeline.

By Cynical SallyIssue Nº 1

Not legal advice. Sally roasts behaviour and use-cases in general, never your specific situation, and nothing here replaces a real lawyer. The cases are real; what you do about them is between you and someone licensed to tell you.

The use-case

Sourcing training data from pirated copies of books or other works, even if the training method itself might be defensible.

This actually happenedA real case, in full

The receiptSettled

Bartz v. Anthropic

No. 3:24-cv-05417 (N.D. Cal.), settlement 2025 · US (N.D. California)

What happened

Authors sued over training on their books, including pirated copies. The court ruled training on legally acquired books could be fair use, but pirated copies were not.

The outcome

Anthropic settled for roughly $1.5 billion, the largest US copyright settlement of its kind. The lesson is that how you acquire training data is decisive.

Read the source

Why

A court found that training on legally acquired books could be fair use, but using pirated copies was not. The company then settled for a sum reported around 1.5 billion dollars, the largest copyright settlement of its kind. The training method was not the fatal flaw. The supply chain was.

This is the part teams skip when they are moving fast: the provenance of the data. A defensible technique built on an indefensible source collapses the moment someone asks where the files came from.

“The method might have survived. The torrent did not, and it took 1.5 billion dollars down with it.”

What to do instead

01Acquire training data legitimately. Provenance is now a first-class legal question, not a footnote.
02Keep records of where every dataset came from and what rights you have to it.
03Treat a pirated source as disqualifying, even if the rest of your pipeline is clean.

Not legal advice. General commentary on a use-case, not your situation. Talk to a real lawyer before you act.