The summary judgment in the largest ever copyright-infringement settlement in U.S. history, issued in June 2025, determined that artificial intelligence company Anthropic adhered to copyright law when it amassed a central library of over seven million books to build and train its large language model (LLM), now called Claude.
According to the mixed ruling by Judge William Alsup of the U.S. District Court for the Northern District of California, most of the business conducted by Anthropic constitutes fair use under copyright law because the company paid for most of the books in its corporate library and, as Judge Alsup declared, with some linguistic flair (and a liberal use of the em dash, which linguists have pointed out is a ChatGPT favorite): The “‘purpose and character’ of using works to train LLMs was transformative—spectacularly so.”
To go from a mountain of inert books, many of them out-of-print and forgotten, to an autonomous cognition engine, well, that is a spectacular transformation indeed, and transformation is the catalyst of copyright; if a copied work is transformed, the resulting output earns its own protections under copyright.
Judge Alsup goes on to explain that copyright law does not protect copyright holders against competition, no matter what quarter that competition comes from. If anything, the law as designed is intended to protect competition and encourage innovation. And even though a book, as understood by Claude, is an input copied verbatim, it is the output that counts.
My debut novel is listed in the settlement. My carefully crafted language, pulled from a pirated digital copy, was used to build and train Claude. My plot and characters (often suggestively named), my ideas, fantasies, and figures of speech—some would say my very self—is part and parcel of that system, and I am far from alone. I am but one of the 367,824 infringed-upon authors who each stand to receive $3,000 per pirated book, split fifty-fifty with our publishers.
A knockoff novel that Claude regurgitates may resemble my novel, especially if Claude is prompted to write in my voice, but the result would constitute fair use, in the same way Mel Brooks’s Spaceballs is fair use of George Lucas’s Star Wars. The only difference is that, given our present sci-fi–seeming predicament, the clanker of a creator may as well be R2-D2.
But in the early going, with its financial position and market share not yet secure, Anthropic sought to cut corners and costs. The company knowingly downloaded some 482,460 pirated books hosted on a pair of illegal online libraries, Library Genesis and Pirate Library Mirror. This was Anthropic’s only clear criminal action (though other violations may ultimately come to light), and so the company, founded after a schism by former employees of OpenAI, agreed to the current settlement, north of $1.5 billion.
The costs of this landmark resolution pale in comparison to the money sunk—willingly and in advance—into the hardware that will continue to power Claude.
In November, Anthropic, backed financially by both Amazon and Alphabet, Google’s parent corporation, announced a $50 billion plan to build data centers in Texas and New York. To give these vast tech-industry numbers some scale, consider this: In 2022, when Penguin Random House, largest of the Big Five U.S. book publishers, put forward a bid to purchase Simon & Schuster, the third largest, the offer on the table amounted to $2.2 billion.
These colossal sums—1.5 billion, 50 billion, 2.2 billion—unfathomable to most workers in words, make Anthropic’s initial investment in its foundational source material look like a pittance. The company “spent many millions of dollars to purchase millions of print books, often in used condition. Then its service providers stripped the books from their bindings, cut their pages to size, and scanned the books into digital form—discarding the paper originals.” (Again, note Judge Alsup’s use of the em dash.)
Anthropic couldn’t be bothered to buy new editions of books, allowing authors the meager $1 to $2 royalty payment most of us earn for each new book sold. The spines of our books were broken. They were disarticulated, their pages trimmed to size, fed into a digital scanner, converted into readable and searchable files, then destroyed.
The irony is that this destructive process, according to Judge Alsup, only reenforced Anthropic’s claim of fair use; by destroying the originals, the company wasn’t generating an additional copy; “all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library—without adding new copies, creating new works, or redistributing existing copies.” (Once again, the em dash is Alsup’s…or is it? Judge Alsup using ChatGPT to compose his ruling would count as irony of the highest order.)
And yet I can’t help but see the Anthropic CEO, Dario Amodei, as something of a modern-day Dr. Frankenstein, especially after reading how he hired Tom Turvey, former director of Google’s book-search project, to obtain “all the books in the world.”
Imagine the mad scientist bidding his minion, Fritz, to go find Claude a brain. This image comes not from Mary Shelley’s Frankenstein; or, The Modern Prometheus, certainly a part of Claude’s system, given that the novel is in the public domain. The scene was concocted for the transformative 1931 film adaptation, starring Boris Karloff as the Monster.
In it, Dr. Henry Frankenstein harvests source material from hanged criminals and buried bodies, but the Monster still needs a fresh brain. At Dr. Frankenstein’s alma mater, Goldstadt Medical College, Fritz breaks in after hours only to fumble the jar containing the “normal brain,” splashing its contents to the floor. Fritz must settle for the “abnormal brain.”
What distinguishes Claude from other LLMs was Anthropic’s keen focus on books, because, as John Milton pointed out nearly four hundred years ago, in his Areopagitica; A Speech of Mr. John Milton for the Liberty of Unlicenc’d Printing, to the Parlament of England: “[B]ooks are not absolutely dead things, but do contain a potency of life in them to be as active as that soul whose progeny they are; nay, they do preserve as in a vial the purest efficacy and extraction of that living intellect that bred them.”
Our music may better convey our emotions. Our movies and video games can present us in action, capturing our sights like no other medium. But our books best contain the best of us—and the most of us.
Books are the enduring foundation for nearly all our knowledge. Our books span thousands of years. Our books bottle human desire, imagination, and religion, and written language is the scrawl of the human spirit. No other medium comes close in scope, breadth, or depth. It is why the Bible is not a mural and why we don’t codify our laws in song.
Our written language is our humanity writ large, and it is nearly all there, gathered on pages tucked between the covers of our books, where we are our most thoughtful, our most fantastical, our most intimate. It is where, and how, our humanity is best expressed.
This foundational focus on books gave Anthropic clear advantage over its competitors. At the time of this writing, Claude ranks among the top tier of AI models. It is considered best at performing complex tasks, is safest, and is among the most advanced models for writing. Is it any wonder? But what if Anthropic had ethically sourced all its ingredients, acquiescing to the costs of licensing, rather than going about its disruptive (and deceptive) business like a ham-handed thief in the night? And what if Anthropic were willing to pump as much money into its ongoing data collection as it is investing in its data centers?
Anthropic, OpenAI, and the like should try to see our books and their world-class language—data of the highest quality for an LLM—as more than simply the spare parts needed to assemble their models. Our books are the life-giving essence of their systems. Our complex language, sustained over the duration of hundreds of pages—multiplied millions of times—is the very energy that courses into and through Claude (“It’s alive! It’s alive!”), and if Anthropic better valued the work we writers do, investing in us and our output for its input, it may continue to lead in the AI arms race.
But now that Anthropic’s hand has been caught in the brain jar, it has become harder to secure more data of the highest order—books, books, and more books—and going forward, if Anthropic settles for what language is readily available—the self-generated, incestuous slop churned out by its own LLMs; the informative but uncrafted prose of Reddit; or, far worse, the endless drivel and dreck we encounter on social media—their systems will suffer from early-onset enshittification, as Cory Doctorow so succinctly puts it.
Anthropic fell far short in its bid to obtain “all the books in the world.” According to the now-public record, its central library stands at somewhere between 7 million and 8 million titles. At last count, in 2023, the Library of Congress catalogued 25.77 million books in its classification system, and Claude, when asked, claims that only about 10 percent of the total Library of Congress holdings have been digitized, a ballpark percentage that’s hard to verify. Claude says: “The 10 percent figure appears to be a rough estimate that’s been circulated, though I couldn’t find an official Library of Congress source stating that exact percentage.”
Okay, Claude. If you say so.
Whatever the actual figure, there are a great deal of books, our highest quality data, still up for grabs. The precedent now set by Bartz v. Anthropic PBC makes a reasonably clear case—the AI genie is out of the bottle—and any effort to secure and employ such data would likely fall under fair use. There is little we writers can do to put an end to innovation or competition. We shouldn’t seek to. We cannot fend off the future. But it might be beneficial, for us all—Claude included—if, going forward, we writers receive just compensation for our substantial contribution to that future.
Jay Baron Nicorvo’s true-crime memoir, Best Copy Available (University of Georgia Press, 2024), was selected by Geoff Dyer as winner of the Sue William Silverman Prize for Creative Nonfiction, landed on the Indie Next list, and was named a best book of the month by the Los Angeles Times and a best memoir of the year by CrimeReads. His first novel, The Standard Grand (St. Martin’s Press, 2017), also made the Indie Next list, was one of Library Journal’s Great First Acts, Debut Novels, and was a best book of the year by the Brooklyn Rail. Find him at nicorvo.net.
Thumbnail credit: Thisbe Nissen






