Almost every author has heard the adage that to be a great writer you must be an avid reader. It turns out that the same maxim applies to language-generative artificial intelligence (AI). In order to mimic human writers, AI technologies such as ChatGPT (OpenAI’s chatbot) and LLaMA (Large Language Model Meta AI, a tool for AI developers created by Facebook’s parent company, Meta) were fed millions of copyrighted books, articles, essays, and poems, according to the Authors Guild. Revered authors such as Margaret Atwood, Stephen King, George Saunders, and Zadie Smith are among those whose books are being used to train generative AI, the Atlantic reported in August. These and other texts are responsible for AI’s uncanny ability to reproduce high-quality prose and verse on command, setting off alarm bells in writing communities worried about the exploitation of their work.
But authors are fighting back. There have been at least five class-action lawsuits reportedly filed by authors since June against OpenAI and Meta for copyright infringement: Mona Awad, Michael Chabon, and comedian and memoirist Sarah Silverman are among the growing list of writers seeking damages for the alleged illicit use of their books to train the companies’ language-generative AI. Lawyers representing authors in several of the suits, Joseph Saveri and Matthew Butterick, say these AI language models “remix the copyrighted works of thousands of authors...without consent, compensation, or credit,” according to a statement they issued in June. Many of the books likely came from websites, or “shadow libraries,” that distribute pirated books and publications, the attorneys say. Mary Rasenberger, a lawyer and the CEO of the Authors Guild, says plaintiffs in copyright lawsuits can claim up to $150,000 per work infringed. In the class-action cases against OpenAI and Meta, an unlimited number of class members can join the lawsuits, though it is unclear how authors’ works will be valued, she says.
“What we want to do is regulate AI, and as part of that regulation we need to make sure that writers are getting paid,” says Rasenberger.
To that end, the Authors Guild has been working with congressional leaders on measures to protect authors’ rights in the development of language-generative AI. For example, the organization has proposed creating a body called a collective management organization, which would grant licenses for authors’ work and negotiate “meaningful, fair fees” paid by AI companies. These fees would then be distributed to authors, remunerating them for both past and future uses of their writing.
“The amount should be significant enough that all authors whose works were used feel the benefit of it...not just pennies on the dollar,” says Rasenberger.
Compensation is particularly important at the moment. The Authors Guild reports that writers have seen their income decline by 40 percent over the past decade. In 2022, a full-time author earned a median income of $23,330 according to the Guild’s most recent survey.
“Would it kill these companies to shell out the measly price of thirty-three books?” Atwood quipped in an essay published in the Atlantic in August, where she disclosed that more than three dozen “pirated copies” of her books have been used by AI companies to train their language generators. “Beyond the royalties and copyrights, what concerns me is the idea that an author’s voice and mind are replicable,” she wrote.
Atwood was one of ten thousand signatories of an open letter published in June from the Authors Guild demanding that OpenAI, Meta, and other AI companies refrain from using authors’ works without their consent and without fairly crediting and compensating them. She was joined by Michael Chabon, Jonathan Franzen, Roxane Gay, Celeste Ng, and other literary luminaries and advocates. The letter claims that AI companies cannot defend their treatment of authors’ works with the “fair use” legal doctrine, which permits copyrighted works to be used for educational and research purposes. The Authors Guild contends that fair-use claims are bogus because companies have relied upon “notorious piracy websites” to access authors’ words and that their AI tools result in a commercial repackaging of human works.
In the past, authors could determine if ChatGPT had access to their work by prompting the chatbot to quote from it, Rasenberger says. Now, however, ChatGPT typically will not offer extensive quotations when prompted and denies being trained on “the full text of copyrighted books,” as the chatbot put it when queried in September. Presumably, if the class-action lawsuits move forward, attorneys will obtain documentation showing which authors’ works were used to train AI; those authors will then be able to join the suits as class members, says Rasenberger.
Michael Littman, a computer science professor at Brown University, says there’s an “ethical stickiness” in training AI with human writing. “When this type of work started, it was scientific, less problematic. Now it’s being turned around and threatening livelihoods,” he says. It might take a while before society finds the appropriate balance between advancing artificial intelligence and respecting the rights of human creators. But in the long-running conversation about intellectual property, one thing is not up for debate: “If people’s thoughts and creations are being used [without credit], it’s not right,” Littman says.
Enma Karina Elias is a writer living in the Pacific Northwest.