Aussie Authors Are ‘Outraged’ After Discovering Their Books On A Dataset Used To Train A US AI

Aussie Authors Are 'Outraged' After Discovering Their Books On A Dataset Used To Train A US AI. Image below is of a long, winding shelf of books which takes up the whole frame.

Some of Australia’s most iconic authors might have had their work funneled through an AI without their consent, with one novelist calling it “the biggest act of copyright theft in history”.

According to Guardian Australia, as many as 18,000 books were allegedly pirated to US-based Books3 dataset to train an AI that is used by the likes of Meta. It’s not clear how many of them were actually published by Aussie authors, but they all had Australian ISBNs.

“We’re still working through [the data] to work out the impact in terms of Australian authors,” Australian Publishers Association (APA) spokesperson Stuart Glover told Guardian Australia.

“This is a massive legal and ethical challenge for the publishing industry and for authors globally.”

Booker-prize winning author Richard Flanagan found 10 of his books on the Books3 dataset, and feels as if his work is being stolen and used without his consent.

“I felt as if my soul had been strip mined and I was powerless to stop it,” he said in a statement.

“This is the biggest act of copyright theft in history.”

Other affected authors include Peter Carey, Helen Garner, Kate Grenville, Anna Funder, Christos Tsiolkas, Thomas Keneally, and dozens more.

Australian Society of Authors chief executive Olivia Lanchester said the use of the books was basically piracy, and that authors had the right to be outraged.

“The fact is this technology relies upon books, journals, essays written by authors, yet permission was not sought nor compensation granted,” she said.

“Turning a blind eye to the legitimate rights of copyright owners threatens to diminish already precarious creative careers.

“The enrichment of a few powerful companies is at the cost of thousands of individual creators. This is not how a fair market functions.”

This is particularly concerning given multiple big-name authors (including the likes of George R. R. Martin) have already tried to sue OpenAI (creator of ChatGPT) over similar allegations of illegally feeding their work through AI to train its language skills. So far, OpenAI reckons it exists outside the copyright laws they claim it breached.

The issue is of wide scope, too — writers striking with the WGA will be watching that court case closely, because the issue of scripts also being fed into AI without the copyright owner’s consent is one the WGA may be leaving out of its deal.

While Australian copyright law does protect people from their original content being used in data extraction, this stuff is still really hard to enforce — especially when it happens internationally. How many authors have the capacity to take this issue to court?

The law is yet to catch up with the increasingly fast pace of the digital landscape, but let’s hope it does soon because this is starting to feel dystopian.