Inside Project Panama: 2 million books scanned and destroyed by Anthropic AI to train machines
Information follows a simple rule: The more it is consumed, the more it is reshaped and shared. Human intelligence grows by absorbing knowledge, combining it, and passing it forward in new forms. From oral stories to inscriptions, from letters to books, and from computers to algorithms, each stage of progress has depended on preserving what came before and reworking it for what comes next. Artificial intelligence is built on the same principle. Systems designed to deliver answers on everything, everywhere, at once require access to the widest and most reliable record of human thought.
For centuries, the most trusted and durable form of information has been the book. Books record humanity’s long arc, from early tools and survival to rockets reaching Mars, from handwritten letters to digital networks, from foraging for food to ordering dinner in minutes. They preserve ideas across generations. Inside Anthropic, planners viewed books as a concentrated form of human knowledge, shaped by editors, authors, and time. They believed long-form texts could teach artificial intelligence how to reason and write more clearly than fragmented online content.
That belief led to an internal effort later known as Project Panama. Court filings unsealed in a copyright lawsuit reveal how it worked. Anthropic bought physical books in bulk. The books were cut apart and scanned at high speed. Once digitised, the paper copies were recycled. The goal was to rapidly expand the amount of book-based data used to train the company’s AI systems. The scope of the project became public after a report by The Washington Post offered a rare glimpse into how aggressively AI firms pursued high-quality text as competition to build more capable chatbots intensified.
Internal documents show the company chose this approach instead of negotiating licences at scale. Executives argued that buying physical copies and digitising them internally was faster and more practical. The strategy also reflected the increasingly fierce race to dominate artificial intelligence, where each advance can translate directly into market share, investment, and revenue. In an industry moving at breakneck speed, new developments emerge almost daily, and access to high-quality data has become one of the most valuable assets in the push to turn AI capability into commercial power.
Vendor proposals and court records indicate Anthropic sought scanning capacity for 500,000 to 2 million books over roughly six months. Although the precise final number remains redacted, filings repeatedly describe the purchase and destruction of millions of volumes, acquired in batches of tens of thousands. The project involved tens of millions of dollars in spending on books, logistics, and scanning services, underscoring how central books had become to AI training strategies.
Once purchased, books were sent to commercial vendors equipped for industrial document processing. Hydraulic cutting machines removed the spines, allowing pages to be scanned on high-speed production equipment. After digitisation, the paper copies were scheduled for recycling. The process was intentionally irreversible, leaving no physical archive behind. Preservation specialists note that this distinguishes Project Panama from earlier digitisation efforts, which typically retained original copies.
Court records suggest Anthropic viewed destructive scanning as a safer alternative to downloading large pirated digital libraries. The approach drew on lessons from earlier mass digitisation efforts, including Google Books, and was shaped in part by Tom Turvey, who previously worked on that project. Unlike Google Books, however, Project Panama prioritised speed and exclusivity over public access or preservation.
A federal judge later ruled that training AI models on books can qualify as fair use when the process is transformative. However, the court also found that Anthropic’s earlier downloads of pirated books raised separate copyright concerns, making clear that how training data is acquired remains legally significant even if the training itself is permitted.
Authors reacted strongly to the disclosures, arguing that AI companies benefited from creative work without consent or compensation. Ed Newton-Rex, a former AI executive, said the case illustrated a growing imbalance between technology firms and creators whose work underpins modern AI systems. He and others have argued that existing copyright frameworks do not adequately address large-scale machine learning.
In 2025, Anthropic agreed to pay $1.5 billion to settle claims related to its earlier use of pirated books, without admitting wrongdoing. Under the agreement, authors whose works were included can seek compensation estimated at about $3,000 per title, though payouts vary. Anthropic has said the settlement addressed acquisition practices rather than the legality of AI training itself.
Project Panama is not an isolated case. Court filings in other lawsuits show Meta employees debated downloading large shadow libraries of books, while OpenAI has acknowledged downloading similar datasets in the past before deleting them. Google and Microsoft also face ongoing legal challenges over AI training data.
Legal scholar James Grimmelmann has said the industry carried academic data-use norms into a commercial arms race, only confronting legal risks after massive investments had already been made. By then, he noted, companies were effectively locked into data pipelines that would be difficult to unwind.
The unsealed records surrounding Project Panama offer one of the clearest views yet into how modern AI systems are built. They show that behind consumer-facing chatbots lies an industrial pipeline involving large capital outlays, legal risk, and irreversible data extraction. The case has also sharpened debate over whether current copyright law is equipped to handle machine learning at scale.
As courts continue to define the limits of fair use in AI training, Project Panama stands as a defining example of the pressures shaping artificial intelligence development and the unresolved tension between technological progress and the rights of creators.
That belief led to an internal effort later known as Project Panama. Court filings unsealed in a copyright lawsuit reveal how it worked. Anthropic bought physical books in bulk. The books were cut apart and scanned at high speed. Once digitised, the paper copies were recycled. The goal was to rapidly expand the amount of book-based data used to train the company’s AI systems. The scope of the project became public after a report by The Washington Post offered a rare glimpse into how aggressively AI firms pursued high-quality text as competition to build more capable chatbots intensified.
Internal documents show the company chose this approach instead of negotiating licences at scale. Executives argued that buying physical copies and digitising them internally was faster and more practical. The strategy also reflected the increasingly fierce race to dominate artificial intelligence, where each advance can translate directly into market share, investment, and revenue. In an industry moving at breakneck speed, new developments emerge almost daily, and access to high-quality data has become one of the most valuable assets in the push to turn AI capability into commercial power.
How Project Panama processed millions of books
Vendor proposals and court records indicate Anthropic sought scanning capacity for 500,000 to 2 million books over roughly six months. Although the precise final number remains redacted, filings repeatedly describe the purchase and destruction of millions of volumes, acquired in batches of tens of thousands. The project involved tens of millions of dollars in spending on books, logistics, and scanning services, underscoring how central books had become to AI training strategies.
Once purchased, books were sent to commercial vendors equipped for industrial document processing. Hydraulic cutting machines removed the spines, allowing pages to be scanned on high-speed production equipment. After digitisation, the paper copies were scheduled for recycling. The process was intentionally irreversible, leaving no physical archive behind. Preservation specialists note that this distinguishes Project Panama from earlier digitisation efforts, which typically retained original copies.
A federal judge later ruled that training AI models on books can qualify as fair use when the process is transformative. However, the court also found that Anthropic’s earlier downloads of pirated books raised separate copyright concerns, making clear that how training data is acquired remains legally significant even if the training itself is permitted.
Authors’ response and settlement
Authors reacted strongly to the disclosures, arguing that AI companies benefited from creative work without consent or compensation. Ed Newton-Rex, a former AI executive, said the case illustrated a growing imbalance between technology firms and creators whose work underpins modern AI systems. He and others have argued that existing copyright frameworks do not adequately address large-scale machine learning.
In 2025, Anthropic agreed to pay $1.5 billion to settle claims related to its earlier use of pirated books, without admitting wrongdoing. Under the agreement, authors whose works were included can seek compensation estimated at about $3,000 per title, though payouts vary. Anthropic has said the settlement addressed acquisition practices rather than the legality of AI training itself.
Part of a wider industry pattern
Project Panama is not an isolated case. Court filings in other lawsuits show Meta employees debated downloading large shadow libraries of books, while OpenAI has acknowledged downloading similar datasets in the past before deleting them. Google and Microsoft also face ongoing legal challenges over AI training data.
Legal scholar James Grimmelmann has said the industry carried academic data-use norms into a commercial arms race, only confronting legal risks after massive investments had already been made. By then, he noted, companies were effectively locked into data pipelines that would be difficult to unwind.
The unsealed records surrounding Project Panama offer one of the clearest views yet into how modern AI systems are built. They show that behind consumer-facing chatbots lies an industrial pipeline involving large capital outlays, legal risk, and irreversible data extraction. The case has also sharpened debate over whether current copyright law is equipped to handle machine learning at scale.
As courts continue to define the limits of fair use in AI training, Project Panama stands as a defining example of the pressures shaping artificial intelligence development and the unresolved tension between technological progress and the rights of creators.
Top Comment
E
Enrico Will
8 hours ago
So why didn't they buy ebooks? ð ¤·Read allPost comment
end of article
Featured in Etimes
- Border 2 box office Day 13: Film nears Rs 300 cr amid drop
- Jisoo unveils 'Boyfriend On Demand' trailer - WATCH
- 'Dhurandhar 2' OTT rights get sold for MASSIVE amount
- Timothee Chalamet CHEATING on Kylie Jenner?
03:14 Did Talwiinder just CONFIRM his relationship with Disha?- Bhavana did not realise how serious her first award was
Trending Stories
- After Arijit Singh quits playback singing, Abhijeet Sawant says ‘singers get exploited a lot’
- 'During COVID, she kicked me out of the house’: R Madhavan on how Sarita pushed him back to work
- Exclusive: Backed by father Sanjay Bangar, Anaya Bangar to undergo gender-affirming surgery in March
- Vivek Oberoi moves to court for protection of personality rights
- 10 oldest restaurants in Bengaluru and their most popular dishes
- George Clooney as Amitabh Bachchan, Meryl Streep into Jaya Bachchan's shoes, Tom Cruise as Shah Rukh Khan: AI reimagines 'K3G' cast, Karan Johar reacts
- 'Indians came to America, and then they became...': Burt Thakur speaks on 'Indian takeover' at Frisco Council meeting
- 'The moment I stepped inside, the energy changed.' Seeing the only Jyotirlinga with 3 faces was an experience I can never forget
- Quote of the Day by Edgar Allan Poe, "Believe nothing you hear and only half..."
- Career Outlook After February 2026: Astrologers see clarity for Cancer, Virgo, Capricorn; work stress eases
Photostories
- 5 costly mistakes to avoid when purchasing your first home
- The 50: From an ugly clash between Rajat Dalal and Prince Narula to Elvish Yadav's friend Archit Kaushik physically attacking Maxtern: Highlights from the episode
- 5 best ways to cook broccoli for maximum health benefits
- 7 flower seeds to sow in February for a colourful garden all season
- Jupiter-inspired names for babies born on Thursday
- From Miley’s leather to Rosé’s black dress: 5 Grammy trends you can actually wear to work today
- 5 types of main door locks and which one is perfect for apartments
- 6 reality TV couples who didn’t last: Nick Thompson and Danielle Ruhl, Josh Oyinsan and Mimii Ngulube and more
- What’s streaming on Apple TV+ in February 2026: New seasons, romance, and monster mayhem
- From India to Russia: Stunning frozen rivers from around the world
Up Next