Inside Project Panama: 2 million books scanned and destroyed by Anthropic AI to train machines

| TOI Trending Desk | etimes.in | Feb 3, 2026, 03:12 IST

Information follows a simple rule: The more it is consumed, the more it is reshaped and shared. Human intelligence grows by absorbing knowledge, combining it, and passing it forward in new forms. From oral stories to inscriptions, from letters to books, and from computers to algorithms, each stage of progress has depended on preserving what came before and reworking it for what comes next. Artificial intelligence is built on the same principle. Systems designed to deliver answers on everything, everywhere, at once require access to the widest and most reliable record of human thought.

Tired of too many ads?go ad free now

For centuries, the most trusted and durable form of information has been the book. Books record humanity’s long arc, from early tools and survival to rockets reaching Mars, from handwritten letters to digital networks, from foraging for food to ordering dinner in minutes. They preserve ideas across generations. Inside Anthropic, planners viewed books as a concentrated form of human knowledge, shaped by editors, authors, and time. They believed long-form texts could teach artificial intelligence how to reason and write more clearly than fragmented online content.

That belief led to an internal effort later known as Project Panama. Court filings unsealed in a copyright lawsuit reveal how it worked. Anthropic bought physical books in bulk. The books were cut apart and scanned at high speed. Once digitised, the paper copies were recycled. The goal was to rapidly expand the amount of book-based data used to train the company’s AI systems. The scope of the project became public after a report by The Washington Post offered a rare glimpse into how aggressively AI firms pursued high-quality text as competition to build more capable chatbots intensified.

Internal documents show the company chose this approach instead of negotiating licences at scale. Executives argued that buying physical copies and digitising them internally was faster and more practical. The strategy also reflected the increasingly fierce race to dominate artificial intelligence, where each advance can translate directly into market share, investment, and revenue. In an industry moving at breakneck speed, new developments emerge almost daily, and access to high-quality data has become one of the most valuable assets in the push to turn AI capability into commercial power.

How Project Panama processed millions of books

Vendor proposals and court records indicate Anthropic sought scanning capacity for 500,000 to 2 million books over roughly six months. Although the precise final number remains redacted, filings repeatedly describe the purchase and destruction of millions of volumes, acquired in batches of tens of thousands. The project involved tens of millions of dollars in spending on books, logistics, and scanning services, underscoring how central books had become to AI training strategies.

Tired of too many ads?go ad free now

Once purchased, books were sent to commercial vendors equipped for industrial document processing. Hydraulic cutting machines removed the spines, allowing pages to be scanned on high-speed production equipment. After digitisation, the paper copies were scheduled for recycling. The process was intentionally irreversible, leaving no physical archive behind. Preservation specialists note that this distinguishes Project Panama from earlier digitisation efforts, which typically retained original copies.

Court records suggest Anthropic viewed destructive scanning as a safer alternative to downloading large pirated digital libraries. The approach drew on lessons from earlier mass digitisation efforts, including Google Books, and was shaped in part by Tom Turvey, who previously worked on that project. Unlike Google Books, however, Project Panama prioritised speed and exclusivity over public access or preservation.

A federal judge later ruled that training AI models on books can qualify as fair use when the process is transformative. However, the court also found that Anthropic’s earlier downloads of pirated books raised separate copyright concerns, making clear that how training data is acquired remains legally significant even if the training itself is permitted.

Tired of too many ads?go ad free now

Authors’ response and settlement

Authors reacted strongly to the disclosures, arguing that AI companies benefited from creative work without consent or compensation. Ed Newton-Rex, a former AI executive, said the case illustrated a growing imbalance between technology firms and creators whose work underpins modern AI systems. He and others have argued that existing copyright frameworks do not adequately address large-scale machine learning.

In 2025, Anthropic agreed to pay $1.5 billion to settle claims related to its earlier use of pirated books, without admitting wrongdoing. Under the agreement, authors whose works were included can seek compensation estimated at about $3,000 per title, though payouts vary. Anthropic has said the settlement addressed acquisition practices rather than the legality of AI training itself.

Tired of too many ads?go ad free now

Part of a wider industry pattern

Project Panama is not an isolated case. Court filings in other lawsuits show Meta employees debated downloading large shadow libraries of books, while OpenAI has acknowledged downloading similar datasets in the past before deleting them. Google and Microsoft also face ongoing legal challenges over AI training data.

Legal scholar James Grimmelmann has said the industry carried academic data-use norms into a commercial arms race, only confronting legal risks after massive investments had already been made. By then, he noted, companies were effectively locked into data pipelines that would be difficult to unwind.

Tired of too many ads?go ad free now

The unsealed records surrounding Project Panama offer one of the clearest views yet into how modern AI systems are built. They show that behind consumer-facing chatbots lies an industrial pipeline involving large capital outlays, legal risk, and irreversible data extraction. The case has also sharpened debate over whether current copyright law is equipped to handle machine learning at scale.

As courts continue to define the limits of fair use in AI training, Project Panama stands as a defining example of the pressures shaping artificial intelligence development and the unresolved tension between technological progress and the rights of creators.