Anthropic vs China: The AI Model Theft Scandal Explained

The story of DeepSeek, Moonshot, MiniMax, and the industrial-scale heist that shook Silicon Valley

The Accusation That Rocked the AI World

Something big dropped on a Monday in February 2026. Anthropic, the San Francisco-based AI safety company behind the Claude chatbot, went public with a bombshell. Three prominent Chinese AI firms DeepSeek, Moonshot AI, and MiniMax allegedly ran coordinated, industrial-scale campaigns to secretly extract Claude’s capabilities and use them to train their own competing models.

This wasn’t a minor terms-of-service violation. It wasn’t a rogue developer poking around an API. According to Anthropic, these were deliberate, systematic operations. The three companies generated over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts. The goal? To quietly reverse-engineer one of America’s most advanced AI systems without paying for it, without building it, and without anyone noticing.

The AI world took notice immediately. And the implications stretch far beyond corporate rivalry.

What Is Distillation And Why Does It Matter?

Let’s break this down. “Distillation” is a real, legitimate technique in machine learning. It involves training a smaller, less powerful “student” model on the outputs of a larger, more capable “teacher” model. Done properly, it helps companies build leaner, cheaper versions of their AI products. It’s widely used, well-documented and It’s not inherently wrong.

But here’s where it gets complicated.

When you apply distillation to a competitor’s proprietary model without permission, at massive scale, using fake accounts to bypass access restrictions it stops being a training technique. It becomes theft. Anthropic put it plainly: distillation “can also be used for illicit purposes,” including acquiring “powerful capabilities from other labs in a fraction of the time, and at a fraction of the cost, that it would take to develop them independently.”

Training a frontier AI model from scratch costs hundreds of millions of dollars in compute alone. Distillation from an existing model can shortcut most of that expense. That’s not a minor competitive edge. That’s a cheat code.

The Numbers Are Staggering

Each of the three accused companies ran a distinct operation. The scale of each one tells its own story.

MiniMax — racked up more than 13 million exchanges with Claude. Their operation zeroed in on agentic coding and tool orchestration the kind of complex, multi-step reasoning that makes Claude particularly valuable for developers. Anthropic says it detected this campaign while it was still active. When Anthropic released a new model version, MiniMax pivoted within 24 hours, redirecting nearly half their traffic to extract capabilities from the latest system. That’s not accidental. That’s a coordinated intelligence operation.

Moonshot AI — generated over 3.4 million requests, focusing on computer vision, data analysis, and agentic reasoning. This group used hundreds of varied accounts to obscure their coordinated activity. Anthropic says it matched request metadata to the public profiles of senior staff at the foreign laboratory. In a later phase, Moonshot allegedly attempted to extract and reconstruct Claude’s reasoning traces essentially mapping out how the model thinks.

DeepSeek — the company that already sent shockwaves through the AI industry with its surprisingly capable, cost-efficient models held over 150,000 exchanges with Claude. Their focus was on reasoning capabilities and rubric-based grading data. They forced Claude to map out its internal logic step-by-step, generating massive volumes of chain-of-thought training data. They also extracted what Anthropic calls “censorship-safe alternatives to politically sensitive questions” training their own system to steer conversations away from topics like dissidents, party leaders, and authoritarianism.

The “Hydra Cluster” Problem

How did they pull this off without getting caught sooner? The answer is infrastructure.

These operations didn’t use a handful of accounts. They deployed what Anthropic calls “hydra cluster” architectures sprawling proxy networks that distribute traffic across APIs and third-party cloud platforms. One identified proxy network managed more than 20,000 fraudulent accounts simultaneously. These networks mixed distillation traffic with standard customer requests to evade detection.

The name “hydra” is apt. As Anthropic noted, “when one account is banned, a new one takes its place.” There’s no single point of failure. You cut off one head, two more grow back.

Because Anthropic blocks commercial access in China for national security reasons, the attackers bypassed regional restrictions by routing traffic through commercial proxy services that resell access to Western AI models. The result was an off-the-books pipeline that turned Claude into an unwilling teacher for models being developed inside China’s increasingly competitive AI sector.

A National Security Crisis in Disguise

This isn’t just a corporate IP dispute. Anthropic frames it as a national security threat and the argument is hard to dismiss.

According to Anthropic, illicitly distilled models are “unlikely” to carry over the safety guardrails built into the original. U.S. developers build protections specifically to prevent state and non-state actors from using AI systems to develop bioweapons, run disinformation campaigns, or conduct mass surveillance. Cloned systems lack those protections entirely.

“Foreign labs that distill American models can then feed these unprotected capabilities into military, intelligence, and surveillance systems,” Anthropic wrote “enabling authoritarian governments to deploy frontier AI for offensive cyber operations, disinformation campaigns, and mass surveillance.”

If those distilled models get open-sourced, the danger multiplies further. Capabilities spread freely, beyond any single government’s control.

WebProNews reported that Anthropic identified at least 16 distinct Chinese entities involved in distillation activity not just the three named publicly. The Biden and Trump administrations have both imposed stringent export controls on advanced semiconductors to slow China’s AI progress. But distillation suggests Chinese firms may be finding alternative routes to capability gains not by acquiring chips, but by extracting knowledge embedded in American-built AI models. Existing export controls were never designed to address this vector.

The Uncomfortable Irony

Here’s where the story gets messy. Anthropic’s accusations landed with a thud but not everyone responded with sympathy.

Online reaction was notably skeptical. Commentators quickly pointed out that Anthropic itself has faced high-profile accusations of overreaching in its own data collection practices. In September 2025, Anthropic settled a landmark copyright lawsuit brought by thousands of authors for $1.5 billion paying writers around $3,000 per book for roughly 500,000 works allegedly downloaded from shadow libraries to train its AI models. There’s also a separate case involving alleged unauthorized scraping of Reddit content.

“How the turn tables,” wrote one commenter on Reddit’s r/singularity thread, quoting the beloved The Office malapropism.

Fortune noted that OpenAI has similarly pushed for aggressive enforcement against foreign competitors accused of copying proprietary systems even as it defends its own sprawling data collection under the banner of fair use. The uncomfortable symmetry at the heart of modern AI is hard to ignore. The entire industry was built on remixing human work. Now it’s upset that someone is remixing its work.

That said, the scale and intent of what Anthropic describes is qualitatively different from broad training data collection. Scripted pipelines issuing millions of targeted prompts to extract specific capabilities is not the same as training on publicly available text. The legal and ethical distinctions matter even if the optics are awkward.

How Do You Defend Against This?

Anthropic isn’t just pointing fingers. It’s also laying out a defense playbook.

The company advises implementing behavioral fingerprinting and traffic classifiers designed to identify distillation patterns in API traffic. IT leaders need to strengthen verification processes for common vulnerability pathways educational accounts, security research programs, startup organizations. Companies should integrate product-level and API-level safeguards to reduce the efficacy of model outputs for illicit distillation, without degrading the experience for legitimate users.

Detecting coordinated activity across large numbers of accounts is essential. This includes monitoring specifically for the continuous elicitation of chain-of-thought outputs the step-by-step reasoning traces that make the richest training data.

But experts caution that these defenses are far from foolproof. Sophisticated actors can distribute queries across thousands of accounts to avoid rate limits, use prompt engineering to disguise systematic harvesting, and apply post-processing to strip watermarks. The cat-and-mouse dynamic is familiar to anyone who has followed cybersecurity. Defenders must protect every vector. Attackers need only find one gap.

What Anthropic Wants Washington to Do

Anthropic is calling on the AI industry, cloud providers, and lawmakers to act. The company is urging “rapid, coordinated action among industry players, policymakers, and the global AI community.”

Specifically, Anthropic argues that restricted chip access could limit both direct model training and the scale of illicit distillation. It’s also pushing for cross-industry intelligence sharing because these attacks are growing in intensity and sophistication, and no single company can fight them alone.

The legal framework for addressing cross-border AI model distillation remains underdeveloped. In the U.S., companies can pursue claims under the Defend Trade Secrets Act, the Computer Fraud and Abuse Act, and contractual terms of service. But enforcing any of these against entities based in China presents enormous practical difficulties. Chinese courts are unlikely to enforce U.S. judgments. The legal ambiguity currently benefits the alleged distillers.

Some legal scholars argue the U.S. government should treat large-scale model distillation as a form of economic espionage, potentially triggering sanctions or diplomatic responses. Others say the AI industry needs new international agreements analogous to treaties governing intellectual property in pharmaceuticals or semiconductors to establish clear rules of the road.

The Bigger Picture

Anthropic’s disclosure is notable not just for its substance, but for its timing and tone. The company has generally avoided aggressive public posturing. Its decision to go public naming the scale of the alleged distillation and framing it as a systemic threat signals that the problem has become severe enough to break from its usual restraint.

The disclosure also serves a clear policy purpose. By documenting the distillation threat in concrete terms, Anthropic strengthens the case for regulatory action at a moment when lawmakers are already grappling with how to govern AI.

For the Chinese firms involved, the accusations carry real reputational risk. DeepSeek, Moonshot, and MiniMax have attracted significant venture capital and government support on the strength of their technical achievements. If those achievements are shown to rest partly on distilled knowledge from American models, investor confidence could take a hit.

At the same time, China’s AI sector has made genuine progress driven by engineering talent, government investment, and access to enormous domestic datasets. Not all Chinese AI capability can be attributed to distillation. The challenge is distinguishing legitimate innovation from capability gains derived from unauthorized use of foreign models a distinction that is technically difficult to make and politically charged on both sides of the Pacific.

One thing is clear. The AI arms race just got a lot more complicated.