The Ghost in the Machine: Claude Mythos, Anthropic's Secret Weapon, and the AGI Question No One Is Ready to Answer

The Leak That Broke the Internet

On March 26, 2026, a pair of cybersecurity researchers — Roy Paz of LayerX Security and Alexandre Pauwels of the University of Cambridge — stumbled across something extraordinary while scanning the internet for exposed data stores. Sitting in a publicly accessible, unsecured data lake belonging to one of the most safety-obsessed AI companies on Earth was a cache of nearly 3,000 unpublished assets: images, PDFs, draft blog posts, internal documents, and, most importantly, an unannounced announcement.

The document described a model called Claude Mythos, internally codenamed Capybara — a model Anthropic called “by far the most powerful AI model we’ve ever developed.” The irony was almost cosmic: an AI company warning the world about unprecedented cybersecurity risks had accidentally left its most sensitive product announcement in a publicly searchable folder, just waiting to be found.

Anthropic, to its credit, didn’t deny it. A spokesperson confirmed to Fortune that yes, the model was real, yes, it represented “a step change,” and yes, it was already in the hands of early access customers. Then the company quietly locked the data store, and the internet exploded.

This article is about what that leak revealed, what it implies about how Anthropic actually builds its tools, how a model like Mythos might be used to improve AI from the inside out, and whether — in the background of all this — Anthropic has already quietly crossed the threshold that every lab says it hasn’t crossed yet: AGI.

Part I: What Is Claude Mythos?

The Accidental Announcement

The leaked draft blog post was structured like a formal product launch page — complete with headings, a publication date, benchmarks, and a staged rollout strategy. This wasn’t a speculative whiteboard session. Anthropic had completed training on Mythos and was actively running it with select customers at the time of the leak.

The confirmed details are striking. The model’s codename is Capybara, its official name is Claude Mythos — speculated by some analysts to be Claude Mythos 5.0 or Capybara 5.0 based on internal testing screenshots circulating as of March 30, 2026. More significantly, it occupies a brand-new tier in Anthropic’s model hierarchy, sitting above Opus in a position that has never existed before. Previously the ranking ran Haiku → Sonnet → Opus; Capybara/Mythos sits above all of them. Multiple sources have cited leaked materials pointing to a 10-trillion parameter model.

As of this writing, it remains in internal beta, available to a small group of early-access customers and not yet publicly released. Internal documents describe it as expensive to run and not ready for general availability. Its performance, per the leaked post, is described as “dramatically higher” than Claude Opus 4.6 on “tests of software coding, academic reasoning, and cybersecurity, among others.”

The Cybersecurity Bombshell

The part of the leak that triggered a market sell-off and congressional briefing requests wasn’t the model’s existence — it was what Anthropic said about it internally.

In its own draft blog post, Anthropic wrote that Mythos is “currently far ahead of any other AI model in cyber capabilities” and that it “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders.”

This language didn’t come from a critic or a regulator. This was Anthropic talking to itself — and it is genuinely alarming.

As Axios reported, Anthropic has been privately warning top government officials that Mythos could make large-scale cyberattacks dramatically more likely in 2026. The model, once deployed, can operate agents with what’s described as “wild sophistication and precision” to autonomously penetrate systems.

To illustrate how serious this claim is, consider what was demonstrated with the current model — not Mythos — at the [un]prompted conference in San Francisco just days after the leak. Anthropic researcher Nicholas Carlini showed Claude Opus 4.6 independently cracking a 20-year-old Linux kernel vulnerability in 90 minutes, and exploiting a never-before-discovered SQL injection bug in a GitHub-popular CMS, without any human guidance. That was the older model. Mythos is supposed to be dramatically more capable than that.

Why Is It Being Held Back?

The rollout strategy in the leaked document is telling: Anthropic planned to release Mythos first to cybersecurity organizations, giving defenders a head start before offensive actors can weaponize it. This is a highly unusual and arguably unprecedented approach to a model launch — and it raises a question the AI industry has never quite had to answer before: what do you do when the thing you’ve built is dangerous enough that the launch strategy has to account for it before day one?

Industry analysts are also speculating that the Mythos release timing may be calibrated against Anthropic’s anticipated Q3 2026 IPO, adding a commercial dimension to the delay calculus. A Polymarket prediction market opened on the model’s release date, currently sitting at a 73% probability of a June 2026 launch. Whether the leak accelerates or delays that timeline remains to be seen.

Part II: Was Mythos Used to Build Anthropic’s Own Tools?

Anthropic’s Habit of Eating Its Own Cooking

Before getting to Mythos specifically, it’s essential to understand the cultural and engineering reality at Anthropic: the company builds its products with its own AI, aggressively and openly. This isn’t a footnote — it’s a defining characteristic of how the organization works.

The clearest recent example is Claude Cowork, Anthropic’s productivity agent for non-developers, released in January 2026 as a research preview. When Mashable asked Boris Cherny, Anthropic’s head of Claude Code, how much of Cowork was built using Claude Code, his answer was a single word: “All of it.”

Business Insider reported the tool was built in under two weeks, almost entirely by Claude itself. Axios confirmed this and framed it as the starkest example yet of AI writing itself into existence. And crucially, this wasn’t a planned product — it emerged organically. Anthropic’s own internal data revealed that Claude Code users, including Anthropic engineers, had already been stretching the tool far beyond coding: planning workflows, organizing files, drafting documents, running product cycles. Cowork was Anthropic acknowledging and formalizing behavior that was already happening.

VentureBeat quoted Anthropic explicitly stating: “In 2025 Claude transformed how developers work, and in 2026 it will do the same for knowledge work.” That’s not just a product pitch. It’s a confession that Claude is the scaffolding underneath everything Anthropic ships.

The Mythos Probability

So was Claude Mythos specifically used to build the company’s recent wave of tools — Code Review for Claude Code, Voice Mode, the Dispatch feature, the Cowork computer use integration, MCP connectors, and more?

Directly, for most of 2025 and early 2026, probably not. Mythos appears to have completed training in 2025 or very early 2026 and was only available internally before the leak. The majority of tools launched in that period were built on Claude Opus 4.5 and Opus 4.6, the publicly available frontier models.

Indirectly, and increasingly? Almost certainly yes.

The reasoning is structural. If Anthropic engineers are using Mythos internally for coding, research, and product development — and if Mythos is dramatically better at those tasks than Opus — the output quality of everything they build rises accordingly, even if Mythos never touches the public-facing product. The model becomes invisible infrastructure.

Then there’s the “AI writes 90-100% of our code” threshold. VentureBeat’s February 2026 enterprise briefing coverage quotes Anthropic citing use cases where “AI writes 90 or sometimes even 100% of the code.” With Mythos available internally, the ceiling on what’s possible in that pipeline rises dramatically. A 10-trillion parameter model with Mythos-grade reasoning doesn’t just write faster code — it writes better architectures, catches edge cases earlier, and proposes system designs that junior engineers might miss entirely.

Mythos’s reported strengths are also precisely the capabilities needed to design the kinds of complex agentic tools Anthropic has been shipping. Claude Code Security, multi-agent Cowork architectures, the Dispatch remote-control system — all require exactly the multi-step reasoning and software architecture capability that Mythos is said to excel at. It strains credulity to imagine Anthropic has a 10-trillion parameter coding monster sitting in its internal environment and not using it to build the next version of its own products.

CXO Today noted that Anthropic’s leadership has boasted about automating much of its internal software development — which, if the internal model is Mythos-grade, makes the irony of “AI leaked itself” even more pointed.

The Bigger Pattern: AI as Internal Infrastructure

This is no longer a hypothetical or a PR stunt. Built In’s analysis describes how Claude Code reached $1 billion in run-rate revenue just six months after general availability, driven largely by enterprise teams who adopted the Anthropic-eats-its-own-dog-food model internally. The product validated itself by being the product.

The implication is structural: Anthropic has created a recursive engineering loop. Claude builds tools. Those tools are used to train and evaluate better Claudes. Better Claudes build better tools. Mythos, if as capable as advertised, represents a potential step-change in that loop — one that would make every subsequent public product inherently more powerful, even if end users never interact with Mythos directly.

Part III: Is Mythos Being Used to Improve Older and Future Models?

What “Self-Improvement” Actually Means in Practice

The term “AI self-improvement” conjures science fiction. The reality is more bureaucratic, but no less significant. There are several well-established mechanisms by which a more capable model improves its successors — and Mythos is positioned to exercise all of them.

Synthetic data generation is perhaps the most immediate. A more powerful model generates higher-quality training examples — code, reasoning chains, step-by-step instructions, annotated problem solutions — that can be used to fine-tune smaller or future models. This is already a standard industry practice. If Mythos is generating synthetic training data for Haiku- and Sonnet-class models, the quality of those models quietly improves without a single new human data point entering the pipeline.

Constitutional AI and RLAIF represent a second mechanism. Anthropic pioneered Constitutional AI, where one model critiques and refines the outputs of another — a technique that forms the backbone of how Anthropic trains for safety and helpfulness. If Mythos is now serving as the “judge” model in Anthropic’s reinforcement learning pipelines, every model in the lineup benefits from Mythos-level oversight. The feedback signal becomes sharper, the alignment more precise, the outputs more reliable.

Red-teaming and safety evaluation offer a third avenue. Mythos’s extraordinary cybersecurity capabilities — the same ones that alarmed Anthropic’s own team — make it an obvious choice for internal red-teaming: having the model attempt to break, manipulate, or jailbreak existing Claude models to identify weaknesses before those weaknesses can be exploited externally. The safer and more robust Opus 4.6 became before its February 2026 release, the more likely it is that a capable internal model was used to stress-test it.

The Capybara Tier and Model Distillation

The leaked document described Capybara/Mythos not just as a top-tier public model but as a new class of model — one “larger and more intelligent than our Opus models.” This language is significant for what it implies about its role in the broader model hierarchy.

In machine learning, it’s common to train a very large “teacher” model and then use its outputs to distill a smaller, faster, cheaper “student” model. The student inherits capabilities it couldn’t have developed from scratch at its parameter count. If Mythos, at 10 trillion parameters, is serving as the teacher, every future Sonnet and Haiku model could be a Mythos-distilled artifact — carrying capabilities far beyond what their size would naively suggest.

This would explain Anthropic’s aggressive product release cadence in late 2025 and early 2026. The benchmarks keep jumping. Opus 4.6 demonstrably outperforms its predecessors in ways that felt sudden to outside observers. If Mythos has been in the training loop — as teacher, as red-teamer, as synthetic data engine — the improvements aren’t mysterious at all. They’re downstream effects of having a vastly more capable model whispering in the ear of every model that follows it.

What Comes After Mythos?

If Mythos is Claude 5, the question becomes: what is Claude 6? And the answer implied by the recursive loop is uncomfortable: Claude 6 is what Mythos decides it should be.

The internal documents suggest Anthropic believes Mythos represents a genuine capability inflection — not just an incremental step. The company’s language around cybersecurity implies it’s not simply better; it’s categorically different. If so, Mythos is both a product and a research engine, shaping what Anthropic knows is possible and therefore what it builds next. Every model trained after Mythos exists in its shadow, whether or not that relationship is ever disclosed.

Part IV: Does Anthropic Have AGI?

The Definitional Problem

No conversation about Anthropic and AGI can start without acknowledging that Anthropic’s own CEO, Dario Amodei, thinks “AGI” is a marketing term. In a January 2026 appearance on CNBC, Amodei said: “AGI has never been a well-defined term, for me. I’ve always thought of it as a marketing term.”

His preferred frame is more evocative and, frankly, more alarming than any acronym. He describes a future AI system as “a country of geniuses in a data center” — smarter than a Nobel Prize winner across most relevant fields, capable of running millions of simultaneous instances, able to execute multi-week autonomous tasks, and doing all of this at ten to one hundred times human speed.

By his own definition, does Mythos qualify? Almost certainly not — yet. But it is walking the boundary in ways that make the question less theoretical than it was twelve months ago.

Amodei’s Timeline vs. What Mythos Implies

At Davos 2026, Amodei made a series of specific and startling predictions. He noted that some Anthropic engineers already “don’t write any code anymore” and predicted that software engineering as a profession faces replacement within six to twelve months. He placed fifty percent of white-collar job disruption within one to five years, and Nobel-level scientific AI within approximately two years — around 2028. He wrote in his landmark 2024 essay Machines of Loving Grace that powerful AI “could come as early as 2026.”

As WebPRONews reported, Amodei told Dwarkesh Patel that AGI-level systems are “likely within two to three years” — a timeline he has been tightening, not loosening, with each passing quarter.

And yet — here we are in March 2026, with a model that Anthropic itself describes as “far ahead of any other AI model in cyber capabilities,” that independently cracked a 20-year-old Linux vulnerability, that reportedly alarmed even the engineers who built it. The question becomes less “when is AGI?” and more “what are we waiting to call it?”

Three Expert Camps

At Davos 2026, three of the most credible voices in AI publicly disagreed about when and whether AGI arrives. Dario Amodei believes AGI-level capability is approximately two years away, achievable via continued scaling and architectural improvements. Demis Hassabis of Google DeepMind places the timeline at five to ten years, arguing that significant breakthroughs beyond scaling are still required. Yann LeCun of Meta holds that AGI is effectively impossible with current large language model architectures — that fundamentally different approaches to intelligence are needed before the term even becomes meaningful.

The fact that these three — who are building the most advanced AI systems on Earth — cannot agree tells us something crucial: the uncertainty is genuine, not performative. This is not a PR debate. These are people looking at the same underlying reality and drawing radically different conclusions.

The Case That Mythos Is Near-AGI

The strongest argument for the “near-AGI” reading of Claude Mythos begins with its autonomous cybersecurity capabilities. A model that independently discovers zero-day vulnerabilities in production Linux code, without human scaffolding, is performing tasks that most humans cannot perform at all, not merely performing them faster. This is a meaningful distinction. Speed of execution is quantitative. The ability to do something humans fundamentally cannot is qualitative.

Compounding this, Claude Code and Cowork already allow for multi-day autonomous task completion — the model keeps working while the human is away. Mythos, significantly more capable, would extend that window further and reduce the frequency of failures, errors, and dead ends. At some point, “autonomous multi-day task completion with low error rates” starts to sound less like a product feature and more like a capability threshold.

The leaked document describes behaviors implying high-level planning and improvisation under novel conditions — adapting to unexpected results, rerouting around obstacles, selecting among strategies without being explicitly instructed. These are precisely the behaviors that narrow AI cannot exhibit and that AGI definitions typically require.

Most telling is Anthropic’s own language. Companies don’t warn their internal teams about AI models using phrases like “unprecedented cybersecurity risks” and “far ahead of any other model” unless they’re genuinely unnerved. Anthropic has spent years carefully hedging its language about AI capabilities. The leaked document sounds like a company that briefly forgot to hedge.

The Case That It’s Not

The counter-argument begins with the narrowness of Mythos’s reported capability profile. Its strengths are concentrated in coding, reasoning, and cybersecurity. True AGI, by virtually any definition, requires broad, general competence across physical, social, emotional, and creative domains. Excelling at Python and vulnerability discovery is not the same as navigating a novel social situation, improvising in an unfamiliar physical environment, or understanding grief.

There are also the well-documented limitations that affect even the most advanced LLMs. Context windows still constrain long-horizon planning. Hallucination, while reduced, remains a feature of the architecture. Genuine persistent memory — the kind that accumulates and integrates experience over time — does not yet exist in any deployed system. These are not minor gaps. They are fundamental to what AGI would need to be.

Anthropic’s impressive benchmark numbers also deserve scrutiny. Scoring dramatically higher on “tests of software coding, academic reasoning, and cybersecurity” is significant, but benchmarks are specifically engineered to be solvable in ways the real world is not. The real world is open-ended, poorly specified, adversarial, and physically embodied. No benchmark captures that, and a model that dominates benchmarks may still fail unpredictably in deployment.

Finally, Mythos works on screens. It cannot reach into the physical world without robotic intermediaries. Amodei’s own definition of transformative AI requires real-world action capability. Until that gap is closed, “a country of geniuses in a data center” remains an aspiration.

Part V: What Happens Next — and Why It Matters

The IPO Dimension

Anthropic is reportedly preparing for a Q3 2026 IPO, potentially at a $350 billion valuation. The timing of Mythos’s eventual public release will almost certainly be calibrated against that event. A model described by its own creator as “unprecedented” — and confirmed to exist by an accidental leak that generated international headlines — is the most powerful IPO marketing tool in AI history, provided Anthropic can control the narrative around it.

The accidental leak may have scrambled that plan. Or it may be that the narrative was already beyond any single company’s control the moment Mythos finished training.

The Competitive Landscape

OpenAI’s rumored “Spud” model is widely speculated to be a Mythos rival. Elsewhere, Z.AI’s GLM 5.1 offers 94% of Claude Opus 4.6’s coding performance at $10 a month, raising the question of how long Anthropic’s pricing model holds. Google’s Gemini continues to advance. And the Chinese state-sponsored hacking groups that Anthropic has already documented exploiting Claude models will almost certainly attempt to exploit Mythos the moment it becomes accessible — which is precisely the scenario that makes the cybersecurity-first launch strategy less theatrical precaution and more operational necessity.

The pressure from all directions is immense, and it is only going to intensify.

The Responsibility Paradox

Here is the sharpest tension in the entire story. Anthropic was founded explicitly because its founders believed AI was too dangerous to develop carelessly. They left OpenAI over safety concerns. They built Constitutional AI. They published extensive Responsible Scaling Policies. They have spent years positioning themselves as the adults in the room — the company that would ring the alarm before something went wrong.

And now they have accidentally broadcast to the world that they’ve built something they themselves describe as an unprecedented threat. Their own internal document contains the clearest statement of AI risk made by any major lab in history — and it was sitting in an unsecured data store, accessible to anyone with a browser.

This is either the best possible advertisement for why safety-focused labs should be the ones building frontier AI — or the best possible argument that no lab can be trusted with capabilities at this level. Depending on what Mythos turns out to be, it may be both at once.

Closing Thought

Claude Mythos may not be AGI. But it is something. Something that Anthropic’s own engineers find alarming. Something that is reshaping how the company builds its tools, trains its models, and thinks about what comes next. Something that may already be quietly shaping the next generation of models being trained right now — as teacher, as judge, as ghost in the machine.

The irony of the leak isn’t just that a safety-focused lab forgot to mark a folder private. It’s that the folder contained evidence of exactly the future they’ve been warning everyone about — and it was sitting there, publicly accessible, the whole time.

The Ghost in the Machine: Claude Mythos, Anthropic’s Secret Weapon, and the AGI Question No One Is Ready to Answer

Curtis Pyke

Related Posts

Google’s Remy and the Race to Build the AI Agent That Actually Does Stuff

Codex Just Landed in the ChatGPT Mobile App: Inside OpenAI’s Push to Make AI Coding Truly Portable

xAI Drops Grok Build: An Agentic CLI That Wants to Live in Your Terminal

Comments 4

Leave a Reply Cancel reply

Recent News

Google’s Remy and the Race to Build the AI Agent That Actually Does Stuff

Codex Just Landed in the ChatGPT Mobile App: Inside OpenAI’s Push to Make AI Coding Truly Portable

xAI Drops Grok Build: An Agentic CLI That Wants to Live in Your Terminal

Google Fitbit Air Explained: How AI Is Transforming Wearable Fitness Tracking

The Best in A.I.

Recent Posts

Recent News

Google’s Remy and the Race to Build the AI Agent That Actually Does Stuff

Codex Just Landed in the ChatGPT Mobile App: Inside OpenAI’s Push to Make AI Coding Truly Portable

Welcome Back!

Retrieve your password