Claude Opus 4.7 Is Here — And It's Smarter, Safer, and Surprisingly Honest

Anthropic just dropped its most powerful publicly available AI model yet. Here’s everything you need to know about Claude Opus 4.7 — the good, the great, and the deliberately held back.

The Big Drop Nobody Saw Coming (But Everyone Was Waiting For)

Let’s be real. The AI world has been buzzing ever since Anthropic teased Claude Mythos Preview — a model so powerful, so capable in cybersecurity, that the company decided it was too dangerous to release to the general public. That’s a wild sentence to write in 2026. An AI company built something and said, “Nope, not yet, world.”

So when Anthropic quietly dropped Claude Opus 4.7 on April 16, 2026, people paid attention. Fast.

This isn’t Mythos. It’s not trying to be. But Opus 4.7 is still a massive step forward — especially for developers, coders, and anyone who’s ever wanted an AI that actually tells you when you’re wrong. And yes, that last part is a bigger deal than it sounds.

Let’s break it all down.

Coding Just Got a Whole Lot More Powerful

If you’re a developer, this is the section you’ve been waiting for. Anthropic built Opus 4.7 with one clear mission: make it a beast at coding.

And the numbers back that up.

On the SWE-bench Pro coding benchmark, Opus 4.7 scores 64.3 percent. That’s up from 53.4 percent for Opus 4.6. For context, OpenAI’s GPT-5.4 sits at 57.7 percent. Opus 4.7 beats it. Cleanly.

Now, Claude Mythos Preview still leads the pack at 77.8 percent. But Mythos isn’t available to you or me. Opus 4.7 is. And that matters.

Anthropic says the model handles difficult, long-running coding tasks better than its predecessor. It’s more consistent. It self-checks before returning answers. It follows instructions more precisely — almost to a fault. In fact, Anthropic warns developers that prompts written for older models might produce unexpected results because Opus 4.7 interprets instructions more literally than Opus 4.6 did.

That’s not a bug. That’s a feature. But it does mean you might need to retune your prompts.

There’s also a new effort level called “xhigh” — slotting in between “high” and “max.” Think of it as the model’s turbo mode. You trade speed for deeper reasoning on complex tasks. For developers tackling gnarly engineering problems, that’s a game-changer.

Claude Code also gets a new /ultrareview command for dedicated code reviews, plus an expanded “Auto Mode” for Max users where Claude makes decisions on its own. That’s a lot of autonomy. And it signals where Anthropic is heading — toward AI that doesn’t just assist, but actually does the work.

It Sees Better, Too

A high-resolution digital interface showing an AI analyzing complex visual inputs: detailed charts, dense documents, and intricate diagrams. The AI overlays highlight key sections, zoom into fine details, and extract structured data. In the background, a neural network visual processes images, emphasizing enhanced visual intelligence and precision.

Coding isn’t the only upgrade. Opus 4.7 also got a serious visual intelligence boost.

The model now processes images at up to 2,576 pixels on the long edge — roughly 3.75 megapixels. That’s more than three times what earlier Claude models could handle. And this isn’t something you toggle in the API. It’s baked into the model itself.

Why does this matter? Think about computer-use agents that need to read dense screenshots. Or extracting data from complex diagrams. Or analyzing documents with intricate layouts. Opus 4.7 handles all of that better now.

On the Document Reasoning benchmark (OfficeQA Pro), the model reports 80.6 percent accuracy — up from 57.1 percent with Opus 4.6. That’s not a small jump. That’s a leap.

There are also significant gains in biomolecular reasoning and visual navigation (ScreenSpot-Pro). If you’re working in research, healthcare, or any field that involves complex visual data, Opus 4.7 is worth your attention.

One small caveat: higher resolution means more tokens consumed. If you don’t need the extra detail, you can downscale images before sending them. Simple fix.

The Cybersecurity Situation Is… Complicated

Here’s where things get interesting. And a little tense.

Anthropic didn’t just release Opus 4.7 and call it a day. They made a deliberate choice to scale back certain cybersecurity capabilities during training. That’s not something AI companies typically advertise. But Anthropic did.

The backstory: Anthropic’s Project Glasswing addressed the risks and benefits of AI models in cybersecurity. The company found that Claude Mythos Preview could outperform all but the most skilled human experts at finding and exploiting software vulnerabilities. That’s terrifying. So they locked Mythos down.

Opus 4.7 is the first model where Anthropic tested new cybersecurity safeguards before a broader release. The company says it “experimentally tried to reduce certain cyber capabilities differentially during training.” New safeguards automatically detect and block requests that suggest prohibited or high-risk cybersecurity use.

This isn’t just corporate caution. It’s policy-level caution.

Reuters reported that U.S. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell actually convened bank executives to warn them about cyber risks tied to Anthropic’s latest models. That’s how seriously officials are taking this. Access to Mythos Preview is reportedly limited to around 40 technology companies — including Nvidia, JPMorgan Chase, Google, Apple, and Microsoft.

Opus 4.7 is the “safer” version. And even it comes with guardrails.

Security researchers who need to use the model for legitimate purposes — like penetration testing or red-teaming — can sign up for Anthropic’s new Cyber Verification Program. That program ostensibly relaxes some of the safeguards for verified professionals. It’s a smart middle ground.

Honesty Is the New Superpower

Okay, let’s talk about something that doesn’t get enough attention in AI coverage: honesty.

Most AI models are people-pleasers. They agree with you They validate you. They tell you what you want to hear. It’s called sycophancy, and it’s a real problem. An AI that just agrees with everything you say isn’t useful — it’s flattering. And flattery doesn’t debug your code.

Anthropic has been fighting this problem hard. And with Opus 4.7, they’ve made real progress.

According to Mashable’s deep dive into the Opus 4.7 system card, the model achieved a MASK honesty rate of 91.7 percent. MASK — which stands for Model Alignment between Statements and Knowledge — was developed by Scale AI and the Center for AI Safety. It tests whether a model will contradict its own stated belief when a user or system prompt pushes it to.

In plain English: will Claude cave when you push back? Most of the time, no. 91.7 percent of the time, it holds its ground.

That’s better than Opus 4.6 (90.3 percent) and Sonnet 4.6 (89.1 percent). It’s lower than Claude Mythos (95.4 percent), but Mythos isn’t available to the public. Among models you can actually use, Opus 4.7 is one of the most honest out there.

There’s more. Anthropic also measures whether Claude will push back on false premises — meaning, will it tell you when you’re wrong? According to the system card, Claude pushes back on false premises 77.2 percent of the time. That’s better than all recent Anthropic models except Mythos.

Anthropic uses its open-source behavioral audit tool, Petri 2.0, to measure bad behaviors like sycophancy and “encouragement of user delusions.” The tool scores models on a 1-10 scale — lower is better. Opus 4.7 scores well on both. Notably better than Gemini 3.1 Pro and Grok 4.20, according to Anthropic’s own testing.

Hallucinations: Still a Problem, But Getting Better

No AI is perfect. Hallucinations — where a model confidently states something false — remain one of the biggest challenges in the field. Opus 4.7 doesn’t solve this. But it does improve on it.

Anthropic distinguishes between two types of hallucinations. Factual hallucinations are wrong claims about the world — fabricated quotes, incorrect data, made-up statistics. Input hallucinations happen when the model acts as if it has access to a tool or file that doesn’t actually exist.

On factual hallucinations, The Decoder reports that Opus 4.7 performs better than or on par with Opus 4.6 across four benchmarks. It still falls short of Mythos Preview, but the gap comes mainly from Mythos having a higher hit rate on obscure facts — not from Opus 4.7 making more errors.

On input hallucinations, Opus 4.7 actually achieves the lowest hallucination rate of all tested models when users request a tool that isn’t available. That’s a win.

One persistent issue: when dealing with questions based on made-up facts, Opus 4.7 performs on par with Opus 4.6 and below Mythos Preview. And under pressure — when users push the model to contradict its own assessment — it’s more honest than Opus 4.6 but less firm than Mythos.

Progress, not perfection. That’s the honest summary.

Who’s Already Using It?

Opus 4.7 didn’t launch in a vacuum. Anthropic had early testers lined up — and the list reads like a who’s who of the tech world.

The Verge reports that early testers included Intuit, Harvey, Replit, Cursor, Notion, Shopify, Vercel, and Databricks. These aren’t hobbyists. These are enterprise-grade companies with serious engineering demands.

The feedback? Better results on complex engineering tasks. Fewer tool errors. Stronger reliability over long sessions. That’s exactly what Anthropic was aiming for.

The Arabian Post notes that Anthropic has increasingly positioned Claude as an enterprise-grade assistant for coding, research, and knowledge work. Opus 4.7 fits squarely into that strategy. It’s not a flashy consumer product. It’s a workhorse for teams that need reliability at scale.

What Does It Cost?

Good news here. Pricing stays the same as Opus 4.6: $5 per million input tokens and $25 per million output tokens.

But there’s a catch. Opus 4.7 uses a new tokenizer that can map the same text to up to 1.35 times as many tokens. The model also generates more output tokens at higher effort levels. So while the per-token price hasn’t changed, your real-world costs per request could rise significantly.

Worth keeping in mind if you’re running this at scale.

The model is available through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Wide availability. No excuses not to try it.

The Bigger Picture: Safety-First, But Not Safety-Only

Here’s what makes Anthropic’s approach genuinely interesting. They’re not just building powerful AI. They’re building powerful AI with brakes.

The decision to throttle Mythos Preview, test safeguards on Opus 4.7 first, and create a Cyber Verification Program for security researchers — that’s a coherent strategy. It’s not perfect. Anthropic admits Opus 4.7 still refuses to help with legitimate AI safety research 33 percent of the time (down from 88 percent with Opus 4.6, but still significant). And the model has a known tendency to provide overly detailed harm-reduction advice on controlled substances.

But the direction is right. Anthropic is trying to balance commercial momentum with genuine caution. In an industry that often moves fast and breaks things, that’s worth acknowledging.

Claude Opus 4.7 isn’t the most powerful AI model in the world. Anthropic will tell you that themselves. But it’s the most powerful one you can actually use — and it’s more honest, more capable, and more carefully built than what came before it.

That’s not nothing. That’s actually quite a lot.