The Experiment That Changed Everything We Thought We Knew About AI Commerce

Picture this. It’s December 2025. You’re an Anthropic employee in San Francisco. You’ve got a snowboard you want to sell and a $100 budget to spend. But instead of scrolling through listings yourself, haggling awkwardly over Slack, or lowballing a coworker — you hand the whole thing off to an AI.
Your AI agent writes the listing. It finds buyers, It negotiates. It closes the deal. You just show up at the end to hand over the goods.
That’s exactly what happened. And the results? Let’s just say the future of commerce arrived a lot faster than anyone expected.
What Was “Project Deal” Anyway?
Anthropic ran a one-week internal experiment called Project Deal in December 2025. Sixty-nine employees at its San Francisco office participated. Each person got a $100 budget. Each person got a Claude AI agent. And each agent was given one job: buy and sell stuff on behalf of its human.
The whole marketplace ran through Slack. Before the trading began, Claude sat down — virtually speaking — with each participant. It asked what they wanted to sell, at what price, what they hoped to buy, and how aggressive they wanted their agent to be. Those answers became a custom system prompt. A personalized AI negotiator, built just for you.
From there, the humans stepped back. The agents took over completely.
They wrote listing, They found buyers and sellers, They made offers They haggled. They closed deals. The items ranged from a snowboard to a bag of ping-pong balls. One agent even marketed those ping-pong balls as “perfectly spherical orbs of possibility.” Honestly? Respect.
According to The Daily Tech Feed, the agents closed 186 deals out of more than 500 listed items. Total transaction value? Just over $4,000. The humans only stepped back in at the very end — to physically swap the goods.
The Numbers Don’t Lie
Let’s talk results, because they’re genuinely impressive.
One hundred eighty-six deals. Over five hundred listings. Four thousand dollars in total value. Those aren’t toy numbers. That’s a functioning marketplace — built, run, and closed entirely by AI agents in one week.
But here’s the part that really stands out. After the experiment, Anthropic surveyed participants. 46% said they’d pay for a service like this in the future. Not “it was cool.” Not “interesting demo.” Nearly half said they’d actually open their wallets for it.
That’s not a science project. That’s a product signal.
The agents also showed off some genuinely impressive contextual reasoning. One agent remembered that a colleague had previously mentioned a specific snowboard brand — and used that detail to close a purchase of the exact model the buyer wanted. Another creatively pitched a bag of ping-pong balls with flair that most human sellers wouldn’t bother with.
These weren’t just bots running scripts. They were reasoning, adapting, and personalizing in real time. And they did it without a single human check-in during the negotiation process.
The Hidden Experiment Inside the Experiment
Here’s where things get really interesting — and a little uncomfortable.
Anthropic didn’t just run one marketplace. They ran four simultaneously. In two of them, every agent used Claude Opus 4.5, Anthropic’s flagship model. In the other two, each participant had a 50/50 chance of being assigned Claude Haiku 4.5, the smaller, lighter model. Participants didn’t know which model they had.
The results exposed a clear gap.
According to The Decoder, Opus users closed about two more deals on average than Haiku users. When the same item sold through both models, Opus pulled in $3.64 more on average.
The numbers get even more specific. Opus sellers earned $2.68 more per item. Opus buyers paid $2.45 less. When an Opus seller faced off against a Haiku buyer, the average price hit $24.18 — compared to $18.63 for Opus-on-Opus deals.
One example tells the story perfectly. A lab-grown ruby sold for $65 with Opus and only $35 with Haiku. The Opus agent opened at $60 and got pushed up by competitive bidding. The Haiku agent started at $40 and got talked down. Same ruby. Very different outcomes.
A broken folding bike? Opus got $65. Haiku got $38. Same buyer. Same seller. Different AI.
The Part That Should Make You Think Twice

Now here’s the twist that Anthropic itself calls an “uncomfortable implication.”
Despite getting objectively worse deals, participants with Haiku agents rated the fairness of their transactions almost identically to Opus users. The fairness score? 4.06 for Haiku users versus 4.05 for Opus users. Statistically, that’s nothing.
They didn’t know they were losing. They felt fine about deals that were quietly costing them money.
Of 28 participants who used both models across different runs, 17 preferred their Opus run — but 11 actually preferred the Haiku run. Even when the data showed Haiku underperformed, some people liked it better.
This is the perception gap. And it’s a big deal.
As The Decoder puts it: “when agents of different strengths meet in real markets, people could end up on the losing side without ever knowing it.” You might walk away from a deal feeling great — while your AI quietly left money on the table.
Anthropic admits the experiment wasn’t designed to dig deep into these dynamics. But they flag it clearly: this could reinforce or even compound existing economic inequalities. If wealthier users or larger companies access stronger AI agents, they win more deals. The other side loses — and never finds out why.
What About Negotiation Style? Does It Even Matter?
You might be wondering: what if you just told your agent to be really aggressive? Lowball hard. Negotiate tough. Does that help?
Short answer: not really.
Anthropic tested this. Some participants asked for a friendly approach. Others wanted their agent to “negotiate hard and lowball at first.” The aggressive sellers did get higher prices — but only because they set higher opening prices to begin with. The negotiation style itself made no statistically significant difference in outcomes.
Model strength mattered far more than instructions. You can tell a weaker agent to fight harder all you want. It still won’t out-negotiate a stronger one.
The Trust Problem Nobody’s Talking About
Project Deal worked brilliantly — inside Anthropic’s office. But Dev.to contributor Aaron Schnieder raises a critical question: what happens when these agents leave the building?
Inside Anthropic, trust was implicit. Everyone knew each other. Accountability was baked into the organizational structure. The agents operated in a closed, controlled environment.
Scale that to the open internet, and the whole trust model collapses. Fast.
Think about it. If your AI agent needs to hire another AI agent it’s never worked with — a stranger on the open web — how does that work? There’s no reputation system. No escrow No accountability if the other agent ghosts. No identity verification to confirm the agent is who it claims to be.
The commerce works. The trust infrastructure doesn’t exist yet.
Or does it? Schnieder points to emerging standards quietly going live in parallel: ERC-8004 for on-chain AI agent identity (129,000+ agents already registered), ERC-8183 for escrowed transactions with dispute resolution, and the x402 protocol for machine-to-machine payments — already clocking 165 million transactions and $50 million in volume. On Base network, 20% of traffic is now agent-to-agent.
The plumbing is being built. But it’s not finished yet.
The Bigger Picture: AI Commerce Is Already Here
Let’s zoom out for a second.
Eighty-five percent of enterprises are already running AI agents, according to a VentureBeat report from RSAC 2026. But only 5% trust them enough to ship to production. And 82% of organizations have unknown agents operating somewhere in their IT infrastructure.
The agents work. The trust doesn’t scale yet.
Project Deal proved that AI agents can negotiate, reason, and transact in real-world conditions. They can remember context They can adapt their pitch. They can close deals without human hand-holding. That’s remarkable.
But Anthropic is also honest about the risks. In a world where companies — not volunteers — are the participants, the incentives look very different. Optimizing for AI agent attention could become a powerful manipulation tool. Security risks like jailbreaking and prompt injection become real threats when agents are actually spending your money.
And the legal frameworks? They simply don’t exist yet. As Anthropic writes: “The policy and legal frameworks around AI models that transact on our behalf simply don’t exist yet” — and “society will need to move quickly.”
What Comes Next for AI Agents in Commerce?

Project Deal isn’t Anthropic’s first rodeo with agentic commerce. The company previously ran Project Vend, where Claude operated a small physical shop out of its office. Each experiment pushes the boundary a little further.
The trajectory is clear. AI agents are moving from assistants to actors. They’re not just answering questions — they’re making decisions, closing deals, and moving money. The question isn’t whether this future is coming. It’s already here, at least in controlled environments.
What needs to catch up is everything around it. Trust infrastructure. Legal frameworks. Transparency about which model is representing you. Equity in access to stronger agents. And honest conversations about what it means when your AI quietly loses a negotiation — and you never find out.
For now, though, one thing is certain. If you’re going into a negotiation with an AI agent on the other side, you’d better hope yours is the stronger model.
Because the other side? They won’t even know they lost.
Sources
- Tech in Asia — Anthropic Tests AI Agents in Real-World Deals
- The Decoder — Anthropic Says Stronger AI Models Cut Better Deals, and the Losers Don’t Even Notice
- Dev.to — Anthropic Just Proved Agents Can Do Commerce. Here’s What They Didn’t Build.
- The Daily Tech Feed — Anthropic’s Claude AI Agents Close 186 Deals in Marketplace Transaction Experiment
- Anthropic — Project Deal Official Research Page







