Meet Laguna XS.2: The Open-Source AI That's Crashing the Big Boys' Party

A San Francisco startup just dropped a free, locally-runnable coding AI — and it’s punching way above its weight class.

The AI Arms Race Just Got a New Contender

Let’s be honest. The AI space lately feels like watching a very expensive tennis match.

Anthropic lobs a shiny new model over the net — Claude Opus 4.7. OpenAI volleys back with GPT-5.5. Meanwhile, Chinese companies like DeepSeek and Xiaomi are playing a completely different game — open weights, lower costs, and near-frontier performance.

So what happens when a small, scrappy American startup decides to crash the party?

You get Poolside. And you get Laguna XS.2.

On April 28, 2026, this San Francisco-based AI lab — founded just three years ago — dropped two brand-new large language models on the world. Both are built specifically for agentic coding: AI that doesn’t just chat or generate text, but actually writes code, calls tools, and takes autonomous action like a real software engineer.

The bigger model is Laguna M.1. The smaller one is Laguna XS.2. And the smaller one? That’s the one everyone should be talking about.

So What Exactly Is Laguna XS.2?

Here’s the quick version: Laguna XS.2 is a 33-billion parameter Mixture of Experts (MoE) model with only 3 billion active parameters. It runs on a single GPU. You can download it right now on Hugging Face. It’s free. It’s open source under the Apache 2.0 license. And it was built entirely from scratch — not fine-tuned from someone else’s base model.

That last part matters more than you might think.

Some U.S. labs have been quietly building on top of Chinese AI giant Alibaba’s Qwen series. (Cough cough, Cursor, cough.) Poolside didn’t do that. They built Laguna from the ground up, inside their own internal training system called the Model Factory.

That’s a big deal — especially for government and enterprise clients who care deeply about where their AI actually comes from.

The Bigger Sibling: Laguna M.1

Before we go deeper on XS.2, let’s give its older sibling a quick shoutout.

Laguna M.1 is a monster. It’s a 225-billion parameter MoE model with 23 billion active parameters. It’s built for high-stakes enterprise and government environments — the kind of work that demands serious reasoning, long-horizon planning, and the ability to navigate complex, interconnected codebases.

Poolside is offering M.1 for free (for now!) through their API and through third-party partners like OpenRouter, Ollama, and Baseten. So if you want to test a frontier-level coding model without paying a dime, now’s your window.

But back to the star of the show.

How Did They Build This Thing?

Poolside’s training process is genuinely fascinating — and a little wild.

Everything runs inside their Model Factory, powered by an internal training engine called Titan. Think of Titan as the furnace that forges the model. But the secret sauce? A tool called the Muon optimizer.

Muon acts like a high-speed tutor. It helps the model learn roughly 15% faster than standard industry methods. It does this by making sure every update to the model’s internal “brain” is mathematically balanced and pointing in the right direction. No getting confused. No getting stuck. Just clean, efficient learning — at a scale of 30 trillion tokens.

Now, 30 trillion tokens is a staggering amount of data. But Poolside doesn’t just dump the entire internet into the training pipeline. They use a system called AutoMixer to figure out the perfect recipe.

AutoMixer deploys a swarm of 60 proxy models to test different combinations of code, math, and general web data. It’s like having 60 chefs simultaneously testing different recipes to find the one that produces the best reasoning capabilities. Scientific. Deliberate. Smart.

About 13% of the training data is synthetic — high-quality, custom-made material generated by other AIs to teach specific skills that are hard to find in the real world.

Once the basic training wraps up, the model enters what Poolside calls a virtual gym — a reinforcement learning phase where the AI practices solving real software engineering problems in a sandboxed environment. It gets a “reward” every time it successfully fixes a bug or writes working code. Trial, error, reward. Repeat. That’s how a text generator becomes a genuine coding agent.

The Benchmarks: Small Model, Big Swagger

Okay, let’s talk numbers. Because the benchmarks are where things get really interesting.

On SWE-bench Pro — a benchmark that tests an AI’s ability to solve real-world software issues — Laguna M.1 scored 46.9%, putting it in the same neighborhood as the much larger Qwen3.5 and DeepSeek V4-Flash.

But here’s the jaw-dropper: Laguna XS.2 scored 44.5% on SWE-bench Pro. A 33-billion parameter model, with only 3 billion active, nearly matching its 225-billion parameter sibling. That’s not just impressive — that’s borderline absurd.

On SWE-bench Verified, M.1 scored 72.5%, beating out Devstral 2 (72.2%). Claude Sonnet 4.6 still leads that category at 79.6%, so Poolside isn’t claiming the crown just yet. But they’re clearly in the conversation.

For XS.2 specifically, the numbers tell an even more exciting story. Despite its tiny active parameter count, XS.2 outperforms Claude Haiku 4.5 (39.5%) and the significantly larger Gemma 4 31B dense model (35.7%) on SWE-bench Pro. It also edges out Haiku 4.5 on Terminal-Bench 2.0 (30.1% vs. 29.8%).

Yes, specialized nano models like GPT-5.4 Nano still lead on Terminal-Bench 2.0 at 46.3%. XS.2 isn’t perfect. But for a model you can run locally on your laptop, these numbers are remarkable.

All benchmarking was done using the Harbor Framework with sandboxed execution — so these results reflect real-world, resource-constrained performance. No cherry-picking. No ideal conditions.

Can You Actually Run It at Home?

Yes — but you’ll need decent hardware.

For Mac users, you’ll want at least 36 GB of unified memory. That means a MacBook Pro with an M5 Max chip, or a Mac Studio/Mac Mini configured with 48 GB or 64 GB of RAM. (And no, the budget-friendly MacBook Neo won’t cut it — it’s capped at 8 GB.)

For PC and Linux users, the full model would normally require over 60 GB of VRAM. But thanks to 4-bit quantization (Q4), you can run it on consumer GPUs with 24–32 GB of VRAM — like the newly released RTX 5090. The RTX 4090 (24 GB) can also handle it with heavier quantization, though it’ll be slower on complex tasks.

You’ll also want to set aside at least 70 GB of storage for the full model, or around 20–35 GB for a compressed version.

Poolside recommends using Ollama or their own terminal-based agent, pool, for the smoothest local experience. You can literally run:

ollama launch pool --model laguna-xs.2

And you’re off to the races.

Meet pool and shimmer: The Tools That Make It All Click

Models are only as good as the environments they live in. Poolside knows this — so they shipped two new developer tools alongside the Laguna models.

pool is a terminal-based coding agent designed for your local machine. It acts as an Agent Client Protocol (ACP) server — the same internal harness Poolside uses for their own reinforcement learning training. By releasing pool to the public, Poolside is essentially inviting developers to participate in the same “real-world gym” that trains their future models. That’s a clever move. The community becomes part of the feedback loop.

Shimmer is the cloud-native counterpart. It’s an instant-on Virtual Machine sandbox where you can build web apps, APIs, and CLIs in seconds. Unlike traditional IDEs like Visual Studio, Shimmer integrates the Poolside Agent directly into the workspace. Push to GitHub. Import existing repos. Iterate fast.

The most jaw-dropping part? Poolside’s Founding Designer Alasdair Monk showed Shimmer running entirely on a smartphone. A split-screen interface with the Poolside Agent generating code on one side and a full dev environment on the other — all on a mobile device.

The future of software engineering might not be tethered to a desk after all.

Why Go Open Source? The Philosophy Behind the Decision

This is the part that really sets Poolside apart from the crowd.

Poolside didn’t have to release XS.2 as open weights. They could have kept it behind an API, charged for access, and called it a day. Instead, they chose the Apache 2.0 license — one of the most permissive licenses in existence. Use it. Modify it. Build commercial products with it. No royalties. No restrictions.

Why? Because Poolside genuinely believes the West needs strong open-weight models. The open-weight AI ecosystem in the U.S. is still early. Poolside wants to change that.

As the company put it in their official blog post: “The fastest way for us to improve our work is to put it in the hands of people who’ll push it.”

That’s not just marketing speak. It’s a strategic bet. By putting XS.2 in the hands of researchers, startups, and individual developers, Poolside is ensuring their technology gets baked into the next generation of third-party tools — and that the community helps them make it better.

The Bigger Picture: What Poolside Is Really Building

Poolside’s core thesis is bold: software development is the ultimate proxy for general intelligence.

Writing code requires long-horizon planning, abstract reasoning, and the ability to manage complex systems — all traits central to human cognition. Most AI agents today are limited to calling pre-defined tools. Poolside’s agents are designed to write and execute their own code to solve problems. That’s a fundamentally different — and more powerful — approach.

The team of roughly 60 people in Poolside’s Applied Research organization spent three years and ran tens of thousands of experiments to get here. Their vision isn’t just about building smarter AI. It’s about creating abundance for humanity through software.

By focusing on a domain with verifiable rewards — test passes, compilation results, working code — they’ve built a self-improving feedback loop. Every success teaches the model something new. Every failure does too.

The Bottom Line

Poolside just made a serious statement. With Laguna XS.2, they’ve delivered a free, open-source, locally-runnable coding AI that punches well above its weight class — and they’ve backed it up with real benchmark numbers.

Is it perfect? No. Claude Sonnet 4.6 still leads on SWE-bench Verified. GPT-5.4 Nano still wins on Terminal-Bench 2.0. But for a model you can run on your own hardware, completely offline, with full Apache 2.0 freedom to modify and deploy however you want? Laguna XS.2 is a genuinely exciting development.

The AI race just got a new player. And this one’s playing a very different game.