The Anti-FDE Playbook: When Forward Deployed Engineering Fails — And What Mid-Market Companies Should Do Instead

The Forward Deployed Engineer has become the most romanticized role in enterprise AI. The Pragmatic Engineer called it “the hottest job in tech”, citing a16z’s framing of the role. Palantir pioneered it in the early 2010s under the internal name “Delta.” OpenAI stood up its own FDE team earlier this year under Colin Jarvis. Ramp has roughly 15 FDEs working in pods. Salesforce, Commure, Matta, Lindy, and Varick are all hiring for the same archetype.

The role is real, and at Palantir, FDSE base salaries run $135,000 to $200,000 before equity and bonuses. At top AI labs, total comp goes considerably higher.

This article is not about whether FDEs are valuable. They are. It’s about a quieter question almost nobody is asking: what should you do if you’re the company writing the check, not cashing it?

For the Fortune 500, an FDE engagement is often the right call. For everyone else — mid-market companies, growth-stage startups, regional operators — the math is harder than the LinkedIn posts suggest. Here is a buyer-side playbook for getting FDE-grade outcomes without an FDE-grade invoice.

Part 1: The Economics Most Vendors Won’t Walk You Through

An FDE engagement is rarely a single line item. It is a stack:

A senior engineer’s loaded cost (often 25–50% on-site, per Palantir and Commure’s own descriptions)
Vendor margin on that engineer’s time
Custom agent maintenance after handoff
Eval and observability infrastructure
An ongoing retainer to keep the work alive

There is no public benchmark for what an Applied AI FDE engagement costs end-to-end — vendors don’t publish it, and contracts are usually NDA’d. But you can triangulate from public salary data and standard consulting markups, and the answer for a multi-month engagement is comfortably in the high six figures to low seven figures.

That cost can be justified — if the underlying use case is large enough to absorb it. The problem is that most companies running the math do so on optimistic ROI assumptions, not on the base rate of what happens to enterprise AI projects.

The base rate is not kind. MIT’s NANDA initiative, which studied 300+ deployments and surveyed 350 employees, found that 95% of enterprise AI pilots deliver zero measurable return on the P&L. IDC research puts the pilot-to-production failure rate at 88%. S&P Global reported that 42% of companies scrapped most of their AI initiatives in 2025, up from 17% the prior year. Gartner predicts over 40% of agentic AI projects will be cancelled by end of 2027.

None of those numbers are a knock on FDEs. They’re a knock on the assumption that hiring one — or contracting one — automatically lands you on the right side of the statistics.

Part 2: Three Failure Modes Inside the FDE Framework

The original FDE playbook — audit, evals, deployment — is genuinely good. The failures happen inside each phase, and they’re usually invisible until the engagement ends.

Audit Theater

A thoughtful FDE audit maps workflows, identifies bottlenecks, and decides what shouldn’t be automated. A theatrical one produces a 40-page deck restating what the operations team already told leadership in a stand-up.

The MIT NANDA research surfaces an uncomfortable detail: back-office automation delivers the highest ROI, but most AI budgets flow to sales and marketing pilots. The audit phase is where that misallocation either gets corrected or quietly rubber-stamped. If the audit ends with the engagement pointed at the same use cases the executive team was already excited about, it didn’t do its job.

Eval Theater

A good eval, as the original FDE article correctly argues, traces the human’s reasoning steps and grades the agent at each checkpoint. A theatrical eval grades final outputs against a golden dataset and reports “94% accuracy” to the steering committee.

The gap matters because agility-at-scale’s review of the research found that nearly one-third of CIOs lack clear metrics for AI proofs-of-concept — meaning pilots are often measured on model accuracy rather than business outcomes. An agent can be 94% accurate and still fail to move a single P&L line. If your eval framework can’t answer the question “what dollar amount of work did this agent do this quarter?” it’s measuring the wrong thing.

Pilot Purgatory

This is the structural killer. Bonjoy’s analysis cited by AI Assembly Lines found 88% of AI agents fail in production, clustered around data fragmentation, integration complexity, and governance gaps. None of those show up in a pilot, because the pilot deliberately simplifies them.

The MIT data also surfaces a finding that should give every internal-build advocate pause: vendor-led solutions reach production roughly 67% of the time; internal builds succeed about 33%. But the same research shows large enterprises run the most pilots and have the lowest pilot-to-scale conversion rates, averaging nine months to scale a successful pilot versus 90 days at mid-market firms.

In other words: hiring the best vendor doesn’t save you from organizational gravity. Knowing your own gravity does.

Part 3: The Shadow FDE Model

For most mid-market companies, the right answer isn’t “hire a vendor FDE” or “build an internal AI engineering team from scratch.” It’s one hybrid hire — what I’d call a Shadow FDE — who operates the same three-phase framework with one critical difference: institutional memory stays in-house.

The profile to look for:

A product manager who can write Python, or an engineer who can run a workshop without losing the room
Comfort with ambiguity over comfort with credentials
A demonstrated history of shipping something small that mattered, not architecting something large that didn’t

A reasonable first 90 days, mapped to the original framework:

Days 1–30 — Internal audit. Cheaper than an external audit because the hire already has org context, knows which managers tell the truth, and doesn’t need two weeks of stakeholder onboarding to find the actual bottleneck. The deliverable is a one-page document: three to five candidate workflows, ranked by annual hours consumed and rule-vs-judgment ratio.

Days 31–60 — Evals tied to one KPI. Not model accuracy. Not F1 score. One business KPI the CFO already tracks. If the agent can’t be tied to that number, it doesn’t get built. This single constraint eliminates most eval theater before it starts.

Days 61–90 — Deploy one narrow agent. The smallest possible unit of autonomy: read-only first, suggestion-only second, action-taking third. Instrument it heavily. Expand only after the metric moves.

This pattern aligns with what agility-at-scale’s framework calls a gated rollout — sandbox, controlled pilot, department-wide, enterprise-wide — but compressed for an org that doesn’t have four phases worth of patience.

If you want a structured way to assess where you actually sit on this curve before you hire anyone, Kingy.ai’s AI Agent Directory & Readiness Scorecard is a reasonable starting point.

Part 4: When Agents Are the Wrong Answer

The honest version of the FDE pitch includes knowing when not to deploy AI. Three quick filters:

Predictable inputs + predictable outputs → use deterministic code or a Zapier-class tool. Wrapping if/else logic in an LLM call is expensive theater.
One LLM call in the middle of an otherwise deterministic pipeline → don’t call it an “agent.” Call it what it is: a function that uses a model. The naming matters because it changes how you evaluate, monitor, and price it.
Low volume (<100 runs/month) → manual stays cheaper. Token costs are low, but engineering, eval, and maintenance costs are fixed. Below a certain volume, the agent will never amortize.

If you’re trying to figure out which tools actually fit a given workflow before you commit to building anything custom, Kingy.ai’s Workflow Stack tool is useful for mapping job, budget, and skill level to off-the-shelf options first.

Part 5: The Buyer’s Checklist

If you do hire a vendor FDE engagement, these are the clauses worth fighting for at contract time, not after:

Eval ownership. You own the golden dataset and the eval harness. If the vendor leaves, the evals stay.
Source code and prompt handover. Including system prompts, tool schemas, and orchestration logic — not just a deployed binary.
Knowledge transfer milestones. At least one internal engineer trained to operate the system per quarter, with measurable competency tests.
Inspectable reasoning logs. No black-box orchestration layers you can’t audit. Per agility-at-scale’s MLOps guidance, session-level logging of every prompt, tool call, and response is table stakes.
Outcome-tied milestones. Tie at least part of the engagement to a KPI both sides agreed to before kickoff. If the vendor pushes back on this, that itself is information.

None of these are radical. They’re standard in mature consulting categories — legal, accounting, classical management consulting. They’re missing from most AI engagements only because the category is young.

What the FDE Hype Gets Right, and What It Misses

The FDE role exists because, as the original article correctly points out, intelligence is becoming commoditized and the edge is in how and where you deploy it. That observation is sound. The leap that doesn’t follow is “therefore you need to hire an external FDE team.”

The actual edge is keeping deployment capability inside your own company. Vendors can accelerate it. Vendors can also obscure it, depending on the contract.

For a Fortune 500 with regulatory complexity, air-gapped environments, and the budget to absorb a six-figure-plus engagement, an external FDE team is usually the right tool. For a 200-person company trying to automate one expensive workflow, a single Shadow FDE hire — operating the same framework with the same discipline — will often outperform the vendor route, at roughly 10% of the cost, with the institutional memory still sitting at your standup the morning the engagement would have ended.

The framework is the asset. The headcount strategy is the choice.

TLDR

FDEs are a real, valuable role — for a narrow slice of the market.
The base rates for enterprise AI projects (88% pilot-to-production failure per IDC, 95% zero-ROI per MIT NANDA) apply whether or not you hire one.
For most mid-market companies, a Shadow FDE — one internal hybrid hire running the audit-evals-deployment framework — beats the vendor route on cost and on retained knowledge.
Know when AI isn’t the answer. Deterministic problems want deterministic code.
If you do hire a vendor, negotiate eval ownership, source handover, and knowledge transfer at contract time, not at handoff.

The companies that win this cycle won’t be the ones with the best AI vendors. They’ll be the ones who built the smallest viable internal capability and refused to outsource the muscle.