Most people still think the main AI skill is writing better prompts.
That is partly true. Prompting is not dead. A clear prompt still matters. If you cannot explain what you want, the model will usually guess, and the guess will often be wrong.
But advanced AI users are moving past one-off prompting as the center of the work. They are designing loops: repeatable systems where an AI model receives a goal, checks the current state, takes action, verifies the result, and decides whether to continue, retry, escalate, or stop.
That shift matters whether you are using Codex, Claude Code, ChatGPT, AI coding agents, research agents, SEO workflows, or AI launch intelligence systems.
The new skill is not only asking the model a better question. The new skill is designing the loop that prompts, checks, and improves the agent for you.
For readers building AI-assisted coding habits, this pairs naturally with Kingy AI’s AI Coding Foundations for Beginners and the course on MCP, AGENTS.md, and context engineering. The practical difference is simple: prompts ask for an answer. Loops create a repeatable way to keep working until a defined result is reached.

TL;DR
| Idea | Short version |
|---|---|
| AI loops | Repeatable cycles where an AI model uses context, tools, action, and checks to make progress. |
| Core formula | Goal → Context → Action → Result → Check → Continue or Stop |
| Why they matter | One-off prompts are fragile for long tasks. Loops make AI work more durable, testable, and reusable. |
| Best use cases | Coding, bug fixing, refactoring, PR review, research, SEO improvement, launch monitoring, content workflows, and operations. |
| Real unlock | Verification. A loop without checks is just a faster way to make repeated mistakes. |
| Main risk | Runaway loops can burn tokens, money, time, attention, and trust. Use budgets and stop conditions. |
| Practical framework | G-CAVS: Goal, Context, Action, Verification, Stop. |
What Is an AI Loop?
An AI loop is a repeatable cycle where an AI system keeps moving through a task until a condition is met.
The simplest version looks like this:
Goal → Context → Action → Result → Check → Continue or Stop
In plain English:
- Give the AI a goal.
- Give it the context it needs.
- Let it take an action.
- Capture the result.
- Check whether the result is good enough.
- Continue, retry, escalate, or stop.
That is different from a normal prompt.
A prompt is one request. A loop is a working pattern. A prompt might say:
Write a better product description.
A loop says:
Improve this product description until it passes a checklist: clear audience, specific pain, concrete benefit, no unsupported claims, readable under 9th grade level, and aligned with the source page. After each revision, explain what changed and stop after three attempts or when the checklist passes.
That second version is not just “a better prompt.” It contains a goal, source context, an action policy, verification, and a stop condition.
The loop is the real interface.
This is not a fringe idea. Anthropic, in its widely cited engineering guide Building Effective Agents, describes agents in almost exactly these terms: “Despite their sophistication, agents are straightforward systems — LLMs working in loops with tools and environmental feedback.” The sophistication people imagine is mostly a tight loop wrapped around a capable model.
Why Loops Are Suddenly Important
Loops are not new. Programmers have used loops forever. Automation teams have scheduled jobs forever. Researchers have built iterative systems for decades.
What changed is that modern LLMs can now make useful judgments inside the loop.
Older automation was usually rigid:
Every morning at 8:00, fetch a report, transform a file, send an email.
That is still useful. But it is not the same as:
Every morning, inspect the AI launch tracker, identify meaningful launches, verify official sources, classify each launch by audience impact, draft a briefing, flag uncertain claims, and ask for human approval before publishing.
The second system repeats, but repetition is not the interesting part. The model reads, compares, summarizes, checks sources, identifies ambiguity, and decides what needs attention.
That is why power users are moving from “prompting the model” to “designing the system that prompts the model.”
OpenAI’s Codex documentation describes Codex as working in a loop when it handles prompts: it calls the model and performs actions such as file reads, file edits, and tool calls until the task completes or is cancelled. The same documentation describes the newer Goal mode as a way to give Codex a persistent objective and completion criteria across longer work. In OpenAI’s Follow a goal use case, the company frames the entire mechanic around persistence: “Use /goal when you want Codex to keep working toward one durable objective instead of stopping after one normal turn.”
Anthropic’s Claude Code documentation uses similar language. According to How Claude Code works, Claude Code moves through three blended phases — “gather context, take action, and verify results” — and “the loop adapts to what you ask. A question about your codebase might only need context gathering. A bug fix cycles through all three phases repeatedly.” The model “decides what each step requires based on what it learned from the previous step, chaining dozens of actions together and course-correcting along the way.”
Two of the most influential AI tools on the planet now describe their core behavior the same way: not as a prompt, but as a loop.

The Anatomy of a Good Loop: G-CAVS
If you want a memorable framework for designing loops, use G-CAVS:
- G — Goal. What outcome must be true when this is done?
- C — Context. What information, files, sources, and constraints does the model need before it acts?
- A — Action. What is the model allowed to do, and with which tools?
- V — Verification. How will success be checked against evidence, not vibes?
- S — Stop. When does the loop end, pause, or escalate to a human?
Most failed AI automations are missing at least one of these. A prompt with no verification produces confident nonsense. A loop with no stop condition burns money. A goal with no context produces work that technically completes but solves the wrong problem.
The strongest signal that you have a real loop and not just a wordy prompt is whether you can answer this one sentence: the agent can check its own progress against evidence. That phrasing comes from a hands-on review by developer J.D. Hodges, who tested Codex /goal and put it bluntly: “If you can’t write that line, you don’t have a /goal. You have a prompt.”
Goal
A goal is a contract, not a wish. “Improve performance” is a wish. “Reduce p95 latency below 120 ms on the checkout benchmark while keeping the correctness suite green” is a contract, because it specifies the outcome (latency under 120 ms), the verification surface (the benchmark), and the constraint (tests stay green). OpenAI uses this exact example in its Using Goals in Codex cookbook, where it describes a goal as “a scoped, user-controlled completion contract.”
Context
The model only knows what is in front of it. In Claude Code’s case, that context window holds your conversation history, file contents, command outputs, your CLAUDE.md instructions, auto memory, and system instructions. As the engineering breakdown from Rubric Labs puts it, “whatever fits in the model’s context window is what exists for this step.” Anything outside it “might as well not exist.” Good loops front-load the right context and keep it from being crowded out.
Action
Action is where the loop touches reality. Claude Code’s built-in tools fall into roughly five categories — file operations, search, execution, web, and code intelligence — and “each tool use returns information that feeds back into the loop, informing Claude’s next decision.” Tools are deliberately constrained: the editor tool, for instance, requires an exact string match before it will replace text, so an ambiguous edit fails loudly and kicks the agent back into the gather phase rather than guessing.
Verification
Verification is the part that separates a loop from a fast mistake machine. This is the single most important idea in the entire shift from prompts to loops, so it gets its own section below.
Stop
A loop without a stop condition is a liability. Anthropic’s Building Effective Agents guide explicitly lists “stopping conditions (like maximum iterations)” as a core control mechanism. Codex implements this directly: a goal can end on success, pause, clear, interruption, a budget limit, or a blocker that needs human input.
Verification Is the Real Unlock
A loop without checks is just a faster way to make repeated mistakes.
The thing that makes modern AI loops valuable is not that they repeat. It is that they repeat against evidence. Tests pass or fail. Benchmarks move or they do not. A build compiles or it errors. A checklist hits 200 of 200 or it does not.
This is why OpenAI’s cookbook insists that completion is “evidence-based, not just a model guess.” A goal “should be complete only after the objective is checked against the relevant files, tests, logs, benchmark output, generated artifacts, or other concrete evidence. That is the design center: Codex can keep moving, but the evidence decides.”
The most striking real-world example of why verification matters came from Hodges’s own test of /goal. He gave Codex a read-only font-matching task that required uploading image crops to external font-identification sites. Inside the sandbox, the browser tool was not exposed, no browsers were installed, and outbound DNS was blocked. A naive system might have hallucinated plausible-looking font matches.
Instead, Codex tried multiple fallbacks, failed honestly, and “wrote the final ranked report stating honestly that no external matcher upload was performed, rather than inventing candidate names.” As Hodges emphasized: “/goal did not fabricate matcher results when it couldn’t get them.” That behavior is only possible because the loop was built around verifiable evidence and an explicit stop signal, not a vague instruction.
Anthropic formalizes this idea as the evaluator-optimizer pattern: “one LLM call generates a response while another provides evaluation and feedback in a loop.” The company recommends it “when we have clear evaluation criteria and when iterative refinement provides measurable value” — for example, literary translation where an evaluator LLM catches nuances the first draft missed, or complex search where an evaluator decides whether more searching is warranted.
The practical takeaway: if you cannot describe how the loop verifies its own output, you do not yet have a loop you can trust. You have an optimistic prompt running on autopilot.

Codex /goal: A Loop You Can Actually Configure
OpenAI’s Codex Goal mode is one of the clearest public examples of “loop, not prompt” thinking shipped as a real feature.
Goals are persistent objectives that “keep a thread working toward a defined outcome across turns,” giving Codex “a completion condition: what should be true, how success should be checked, and what constraints must stay intact,” per the cookbook. The mental model OpenAI uses is worth memorizing:
- Prompt: ask → work → result → wait
- Goal: work → check → continue or complete
Goals became available in Codex 0.128.0 and, at least early on, were gated behind a feature flag. According to OpenAI’s Follow a goal docs, if /goal does not appear, you enable it by adding goals = true under a [features] table in config.toml, or by running codex features enable goals. The lifecycle commands are deliberately small:
/goal <objective>— set a goal/goal— check the current goal/goal pause— pause an active goal/goal resume— resume a paused goal/goal clear— remove the current goal
A word of caution that is easy to miss: because Goal mode is new, a wave of SEO content has invented commands that do not exist. Hodges’s review specifically calls out fabricated subcommands like /goal status and /goal budget, and a non-existent “Codex Managed Outcomes” product. To check status, you type /goal with no arguments. The honest reference point is the official docs at developers.openai.com, not third-party summaries — which is, fittingly, exactly the kind of source-verification discipline a good loop is supposed to enforce.
Under the hood, the persistence is real engineering, not a prompt trick. A community analysis of the open-source implementation found that Codex stores goal state in a thread_goals table with fields for the objective, status (active, paused, budget_limited, complete), a token budget, and tokens used, as documented in this GitHub gist. That is why the runtime can survive interruptions, resume across sessions, and stop on a budget — behavior a stateless prompt could never provide. Kingy AI’s own breakdown of the /goal release summarizes the design as “persisted /goal workflows with app-server APIs, model tools, runtime continuation, and TUI controls.”
What /goal is good at, and what it is not
OpenAI is explicit that a good goal is “bigger than one prompt but smaller than an open-ended backlog.” It works for code migrations, large refactors, deployment retry loops, test-coverage expansion, performance tuning with a stop threshold, and prototypes the agent can keep improving. It does not work for “a loose list of unrelated work,” exploratory design, architecture decisions, or anything requiring taste and stakeholder judgment.
A useful set of community case studies, compiled by Chier Hu in a Medium write-up on Codex Goals, illustrates both the power and the limits. One engineer reportedly told Codex to “ship the 18 features in BACKLOG.md” and returned to find 14 of 18 implemented and passing CI. Another developer, Yannik Zuehlke, closed his laptop during a coding task, came back roughly five and a half hours later, and found four target end-to-end tests passing, with Codex reporting status TASK_COMPLETE. A third user converted an academic paper from NeurIPS to ICML format by handing Codex a ~200-point formatting checklist, turning a fuzzy “format this paper” task into a binary “all 200 rules satisfied” loop.
The lesson across all of these is not “the agent is magic.” It is the opposite. As Hu’s analysis stresses, “Goals do not replace human specification. The work is in defining ‘done.'” Zuehlke reportedly ran two separate AI “interviews” to refine the specification before launching the goal. The agent can run for hours unattended, but it cannot decide what “done” means on its own — and a vague goal will either quit early or wander indefinitely while quietly burning quota.
Claude Code: The Loop as a Product
Where Codex /goal exposes the loop as a configurable feature, Claude Code makes the loop its entire identity.
Anthropic describes Claude Code as an “agentic harness” around the model: it “provides the tools, context management, and execution environment that turn a language model into a capable coding agent.” When you say “fix the failing tests,” the documented loop looks like this:
- Run the test suite to see what’s failing
- Read the error output
- Search for the relevant source files
- Read those files to understand the code
- Edit the files to fix the issue
- Run the tests again to verify
“Each tool use gives Claude new information that informs the next step. This is the agentic loop in action,” per Anthropic’s docs.
The Rubric Labs engineering breakdown describes the same mechanic from first principles as a “tight transaction loop: decide what to do next, do it, then use the result as new evidence.” Their summary of the whole cycle is elegant: “context → decide → tool → result → context, repeated until the evidence is strong enough.” They also place Claude Code on what Andrej Karpathy calls the “autonomy slider” — the industry’s move from autocomplete, to chat copilots, to longer-running agents that “can own larger parts of the loop.”
Several design choices make Claude Code’s loop safer and more durable than a raw chat session:
- Context management. Claude Code stores sessions locally as JSONL files, supports resuming and forking, and automatically compacts older content when the context window fills. Persistent rules belong in
CLAUDE.mdrather than fragile conversation history. - Subagents. A
Taskspins up a separate agent instance with its own context window that does one scoped unit of work and returns a condensed result. This keeps deep exploration from polluting the main thread and enables parallelism. - Checkpoints and permissions. Every file edit is reversible via snapshots, and permission modes range from asking before everything (Default), to read-only Plan mode, to fully autonomous operation for CI.
Anthropic’s own product page frames the human’s role precisely: “The developer sets the objective and retains control over what gets committed, but the execution” is the agent’s. That is a loop with a human at the boundary — which is exactly where the human belongs.
The Loop Patterns Worth Knowing
Once you accept that loops are the unit of work, the next question is which shape of loop fits your task. Anthropic’s Building Effective Agents guide catalogs a small set of composable patterns that cover most real workloads. You do not need a heavy framework to use them; most are achievable with a few lines of code around a model API.
- Prompt chaining. Decompose a task into a fixed sequence of steps, each processing the previous output. Good when the task cleanly splits into subtasks — for example, write an outline, check it against criteria, then write the document.
- Routing. Classify the input, then send it to a specialized follow-up. Useful when distinct categories (general questions, refunds, technical support) each deserve their own prompt and tools.
- Parallelization. Run independent subtasks at once (sectioning), or run the same task several times and aggregate (voting). Good for speed or for higher-confidence results.
- Orchestrator-workers. A central LLM dynamically breaks a task into subtasks, delegates them, and synthesizes results. Anthropic notes its own coding agents use this to “handle GitHub issues across multiple files without predefined subtasks.”
- Evaluator-optimizer. One model generates, another critiques, and the loop repeats until quality criteria are met.
A recurring theme in Anthropic’s guidance, quoted by engineers Erik Schluntz and Barry Zhang, is restraint: “Success in the LLM space isn’t about building the most sophisticated system. It’s about building the right system for your needs.” Start with the simplest thing — often a single well-augmented LLM call — and only add loop complexity when simpler solutions demonstrably fall short. As LangChain CEO Harrison Chase frames the dividing line, “If the LLM can change your application’s control flow, it’s an agent. If the flow is fixed by your code, it’s not.”
Where Loops Shine: Practical Use Cases
Loops are not just for coding, though coding is where they matured fastest. Any task with a clear finish line and a way to check progress is a candidate.
- Bug fixing and flaky tests. A loop can run the suite, read failures, patch, and re-run until green. This is the canonical Claude Code workflow and a prime Codex
/goaluse case. - Refactors and migrations. “Migrate this codebase from JavaScript to TypeScript with strict-mode compilation and no
anytypes” gives the loop a binary success condition it can grind toward across many files. - Performance tuning. Set a numeric target and a benchmark, and let the loop profile, change, and re-measure. The stop condition is the threshold.
- Research and analysis. A loop can decompose a question into claims, gather evidence for each, and explicitly separate confirmed findings from blocked ones — preserving an audit trail of what is certain versus uncertain.
- SEO and content workflows. Improve a page until it passes a checklist (audience clarity, supported claims, readability, alignment to source), revising and re-checking each pass.
- Launch monitoring and operations. Inspect a tracker, verify official sources, classify by impact, draft a briefing, flag ambiguity, and route to a human for approval before anything publishes.
The common thread is the same one OpenAI emphasizes: define a clear outcome, use automated tests, benchmarks, or checklists as the “oracle,” and let the loop iterate. The user provides the spec; the loop provides the persistence.
The Risks: Runaway Loops and the Cost of Autonomy
A loop that can keep going is, by definition, a loop that can keep spending.
This is the central trade-off. The same persistence that lets Codex grind for hours on a hard problem can also burn through tokens, money, time, and trust if the goal is vague. Hodges is blunt about it: “/goal is experimental and persistent. It is not safer than a regular Codex run. A vague goal can burn weekly quota or produce broad off-target changes since Codex keeps going until it decides it is done.” His advice: run it read-only or on a scratch branch, define one measurable stop condition, and use /goal pause or /goal clear if it drifts.
The numbers from community reports make the stakes concrete. Self-reported figures shared on Reddit’s r/codex and relayed in Hodges’s review describe one trading-app refactor running 6.5 hours and burning roughly 20% of a $100-plan weekly quota, and a larger in-game economy migration running 18 hours. These are anecdotes, not benchmarks — but they show that an unattended loop is a budget decision, not just a technical one.
There is also the verification trap. A loop that appears to verify but actually checks the wrong thing is worse than no loop, because it produces confident, evidence-shaped failure. This is why the evaluator-optimizer pattern depends on “clear evaluation criteria.” If your oracle is weak, your loop will faithfully optimize toward the wrong target.
Enterprise guidance converges on a consistent set of guardrails. The patterns documented across Anthropic’s work and summarized in industry write-ups like AIMultiple’s analysis include:
- Stop conditions. Cap iterations (
maxSteps), wall-clock time, or token budgets so the loop cannot run forever. - Human checkpoints. Pause for approval before irreversible actions — financial transactions, deletions, production deploys.
- Escalation over guessing. When the loop hits something it cannot resolve, it should hand off to a human rather than fabricate a result.
- Observability. Log every tool call and decision point. If the loop fails, you need to trace exactly where it went wrong; “black box” autonomy is unacceptable for serious work.
- Scoped permissions. Give the loop the minimum access it needs, ideally in a sandbox or on a scratch branch.
The goal is to match the level of oversight to the risk of the task. Low-risk loops can run autonomously. High-stakes loops should require a human at the boundary.
How to Start Designing Loops Instead of Prompts
You do not need Codex or Claude Code to think in loops. You can build the habit in any chat interface today. The shift is mostly mental.
1. Write the stop condition first. Before you describe the task, finish this sentence: “This is done when ___.” If you cannot, the task is exploratory and a normal prompt is the right tool. Hope is fine for a prompt. It is not fine for an autonomous loop.
2. Name the verification surface. What evidence proves success? A passing test, a compiled build, a checklist, a metric threshold, a matching screenshot. If the only “verification” is your own gut feeling reading the output, you have a prompt, not a loop.
3. Constrain the action space. Specify what the model may and may not touch. “Only edit src/auth and its tests.” “Do not auto-download or purchase anything.” Constraints make the loop safe to run unattended.
4. Bound the budget. Set a maximum number of attempts, a time cap, or a token budget. “Stop after three revisions or when the checklist passes.”
5. Decide the escalation rule. What should trigger a pause and a handoff to you? A blocker, an ambiguous requirement, a needed paid action.
Concretely, compare the same task expressed two ways:
- Prompt: “Improve the auth code.”
- Loop: “Raise
src/authtest coverage from 38% to 75%. Only editsrc/authand its tests. Stop whennpm testpasses and the coverage threshold is met. If a fix would require changing the public API, pause and ask me first.”
The second version contains a goal, context, an action policy, verification, and a stop condition. It is something you could hand to Codex /goal, to Claude Code, or to a custom loop built on a model API — and walk away from with reasonable confidence.
The Bigger Picture: A New Operator Skill
The move from prompts to loops mirrors a larger shift in how the most effective people use AI.
Karpathy’s “autonomy slider” captures the trajectory: autocomplete suggested your next line, copilots helped you draft while you ran and verified everything, and agents now own larger stretches of the gather-act-verify cycle themselves. As you slide toward more autonomy, the human’s job changes. You spend less time generating each step and more time specifying outcomes, designing verification, and setting boundaries.
That is why the durable skill is not prompt-craft alone. It is loop design. The people getting the most out of Codex, Claude Code, and LLM agents are not the ones with the cleverest single prompts. They are the ones who can decompose a goal, define what “done” means in checkable terms, choose the right loop pattern, and put guardrails around autonomy.
Anthropic’s own conclusion, after cataloging every fancy pattern, is almost anticlimactic — and exactly right: “the most successful implementations use simple, composable patterns rather than complex frameworks.” A loop is a simple pattern. Its power comes from discipline, not complexity.
Prompts ask for an answer. Loops create a repeatable way to keep working until a defined result is reached. The first is a question. The second is a system. And as AI tools keep climbing the autonomy slider, the people who can build systems — not just ask questions — are the ones who will get the most real work done.
Sources and Further Reading
- OpenAI Codex — Follow a goal (use case)
- OpenAI Cookbook — Using Goals in Codex
- Anthropic — How Claude Code works
- Anthropic — Claude Code product page
- Anthropic — Building Effective Agents
- Rubric Labs — How does Claude Code actually work?
- J.D. Hodges — Codex /goal: How It Works, Setup, and What I Tested
- Chier Hu (Medium) — Using Goals in OpenAI Codex: Patterns and Case Studies
- GitHub gist (patleeman) — How Codex implements the /goal slash command
- Kingy AI — OpenAI Codex /goal: The New Long-Horizon Mode for Agentic Coding
- AIMultiple — Building AI Agents with Composable Patterns







Comments 1