OpenAI Reveals the Inner Workings of Its AI Coding Revolution: From Codex to Scientific Discovery

OpenAI pulls back the curtain on how its AI agents actually work, while researchers claim they’ve stopped writing code entirely—and a new tool aims to do for science what coding agents did for programming.

The artificial intelligence industry just got a rare glimpse behind the scenes. In an unusually transparent move, OpenAI has published detailed technical documentation explaining exactly how its Codex coding agent operates while simultaneously revealing that some of its own researchers have completely stopped writing code by hand. The timing couldn’t be more significant: as AI coding agents reach a new level of practical usefulness, OpenAI is positioning itself to replicate that success in scientific research with a brand-new tool called Prism.

The revelations come at a pivotal moment. AI coding assistants like Claude Code and OpenAI’s Codex have evolved from experimental curiosities into indispensable tools that can write code, run tests, and fix bugs with minimal human oversight. Now, OpenAI wants to bring that same transformation to the world of scientific research.

Inside the Agent Loop: How Codex Actually Works

On January 23, 2026, OpenAI engineer Michael Bolin published a comprehensive breakdown of the “agent loop” the core logic that powers Codex CLI. This is the first in a series of posts that will explore various aspects of how Codex works, offering developers unprecedented insight into building AI agents.

At the heart of every AI agent lies something called the agent loop. It’s a repeating cycle that orchestrates interactions between the user, the AI model, and the tools the model invokes to perform meaningful work. The process starts when the agent takes input from a user and incorporates it into a set of textual instructions called a “prompt.” This prompt is then sent to the model for inference the process of generating a response.

During inference, the text prompt gets converted into a sequence of input tokens (integers that index into the model’s vocabulary). These tokens are used to sample the model, producing a new sequence of output tokens that get translated back into text. Because tokens are produced incrementally, this translation can happen as the model runs, which is why many AI applications display streaming output.

The model’s response takes one of two forms: either it produces a final answer for the user, or it requests a “tool call” that the agent must perform (like running a shell command or reading a file). When a tool call is requested, the agent executes it and appends the output to the original prompt. This updated prompt is used to query the model again, and the process repeats until the model stops requesting tools and instead produces an assistant message for the user.

This journey from user input to agent response is called one “turn” of a conversation. Each turn can include many iterations between model inference and tool calls. Every time you send a new message to an existing conversation, the entire conversation history gets included as part of the prompt for the new turn.

As Ars Technica reported, this means the prompt grows with every interaction, which has significant performance implications. The ever-growing prompt is directly related to the context window the maximum number of tokens an AI model can process in a single inference call.

The End of Hand-Written Code?

The technical revelations arrived alongside a bombshell announcement from within OpenAI itself. An OpenAI researcher known as “roon” publicly declared that Codex has officially taken over 100% of their code-writing work. When asked what percentage of coding work is based on the OpenAI model, roon responded bluntly: “100%. I don’t write code anymore.”

The statement carried extra weight because Sam Altman, OpenAI’s CEO, had previously posted that “roon is my alt account” suggesting this wasn’t just any researcher, but someone close to the company’s leadership.

Roon issued a profoundly emotional statement about the transition: “Programming has always been painful, yet it was an inevitable path. I’m glad it’s finally over. I’m surprised that I got rid of the programming shadow so quickly and don’t miss it at all. I even feel a bit regretful that computers weren’t like this before.”

According to reports from 36kr, another OpenAI researcher revealed that with Codex’s assistance, they built OpenAI’s MCP server from scratch and completed scale verification in just three days. They also launched an Android app for Sora within three weeks. A large number of internal tools built by Codex and even self-audited by Codex are queuing up to go online.

The researcher explained their workflow: they spend considerable time writing specifications and visualizing what the output should look like. Then they start a “4×Codex” cloud concurrent task, allowing them to see multiple different variations at once while filling in initially missed details. After Codex finishes, humans step in for testing and verification.

This echoes a similar revelation from December 2025, when Boris Cherny, the father of Claude Code, announced that 100% of his contributions to Claude Code were completed by Claude Code itself a “recursive” self-evolution that triggered an automatic coding frenzy in Silicon Valley.

Prism: Bringing AI Agents to Scientific Research

While engineers were celebrating the death of hand-written code, OpenAI was preparing its next move: applying the same agent-based approach to scientific research. On January 27, 2026, the company released Prism, a free tool that embeds ChatGPT in a text editor designed specifically for writing scientific papers.

Prism builds on Crixet, a cloud-based LaTeX platform that OpenAI acquired. LaTeX is a typesetting system for formatting scientific documents and journals that nearly the entire scientific community relies on. While powerful, LaTeX can make some tasks like drawing diagrams through TikZ commands extremely time-consuming.

Prism incorporates GPT-5.2, OpenAI’s best model yet for mathematical and scientific problem-solving, into an editor for writing documents in LaTeX. A ChatGPT chat box sits at the bottom of the screen, below a view of the article being written. Scientists can call on ChatGPT for anything they need: drafting text, summarizing related articles, managing citations, turning photos of whiteboard scribbles into equations or diagrams, or talking through hypotheses or mathematical proofs.

As Engadget reported, in a press demo, an OpenAI employee used Prism to find and incorporate scientific literature relevant to a paper they were working on, with GPT-5.2 automating the process of writing the bibliography. The employee also used Prism to generate a lesson plan for a graduate course on general relativity and a set of problems for students to solve.

Prism is available to anyone with a personal ChatGPT account and includes support for unlimited projects and collaborators. OpenAI plans to bring the software to organizations on ChatGPT Business, Team, Enterprise, and Education plans soon.

2026: The Year AI Transforms Science

Kevin Weil, head of OpenAI for Science, is making bold predictions. “I think 2026 will be for AI and science what 2025 was for AI in software engineering,” he said at a press briefing. At the beginning of 2025, using AI to write code made you an early adopter. Twelve months later, not using AI means falling behind. Weil believes the same shift is coming for scientists: “I think that in a year, if you’re a scientist and you’re not heavily using AI, you’ll be missing an opportunity to increase the quality and pace of your thinking.”

According to The Decoder, OpenAI claims that around 1.3 million scientists worldwide submit more than 8 million queries per week to ChatGPT on advanced topics in science and math. “That tells us that AI is moving from curiosity to core workflow for scientists,” Weil said.

Weil sees the model’s strengths primarily in making connections. “GPT-5.2 has read substantially every paper written in the last 30 years,” he explained. “And it understands not just the field that a particular scientist is working in; it can bring together analogies from other, unrelated fields.”

That’s powerful, Weil argues. “You can always find a human collaborator in an adjacent field, but it’s difficult to find, you know, a thousand collaborators in all thousand adjacent fields that might matter.” Additionally, the model works at night and can handle ten queries in parallel “which is kind of awkward to do to a human.”

However, Weil is careful to temper expectations after OpenAI executives had to delete posts in October falsely claiming GPT-5 solved unsolved math problems. Mathematicians quickly corrected them: GPT-5 had actually dug up existing solutions in old research papers. Now Weil strikes a more humble tone, emphasizing the model works best as a sparring partner rather than an oracle.

“I don’t think models are there yet” for making groundbreaking new discoveries, Weil admitted. “Maybe they’ll get there. I’m optimistic.” But that’s not the mission anyway: “Our mission is to accelerate science. And I don’t think the bar for the acceleration of science is, like, Einstein-level reimagining of an entire field.”

The Technical Architecture Behind the Magic

Bolin’s technical post reveals fascinating details about how Codex constructs and manages its context when querying the model. The Codex CLI sends HTTP requests to the Responses API to run model inference. The endpoint is configurable, so it can be used with any endpoint that implements the Responses API including ChatGPT, OpenAI’s hosted models, or even local models running through ollama or LM Studio.

When building the initial prompt, Codex doesn’t simply pass the user’s words directly to the model. Instead, it actively “stitches” together a well-designed prompt structure that covers instructions from multiple roles. Each item in the prompt is associated with a role that determines its priority: system, developer, user, or assistant (in decreasing order of priority).

The Responses API takes a JSON payload with three core parameters: instructions (system or developer messages inserted into the model’s context), tools (a list of tools the model may call while generating a response), and input (a list of text, image, or file inputs to the model).

One of the most interesting revelations concerns performance optimization. Because Codex doesn’t use an optional “previous_response_id” parameter, every request is fully stateless meaning it sends the entire conversation history with each API call rather than the server retrieving it from memory. This design choice simplifies things for API providers and makes it easier to support customers who opt into “Zero Data Retention,” where OpenAI doesn’t store user data.

While this approach means the agent loop is quadratic in terms of the amount of JSON sent to the Responses API over the course of a conversation, prompt caching mitigates this issue. Cache hits are only possible for exact prefix matches within a prompt, which is why Codex carefully structures prompts so that the old prompt is an exact prefix of the new prompt. When cache hits occur, sampling the model is linear rather than quadratic.

According to the technical documentation, several operations can cause cache misses in Codex: changing the tools available to the model mid-conversation, changing the target model, or modifying the sandbox configuration, approval mode, or current working directory. The Codex team must be diligent when introducing new features to avoid compromising prompt caching.

To avoid running out of context window space, Codex automatically compacts conversations when the number of tokens exceeds a threshold. The Responses API has evolved to support a special /responses/compact endpoint that performs compaction more efficiently, returning a list of items that can replace the original input to continue the conversation while freeing up context window space.

The Road Ahead

The convergence of these announcements technical transparency about Codex, researchers abandoning hand-written code, and the launch of Prism signals a significant shift in how OpenAI views AI agents. Rather than keeping their methods proprietary, the company is sharing implementation details, perhaps recognizing that the real competitive advantage lies not in secrecy but in execution and continuous improvement.

The company’s approach to scientific AI also reflects lessons learned from the coding agent space. Rather than promising a fully automated AI scientist that makes stunning new discoveries, OpenAI is focusing on practical tools that accelerate existing workflows. As Weil told MIT Technology Review: “I think more powerfully and with 100% probability there’s going to be 10,000 advances in science that maybe wouldn’t have happened or wouldn’t have happened as quickly, and AI will have been a contributor to that. It won’t be this shining beacon it will just be an incremental, compounding acceleration.”

OpenAI is also working on giving GPT-5 what Weil calls “epistemological humility” teaching the model to present ideas as suggestions rather than definitive answers. “That’s actually something that we are spending a bunch of time on,” Weil said. “Trying to make sure that the model has some sort of epistemological humility.”

The company aims to develop an autonomous research agent by 2028, but for now, the focus is on augmenting human capabilities rather than replacing them. Whether this measured approach will satisfy critics who worry about AI-generated “slop” flooding scientific literature remains to be seen. But one thing is clear: the way we write code and potentially the way we do science is changing faster than most people anticipated.

As conversations in Silicon Valley increasingly turn to what comes after coding agents, OpenAI’s bet is that the answer lies in applying the same agent-based architecture to new domains. If Prism succeeds in making scientists as productive as Codex has made programmers, 2026 may indeed be remembered as the year AI transformed scientific research.