Appshots: Inside OpenAI Codex's New "Command-Command" Trick for macOS

On May 21, 2026, OpenAI quietly slipped a new feature into the Codex desktop app that may quietly change how a lot of Mac developers feed context to an AI coding agent. It is called Appshots, and at the surface it sounds almost too simple to write 2,400 words about: press both Command keys, and the frontmost Mac app window gets attached to a Codex thread — screenshot plus text — without copying, pasting, or describing anything.

But the more you read OpenAI’s actual documentation, the 9to5Mac coverage, and the Apple platform primitives this thing is sitting on top of, the more interesting the design choices become. Appshots is not just “screenshot to chat.” It is OpenAI’s bet on a very particular pattern of human–AI handoff on the desktop, and it has real consequences for privacy, enterprise governance, and how the Codex app is positioning itself against Anthropic’s Claude computer use, GitHub Copilot, and Cursor.

Here is the full breakdown.

What OpenAI actually shipped

Appshots arrived as part of Codex app build 26.519, listed in the Codex changelog on May 21, 2026. The changelog entry is short and direct:

“Press both Command keys to send the frontmost app window to Codex with a screenshot and available text, so Codex can work from context in another app without you copying, pasting, or describing it manually.”

The official Codex app overview lists Appshots as a first-class app feature alongside Computer Use, the in-app browser, worktrees, and the Chrome extension. The dedicated Appshots documentation page — which is the canonical reference — adds a few important details:

Trigger: press both Command keys, or a custom hotkey you set in Codex preferences.
Capture: the frontmost window only, not the whole desktop or all windows.
Payload: an image of the visible window plus available text, including text the app exposes outside the visible scroll area.
Threading: by default an Appshot starts a new thread, but if you interacted with a thread in the last 60 seconds, the Appshot joins that recent thread. Consecutive Appshots stack into the same thread.
Storage: an Appshot behaves like a normal Codex attachment and is stored locally in the session file.
CLI: you can resume a thread that already has an Appshot in the Codex CLI, but the CLI cannot create new Appshots.
Permissions: macOS may prompt for Screen & System Audio Recording and Accessibility.

That last detail is more revealing than it looks. We will come back to it.

Why this is not “just OCR”

Most “send screen to AI” features in 2025 and early 2026 were variants of the same recipe: take a bitmap of a region, run OCR (or a vision model) on it, send the result to a chat. Appshots looks like that, but it is not quite that.

The giveaway is the permissions model. To work end-to-end, Appshots asks for both Screen Recording and Accessibility. Screen Recording on macOS is what you need to capture the bitmap; Accessibility is what you need to read structured UI text from another app — labels, button titles, focused-window contents, and crucially text outside the current viewport, when the app exposes it through Apple’s accessibility tree.

In other words, Appshots almost certainly uses a hybrid capture path:

A visible-window screenshot, likely via Apple’s modern ScreenCaptureKit stack, which Apple positions as the official replacement for older screen-capture APIs and supports per-window capture with configurable resolution and quality.
A structured-text pull via Accessibility APIs — built on primitives like NSWorkspace.frontmostApplication and kAXFocusedWindowAttribute — that lets Codex see content the user has not actually scrolled into view.

OpenAI’s docs effectively confirm this by saying the text payload “includes content beyond what’s visible onscreen.” 9to5Mac quotes OpenAI using almost the same wording: “Codex gets both a screenshot and text from the window, including content beyond what’s visible onscreen.”

That is a meaningfully different design than naive screenshot-OCR. It also has a meaningful consequence: Appshots work best with apps that have strong accessibility implementations. Well-instrumented AppKit/WebKit apps will surface a lot of text; custom-rendered surfaces (Electron with weak a11y, canvas-heavy editors, game engines, custom OpenGL UI) will degrade toward “screenshot only” understanding. OpenAI publishes no accuracy benchmark for either case.

The user flow, in five steps

The actual user experience is intentionally boring, which is the point:

Open the Codex app on macOS.
Bring the app window you want to share into focus.
Press ⌘⌘ (or your custom hotkey).
Grant the macOS permission prompts the first time.
Ask Codex what to do with the captured context.

By default, the Appshot opens a brand-new thread. The 60-second “recent thread” rule is a small UX flourish that matters more than it looks: it lets you take three or four Appshots in a row — say, three states of a buggy UI — and have Codex reason about all of them in one conversation, without you having to manually route each capture to the right place.

There is no documented preview-before-send sheet, no built-in redaction pass, and no “detach Appshot” UI in the public docs. Once captured, an Appshot is treated like any other attachment in the Codex thread.

Where Appshots sits in OpenAI’s stack

The most useful way to understand Appshots is by what it is not. OpenAI now ships three overlapping “give Codex visual context” primitives on macOS, and Appshots is the lightest of the three.

Surface	What it does	How it differs from Appshots
Manual image upload	Drag/drop an image file into a prompt	File-based, no off-screen text, no hotkey
In-app browser	Lets Codex see and comment on local dev servers and public pages	Browser only, no signed-in flows
Computer Use	Lets Codex click and type in Mac apps	Action-oriented, riskier, geo-restricted (no EEA/UK/Switzerland at launch)
Chronicle	Continuous, ambient screen capture as memory	Persistent, Pro-only, stored as unencrypted local Markdown
Appshots	On-demand frontmost-window capture + text	Passive context, no actions, one-shot

Two things stand out. First, Appshots is passive: it gives Codex something to reason about; it does not move your mouse or send keystrokes. Second, it is on-demand: unlike Chronicle, which TNW’s in-depth report on Codex Chronicle notes “periodically captures screenshots” and writes summaries to disk, Appshots only fires when you press the hotkey.

If Chronicle was OpenAI’s bet on ambient desktop AI — and a controversial one, given that Chronicle’s screen captures get processed in the cloud and its local memory files are stored unencrypted — then Appshots is the deliberate, scoped, “user-in-the-loop” counterpoint.

The privacy story is more complicated than it looks

Press ⌘⌘. The frontmost window goes to Codex. It feels frictionless. That is exactly the problem.

The convenience is real, but it removes the natural pause where a user might crop a screenshot, hide a sidebar, or think twice. And because Appshots can include text outside the current viewport, the model can end up reading more than the user can currently see on screen — the rest of a long email thread, the next 200 rows of a spreadsheet, the bottom of a CRM record.

OpenAI’s mitigations for this today are mostly manual and policy-based:

The capture is user-invoked (no background firing).
It targets a single frontmost window, not the full desktop.
The Appshots docs explicitly warn users to avoid using Appshots on sensitive content unless the task requires it.

What is not publicly documented, as of the launch, is any Appshots-specific automatic redaction, PII detection, field masking, or pre-send privacy review. There is also no documented Appshots-only admin disable switch in the Codex managed-configuration material — even though Computer Use already has documented feature pins that admins can use.

For organizations, the real privacy layer sits at the workspace level. According to OpenAI’s enterprise privacy page, Codex within Business, Enterprise, and Edu workspaces inherits commitments like:

No training on business data by default.
AES-256 encryption at rest, TLS 1.2+ in transit.
RBAC, SAML SSO, MFA, audit logging via Compliance API.
Retention controls (Business: user-controlled; Enterprise/Edu: admin-controlled; deleted conversations removed within 30 days unless legally required otherwise).
Data residency in multiple supported regions for eligible Enterprise/Edu customers, with inference residency limited to the US, Europe, and UAE.

Those are strong baseline guarantees. But none of them are Appshots-specific. If you are a privacy officer, the honest answer is that Appshots inherits the security posture of Codex Local — which is good — without yet getting a dedicated control surface of its own. For now, governance comes through user training, workspace policy, and (if needed) blocking the Codex app outright in higher-risk contexts.

What developers can — and can’t — do with it

The bluntest finding here: Appshots is not (yet) a public API.

OpenAI’s docs are explicit that the feature is created from the Codex app on macOS, and the Codex CLI can only continue threads that already contain an Appshot. There is no take_appshot endpoint in the Codex CLI reference, no documented hook in the Codex SDK, and no MCP tool that synthesizes an Appshot programmatically.

What developers can do is use the adjacent surfaces:

The Codex app-server exposes a JSON-RPC interface for thread management, approvals, and streamed events. It accepts image inputs at the thread level, so a custom client can mimic some of what Appshots delivers — without the native frontmost-window targeting or accessibility-text extraction.
The Codex SDK (TypeScript, with an experimental Python SDK) is good for custom orchestration around threads, not for triggering Appshots.
The Codex CLI supports attaching image files via flags, which is a reasonable manual substitute on platforms where Appshots is not available — like Windows.
The Codex GitHub Action runs Codex in CI/CD and is the right surface for post-capture review workflows, not for capture itself.

There is also a clean architectural boundary worth flagging. OpenAI’s enterprise documentation defines Codex Local (the app, CLI, and IDE extension running in a sandbox on the developer’s machine) versus Codex Cloud (hosted agent features). Appshots is unambiguously a Codex Local capability — it depends on macOS permissions, the macOS accessibility tree, and local frontmost-window detection. There is no sign Appshots can be produced by Codex Cloud or by a remote agent without a macOS host.

Plan availability and rollout context

Appshots ships inside the Codex app, which is included on ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. OpenAI’s launch communications around the broader Codex app rollout also extended limited-time Codex access to Free and Go users. Pricing details — including Pro at $100+/month with higher limits — are listed on the Codex pricing page.

Crucially, there is no Windows Appshots at launch. The Codex app itself is now available on both macOS and Windows, but Appshots requires the macOS accessibility and screen-recording stack. Windows users get the Codex app and most other features; they just don’t get the ⌘⌘ trick.

The wider rollout timeline puts Appshots in context:

Feb 2, 2026 — Codex app launches on macOS.
Mar 4, 2026 — Windows support added.
Apr 16, 2026 — “Codex for (almost) everything” lands Chronicle, Computer Use, in-app browser, and 90+ plugins.
May 14, 2026 — Remote access via the ChatGPT mobile app.
May 21, 2026 — Appshots, Goal mode GA, remote locked Computer Use, plugin sharing.

Read together, the trajectory is clear: OpenAI is steadily turning Codex into a multimodal, situated, ambient coding agent. Appshots is the most “considerate” piece of that puzzle — the one where the human stays explicitly in the loop.

How it stacks up against the competition

The closest competitor capability is Anthropic’s Claude computer use tool, which gives Claude full screenshot capture plus keyboard and mouse control via API. That is more powerful than Appshots — it can drive applications, not just look at them — but it is also more API-oriented and more dangerous. Appshots is deliberately narrower: a global hotkey on macOS that hands one window’s contents to a thread, full stop.

GitHub Copilot Chat and its cloud agent both support image attachments in prompts, including screenshots and Figma exports. The visual-context value is similar, but the activation model is different: Copilot wants a file or an issue attachment, not a global Mac hotkey targeting whichever window happens to be in focus. Cursor’s agent prompting docs similarly support image inputs but don’t, in the current public docs, expose anything like Appshots’ off-screen text capture.

In other words, Appshots is not the most powerful visual-context feature on the market — but it may be the least friction one. That alone could make it a default habit for macOS-first dev teams.

Open questions

There is a non-trivial list of things OpenAI has not yet published about Appshots, and any honest write-up has to flag them:

No public screenshot resolution cap or quality preset. Apple’s ScreenCaptureKit configuration supports flexible capture parameters, but Codex’s exact settings aren’t disclosed.
No published OCR or accessibility-text accuracy benchmark.
No latency SLA for capture-to-attachment-to-inference.
No Appshots-specific admin disable switch in the published managed-configuration docs, despite Computer Use and Browser Use having documented pins.
No official Appshots-specific minimum macOS version.
No public Appshots SDK or API entry point.

These gaps don’t make Appshots a bad feature. They make it a young feature, and they are the kind of details that enterprise security reviews will absolutely ask about over the next quarter.

The bigger picture

Strip away the implementation details and Appshots is OpenAI making a quiet philosophical bet: that the most valuable place to apply AI is not in trying to be the user (Computer Use, agents), and not in trying to watch the user (Chronicle), but in collapsing the moment between “I see something on my screen” and “the model also sees it.”

If you’ve ever spent two minutes explaining to a chatbot what is in the error dialog, what the row in the dashboard says, what fields a form is asking for — Appshots is aimed exactly at that two minutes. It’s not glamorous. It is, at minimum, going to save real time for real macOS developers, designers, PMs, and analysts who already live in the Codex app.

Whether OpenAI extends it with redaction tooling, an admin pin, a Windows equivalent, an API, and per-app allowlists will determine whether Appshots stays a clever shortcut — or becomes the default way humans hand context to AI on the desktop.

For now, it’s a ⌘⌘ away.