How spending as many AI tokens as possible became a badge of honor, a performance metric, and a cultural flashpoint all at once.
There’s a new scoreboard in Silicon Valley, and it doesn’t measure lines of code, pull requests merged, or products shipped. It measures tokens — the atomic unit of computation that AI models process every time an engineer types a prompt.
The behavior has a name: tokenmaxxing. And in the spring of 2026, it exploded from an inside-baseball developer conversation into a full-blown cultural debate about productivity, performance measurement, and what it even means to “use AI” at work.
The New York Times broke the story in March 2026, reporting that at companies including Meta and OpenAI, employees were competing on internal leaderboards tracking how many AI tokens they consumed. The leaderboards weren’t just passive dashboards — they handed out titles, ranked employees, and in some companies, fed directly into performance reviews. Token use had become a proxy for ambition, a signal of being “AI-native,” and a competitive metric in a workforce increasingly defined by its relationship to large language models.
The reaction was immediate and split. Some of the most prominent names in tech — Nvidia CEO Jensen Huang, Meta CTO Andrew Bosworth, Y Combinator’s Garry Tan — embraced the concept enthusiastically. Others, including venture capitalists, newsletter writers, and engineers who’d seen this movie before, called it one of the most easily gameable, least meaningful metrics anyone had ever tried to use to measure human performance.
This is the story of tokenmaxxing: where it came from, what it actually means, who’s driving it, who’s fighting it, and what it tells us about the current moment in AI.

First, What Is a Token?
Before you can understand tokenmaxxing, you have to understand what a token is — and most people, even those who use AI tools daily, are hazier on this than they think.
When you type a message into an AI chatbot, your words don’t arrive as words. They arrive as tokens — small numerical units that the model processes through its neural network. OpenAI estimates that one token is roughly equivalent to about four characters of text, or about three-quarters of a word. A single short prompt — say, “What’s the capital of France?” — might use around 15–20 tokens. A rich, multi-paragraph prompt pasting in a full document for analysis might use thousands.
Tokens flow in both directions. The model receives input tokens (your prompt, any documents you paste in, the system instructions the app has pre-loaded) and generates output tokens (the response). Together, these are counted as the total token usage for any given interaction, and that number is how AI providers charge for their services.
The pricing varies by model and provider, but the structure is consistent: Anthropic, OpenAI, and Google all charge per million tokens consumed. More tokens equals more cost. This is why tokens are currency — not just metaphorically, but literally.
Context windows — the total number of tokens a model can hold in its “working memory” at once — have expanded dramatically in recent years. Early GPT-3 had a context window of around 4,000 tokens. Today’s frontier models routinely handle 128,000 tokens or more; Google’s Gemini 1.5 Pro has demonstrated a context window of one million tokens. This expansion is part of what has made token spending possible at scale: there’s simply a lot more room to fill.
Understanding that tokens are finite but expanding resources, priced per unit, charged to whoever is running the query — that’s the foundation you need to understand why “how many tokens you burn” has become a flashpoint.
The Linguistics of “Maxxing”
The “-maxxing” suffix deserves a moment of attention, because it’s doing real cultural work in this word.
“Maxxing” entered mainstream internet vocabulary through communities focused on self-optimization — most famously in terms like looksmaxxing (optimizing physical appearance), gymmaxxing (maximizing workout gains), and sleepmaxxing (optimizing sleep quality). The suffix implies obsessive, systematic, often competitive maximization of a particular trait or behavior. It carries a slight ironic edge — acknowledging the extremity of the optimization while also genuinely endorsing it.
Applying it to tokens — tokenmaxxing — captures something specific: not just using AI, but using it maximally, visibly, and competitively. The word implies that token consumption is something you can be good or bad at, that more is better, and that there’s a community of people racing each other to the top.
That framing is exactly what makes tokenmaxxing both exciting to its practitioners and alarming to its critics. If looksmaxxing is about trying harder to improve yourself, tokenmaxxing is about trying harder to improve — or at least appear to improve — your AI-assisted output.
The Leaderboard That Broke the Internet: Claudeonomics
The most vivid example of tokenmaxxing in the wild isn’t abstract — it has a name, a dataset, and a story.
As Fortune reported on April 9, 2026, a Meta employee independently built an internal leaderboard on the company’s intranet and named it Claudeonomics — a portmanteau of “Claude” (Anthropic’s AI model, widely used inside Meta despite the company having its own Llama models) and “economics,” reflecting that the dashboard was fundamentally about the economics of compute.
The leaderboard tracked AI token consumption across more than 85,000 Meta employees. It showed the top 250 token users and awarded them gamified titles: Token Legend for the highest achievers, Session Immortal for extraordinary session duration, Model Connoisseur for breadth of model usage, Cache Wizard for efficiency in reuse.
The numbers were staggering. According to data compiled from the dashboard, in a single 30-day period, Meta employees collectively consumed 60 trillion tokens. The top individual user alone averaged 281 billion tokens per day. At public pricing for Claude Opus 4.6, that one user’s consumption could theoretically have cost over $1.4 million in a single month.
And then, just two days after The Information broke the story, Claudeonomics went dark. The employee who built it posted a note: “We’ve really enjoyed building this app on Nest for everyone. It was meant to be a fun way for people to look at tokens, but due to data from this dashboard being shared externally, we’ve made the decision to shutter Claudeonomics for now.”
Meta told Fortune that “the employee took down the dashboard at their discretion; Meta did not request this action.” The company’s official AI usage dashboards for software engineers remain active, per reporting by The New York Times.
The two-day lifespan of Claudeonomics became, in itself, a cultural artifact. The speed with which it was built (a bottom-up employee initiative), the speed with which it went viral (instantly, the moment external media covered it), and the speed with which it was pulled down (within 48 hours of becoming news) all crystallized something real about the current moment in tech culture: tokens matter enough to compete over, but the competition is uncomfortable to be seen having.
The Executives Who Built the Culture Around It
Claudeonomics didn’t emerge from nowhere. It was the logical product of a culture that had been explicitly shaped, over months, by some of the most influential voices in enterprise technology.
Jensen Huang, Nvidia CEO, has been the most prominent advocate for treating token spending as a proxy for engineer productivity. Speaking at Nvidia’s GTC conference in San Jose in March 2026, Huang laid out his vision: “I could totally imagine in the future every single engineer in our company will need an annual token budget. They’re going to make a few 100,000 a year as their base pay. I’m going to give them probably half of that on top of it as tokens so that they could be amplified 10 times.”
Days later, Huang went further, saying he would be “deeply alarmed” if an engineer earning $500,000 a year didn’t consume at least $250,000 worth of tokens. The implication was stark: token spending isn’t a nice-to-have indicator of AI engagement. It’s a threshold. Fall below it and your value as an engineer is in question.
Andrew Bosworth, Meta CTO, made the same argument in concrete terms. At a tech conference in February, Bosworth described a top Meta engineer who spent the equivalent of his entire salary on AI tokens annually and had achieved roughly five to ten times productivity as a result. “It’s a no-brainer deal,” Bosworth said. “Keep doing it, with no upper limit.”
Garry Tan, CEO of Y Combinator, offered a three-word endorsement that became widely shared. Quoting a post criticizing companies that are “stingy” with token budgets, Tan replied: “We’ve been tokenmaxxing longer than most people.”
Andrej Karpathy, a former AI scientist at Tesla and OpenAI who now leads an AI education startup, articulated the aspirational version of the concept cleanly. On a podcast, Karpathy distilled the philosophy: “The name of the game is tokens. How can you maximize your token throughput and not be in the loop.” The phrase “not be in the loop” is key — Karpathy was arguing that the goal is to maximize autonomous AI action, with humans stepping back to supervise rather than directly intervening in each step.
Ali Ghodsi, CEO of Databricks, brought similar energy to his own engineering organization. According to the Engineering Leadership newsletter, Ghodsi highlighted an engineer who had spent more than $7,000 worth of AI tokens in just two weeks in January 2026. Rather than flagging this as excessive, Ghodsi used it as a positive example — calling for the whole engineering team to recognize and applaud this behavior.
And then there’s the institutional structure Meta built that gave Claudeonomics its legitimacy. Last year, Meta’s Chief People Officer Janelle Gale told employees that “AI-driven impact” would be a “core expectation” in 2026, according to Business Insider. In January, the company overhauled its performance review system to incentivize the highest performers with upward of 200% bonuses. Token usage didn’t exist in a vacuum — it existed inside a performance system that explicitly rewarded AI integration.

The Argument for Tokenmaxxing
Let’s steelman the case, because it’s not without merit.
At its core, the tokenmaxxing hypothesis is an economic argument. AI tokens have a cost. Human engineers also have a cost. If deploying tokens as a substitute or amplifier for human cognitive effort produces more output per dollar than adding headcount, then maximizing token usage is the rational strategy.
The Bosworth example — a single engineer who spends his entire salary on AI tokens and delivers five to ten times the output — isn’t implausible on its face. If true (and we’ll return to the measurement problem), it represents one of the most favorable cost structures in the history of knowledge work: you spend $400k on salary and $400k on tokens and get the equivalent output of five or ten engineers earning $400k each.
There’s also a genuine cultural logic to the leaderboard approach. In large organizations, adoption of new tools is a coordination problem. Not everyone will adopt AI tools just because they’re available. Leaderboards create social proof, competitive pressure, and visibility. A high-token engineer signals — to herself and to her peers — that she’s figured out how to integrate these tools into her workflow at scale. If high-token usage correlates with high AI fluency, the leaderboard is a rough measure of organizational AI maturity.
The token-as-productivity-proxy also fits neatly into a view of engineering work as a process of continuous iteration. A developer who generates 100 candidate solutions, evaluates them quickly, and selects the best one may be more productive than one who labors manually over a single carefully constructed solution. More tokens can mean more iterations, more options explored, more ground covered per unit of time.
The Hard Fork podcast, hosted by New York Times journalists covering AI, captured this tension well: there’s a version of tokenmaxxing that reflects genuine productivity transformation, and a version that is purely performative. The problem is that they look identical from the outside.
The Case Against
The critics of tokenmaxxing, it turns out, are just as formidable as its advocates.
Cristina Cordova, COO of Linear, put the counterargument in a single sentence on X: “Ranking engineers by token spend is like me ranking my marketing team by who spent the most money. Don’t mistake a high burn rate for a high success rate.”
The analogy is sharp. Marketing teams have budgets, but no competent manager rewards a marketer simply for spending more. The question is what the spending produced. Token spending, divorced from output quality and business impact, is a pure input metric.
Khosla Ventures partner Jon Chu was harsher: “Ranking engineers by token spend is an absolutely stupid policy.” His evidence was anecdotal but telling: “Plenty of my Meta friends told me folks have been building bots that just run in a loop burning tokens as fast as they can due to this policy.” The gamification of token leaderboards — which Claudeonomics exemplified — predictably produces gaming behavior. If the metric is tokens consumed, the rational response for a reward-maximizing employee is to consume tokens whether or not that consumption produces value.
Gergely Orosz, author of The Pragmatic Engineer newsletter (one of the most widely-read engineering industry publications), was blunt: “Devs game everything and anything seen as a target for more bonus or promos. This was no different.”
This is a direct application of Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure. The history of software engineering metrics is littered with examples — lines of code (produces bloated, verbose code), story points (produces inflated story estimates), commit counts (produces meaningless micro-commits). Tokens now join a long list of metrics that seemed like good proxies until they became targets, at which point they stopped being good proxies.
Edwin Wee Arbus, an employee at Cursor, offered a more measured take: tokenmaxxing is “a useful, fast proxy, but slightly flawed.” He compared it to BMI — a metric that provides genuine health signal but doesn’t capture muscle composition, age, or baseline health. You’d be foolish to dismiss BMI entirely; you’d also be foolish to optimize purely for it.
Chester Zelaya (@chesterzelaya) made what might be the most interesting counter-argument on X: “tokenmaxxing is the most heinous heuristic I have ever heard — in fact, I’d argue that the better engineer can solve the problem with less tokens.” This inverts the tokenmaxxing logic entirely. Efficiency — precision, getting the right answer with minimal compute — might be a better signal of AI fluency than volume.
Arush Shankar, a software engineer who previously worked at Square and Microsoft, offered the most calibrated framing: “Token spend is always an output not an input. Worth looking at, but never in isolation. It’s a signal but not THE signal.”
The Gaming Problem in Practice
It’s worth dwelling on the gaming problem in more concrete terms, because it’s not just theoretical.
According to reporting by The Information (covered by Fortune), some Meta employees were leaving AI agents running for hours — continuously performing research tasks with no human supervision — specifically to inflate their position on the Claudeonomics leaderboard. These agents weren’t producing valuable output. They were burning tokens the way an idle car engine burns fuel: generating heat, producing nothing useful, inflating a number.
The Engineering Leadership newsletter collected similar accounts from engineers at other companies: “The best way to rack up tokens seems to be keeping a chat context going for a long time, telling it to read tons of code (multiple repos for extra points), and pasting as much code or text into the chat as you can.” Another engineer: “It’s official in the company I work for, but in our case, if you don’t reach an AI usage threshold each week, you are fired.”
That last data point is alarming. A mandate to reach a minimum AI usage threshold under penalty of termination is not a productivity policy. It’s a compliance theater policy — it guarantees that employees will find ways to consume tokens, not that they’ll produce better work.
The Business Insider coverage captured a useful summary quote from the fintech company Ramp, which called rising AI spending a “$1 trillion blind spot,” citing Gartner data showing that monthly AI spending among businesses has quadrupled over the last year. That’s a real trend. Whether that quadrupling of spending represents a quadrupling of productive AI use, or a quadrupling of poorly-measured, compliance-driven token burning, is the central question that tokenmaxxing doesn’t answer.
What High Token Users Actually Do
Separate the performance anxiety from the practice and a more interesting question emerges: what does it actually look like when someone genuinely uses AI to transform their productivity? What are the high-token use cases that actually make sense?
The most legitimate high-token workflows tend to share a common structure: they involve loading large amounts of context before asking a question, because the quality of AI output scales meaningfully with the quality and quantity of the information provided.
For software engineers, this means pasting in entire codebases, full error logs, dependency specifications, and architectural documentation before asking a model to help debug or extend a system. A 1,000-token prompt that lacks context will produce a hallucinated answer; a 50,000-token prompt that includes the relevant codebase will produce a response that understands the actual system. This isn’t tokenmaxxing as performance — it’s tokenmaxxing as engineering judgment.
For researchers and analysts, it means loading complete papers, full datasets with documentation, and prior analyses before asking for synthesis. A model asked to compare three research papers it hasn’t read will produce garbage. A model given all three papers in full will produce genuinely useful comparative analysis.
For writers and content creators, it means loading comprehensive briefs, style guides, prior published work, and audience profiles before asking for drafts. The output quality difference between a sparse prompt (“write a blog post about AI”) and a rich one (here is our editorial style guide, here are 10 examples of our best-performing content, here is the research we want to draw from, here is the audience persona) is not marginal — it can be the difference between publishable work and something that needs to be rewritten from scratch.
This context-loading approach has real scientific grounding. The landmark chain-of-thought prompting paper by Jason Wei, Xuezhi Wang, and colleagues at Google (arXiv:2201.11903) demonstrated that asking models to generate intermediate reasoning steps — which uses more tokens than asking for direct answers — significantly improves performance on arithmetic, commonsense, and symbolic reasoning tasks. “The empirical gains can be striking,” the paper notes. More tokens, in this specific configuration, meant meaningfully better reasoning.
There’s also a well-documented phenomenon in long-context modeling sometimes called the “lost in the middle” problem — the tendency of language models to better attend to information at the beginning and end of long prompts, degrading performance on information placed in the middle. Understanding this means that high-token users who are actually trying to optimize their outputs need to think carefully about where they place key information, not just how much context they include. Volume alone isn’t the answer; strategic volume is.
The Economics: Who Pays and How Much
The tokenmaxxing debate would be purely academic if tokens were free. They aren’t.
Anthropic’s pricing for Claude Opus 4.6, their most capable model, is among the higher-end products in the market. At those rates, the 281-billion-token-per-day figure from Claudeonomics’s top user represents a staggering implied cost — over a million dollars a month, at public API prices, for one person.
Of course, Meta doesn’t pay public API prices. At hyperscale, enterprise agreements and internal infrastructure dramatically reduce per-token costs. But the directional reality remains: someone is paying for those tokens, and that someone is the company.
The Vucense analysis of Claudeonomics estimated that 60 trillion tokens in 30 days would cost approximately $9 billion at public pricing — a number that serves to illustrate scale more than actual cost. Even at 10% of that (a generous enterprise discount), the figure is nearly $1 billion per month, for token consumption at a single company, over a single month.
This is the “$1 trillion blind spot” that fintech firm Ramp flagged — the observation that corporate AI spending is growing faster than the ability to measure whether that spending is producing proportional value. Businesses are quadrupling their AI spend, Gartner data suggests, without necessarily quadrupling their ability to audit the returns.
For individual developers at startups, the math is different and more immediate. An engineer spending “thousands of dollars on tokens every week” — as one xAI employee publicly posted on X in April — is making a real personal or operational bet. If that spend produces ten times the output, it’s a bargain. If it produces twice the output with questionable quality, it’s an expensive habit.
The right framework, as several commentators suggested, is to treat token spending the way you treat any capital expenditure: measure the return, not just the outlay.
Tokenmaxxing Beyond the Enterprise: Individual Practitioners
The corporate leaderboard drama has dominated the media coverage, but tokenmaxxing has a parallel life in the individual developer and prompt engineering community — and there the conversation is substantially more technical.
For developers working with APIs directly, token optimization is an engineering discipline with two opposite schools of thought.
The maximalist school holds that bigger context means better output: more examples in a few-shot prompt, more background context about the codebase, more explicit instructions about edge cases, produce models that behave more reliably and accurately. This is the tokenmaxxing thesis at its most technically defensible. Context is information. More relevant information usually helps.
The minimalist school holds that precision beats volume: clean, tight prompts that focus the model on exactly the right problem, with minimal noise, produce more predictable and controllable outputs. Over-loaded contexts can dilute attention, introduce irrelevant information the model latches onto unhelpfully, and slow down iteration cycles.
Tools like Tiktoken, OpenAI’s open-source tokenization library, allow developers to count tokens before sending them — a practice that serious API users employ to optimize the precision and cost of their prompts. The developer community on r/LocalLLaMA and r/PromptEngineering has produced extensive testing of both approaches, with context-dependent conclusions: the right strategy varies by model, task type, and available budget.
What’s notable is that this technical debate has almost nothing to do with the corporate tokenmaxxing leaderboard debate. The engineers on Reddit arguing about optimal token strategies are optimizing for output quality. The engineers on Meta’s Claudeonomics leaderboard were, at least in part, optimizing for rank. These are fundamentally different objectives that happen to share a vocabulary.
What Tokenmaxxing Reveals About This Moment in AI
Pull back from the specific debate and tokenmaxxing starts to look like a Rorschach test for anxieties about the current AI moment.
For technology optimists, it represents the beginning of a genuine productivity revolution — the moment when AI tools became capable enough, and cheap enough, that organizations could restructure work around them. The leaderboards, the titles, the cultural pressure to maximize token use: these are the clunky early-stage mechanisms of a transition that will ultimately prove transformative. The metric is imperfect, but the underlying behavior change it’s driving is real.
For skeptics, it represents a familiar Silicon Valley pattern: a compelling narrative, a quantifiable proxy, and enormous incentives to perform the narrative rather than deliver the underlying value. The corporate enthusiasm for tokenmaxxing, in this reading, is not evidence that tokens produce proportional value — it’s evidence that in a moment of uncertainty about AI’s actual returns, token spending is the easiest number to point to.
For economists, it raises a genuine and interesting question about how labor markets respond to the availability of cognitive automation. If token budgets become a standard part of engineer compensation — as Jensen Huang explicitly envisioned — then the structure of knowledge work changes in ways we haven’t fully mapped yet. Engineers become managers of AI agents. Their value is less in doing technical work and more in knowing what work to assign, what context to provide, and how to evaluate the output.
For anyone who has lived through previous productivity metric manias in tech — lines of code, story points, commit counts, meeting attendance — tokenmaxxing triggers immediate and understandable skepticism. Not because AI isn’t powerful, but because the history of measuring knowledge work productivity is a history of metrics that worked until they became targets and then broke.
The Productive Middle Ground
The most useful perspectives on tokenmaxxing come from people who resist the binary — neither cheerleaders nor doomsayers.
Cursor employee Edwin Wee Arbus’ BMI analogy is worth returning to. A good health system doesn’t maximize BMI or minimize it — it monitors it as one signal among many, uses it to prompt further investigation, and evaluates it in context. A good AI productivity system would do the same with token usage: it’s a real signal, it tells you something, but it doesn’t tell you everything, and optimizing purely for it produces distorted behavior.
Arush Shankar’s framing is similarly useful: “Token spend is always an output not an input. Worth looking at, but never in isolation. It’s a signal but not THE signal.”
What does the right framework look like in practice? Probably something like this: track token consumption at an organizational level as one metric among several. Pair it with output metrics — code quality, customer satisfaction scores, product delivery speed, defect rates. Look for correlation between high token use and high output quality, and investigate cases where the correlation breaks down (which is where you’ll find the gaming). Use token data to identify developers who might benefit from more AI tool training, and to identify workflows where AI integration hasn’t taken hold yet.
What the right framework explicitly does not look like: a leaderboard, a minimum usage threshold tied to continued employment, or a public competitive ranking of individual token consumers.
What Comes Next
The immediate trajectory of tokenmaxxing as a phenomenon seems likely to follow a well-worn arc: peak hype, backlash, recalibration, and eventual normalization into something more nuanced.
The backlash is already visible. The NYT Hard Fork podcast raised the question of whether this represents “AI-washing” — the performance of AI engagement rather than genuine AI transformation. The fact that Claudeonomics was taken down within 48 hours of becoming public news suggests even its creators recognized it had crossed a line. The volume of criticism from respected voices — Orosz, Cordova, Chu — suggests the token-as-productivity-metric will face increasing scrutiny.
The recalibration will likely produce something more useful: organizations that genuinely want to measure AI adoption will develop more sophisticated composite metrics, combining token usage data with output quality measures, project delivery data, and employee self-reporting. Companies like Anthropic, OpenAI, and Google, who have every incentive to help enterprise customers understand the value of their token spend, will likely develop better analytics tools for this purpose.
What won’t change is the underlying reality that tokens — as the currency of AI inference — will become an increasingly significant line item in enterprise technology budgets. Whether organizations treat that expenditure with the same rigor they apply to hiring, software licensing, or infrastructure spending will determine whether the tokenmaxxing era is remembered as a moment of genuine transformation or an expensive detour.
Conclusion: The Signal in the Noise
Tokenmaxxing is, at its best, a clumsy early articulation of something genuinely important: that in an era of powerful AI tools, how much you engage with those tools matters, and organizations that figure out how to maximize productive AI integration will have real advantages over those that don’t.
At its worst, it’s another chapter in the long history of tech industry metric mania — the substitution of a quantifiable proxy for the harder, less legible work of measuring whether anything valuable is actually being produced.
The truth, as usual, is somewhere between those poles. Sixty trillion tokens consumed in thirty days at Meta isn’t purely noise — it represents an enormous amount of human-AI interaction, and some fraction of that interaction is producing genuine value. The question that no leaderboard has yet answered, and that the industry badly needs to answer, is what fraction.
As Persona engineer Arush Shankar put it: token spend is a signal, not the signal. The organizations that thrive in the next decade of AI won’t be the ones that burned the most tokens. They’ll be the ones that best understood the difference.






