How People Use ChatGPT: A Deep Review of the First Major Study of LLM Behavior at Scale

A comprehensive review of NBER Working Paper No. 34255 by Chatterji, Cunningham, Deming, Hitzig, Ong, Shan, and Wadman (September 2025)

The Paper Nobody Knew We Needed

When ChatGPT launched on November 30, 2022, it triggered one of the most rapid technology adoption events in recorded history. Within five days, it had more than one million registered users. Within two years, it had 700 million weekly active users — roughly 10% of the world’s adult population, collectively firing off 18 billion messages per week, or approximately 29,000 messages every second.

And yet, despite this staggering scale, almost nobody knew what people were actually doing with it.

That is the central problem that “How People Use ChatGPT” — a new NBER Working Paper by Aaron Chatterji (Duke/OpenAI), Thomas Cunningham (OpenAI), David J. Deming (Harvard/NBER), Zoe Hitzig (OpenAI/Harvard), Christopher Ong (Harvard/OpenAI), Carl Yan Shan (OpenAI), and Kevin Wadman (OpenAI) — sets out to solve. Published in September 2025, this is the first economics paper to use internal ChatGPT message data, and the resulting findings challenge many of the most popular narratives about how generative AI is reshaping work, society, and the economy.

What they find is surprising, nuanced, and sometimes deeply counterintuitive. The picture that emerges is not of a world in which AI is automating jobs, writing code for armies of developers, or serving as a digital therapist to a lonely population. Instead, what people mostly use ChatGPT for is getting advice, getting information, and getting help with writing — activities that look less like task automation and more like having a very knowledgeable, very patient advisor on call around the clock.

w34255 Download

What the Paper Actually Is

Before diving into the findings, it’s worth understanding what kind of research this is — and what makes it distinctive.

Most of what we know about how people use generative AI comes from surveys, where respondents self-report their behavior. As the authors note, citing Ling and Imas (2025), there are good reasons to expect systematic bias in self-reports: people may overstate “productive” uses (like professional work) and understate “frivolous” ones (like asking ChatGPT to write birthday card messages). This paper bypasses those biases entirely.

The research team had access to actual ChatGPT message data — not the content of messages, but classified metadata about approximately 1.1 million randomly sampled conversations from consumer plans (Free, Plus, and Pro) between May 2024 and June 2025. The classification was done entirely by automated LLM-based pipelines, with no human analyst ever reading the content of a user’s message. This is a crucial distinction: the privacy-preserving methodology is not just a legal nicety. It is the feature that makes the research possible at scale.

Two other notable datasets power the analysis: aggregate usage statistics covering ChatGPT’s entire consumer history from November 2022 through September 2025 (with basic demographic metadata), and a matched employment dataset for approximately 130,000 users, constructed through a secure Data Clean Room (DCR) that enforced strict aggregation thresholds — no individual-level demographic data was ever accessible to the research team, and no output cell contained fewer than 100 users.

The study was approved by the Harvard IRB (IRB25-0983), and the authors note that all analysis was conducted in accordance with OpenAI’s Privacy Policy.

This is, in short, a methodologically serious and carefully constructed piece of research. Its limitations are real and discussed honestly by the authors, but the core data is more direct and less subject to recall or social desirability bias than any prior public research on the topic.

The Growth Story: 700 Million Users and Counting

The growth statistics alone are worth sitting with for a moment. When ChatGPT launched in late 2022, it reached one million users in five days — a record at the time. But the trajectory since then has been even more remarkable.

By January 2023, ChatGPT had approximately 20 million weekly active users on consumer plans. By July 2023, that had grown to roughly 50 million. By July 2024, it had crossed 200 million. And by July 2025, it had surpassed 700 million. As the authors note, citing Bick, Blandin, and Deming (2024), this speed of diffusion has no historical precedent for any new technology.

The message volume numbers are, if anything, more staggering. Between July 2024 and July 2025, the total number of messages sent per day grew by a factor of more than five — from roughly 450 million to over 2.5 billion daily messages. By the end of July 2025, OpenAI was generating revenues at an annualized rate of $12 billion, according to Reuters.

One of the most important growth findings concerns not the aggregate numbers but the composition of growth across user cohorts. The authors show that earlier sign-up cohorts have consistently higher usage rates (messages per weekly active user) than newer cohorts — and that usage within every cohort has grown over time.

The paper interprets this dual pattern as evidence of two simultaneous forces: improvements in the underlying models making ChatGPT more capable and useful, and users slowly discovering new applications for existing capabilities. Both forces appear to be at work.

This cohort analysis also matters for a more specific claim the paper makes: the shift toward non-work usage is not primarily a story about who is joining ChatGPT. Even among the very first cohort of users — those who signed up in early 2023, presumably among the most technologically sophisticated early adopters — the share of non-work messages has increased substantially over time. The change is happening within users, not just across user pools.

How ChatGPT is Used: The Taxonomy

The intellectual heart of the paper is a set of automated classification schemes applied to the sampled messages. The authors use five distinct taxonomies, each defined by a prompt passed to an LLM classifier: (1) whether a message is work-related, (2) the topic of the conversation, (3) the type of user intent (Asking, Doing, or Expressing), (4) the O*NET Intermediate Work Activity associated with the message, and (5) an interaction quality metric derived from the user’s subsequent message.

None of these classifiers involve a human reading the messages. Each was validated against human-labeled samples drawn from WildChat, a publicly available dataset of user conversations with chatbots that users affirmatively consented to share for research.

The validation results are reported honestly: performance is strong for binary classifiers (work/non-work Cohen’s κ = 0.83, substantially exceeding human-human agreement of 0.66) but more modest for finer-grained taxonomies like interaction quality (Cohen’s κ = 0.14, consistent with the inherent difficulty of inferring latent satisfaction from text). The authors flag these limitations explicitly rather than burying them.

Work vs. Non-Work

The most headline-grabbing finding in the paper is the shift toward non-work usage. In June 2024, approximately 53% of consumer ChatGPT messages were classified as non-work-related. By June 2025, that share had risen to 73%. Both categories grew in absolute terms — work messages more than tripled from roughly 213 million to 716 million per day — but non-work messages grew nearly eight-fold, from 238 million to 1.9 billion per day.

This finding runs directly counter to the dominant framing in the economics literature on AI, which has focused almost exclusively on labor market effects and workplace productivity. As the authors note, economic models of AI — from Acemoglu (2024) to Eloundou et al. (2025) — have concentrated on the question of what share of job tasks AI can perform. But if nearly three-quarters of actual ChatGPT usage has nothing to do with paid work, then a model focused purely on labor market substitution is missing most of the picture.

The authors connect this to Collis and Brynjolfsson (2025), who use choice experiments to estimate consumer willingness-to-pay for generative AI and arrive at a consumer surplus of at least $97 billion in the United States alone in 2024. The non-work dimensions of AI usage — the tutor who helps a teenager understand calculus, the advisor who helps someone navigate a health insurance decision, the writing assistant who helps a non-native speaker express themselves more fluently — represent genuine welfare gains that do not show up in GDP or productivity statistics.

Conversation Topics: What Are People Actually Talking About?

The paper uses a 24-category classifier (aggregated into 7 broad topic areas) to characterize the content of conversations. The three dominant categories are:

Practical Guidance (28.8% of all messages in June 2025) — This covers tutoring and teaching, how-to advice, creative ideation, and health/fitness guidance. The defining characteristic is that the response is customized to the user and can be refined through follow-up conversation. Tutoring and teaching alone account for 10.2% of all messages — making education one of ChatGPT’s largest single use cases, and a particularly striking finding given how little attention it has received relative to coding or job automation.

Seeking Information (24.4%) — This includes specific factual queries, product searches, recipe lookups, and questions about current events. The authors describe this as “a very close substitute for web search” — and notably, it has grown from 14% to 24% of usage over the study period, suggesting that ChatGPT is increasingly displacing traditional search engines for at least some query types.

Writing (23.3%) — Email drafting, document creation, editing, translation, summarization, and similar tasks. This is the most important work-related category, accounting for 40% of work-related messages in June 2025. Critically, about two-thirds of Writing messages are requests to modify user-provided text rather than to generate new text from scratch — meaning most writing interactions are more like “improve this draft” than “write me an essay.”

Together, these three topics account for roughly 78% of all ChatGPT conversations. The remaining categories — Technical Help (5.1%, which includes coding), Multimedia (7.3%), Self-Expression (5.3%), and Other/Unknown (3.2%) — collectively make up less than a quarter of usage.

Two specific findings deserve emphasis because they push back hard against popular narratives.

Computer programming is a small share of consumer ChatGPT usage. Only 4.2% of messages in the study are related to computer programming. This stands in stark contrast to Handa et al. (2025), who found that 33% of work-related Claude conversations involved computer programming or IT tasks. The authors suggest the discrepancy reflects genuine differences between the ChatGPT and Claude user bases — developers and technical users may disproportionately prefer Claude — but also note that LLM usage for coding has increasingly migrated to specialized tools, APIs, and agentic coding environments outside of ChatGPT’s consumer interface. The implication is that coding-focused estimates of AI’s economic impact may not generalize to AI’s overall footprint.

Companionship and emotional support are also a small share. Only 1.9% of messages fall under “Relationships and Personal Reflection,” and 0.4% under “Games and Role Play.” This directly contradicts Zao-Sanders (2025) in Harvard Business Review, who estimated that “Therapy/Companionship” is the most prevalent use case for generative AI. The authors attribute this discrepancy to Zao-Sanders’s methodology — manual collection and labeling of Reddit posts, Quora threads, and online articles — which would tend to over-represent dramatic and discussable use cases at the expense of mundane, everyday usage. When you look at a representative sample of actual messages rather than online discussions about AI usage, the therapy narrative largely evaporates.

Asking, Doing, and Expressing: A New Taxonomy of Intent

One of the more conceptually novel contributions of the paper is the introduction of an “Asking, Doing, Expressing” taxonomy of user intent. Rather than classifying by topic, this taxonomy classifies by what kind of output the user is seeking.

Asking (49% of messages): The user is seeking information, advice, or guidance to inform a decision. “What should I look for when choosing a health plan?” “What’s the difference between correlation and causation?”
Doing (40% of messages): The user wants the model to produce an output. “Rewrite this email to be more formal.” “Write a Dockerfile for this app.”
Expressing (11% of messages): The user is expressing views or feelings without seeking information or action.

The Asking/Doing distinction maps onto two separate traditions in the economics of AI. Doing conversations correspond to the classic “task-based” model of technological change associated with Autor, Levy, and Murnane (2003) — AI as a tool that performs tasks that would otherwise require human labor. Asking conversations correspond to models of AI as decision support or “copilot,” as developed by Garicano and Rossi-Hansberg (2006), Deming (2021), and most recently Ide and Talamas (2025) in the Journal of Political Economy — AI that improves the quality of human decision-making rather than replacing the human decision-maker.

The trend lines here are particularly illuminating. As of July 2024, Asking and Doing messages were roughly equal in share (both around 44%). By June 2025, Asking had grown to 51.6% while Doing had shrunk to 34.6%. Expressing grew from below 8% to 13.8%. The growth of Asking relative to Doing is consistent with the hypothesis that ChatGPT’s primary economic value lies in enhancing human cognition and decision-making rather than automating outputs.

This interpretation is reinforced by the interaction quality data. Asking messages are substantially more likely to receive a “good” rating (a good-to-bad ratio of 4.05) than Doing messages (2.76). The model is, by this measure, better at advising than at executing — at least from the user’s perspective.

Mapping to the Labor Market: The O*NET Analysis

To connect ChatGPT usage patterns to the structure of the labor market, the authors map messages to the O*NET Occupational Information Network, a U.S. Department of Labor taxonomy that classifies jobs by the skills, tasks, and work activities they require. Specifically, they map messages to 332 Intermediate Work Activities (IWAs) which are then aggregated up to 41 Generalized Work Activities (GWAs).

The results show a striking concentration in information-processing activities. Nearly half of all messages (45.2%) fall under just three GWAs:

Getting Information: 19.3%
Interpreting the Meaning of Information for Others: 13.1%
Documenting/Recording Information: 12.8%

The next tier — Providing Consultation and Advice (9.2%), Thinking Creatively (9.1%), and Making Decisions and Solving Problems (8.5%) — rounds out a picture in which ChatGPT is predominantly being used as an information intermediary and thinking partner rather than as an autonomous executor of physical or digital tasks.

For work-related messages specifically, the picture shifts somewhat toward execution and creativity. Documenting/Recording Information is the top GWA at 18.4%, followed by Making Decisions and Solving Problems (14.9%), Thinking Creatively (13.0%), and Working with Computers (10.8%). But the fundamental pattern holds: ChatGPT at work is mostly about knowledge work — getting, interpreting, organizing, and applying information.

Perhaps the most remarkable finding in this section is the similarity across occupations. Making Decisions and Solving Problems appears in the top two GWAs for nearly every occupation group. Documenting and Recording Information is in the top four across all groups. Thinking Creatively is third in ten of the thirteen occupational groups where at least three GWAs can be ranked. Even Working With Computers, predictably, is most common in computer-related occupations, but the general picture is that ChatGPT is being used for the same basic cognitive functions across a remarkably wide range of job types — from management to education to food service.

The authors interpret this as evidence that “obtaining, documenting, and interpreting information” and “making decisions, giving advice, solving problems, and thinking creatively” are two broad functions that ChatGPT serves across the economy as a whole. The tool is not specialized. Its applicability is general.

Who Uses ChatGPT: Demographics and Inequality

Gender

The gender analysis is one of the more surprising demographic findings in the paper. Using a first-name-based classification approach drawing on the World Gender Name Dictionary and other public datasets — a methodology used in prior academic work including Hofstra et al. (2020) and West et al. (2013) — the authors find that in the first months after ChatGPT’s launch, approximately 80% of weekly active users had typically masculine first names. By June 2025, that figure had reversed: more than half of weekly active users had typically feminine first names.

This is a remarkable shift and one that has received almost no attention in public discussion. The technology that was widely described as a tool for coders and techies — a predominantly male demographic — has, within three years, become one that women use at least as much as men, and possibly more.

There are meaningful differences in how different genders use ChatGPT. Users with typically female first names are relatively more likely to send messages related to Writing and Practical Guidance. Users with typically male first names are more likely to use ChatGPT for Technical Help, Seeking Information, and Multimedia (e.g., image creation and modification). These patterns are consistent with broader gender differences in occupational distribution and communication styles, but the headline finding — parity in overall usage — is a major empirical fact that contradicts many assumptions embedded in public discourse about AI adoption.

Age

Among users who self-report their age, the study finds that approximately 46% of all messages are sent by users between 18 and 25. This is a dramatic concentration among young adults, though the authors note that age gaps have narrowed somewhat in recent months.

The work-relatedness of messages increases with age, rising from 23% for users under 26 to 29–31% for users in the 26–55 age range, before dropping to 16% for users 66 and older. This age gradient is unsurprising — younger users are more likely to use ChatGPT for learning and personal tasks — but it reinforces the finding that most ChatGPT usage is not work-related.

Geography: Faster Growth in Lower-Income Countries

Plotting ChatGPT weekly active users as a share of the internet-enabled population against GDP per capita, the paper finds that wealthier countries had higher ChatGPT adoption in May 2024 — but that by May 2025, growth had been disproportionately fast in lower- and middle-income countries (those with GDP per capita in the $10,000–$40,000 range). This pattern has implications for how we think about AI’s global economic impact: if adoption is accelerating fastest in countries with less access to expensive professional services (lawyers, doctors, financial advisors), the welfare gains from AI-as-advisor could be particularly large in those settings.

Education and Occupation: Who Uses It for Work

The employment analysis, conducted through the secure Data Clean Room, confirms and extends findings from prior surveys. Educated users are significantly more likely to use ChatGPT for work: 37% of messages are work-related for users without a bachelor’s degree, rising to 46% for bachelor’s degree holders and 48% for those with graduate education. These differences remain statistically significant even after regression adjustment for occupation, age, gender, seniority, company size, and industry, though they shrink by about half.

Occupation shows even starker patterns. Computer-related occupations have the highest work-related share at 57%, followed by management/business (50%), engineering/science (48%), other professional occupations like law and healthcare (44%), and non-professional occupations (40%). The gaps remain highly significant after regression adjustment.

Within work-related usage, professionals in technical occupations are more likely to send Asking messages (47% in computer-related fields vs. 32% in non-professional occupations). Writing accounts for half or more of work-related messages for management/business (52%), non-professional (50%), and other professional occupations (49%), while Technical Help accounts for 37% of work-related messages in computer-related occupations.

The Economic Interpretation: AI as Decision Support

The paper concludes with a theoretical interpretation of its findings that deserves careful attention. The question they ask is: how does ChatGPT provide economic value, and for whom is that value greatest?

The authors’ answer is that ChatGPT primarily provides value through decision support — improving the quality of human decision-making rather than directly replacing human outputs. This is consistent with several strands of economic theory. Deming (2021) argues that the growing importance of decision-making skills in the labor market means that tools which improve decision quality should have particularly large effects on productivity in knowledge-intensive jobs. Caplin, Deming, Leth-Petersen, and Weidmann (2023) provide evidence that economic decision-making skill predicts income, suggesting that AI which improves that skill could have significant distributional implications.

The most theoretically sophisticated framing comes from Ide and Talamas (2025), who develop a formal model distinguishing between AI as “co-worker” (producing output directly) and AI as “co-pilot” (giving advice and improving human problem-solving). The empirical findings in this paper — the dominance of Asking over Doing, the higher satisfaction ratings for Asking interactions, the concentration of usage in information-gathering and decision-support GWAs — all point toward the co-pilot model as the more accurate description of how ChatGPT is actually being used.

This has important distributional implications. If the primary benefit of ChatGPT is decision support, and if better decision-making is most productive in knowledge-intensive, high-skill jobs, then the workers who benefit most from ChatGPT are likely to be those who were already well-positioned in the labor market. The finding that educated professionals in well-paid occupations are disproportionately likely to use ChatGPT for work is consistent with this concern, though the authors note that the rapid growth of adoption in lower-income countries and among non-professional users suggests the picture is more complex.

At the same time, the Practical Guidance category — which includes tutoring, how-to advice, and health guidance, and which accounts for 29% of all usage — represents a form of decision support that is potentially democratizing. Access to high-quality, customized advice on health, finances, education, and life decisions has historically been a privilege of the wealthy. If ChatGPT can credibly provide some version of that advice at zero marginal cost to a user in rural Bangladesh or suburban Ohio, the welfare implications extend well beyond the professional labor market.

Limitations and Critical Assessment

The paper is honest about its limitations, and a fair reading demands they be noted.

Selection into consumer plans. The data covers only consumer ChatGPT users (Free, Plus, and Pro). It explicitly excludes Business, Enterprise, and Education plans. This matters because enterprise and education users may have very different usage patterns — more structured, more work-focused, and potentially more coding-intensive. The finding that only 4.2% of messages are about computer programming, for instance, might look quite different if enterprise API usage were included.

Opt-out bias. Users who opted out of message sharing for model training are excluded from the classified message datasets. The authors are transparent about this, but to the extent that privacy-conscious users differ systematically from other users (in demographics, usage patterns, or the sensitivity of their queries), the sample may not be fully representative.

Name-based gender inference. The gender analysis relies on first-name-based classification, which cannot capture users who identify as non-binary, who use names that don’t map cleanly to gender, or whose names fall outside the training databases. The authors classify many names as “Unknown” and appropriately caveat their findings, but the methodology has well-known limitations in non-Western name conventions.

Classifier imperfections. The automated classifiers perform well on some dimensions (work/non-work) and less well on others (interaction quality). The coarse Conversation Topic classifier has moderate human agreement (Cohen’s κ = 0.56), and systematic biases are documented — the model under-labels Seeking Information and over-labels Practical Guidance. These imperfections should temper confidence in precise percentages, though the broad ordering of categories (writing, practical guidance, seeking information as the dominant use cases) is unlikely to be an artifact of classifier error.

Causal inference is limited. This is a descriptive paper. It does not — and cannot, from this data — tell us how ChatGPT changes worker productivity, whether it displaces or complements human labor, or what its net effects on wages and employment will be. Those are important questions that require different research designs.

What This Paper Changes

Despite these limitations, “How People Use ChatGPT” is a landmark contribution. It is the first paper to provide a large-scale, empirically grounded, privacy-preserving account of how the world’s largest LLM chatbot is actually being used — not how people say they use it, not what researchers think it might be used for, but what actually shows up in a representative sample of conversations.

The findings should recalibrate several important debates. The narrative that AI is primarily an automation tool — a technology that performs tasks humans used to do, displacing labor in predictable ways — fits poorly with evidence that nearly three-quarters of usage is non-work-related and that the dominant work use case is something more like “help me think through this” than “do this for me.”

The narrative that AI companionship is reshaping human relationships fits poorly with evidence that fewer than 2% of messages are about personal relationships or emotional support. The narrative that AI is primarily a coding tool fits poorly with evidence that coding is only 4.2% of consumer ChatGPT usage.

What the data actually shows is that ChatGPT has become a general-purpose cognitive infrastructure — a tool that hundreds of millions of people turn to for guidance, information, help with writing, and advice on decisions. It is, in the most literal sense of the phrase, a democratized knowledge worker: available at any hour, willing to engage with any question, and increasingly capable of providing useful responses across an enormous range of topics.

Whether that is more like a search engine, a tutor, a writing assistant, or a personal advisor depends on the user and the moment. But the striking finding is that it is all of these things simultaneously, at a scale that no other technology has achieved so quickly. Understanding that fact — empirically, carefully, with appropriate humility about what we still don’t know — is the essential starting point for good policy, good economics, and good thinking about what the AI revolution actually means.

This paper provides that starting point. It is, in that sense, both long overdue and exactly on time.

The full paper is available as NBER Working Paper No. 34255. Related research on AI adoption includes Bick, Blandin, and Deming (2024) on rapid adoption patterns, Handa et al. (2025) on Claude conversation analysis, and Ide and Talamas (2025) on the theoretical framework for AI as co-pilot. Consumer surplus estimates come from Collis and Brynjolfsson (2025), as reported in the Wall Street Journal.

How People Use ChatGPT: A Deep Review of the First Major Study of LLM Behavior at Scale

Curtis Pyke

Related Posts

Find the Right AI Creator – Calculator

Youtube Sponsored Video – ROI Calculator

Gemma 4 Is Here: Google’s Most Powerful Open-Weight Model Family Yet

Leave a Reply Cancel reply

Recent News

Find the Right AI Creator – Calculator

Youtube Sponsored Video – ROI Calculator