The paper, titled “Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations,” presents a comprehensive empirical investigation into how advanced artificial intelligence systems are integrated into everyday economic tasks. Authored by Kunal Handa, Alex Tamkin, Miles McCain, Saffron Huang, Esin Durmus, and others from Anthropic, the study harnesses millions of real-world interactions on Claude.ai—a text-based AI assistant—to map AI usage onto the occupational task framework provided by the U.S. Department of Labor’s O*NET Database. The research is both timely and methodologically innovative, blending privacy-preserving techniques with large-scale task classification to reveal emerging patterns in AI utilization across various sectors of the economy.
Introduction and Motivation
Rapid advancements in artificial intelligence have sparked both excitement and apprehension about the future of work. Over the past few years, theoretical models and predictive studies have speculated about job automation, labor market disruptions, and the evolving roles of human workers. However, limited systematic empirical evidence has been available to pinpoint which economic tasks are currently being supported—or even transformed—by these AI systems. This paper aims to fill that gap by providing a granular assessment of the actual use of AI in tasks performed by workers in different occupations.
Using millions of conversations generated on Claude.ai during December 2024 and January 2025, the authors leverage the Clio system—a privacy-preserving analytical tool—to map these interactions onto nearly 20,000 task statements outlined in the O*NET Database. By designing a hierarchical classification system to navigate the extensive list of tasks, the study not only identifies prevalent usage patterns but also discerns nuanced differences in how AI augments human capabilities versus automating tasks. In doing so, the paper sheds light on which occupational tasks are receiving the most AI support, the role of AI in augmenting or replacing human labor, and the potential economic implications of these changes.
The authors underscore that despite the dynamic nature of AI capabilities, the present usage of AI remains concentrated in specific sectors. For instance, tasks associated with software development, technical writing, and various analytical tasks account for nearly half of the documented usage, while fields that demand extensive physical manipulation or highly specialized expertise appear less affected. Interested readers can explore the dataset and additional details at Hugging Face.

Methodological Approach and Data Acquisition
At the heart of the paper lies a novel empirical framework that combines natural language processing and hierarchical clustering techniques to map individual Claude.ai conversations onto occupational tasks. The authors begin by collecting a large dataset—spanning over one million conversations from both free and pro versions of Claude.ai—ensuring that user privacy is preserved through stringent aggregation thresholds (e.g., requiring at least 15 conversations and five unique user accounts per task to be included in the analysis).
The methodology unfolds in several key steps:
- Screening and Task Mapping:
Each conversation is first screened for its relevance to occupational tasks using a prompt-based approach. Only those conversations deemed occupationally relevant are subsequently processed through a hierarchical classification system that maps them to specific tasks taken from the O*NET database. This mapping is performed through a tree-based search where tasks are embedded into vector spaces using models such as the all-mpnet-base-v2 sentence transformer. A clustering algorithm—specifically k-means clustering—is then employed to group similar tasks into neighborhoods, simplifying the otherwise massive space of 20,000 task descriptions. - Hierarchical Taxonomy Construction:
To overcome the limitations posed by the model’s context window, the researchers devise a three-level task hierarchy. At the top level, the classification yields broad occupational themes, while the middle level features more detailed categories. The base level directly corresponds to the granular O*NET task descriptions. Claude is used iteratively during this process to propose candidate task names, deduplicate across neighborhoods, and finally assign each task to its most appropriate higher-level category. This systematic approach is illustrated in Figure 9 of the paper and ensures that the classification aligns well with the underlying data distribution. - Connection to Occupational Categories:
After assigning tasks to their respective clusters, the analysis connects these tasks to individual occupations by leveraging the associations provided in O*NET. In instances where a single task applies to multiple occupations, the conversation count is split proportionately, which provides an aggregate occupational measure of AI usage. - Additional Facets for Analysis:
Beyond mapping tasks, the study explores several dimensions of AI usage. These include a detailed breakdown of occupational skills exhibited by Claude.ai (e.g., critical thinking, programming, writing), an analysis of AI usage relative to median wage and barriers to entry (using Job Zones from O*NET), and an examination of whether the AI acts primarily in an automation capacity or as an augmentative tool that collaborates with human users. The taxonomy for human-AI collaboration is explicitly defined with categories such as Directive, Feedback Loop, Task Iteration, Learning, and Validation. Complete prompts for each classification step are provided in the appendices, ensuring transparency and reproducibility—for further details, consult Appendix F in the paper. - Robustness and Validation:
The rigor of the methodology is further established through parallel analyses comparing conversation-level and account-level data, as shown in Figure 10, as well as through cluster-based reconstruction analyses (Section G). These additional validation steps indicate high correlations between different aggregation methods and bolster confidence in the paper’s findings.
By employing a methodologically robust framework that blends machine learning techniques with structured economic theory, the study effectively captures a dynamic portrait of AI integration in the workplace.
Key Findings on AI Usage Patterns Across Occupations
A central revelation of the paper is that AI usage is not uniformly distributed across all economic tasks; instead, it exhibits a highly skewed pattern. The analysis reveals that:
- Concentration in Technology and Content Generation:
Tasks related to software development, technical writing, and analytical problem solving dominate the AI usage landscape, with computer and mathematical occupations accounting for roughly 37.2% of all queries. Moreover, content creation tasks in fields such as arts, design, and communication—representing around 10.3% of usage—highlight the AI’s role in fostering creative outputs and marketing-related activities. - Spread Across Diverse Occupations:
While the highest levels of usage are observed in technology-centric roles, the data also indicate that AI is gradually diffusing into other sectors. Approximately 36% of occupations exhibit AI support for at least 25% of their associated tasks, suggesting that AI tools are becoming embedded across a broad spectrum of job functions. Nonetheless, only about 4% of occupations show deep integration (i.e., AI supporting at least 75% of tasks), which implies that for most jobs, the role of AI remains selective rather than comprehensive. - Task-Level vs. Job-Level Impact:
An important nuance emphasized by the authors is that AI is primarily used on a task-by-task basis rather than automating entire job roles. For example, in occupations such as Foreign Language and Literature Teachers or Marketing Managers, AI assistance is visible in tasks like course planning or market research analysis, respectively, but it is not uniformly applied to all components of these jobs. - Detailed Occupational Breakdown:
Figure 3 of the paper compares the distribution of occupational representation in Claude.ai usage data with the actual U.S. workforce composition. It underscores that while occupations requiring extensive physical labor (e.g., transportation or healthcare support) are less represented, those in technical and creative fields see substantial AI engagement. This discrepancy underscores the human complementarity at play: AI systems are currently better at tasks that involve cognitive processing rather than physical manipulations.
These patterns reinforce the view that present-day AI is best understood not as a wholesale replacement of human roles but as a powerful augmentation tool—augmenting human capabilities in fields that demand computational, analytical, and creative thinking.
Linking these findings to broader economic trends, the paper also challenges some earlier predictions. For instance, while some studies forecasted high levels of AI exposure in occupations at the extreme high end of the wage spectrum, the empirical data show that peak usage occurs in the mid-to-high wage range, with lower representation in both the very highest and lowest wage brackets. Such insights have significant implications for policymakers, who must now consider more nuanced approaches to managing labor market effects in the era of generative AI.
Analysis of Occupational Skills and Work Characteristics
Another important contribution of the paper is its analysis of the specific occupational skills that appear in AI-assisted interactions. Using the O*NET Database’s list of 35 occupational skills, the authors map out which abilities are most frequently exhibited by Claude.ai within practical contexts.
The analysis identifies that cognitive skills—critical thinking, reading comprehension, programming, and writing—are prevalent in the majority of conversations. In contrast, skills that necessitate physical interaction, such as equipment maintenance and installation, register with markedly lower frequency. This distribution is unsurprising given that Claude.ai primarily processes text and thus favors abstract, computational problem solving over manual, physical tasks. The paper also notes that certain skills, like active listening, may appear to be common not because they are actively sought by users but rather because they are intrinsic to the conversational style of the model.
Furthermore, the study explores how the usage of AI relates to occupational wages and barriers to entry. By mapping AI usage against median salary figures sourced from the U.S. Census Bureau and considering the Job Zones defined by O*NET, the authors demonstrate that AI usage peaks in occupations where a bachelor’s degree or equivalent preparation is required (Job Zone 4). In contrast, occupations at both the minimal preparation end (Job Zone 1) and those requiring extensive advanced degrees (Job Zone 5) show lower levels of AI engagement. Table 2 in the paper quantitatively articulates these differences using a representation ratio that compares the observed AI usage percentages to occupational baseline distributions.
This nuanced breakdown illuminates how AI’s current capabilities align best with tasks that require structured, analytical input rather than high-touch human interaction or advanced expert knowledge. It suggests that while AI tools are highly effective in supporting certain cognitive tasks, their present limitations restrict their applicability in domains heavily reliant on physical interaction or intricate professional training.
For readers interested in further details, the O*NET OnLine database can be accessed at ONET Online, which provides additional context and data on occupational tasks and required skills.
Automation Versus Augmentation: The Nature of AI Assistance
A particularly compelling aspect of the research is its detailed exploration of how AI collaboration with users can be classified into two broad categories: automation and augmentation. The paper distinguishes between scenarios where AI directly executes tasks with minimal human input (automation) and those where AI interacts iteratively with the user to refine outputs and enhance decision-making (augmentation).
To systematically classify interactions, the authors develop a taxonomy that bins conversations into five collaboration patterns:
• Directive (complete task delegation),
• Feedback Loop (iterative completion using environmental feedback),
• Task Iteration (collaborative refinement),
• Learning (knowledge acquisition), and
• Validation (work verification).
An empirical breakdown of these modes reveals that approximately 43% of conversations exhibit automative behaviors, whereas roughly 57% are characterized by augmentative behaviors. This nearly even split—with a slight edge in favor of augmentation—indicates that users predominantly prefer a collaborative dynamic over simple task outsourcing. For instance, programmers debugging code or professionals seeking clarifications often engage in feedback loops and task iteration dialogues, emphasizing the role of human oversight and the iterative refinement of outputs.
The findings underscore that even tasks which might traditionally be automated are being transformed into hybrid workflows, wherein the strengths of AI (speed and consistency) are coupled with human intuition and domain expertise. This insight is critical for future discussions about productivity and the impact of AI on job quality, as it reinforces the notion that AI’s role is complementary and context-dependent rather than unilaterally substitutive.
Comparative Analysis of Claude Models: Opus Versus Sonnet
The paper also examines differing usage patterns between two variants of the Claude model: Claude 3 Opus and Claude 3.5 Sonnet (new). Released at distinct times in 2024, these models exhibit specialized strengths that align with different occupational tasks. According to the research, Claude 3.5 Sonnet is more frequently associated with tasks in coding and technical domains such as software development and debugging, whereas Claude 3 Opus tends to be preferred for creative, educational, and content-generation tasks.
This dichotomy is visually supported by Figure 8 in the paper, which underlines the specialization in usage patterns between the models. The divergence in usage resonates with external evaluations and widespread observations from users, suggesting that even within the same overarching AI platform, variations in model design can significantly influence task-specific performance. The comparative analysis not only validates the importance of continued model-specific research but also hints at the future potential for customized AI agents tailored to particular professional domains.
For readers seeking further technical details on model capabilities, additional information can be found on Anthropic’s research page at Claude Character.
Robustness Checks and Cluster-Based Reconstruction
Recognizing the complexity of mapping millions of discrete interactions to meaningful occupational tasks, the authors conduct several robustness checks and alternative analyses. In addition to the primary conversation-level analysis, the study performs cluster-based reconstruction analyses that aggregate related conversations into clusters. This additional methodological layer serves as a reliability test—comparing the occupational distributions obtained from direct assignment with those derived from clustering methods.
The analysis shows strong correlation metrics (with Pearson correlations exceeding 0.95 in some cases and only minor discrepancies at higher aggregation levels) when comparing direct assignment to cluster-based approaches. Figures 14 through 19 in the paper illustrate these relationships, demonstrating that while there is some intrinsic noise at the granular task level (due to overlapping task descriptions), the aggregated occupational trends remain remarkably stable. The authors therefore argue that their framework is robust and well-suited for tracking AI integration across diverse occupational tasks despite the inherent complexity of the data.
Furthermore, the authors discuss additional human validation exercises. A sample of 150 examples was manually evaluated for both the task hierarchy classifications and the automation versus augmentation labels, yielding validation accuracies exceeding 86% for task assignments and over 90% for collaboration pattern classifications. These validation efforts lend considerable credibility to the overall framework and ensure that the observed trends are not artifacts of the classification process.
Discussion, Policy Implications, and Future Directions
The paper’s discussion section synthesizes the empirical findings and situates them within the broader context of labor market dynamics and technology policy. The authors argue that understanding how AI tools are used at the micro-level—at the granularity of individual tasks rather than whole jobs—is essential for both predicting future economic trends and designing appropriate policy responses.
Several important points emerge in this context:
- Gradual Transformation Over Displacement:
The findings suggest that while AI is touching many jobs, the transformation is currently selective. Rather than signaling the wholesale automation of entire occupations, the data show that AI is complementing human capabilities in specific tasks. This nuanced understanding challenges some of the more alarmist narratives surrounding AI and job displacement. - Implications for Wage Inequality and Job Quality:
As the analysis of wage data indicates, AI usage peaks in mid-to-high wage sectors and is less pronounced at both extremes of the wage distribution. This pattern may reflect not only technical feasibility but also factors such as implementation costs, regulatory constraints, and organizational readiness. Policymakers must therefore consider these subtleties when designing interventions aimed at mitigating inequality and improving job quality. - Policy Intervention and Worker Training:
Given that a significant portion of AI interactions reflects augmentative behavior, policies should emphasize the development of collaborative AI interfaces and worker training programs that help employees leverage AI tools effectively. The study provides an empirical basis for targeted interventions that can both support productivity improvements and safeguard against potential disruption. - Dynamic Tracking and Longitudinal Analysis:
The authors call for the establishment of dynamic, real-time frameworks for tracking AI integration as both technologies and usage evolve. Such continuous monitoring—as exemplified in this research—will be critical for providing early indicators of economic transitions and enabling proactive responses. The framework presented here lays the groundwork for future studies that could incorporate additional modalities (e.g., image and video interactions) as AI systems become increasingly multimodal.
The paper is careful to acknowledge several limitations, including the representativeness of the sampled data, potential noise in the classification schema, and the inability to fully capture the downstream uses of AI outputs (such as whether a piece of code is actually deployed or a written draft is refined further). The authors stress that while their methodological approach is powerful, it is only a snapshot of current usage patterns on a single platform. Future research will need to integrate data from a variety of AI systems and deployment contexts to develop a more comprehensive picture of AI’s economic impact.
Conclusion
In its conclusion, the paper reiterates that understanding the evolving role of AI in work requires an empirical approach grounded in real-world usage data rather than solely in theoretical forecasting. The findings reveal that advanced AI systems, as exemplified by Claude.ai, are already playing a significant role in supporting economic tasks—especially in domains such as software development, technical writing, and analytical research. While the AI integration is currently selective (with deep usage observed in only a few occupations), the trends point toward increasingly sophisticated applications and a dynamic interplay between automation and augmentation.
This study not only contributes important empirical insights but also establishes a methodological framework that both researchers and policymakers can use to monitor future changes. With links to valuable resources such as the O*NET Online database and additional material on model characteristics available through Anthropic’s research portal, the work stands as a landmark contribution to our understanding of AI’s role in shaping the future of work.
As AI systems continue to evolve—expanding from text-based interactions to incorporate video, speech, and even robotics—the implications for labor markets are expected to grow more profound and complex. The ability to track these shifts at a granular, task-level resolution will be crucial for designing effective policy responses. Overall, the paper makes a persuasive case that while many economic sectors are already benefiting from AI augmentation, strategic and evidence-based policymaking is essential to ensure that the transformative potential of these technologies is harnessed in ways that are both economically beneficial and socially equitable.
In closing, the study underscores the need to view AI not as a singular disruptive force, but as one component of a broader evolution in how work is structured, performed, and conceptualized. By systematically mapping the intersection of advanced AI usage and occupational tasks, the paper provides a detailed framework for understanding current trends and anticipating future developments in the digital economy.
Final Reflections
This paper is characterized by a high degree of methodological sophistication and a rich empirical grounding that challenges previous predictions and refines our understanding of AI’s economic impact. Its use of the Clio system to analyze millions of real-world conversations represents a significant step forward in the empirical study of human-AI interaction, and its detailed dissection of occupational tasks provides valuable insights for both academia and public policy.
With a blend of complex statistical analyses, hierarchical task mapping, and rigorous validation efforts, the study exemplifies how cutting-edge techniques can be applied to issues of immense practical importance. The nuanced distinction between automation and augmentation highlights an important lesson: the integration of AI into work is not a story of simple replacement but one of transformation, with human decision-making and oversight remaining vital to ensuring that AI serves as an effective and ethical partner in the workplace.
Given the pace of AI development, the framework presented in this paper will undoubtedly be a touchstone for future research as the contours of work continue to evolve. Policymakers, economists, and technologists alike would do well to heed its insights, ensuring that the benefits of AI are widely distributed while mitigating the risks associated with rapid technological change.