Anthropic Measures AI Agent Autonomy in the Wild
Anthropic analyzed millions of human-agent interactions across Claude Code and their public API to answer a question nobody had empirical data on: how much autonomy do people actually grant AI agents? The findings suggest we are dramatically underutilizing what these systems can do.
The Autonomy Gap
The most striking finding is what Anthropic calls a deployment overhang. METR estimates Claude Opus 4.5 can complete tasks that would take a human nearly 5 hours. But in practice, the 99.9th percentile turn duration in Claude Code is just 42 minutes, and the median is 45 seconds.
That gap is narrowing. Between October 2025 and January 2026, the longest Claude Code sessions nearly doubled from under 25 minutes to over 45 minutes. This increase was smooth across model releases, suggesting it is driven by user trust, not just model capability.
The Trust Paradox
Experienced users auto-approve more and interrupt more. New users approve about 20% of sessions automatically. By 750 sessions, that jumps to over 40%. But interruption rates also climb from 5% to 9%.
This reflects a shift in oversight strategy. New users approve each action beforehand. Veterans let the agent run freely, then course-correct when something goes wrong.
Effective oversight does not require approving every action, but being in a position to intervene when it matters.
Anthropic Research
Claude Knows When It Does Not Know
Claude Code pauses for clarification more often than humans interrupt it. On complex tasks, Claude stops to ask questions more than twice as often as on simple tasks. Top reasons: presenting choices (35%), gathering diagnostics (21%), clarifying vague requests (13%), requesting credentials (12%), seeking approval (11%).
What Agents Actually Do
Software engineering accounts for nearly 50% of agentic API activity. But emerging usage appears in healthcare, finance, and cybersecurity. 80% of tool calls have at least one safeguard, 73% have a human in the loop, and only 0.8% of actions are irreversible.
Why This Matters
This is the first large-scale empirical study of agent autonomy in production. Models are ready for more autonomy than they get. Users naturally develop sophisticated oversight strategies. And models are learning to recognize their own uncertainty, a crucial safety property.
As agents expand beyond software engineering, Anthropic recommends post-deployment monitoring, training models to recognize uncertainty, and designing products with real-time agent visibility.
Read the full research at anthropic.com.
This article was ultrathought.