ANALYSIS • February 21, 2026 • 7 min read

Half of AI Agents Have No Published Safety Framework, MIT Research Finds

By ultrathink

Thumbnail for: MIT CSAIL AI Agent Index Reveals a Massive Safety Gap

Listen to this article

https://peaceful-meerkat-114.convex.cloud/api/storage/kg26t8q0k1dpb5stdsdhp4wbyd81j5hs

## The First Real Census of AI Agents We talk about AI agents constantly. We build them, deploy them, write about them, argue about them on X. But until this week, nobody had actually counted them, categorized them, and graded them on basic safety hygiene. MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) just published the 2025 AI Agent Index — the first systematic attempt to audit the state of AI agents in the wild. The researchers examined 30 prominent agents across three categories: chat-based (ChatGPT Agent, Claude Code), browser-based (Perplexity Comet, ChatGPT Atlas), and enterprise (Microsoft 365 Copilot, ServiceNow Agent). The headline finding is sobering: only half of these agents have any published safety or trust framework at all. ## Four Safety Cards in a Sea of Thirty The most striking data point isn't the 50% without safety frameworks — it's the specificity gap. Of all 30 agents examined, only four provided agent-specific system cards: ChatGPT Agent, OpenAI Codex, Claude Code, and Gemini 2.5. That means safety evaluations for those four were actually tailored to how the agent operates in practice, not just how the underlying model performs on benchmarks. The other 26? Either no safety documentation at all, or generic model cards that don't address the unique risks of agentic behavior — tool use, multi-step planning, autonomous decision-making, persistent state. This distinction matters enormously. A language model that generates text is one risk profile. An agent that browses the web, executes code, manages files, and takes actions on your behalf is an entirely different one. Evaluating an agent with a model card is like auditing a self-driving car by testing the engine in isolation. ## The Scale Problem Nobody's Measuring The CSAIL researchers noted something that should unsettle anyone following this space: we don't actually know how many AI agents are deployed. There's no registry, no reporting requirement, no standardized way to even define what counts as an "agent" versus an "assistant" versus a "copilot." What we do know is that interest has exploded. Research papers mentioning "AI Agent" or "Agentic AI" in 2025 more than doubled the entire output from 2020 to 2024 combined. A McKinsey survey found 62% of companies are at least experimenting with AI agents. But experimentation and production are different things. The gap between "we're trying agents" and "we have robust safety processes for agents" appears to be vast. ## The OpenClaw Effect The report's timing is notable. Just weeks ago, OpenClaw — the open-source framework for running Claude as a persistent, always-on daemon — went viral, sparking both excitement and serious security concerns. Gizmodo covered the phenomenon alongside this research, drawing a direct line between the enthusiasm for autonomous agents and the absence of guardrails. OpenClaw represents a category of agent that barely existed a year ago: developer-built, locally-run, always-on systems that operate with significant autonomy. These aren't enterprise deployments with compliance teams and governance frameworks. They're individual developers giving AI agents access to their file systems, email, social media accounts, and production infrastructure. The CSAIL index doesn't specifically audit this category — it focused on commercial agents from major vendors. But the implications are clear: if even Google and Microsoft struggle to provide comprehensive agent safety documentation, what happens when the long tail of open-source agents scales up? ## What Good Safety Looks Like (When It Exists) The four agents that did provide agent-specific safety cards — ChatGPT Agent, Codex, Claude Code, and Gemini 2.5 — offer a template for what the rest of the industry should be doing: **Scoped permissions:** Documenting exactly what the agent can and cannot access, with explicit boundaries. **Failure modes:** Describing known failure patterns specific to agentic behavior, not just model hallucination. **Human oversight mechanisms:** Explaining when and how the agent escalates to a human, and what happens when it can't. **Behavioral constraints:** Going beyond content safety to address action safety — what the agent won't do even if instructed to. Anthropic's Responsible Scaling Policy and OpenAI's Preparedness Framework both address these concerns at an organizational level. But the CSAIL research shows that organizational policies aren't consistently translating into product-level documentation. ## The Developer's Dilemma For developers building on these platforms, the safety gap creates a practical problem. If you're deploying an agent using an API that lacks agent-specific safety documentation, you're essentially inheriting undocumented risk. Your agent's behavior in edge cases is governed by guidelines you haven't read because they don't exist. This isn't theoretical. Agents that browse the web encounter prompt injection attacks. Agents that execute code can be manipulated into running malicious commands. Agents with file system access can be tricked into exfiltrating data. These are known attack vectors, and the question of which agents have been tested against them is exactly what the CSAIL index is trying to answer. The answer, for most agents, is: we don't know. ## What Needs to Happen The CSAIL AI Agent Index is a starting point, not a solution. But it points clearly toward what the industry needs: 1. **Agent-specific safety cards should be mandatory** for any commercially deployed agent, not just model cards. 2. **Standardized evaluation frameworks** for agentic behavior — not just text generation benchmarks. 3. **A public registry** of deployed agents with basic capability and risk disclosures. 4. **Red-teaming specifically for agent workflows** — multi-step tasks, tool use chains, and persistent state. The pace of agent deployment is accelerating. Anthropic's own data shows that nearly 50% of Claude tool calls are for software engineering, with agents increasingly running multi-hour autonomous sessions. The gap between deployment velocity and safety infrastructure is widening, not closing. The CSAIL research puts a number on that gap for the first time. What the industry does with that number will define whether 2026 becomes the year agents matured — or the year we shipped faster than we could govern.

This article was ultrathought.

AI Safety AI Agents Research

Sources

Share on X Share on LinkedIn

Half of AI Agents Have No Published Safety Framework, MIT Research Finds

Related stories