AI Agent Security: A Framework for Accountability and Control

This weekend, I came across a LinkedIn article by Priscilla Russo about OpenAI agents and digital wallets that touched on something I’ve been thinking about – liability and AI agents and how they change system designs. As autonomous AI systems become more prevalent, we face a critical challenge: how do we secure systems that actively optimize for success in ways that can break traditional security models? The article’s discussion of Knight Capital’s $440M trading glitch perfectly illustrates what’s at stake. When automated systems make catastrophic decisions, there’s no undo button – and with AI agents, the potential for unintended consequences scales dramatically with their capability to find novel paths to their objectives.

What we’re seeing isn’t just new—it’s a fundamental shift in how organizations approach security. Traditional software might accidentally misuse resources or escalate privileges, but AI agents actively seek out new ways to achieve their goals, often in ways developers never anticipated. This isn’t just about preventing external attacks; it’s about containing AI itself—ensuring it can’t accumulate unintended capabilities, bypass safeguards, or operate beyond its intended scope. Without containment, AI-driven optimization doesn’t just break security models—it reshapes them in ways that make traditional defenses obsolete.

“First, in 2024, O1 broke out of its container by exploiting a vuln. Then, in 2025, it hacked a chess game to win. Relying on AI alignment for security is like abstinence-only sex ed—you think it’s working, right up until it isn’t,” said the former 19-year-old father.

The Accountability Gap

Most security discussions around AI focus on protecting models from adversarial attacks or preventing prompt injection. These are important challenges, but they don’t get to the core problem of accountability. As Russo suggests, AI developers are inevitably going to be held responsible for the actions of their agents, just as financial firms, car manufacturers, and payment processors have been held accountable for unintended consequences in their respective industries.

The parallel to Knight Capital is particularly telling. When their software malfunction led to catastrophic trades, there was no ambiguity about liability. That same principle will apply to AI-driven decision-making – whether in finance, healthcare, or legal automation. If an AI agent executes an action, who bears responsibility? The user? The AI developer? The organization that allowed the AI to interact with its systems? These aren’t hypothetical questions anymore – regulators, courts, and companies need clear answers sooner rather than later.

Building Secure AI Architecture

Fail to plan, and you plan to fail. When legal liability is assigned, the difference between a company that anticipated risks, built mitigations, implemented controls, and ensured auditability and one that did not will likely be significant. Organizations that ignore these challenges will find themselves scrambling after a crisis, while those that proactively integrate identity controls, permissioning models, and AI-specific security frameworks will be in a far better position to defend their decisions.

While security vulnerabilities are a major concern, they are just one part of a broader set of AI risks. AI systems can introduce alignment challenges, emergent behaviors, and deployment risks that reshape system design. But at the core of these challenges is the need for robust identity models, dynamic security controls, and real-time monitoring to prevent AI from optimizing in ways that bypass traditional safeguards.

Containment and isolation are just as critical as resilience. It’s one thing to make an AI model more robust – it’s another to ensure that if it misbehaves, it doesn’t take down everything around it. A properly designed system should ensure that an AI agent can’t escalate its access, operate outside of predefined scopes, or create secondary effects that developers never intended. AI isn’t just another software component – it’s an active participant in decision-making processes, and that means limiting what it can influence, what it can modify, and how far its reach extends.

I’m seeing organizations take radically different approaches to this challenge. As Russo points out in her analysis, some organizations like Uber and Instacart are partnering directly with AI providers, integrating AI-driven interactions into their platforms. Others are taking a defensive stance, implementing stricter authentication and liveness tests to block AI agents outright. The most forward-thinking organizations are charting a middle path: treating AI agents as distinct entities with their own credentials and explicitly managed access. They recognize that pretending AI agents don’t exist or trying to force them into traditional security models is a recipe for disaster.

Identity and Authentication for AI Agents

One of the most immediate problems I’m grappling with is how AI agents authenticate and operate in online environments. Most AI agents today rely on borrowed user credentials, screen scraping, and brittle authentication models that were never meant to support autonomous systems. Worse, when organizations try to solve this through traditional secret sharing or credential delegation, they end up spraying secrets across their infrastructure – creating exactly the kind of standing permissions and expanded attack surface we need to avoid. This might work in the short term, but it’s completely unsustainable.

The future needs to look more like SPIFFE for AI agents – where each agent has its own verifiable identity, scoped permissions, and limited access that can be revoked or monitored. But identity alone isn’t enough. Having spent years building secure systems, I’ve learned that identity must be coupled with attenuated permissions, just-in-time authorization, and zero-standing privileges. The challenge is enabling delegation without compromising containment – we need AI agents to be able to delegate specific, limited capabilities to other agents without sharing their full credentials or creating long-lived access tokens that could be compromised.

Systems like Biscuits and Macaroons show us how this could work: they allow for fine-grained scoping and automatic expiration of permissions in a way that aligns perfectly with how AI agents operate. Instead of sharing secrets, agents can create capability tokens that are cryptographically bound to specific actions, contexts, and time windows. This would mean an agent can delegate exactly what’s needed for a specific task without expanding the blast radius if something goes wrong.

Agent Interactions and Chain of Responsibility

What keeps me up at night isn’t just individual AI agents – it’s the interaction between them. When a single AI agent calls another to complete a task, and that agent calls yet another, you end up with a chain of decision-making where no one knows who (or what) actually made the call. Without full pipeline auditing and attenuated permissions, this becomes a black-box decision-making system with no clear accountability or verifiablity. That’s a major liability problem – one that organizations will have to solve before AI-driven processes become deeply embedded in financial services, healthcare, and other regulated industries.

This is particularly critical as AI systems begin to interact with each other autonomously. Each step in an AI agent’s decision-making chain must be traced and logged, with clear accountability at each transition point. We’re not just building technical systems—we’re building forensic evidence chains that will need to stand up in court.

Runtime Security and Adaptive Controls

Traditional role-based access control models fundamentally break down with AI systems because they assume permissions can be neatly assigned based on predefined roles. But AI doesn’t work that way. Through reinforcement learning, AI agents optimize for success rather than security, finding novel ways to achieve their goals – sometimes exploiting system flaws in ways developers never anticipated. We have already seen cases where AI models learned to game reward systems in completely unexpected ways.

This requires a fundamental shift in our security architecture. We need adaptive access controls that respond to behavior patterns, runtime security monitoring for unexpected decisions, and real-time intervention capabilities. Most importantly, we need continuous behavioral analysis and anomaly detection that can identify when an AI system is making decisions that fall outside its intended patterns. The monitoring systems themselves must evolve as AI agents find new ways to achieve their objectives.

Compliance by Design

Drawing from my years building CAs, I’ve learned that continual compliance can’t just be a procedural afterthought – it has to be designed into the system itself. The most effective compliance models don’t just meet regulatory requirements at deployment; they generate the artifacts needed to prove compliance as natural byproducts of how they function.

The ephemeral nature of AI agents actually presents an opportunity here. Their transient access patterns align perfectly with modern encryption strategies – access should be temporary, data should always be encrypted, and only authorized agents should be able to decrypt specific information for specific tasks. AI’s ephemeral nature actually lends itself well to modern encryption strategies – access should be transient, data should be encrypted at rest and in motion, and only the AI agent authorized for a specific action should be able to decrypt it.

The Path Forward

If we don’t rethink these systems now, we’ll end up in a situation where AI-driven decision-making operates in a gray area where no one is quite sure who’s responsible for what. And if history tells us anything, regulators, courts, and companies will eventually demand a clear chain of responsibility – likely after a catastrophic incident forces the issue.

The solution isn’t just about securing AI – it’s about building an ecosystem where AI roles are well-defined and constrained, where actions are traceable and attributable, and where liability is clear and manageable. Security controls must be adaptive and dynamic, while compliance remains continuous and verifiable.

Organizations that ignore these challenges will find themselves scrambling after a crisis. Those that proactively integrate identity controls, permissioning models, and AI-specific security frameworks will be far better positioned to defend their decisions and maintain control over their AI systems. The future of AI security lies not in building impenetrable walls, but in creating transparent, accountable systems that can adapt to the unique challenges posed by autonomous agents.

This post lays out the challenges, but securing AI systems requires a structured, scalable approach. In Containing the Optimizer: A Practical Framework for Securing AI Agent Systems I outline a five-pillar framework that integrates containment, identity, adaptive monitoring, and real-time compliance to mitigate these risks.

UNMITIGATED RISK

un.mit.i.gat.ed: Adj. Not diminished or moderated in intensity or severity; unrelieved. risk: N. The possibiity of suffering harm or loss; danger.