The CA/Browser Forum is having its first serious conversation about whether publicly trusted client authentication certificates deserve their own Baseline Requirements. Nick France kicked off the discussion on the public list last week, asking for concrete use cases, and the responses so far have been a useful window into how the industry thinks about this problem. Or rather, how it doesn’t.
The timing isn’t accidental. Chrome Root Program Policy v1.6 is forcing a structural realignment of the WebPKI, and client authentication is caught in the middle. All PKI hierarchies in the Chrome Root Store must now be dedicated solely to TLS server authentication. Chrome stopped accepting new intermediate CA applications with mixed EKUs in June 2025, and by June 15, 2026, Chrome will distrust any newly issued leaf certificate containing clientAuth EKU from a Chrome Root Store hierarchy. Multi-purpose roots get phased out entirely. Mozilla, Apple, and Microsoft are all aligning with this direction. Every major public CA has published a sunset schedule. Sectigo stopped including clientAuth by default in September 2025, DigiCert followed in October, and Let’s Encrypt is phasing it out through ACME profiles. By mid-2026, you will not be able to get a publicly trusted TLS certificate that also works for client authentication.
This is the right call. The historical practice of stuffing both serverAuth and clientAuth into the same certificate, from the same hierarchy, created exactly the kind of entanglement that makes the WebPKI brittle. The SHA-1 migration is the canonical example. Payment terminals that relied on client auth from the same roots as server certs couldn’t upgrade, holding back the entire transition for years. Today, Cisco Expressway is the poster child for the same problem, using a single certificate for both server and client auth in SIP mTLS connections and scrambling to decouple them before the deadline. Dedicated hierarchies for dedicated purposes. It’s a principle the WebPKI should have enforced from the start.
What to do about it
What’s emerging is a clearer, more honest WebPKI, but one with a gap that nobody is cleanly addressing. If you’re currently relying on publicly trusted certificates for client authentication, the path forward depends on your use case.
If the client auth is internal to your organization, VPN access, Wi-Fi onboarding, device authentication, mTLS between your own services, you should be moving to private PKI. This was always the right answer for internal use cases, and modern private CA solutions have made it far more practical than it used to be. You get full control over certificate profiles, lifetimes, and revocation without being subject to external root program policy changes. The blast radius of a private CA is contained to your organization, which is exactly what you want for internal trust.
If the client auth is between your organization and a small number of known partners, like B2B API integrations or supply chain connections, private PKI still works well. You exchange trust anchors with your partners and configure your systems to trust their specific CA. This is how most of these integrations should have been built in the first place. The “convenience” of using publicly trusted certs for this was always a false economy, because you were accidentally opening your trust boundary to every entity that could buy a cert from the same CA.
But if the client auth needs to work across organizational boundaries at scale, meaning you can’t reasonably pre-configure trust anchors for every potential counterparty, this is where it gets interesting and where the current alternatives fall short. Private PKI doesn’t solve this. You need some form of shared trust anchor, which is what public PKI provides for server authentication today. The question is whether a similar model can work for client authentication with properly scoped identifiers and validation methods.
The human identity case is the relatively easy part
On the CA/B Forum list, Sebastian Nielsen argued that public CAs shouldn’t issue client auth certificates at all, pointing to the name collision problem. He makes a fair point, but the conclusion is too broad. I’m Ryan Hurst the security practitioner, and there’s also Ryan Hurst the actor (Remember the Titans, Sons of Anarchy). A public CA asserting “Ryan Hurst” in a DN doesn’t help a relying party figure out which one of us is authenticating. The DN is a vestige of the X.500 global directory that never materialized. There is no global directory. Even local directories that correspond to DN structures don’t exist in any meaningful density. Identity in the WebPKI belongs in the SAN, where we have identifiers that are both globally unique and reachable.
S/MIME already handles the human case correctly. The rfc822Name in the SAN is at least unique at the time of issuance. More importantly, it’s reachable. You can send a challenge to an email address and get a response. You can’t send a challenge to a social security number. You can’t send a challenge to “Ryan Hurst, US.” The broad intent of the WebPKI is to make things reachable in an authenticated way. DNS names and email addresses fit that model. DNs do not.
Even with email, there’s a temporal problem. Addresses get reassigned, domains lapse, providers recycle accounts, and throwaway addresses exist by design. CAs can’t monitor for reassignment, so these are inherently short-lived assertions. The certificate lifetime is the outer bound of your trust in that binding. Broader questions around PII and auditability are really about how Key Transparency can be bolted into the ecosystem. I wrote about that previously.
There is valuable work happening in this space. Ballot SMC015v2 enabling mDLs and EU digital identity wallets for S/MIME identity proofing shows this evolving in a meaningful direction. Client authentication and signed email under S/MIME belong together. Apple has argued that emailProtection EKU should mean mandatory S/MIME BR compliance, closing the loophole where CAs omit email addresses from emailProtection certificates to avoid the BRs. I think that’s the right direction. One nuance worth calling out though. S/MIME bundles signing, authentication, and encryption, and I think that’s right for the first two but not the third. Signing and authentication are real-time assertions that work well as short-lived credentials. Encryption is different. The key is bound to an identifier that may not be durable, and without frequent rotation you risk bygone-SSL style attacks where a new holder of an email address could access messages intended for the previous one. The encryption case deserves its own careful treatment around key lifecycle and rotation.
Browsers are actively looking to remove client auth from TLS certificates, and I don’t disagree given how poorly specified and unconstrained it has been. That signals whatever comes next needs to be much more tightly defined. The human client auth case is covered by S/MIME, browser-based client auth is on its way out for good reason, and a new working group doesn’t need to revisit the human case.
The machine identity gap
Where it gets interesting is cross-organizational service-to-service authentication on the public internet. Today this is mostly handled with API keys, OAuth client credentials, or IP allowlisting, all with well-known limitations. mTLS with publicly trusted client certs could fill a real gap, but only if the identity model is built correctly.
Many current uses of mTLS with publicly trusted client certs are misplaced. Organizations are often assuming a level of assurance they don’t actually get when they accidentally cross security domains by relying on the public WebPKI for what is fundamentally a private trust relationship. A publicly trusted cert for payments.example.com tells you that the entity controlling that domain authenticated, nothing more. It does not mean they are your trusted partner, your approved vendor, or anyone you intended to grant access to. Public trust gives you authenticated identity, not authorization. Organizations that conflate the two will accidentally open up access based solely on someone having obtained a client cert. The examples collected on the list so far, Cisco Expressway and EPP, are mostly legacy compatibility problems being fixed. A working group built on those foundations would produce weak Baseline Requirements.
The better foundation is the emerging need for authenticated service-to-service communication across organizational boundaries. Consider SMTP. Mail servers already authenticate to each other over the public internet using TLS, and MTA-STS is pushing that toward authenticated connections. The logical next step is mutual authentication, where the receiving mail server can cryptographically verify the sending server’s identity, not just the other direction. SMTP and mTLS go together like peanut butter and jelly, but there’s no clean way to do it with publicly trusted client certs today. Or consider vendor supply chains. If a manufacturer’s procurement system needs to query a supplier’s inventory API, or a logistics provider needs to authenticate to a retailer’s fulfillment service, the options today are API keys, OAuth flows, or standing up an industry-specific trust framework just so machines can talk to each other. mTLS with publicly trusted client certs would let these systems authenticate directly, without building bespoke trust infrastructure for every partnership.
And this need is accelerating beyond any single industry. As AI agents increasingly act as user agents on the open internet, calling APIs, negotiating with services, and transacting across organizational boundaries on behalf of users, mutual authentication between machines that have no pre-established trust relationship is becoming a practical necessity, not a theoretical concern. You can’t pre-configure trust anchors for every service an agent might need to interact with any more than you can pre-configure them for every website a browser might visit. I wrote about this dynamic previously, and the trajectory is clear. The machine-to-machine authentication problem on the open internet is starting to look a lot like the server authentication problem that the WebPKI was built to solve, just in both directions.
For machines, the name collision problem largely disappears. DNS names are globally unique by design. A client cert with a dNSName SAN of payments-api.example.com or registry-client.registrar.example.net doesn’t have an ambiguity problem. The relying party knows exactly what organization controls that name. Nick’s original question on the list asked about what parts of the DN the relying party verifies. I’d argue that’s almost the wrong framing. There is no global X.500 directory. The question should be, what SAN types are needed, and what validation methods can we define for them?
For straightforward service identification, dNSName works today with no new validation methods needed.
payments-api.example.com
erp-connector.supplier.example.net
registry-client.registrar.example.com
For more expressive service identification, uniformResourceIdentifier SANs encode not just the organization but the specific service.
https://example.com/services/payments
urn:example:service:billing:v2
This URI-based approach isn’t speculative. SPIFFE already uses URI SANs (spiffe://cluster.local/ns/production/sa/checkout) to represent service identities in Kubernetes mTLS contexts. The pattern is proven and widely deployed within private PKI. Extending it to public trust for cross-organizational federation is a natural evolution of an approach the industry has already validated. URI SANs can be validated through .well-known challenge methods (like ACME HTTP-01 scoped to a URI path) and ALPN-based methods, extending battle-tested ACME-era infrastructure rather than building from X.500-era assumptions.
What the industry is doing instead
Almost all the CA and vendor messaging right now says “move to private PKI.” That’s the right answer for internal use cases, but it doesn’t address cross-organizational trust. The most interesting alternative emerging is the DigiCert X9 PKI, launched in partnership with ASC X9, the financial standards body. X9 PKI is a completely independent trust framework, governed by X9’s policy committee rather than the CA/Browser Forum or browser root programs. It supports both clientAuth and serverAuth EKUs, uses a common root of trust for cross-organizational interoperability, and is WebTrust audited. It’s specifically designed for the financial sector’s mTLS needs, though they’re expanding to other sectors.
X9 PKI is essentially a “public PKI that isn’t the WebPKI” for service-to-service auth. It validates the premise that there’s a real need for cross-organizational client authentication with a shared trust anchor. But it’s sector-specific and governed outside the CA/Browser Forum, which means it doesn’t solve the general case. The EU’s eIDAS QWAC framework is another sector-specific approach. These are workarounds for the absence of a general-purpose, properly scoped public client auth certificate type.
If this moves forward
I’m not advocating for or against a working group at the CA/Browser Forum. But if the Forum does decide to take this on, the scope needs to be narrow IMHO. Machine and service client auth only, with identity in the SAN using dNSName and uniformResourceIdentifier. DN fields should not be relied upon for authentication decisions. Validation methods should build on existing domain control mechanisms. Human client auth stays in S/MIME where it belongs. The BRs should address the authentication versus authorization distinction explicitly, so relying parties understand that a publicly trusted client cert tells them who is connecting, not whether that entity should be granted access. This is already how server certificates work, and client auth should follow the same model. And the issuing CAs need to be dedicated, separate from server auth hierarchies. The SHA-1 payment terminal debacle, the Cisco Expressway mess. Every time client and server auth are entangled in the same hierarchy, one use case holds back progress on the other. Don’t repeat that.
The bigger picture
What we’re watching is a structural realignment of the WebPKI’s purpose. The WebPKI is being narrowed to mean “TLS server authentication for web browsers,” full stop. Everything else, client auth, S/MIME, code signing, is being pushed to dedicated hierarchies, private PKI, or alternative trust frameworks. That’s mostly the right direction. But the service-to-service authentication gap is real, growing, and not well served by any of the current alternatives. Private PKI doesn’t solve cross-organizational trust. X9 PKI is sector-specific. The CA/Browser Forum has the institutional knowledge, the validation infrastructure, and the trust framework to define something that works here. Whether they choose to is another question.
The conversation is happening now on the public list. If you have concrete use cases for cross-organizational service authentication with publicly trusted client certificates, this is the time to share them. The shape of what comes next depends on whether the use cases justify the effort, and right now the list is thin.
Let’s Encrypt announced DNS-PERSIST-01 support this week. That is worth noting on its own. But the announcement landed in a way that made me want to trace the longer arc, because what DNS-PERSIST-01 represents is not just a new ACME method. It is the last piece of a transition that took the ecosystem roughly three decades to complete.
That transition was simple in concept and genuinely hard in practice. Stop guessing who answers the phone and start proving who controls the namespace.
What “domain control validation” actually meant in the early days
If you were issuing or auditing certificates in the early web era, domain control validation was less a cryptographic proof than an act of institutional faith. The certificate authority (CA) would send a challenge to webmaster@, admin@, or hostmaster@ at the subject domain, or sometimes look up a fax number in WHOIS and send something there. If a human responded, the certificate got issued.
The model made a bet, a bet that there was a stable, security-relevant human role behind each domain, reachable through a stable channel, and that the person on the other end was both authorized and paying attention.
That bet was always shakier than it looked. What actually happened over time was that the alias went to a ticketing system, or an outsourcer, or a shared mailbox that someone forgot to audit, or just the wrong person entirely. The certificate still got issued. The CA had checked the box. No one had actually verified control of anything.
The worst failures in this period were not exotic cryptographic breaks. They were governance failures and operational drift. The “webmaster takeover” class of problem. The role stopped being real long before the method stopped being allowed. The Baseline Requirements, the industry rules governing what certificate authorities are allowed to do, carried these validation approaches forward because nobody volunteers to own the deprecation, and someone always depends on the thing you want to kill.SC-080 and SC-090 are essentially the CA/Browser Forum (CABF) writing down, in balloted form, what practitioners had already known for years, that being able to be reached at a business address does not demonstrate domain control.
The thing that made the real fix possible
It is easy to look at ACME, the protocol that powers automated certificate issuance, and treat it as a purely technical improvement. It was. But the reason it became viable as a default assumption had as much to do with deployment reality as with protocol design.
In 2014, roughly 30% of web traffic was HTTPS. Mozilla telemetry puts it above 80% globally by late 2024, with North America around 97%. Chrome’s numbers show the same shape, climbing from the low 30s in 2015 to 95-99% by 2020 and plateauing since.
That matters because ACME’s endpoint-based methods depend on actually reaching the endpoint. HTTP-01 proves control by serving a signed token over HTTP at a well-known path on port 80. TLS-ALPN-01 proves control by completing a TLS handshake on port 443 using a dedicated protocol extension and a special validation certificate, with no HTTP handling required. That distinction matters in practice; TLS-ALPN-01 exists specifically for hosting providers, CDNs, and TLS-terminating load balancers who want to validate at the TLS layer without routing validation traffic through to their backends. If port 80 is blocked or you are terminating TLS before HTTP ever reaches your application, TLS-ALPN-01 is the right tool. If you have a publicly reachable web server and port 80 is open, HTTP-01 is simpler.
Both are bootstrap proofs and you can establish domain control without DNS write automation, which matters for the long tail of deployments where DNS is locked down or outsourced in ways that make safe automation difficult. In 2014, assuming you could reach a public endpoint was optimistic. By 2024, the population of sites that cannot serve a response over HTTP or TLS is small enough to be the exception. The web converged on HTTPS fast enough that endpoint-based validation became the reasonable default.
HTTP-01 is also, almost certainly, the last insecure-by-design method that will survive long term, and it will survive for structural rather than technical reasons. There is a bootstrap problem – TLS-ALPN-01 requires TLS already be deployed and configurable at the edge, but if you are getting a certificate because you do not yet have TLS, you cannot use TLS-ALPN-01 to get it. HTTP-01 is how you break out of that loop. More durable than the bootstrap problem, though, is the org chart problem. In large organizations, the web team controls the servers, the network team owns port policies, the DNS team owns the zone, and security owns the TLS infrastructure decisions. None of them individually have the full set of permissions to deploy any other method without coordination. But the web team can serve a token file over port 80 without asking anyone. HTTP-01 wins by default, not because it is the right answer, but because it is the answer that requires the fewest cross-team conversations. That dynamic is unlikely to change, which means HTTP-01 will probably remain the method of last resort indefinitely, insecure channel and all.
DNS-01, and why scale broke it
DNS-01 changed the question from “who answers this email” to “who can write to this DNS zone.” That is a meaningfully better question. DNS is not a signal that you control the domain. It is the domain.
The operational reality, though, is that DNS automation means DNS API credentials distributed across issuance pipelines, renewal workflows, and whatever tooling you are running at the edge. At modest scale that is manageable. At high volume, across large platforms, IoT deployments, and multi-tenant environments, the recurring DNS write per renewal starts to look like both a performance constraint and a credential sprawl problem.
The CNAME delegation pattern that became common was a partial answer, point _acme-challenge.<domain> at a zone you control more tightly, and do the proof there. It worked. It also created a new problem, multiple independent solvers fighting over a shared label.
DNS-ACCOUNT-01, which solved the CNAME collision and nothing else
DNS-ACCOUNT-01 exists to solve that specific problem. By scoping the validation label to the ACME account rather than leaving it shared, multiple delegated pipelines can coexist without colliding. Two independent issuance systems, two different cloud providers, parallel solvers during a migration. They all get their own label and can run without coordinating.
It is intentionally narrow. It does not change the underlying rhythm of fresh proof per issuance. The label is persistent, the proof is still ephemeral. A new token per order, a new DNS write per renewal. The change is only where the proof lives, so delegation can scale cleanly. DNS churn remains, because that was not the problem DNS-ACCOUNT-01 was trying to solve.
In hindsight, that narrowness reflects the world it was designed for. Certificate validity was still measured in years, then in 398 days. Renewals were infrequent enough that requiring fresh DNS proof per issuance was a manageable cost. The credential distribution problem existed, but it was not yet acute. If DNS-ACCOUNT-01 had been designed in a world where certificates expire every 47 days, which is where the CABF is now taking us, it almost certainly would have looked a lot more like DNS-PERSIST-01 from the start. That is not a criticism. You cannot see the 47-day problem from inside a 398-day world.
DNS-PERSIST-01, which the short-validity world actually requires
The CA/Browser Forum’s ongoing push to shorten maximum certificate validity, from years down to 398 days and now trending toward 47, makes the recurring-proof model increasingly painful for everyone, not just operators running at high volume. At 398-day validity, a DNS write per renewal is a minor operational cost. At 47 days, you are writing to DNS eight times a year per certificate, across every certificate in your fleet, with API credentials that have to live somewhere in that pipeline. That is not a scaling problem. That is a design problem.
The more important point is that DNS-PERSIST-01 is simply the better tool for anyone who has DNS access and a CA that supports it, regardless of volume. It subsumes what DNS-01 and DNS-ACCOUNT-01 each solve – the CNAME collision problem goes away because each account’s standing authorization is already scoped, and the credential churn problem goes away because there is no recurring write.
The useful analogy here is passwords versus passkeys. Passwords require you to re-prove the secret on every authentication. Passkeys establish a cryptographic binding once and derive proof from it. Every DNS-based ACME method before DNS-PERSIST-01 worked like a password, prove control again, on this order, right now. DNS-PERSIST-01 works like a passkey; the binding is established, scoped, and cryptographically tied to your ACME account key. You do not re-prove the same thing on every renewal. You prove you still hold the key.
Instead of proving control on every renewal cycle, you establish a standing authorization record bound to your ACME account and the CA. Set it once. Reuse it across renewals. The CABF formalized this direction in SC-088v3, which added the DNS TXT Record with Persistent Value method to the BRs.
This is not a shortcut. The standing authorization is scoped, can carry expiration, and is explicitly tied to an ACME account key. The attack surface moves from the repeated DNS transaction to the account key itself, which is the right place for it. That is why Let’s Encrypt is being deliberate, Pebble (the reference ACME test server) support is in place, client support is in progress, and the staged rollout is planned for 2026. The scope controls around wildcard policy and authorization lifetime are part of the design, not afterthoughts.
What it eliminates is the recurring DNS write requirement that turned high-volume issuance into a credential distribution problem. In a world trending toward 47-day certificates, that is not a nice-to-have. It is the method that makes the new validity regime operationally survivable for anyone running at real scale.
What actually changed
The webmaster era died because the webmaster role died. The person who answered webmaster@ in 1995 was plausibly the person responsible for the domain. By 2010, that alias might go anywhere. By 2020, it was a cassette tape. Technically still a format, functionally forgotten.
This is the same pattern that gave us a decade of SIM-swapping attacks. SMS was a convenient channel, so the industry conscripted it into an authentication role it was never designed for, and held it there long after the threat model had outgrown the assumption. Nobody decided email-to-webmaster or SMS were the right security primitives for what they were being asked to do. They were just there, they mostly worked, and changing them had cost. The failures were predictable in retrospect and ignored in practice until the losses became undeniable.
The ACME methods work because they measure what they claim to measure. HTTP-01 proves you can respond at the endpoint. DNS-01 proves you can write to the zone. TLS-ALPN-01 proves you can complete a handshake. Technical controls, not institutional proxies.
DNS-PERSIST-01 is the mature form of that idea, a standing proof of control that does not require re-proving the same thing every 90 days at the cost of DNS churn and credential distribution. It is also the method that answers the question the old system was never actually asking. The old system broke because standing assumptions about institutional stability turned out not to hold. The new system makes the standing assumption explicit, scoped, bound to a cryptographic identity, and revocable.
That is not the same mistake. That is the lesson applied.
Start here when choosing a method. If you cannot touch DNS and port 80 is open, HTTP-01 is the simplest path. If port 80 is blocked or you are terminating TLS before HTTP reaches your application, TLS-ALPN-01 validates at the TLS layer without touching HTTP handling. If you need wildcard coverage or your edge is not publicly reachable at all, DNS-01 is the right tool. If you are running multiple independent pipelines against the same domain and CNAME delegation is creating label collisions, DNS-ACCOUNT-01 solves that without changing anything else. And if you are renewing at volume in a world trending toward 47-day validity, DNS-PERSIST-01 is the method that does not eventually break you, not because the others are wrong, but because repeated proof per renewal was designed for a renewal cadence that no longer exists.
In practice, large organizations often find themselves in a catch-22 that makes the decision for them. TLS-ALPN-01 requires TLS to already be deployed and configurable at the edge, but you need the certificate to deploy TLS in the first place. DNS-01 requires writing to the zone, but DNS is owned by a different team, and the change process takes weeks. DNS-PERSIST-01 requires standing up ACME account management, but that is a security infrastructure decision that needs approval. Meanwhile, the web team controls the servers and can serve a token file over port 80 today. So HTTP-01 it is, not because anyone evaluated the options and chose it, but because it was the only method where a single team had all the permissions needed to complete validation without a cross-functional project. The decision tree above describes the technically correct path. The org chart usually picks a different one.
Like most security improvements, the arc from fax-based DCV to persistent cryptographic authorization took longer than it should have, the gap between knowing something is broken and replacing it is always larger than it looks from the outside. But the trajectory is now clear that domain control validation means proving control, not guessing at it.
This is a long one. But as a great man once said, forgive the length, I didn’t have time to write a short one.
The industry has been going back and forth on where agent identity belongs. Is it closer to workload identity (attestation, pre-enumerated trust graphs, role-bound authorization) or closer to human identity (delegation, consent, progressive trust, session scope)? The answer from my perspective is human identity. But the reason isn’t what most people think.
The usual argument goes like this. Agents exercise discretion. They interpret ambiguous input. They pick tools. They sequence actions. They surprise you. Workloads don’t do any of that. Therefore agents need human-style identity.
That argument is true but it’s not the load-bearing part. The real reason is simpler and more structural.
Think about it this way. A robot arm on an assembly line is bolted to the floor. It’s “Arm #42.” It picks up a bolt from Bin A and puts it in Hole B. If it tries to reach for Bin Z, the system shuts it down. It has no reason to ever touch Bin Z. That’s workload identity. It works because the environment is closed and architected.
Now think about a consultant hired to “fix efficiency.” They roam the entire building. They’re “Alice, acting on behalf of the CEO.” They don’t have a list of rooms they can enter. They have a badge that says “CEO’s Proxy.” When they realize the problem is in the basement, the security guard checks their badge and lets them in, even though the CEO didn’t write “Alice can go to the basement” on a list that morning. The badge isn’t unlimited access. It’s a delegation primitive combined with policy. That’s human identity. It works because the environment is open and emergent.
Agents are the consultant, not the robot arm. Workload identity is built for maps: you know the territory, you draw the routes, if a service goes off-route it’s an error. Agent identity is built for compasses: you know the destination, but the route is discovered at runtime. Our identity infrastructure needs to reflect that difference.
To be clear, I am not suggesting agents are human. This isn’t about moral equivalence, legal personhood, or anthropomorphism. It’s about principal modeling. Agents occupy a similar architectural role to humans in identity systems. Discretionary actors operating in open ecosystems under delegated authority. That’s a structural observation, not a philosophical claim.
A fair objection is that today’s agents mostly work on concrete, short-lived tasks. A coding agent fixes a bug. A support agent resolves a ticket. The autonomy they exercise is handling subtle variance within a well-defined scope, not roaming across open ecosystems making judgment calls. That’s true, and in those cases the workload identity model is a reasonable fit.
But the majority of the value everyone is chasing accrues when agents can act for longer periods of time on more open-ended problems. Investigate why this system is slow. Manage this compliance process. Coordinate across these teams to ship this feature. And the longer an agent runs, the more likely it is to need permissions beyond what anyone anticipated at the start. That’s the nature of open-ended work.
The longer the horizon and the more open the problem space, the more the identity challenges described here become real engineering constraints rather than theoretical concerns. What follows is increasingly true as agents move in that direction, and every serious investment in agent capability is pushing them there.
Workload Identity Was Built for Closed Ecosystems
Think about how workload identity actually works in practice. You know which services are in your infrastructure. You know which service talks to which service. You pre-provision the credentials or you set up attestation so that the right code running in the right environment gets the right identity at boot time. SPIFFE loosened some of the static parts with dynamic attestation, but the mental model is still the same: I know what’s in my infrastructure, and I’m issuing identity to things I control.
That model works because workloads operate in closed ecosystems. Your Kubernetes cluster. Your cloud account. Your service mesh. The set of actors is known. The trust relationships are pre-defined. The identity system’s job is to verify that the thing asking for access is the thing you already decided should have access.
Agents broke that assumption.
An MCP client can talk to any server. An agent operating on your behalf might need to interact with services it was never pre-registered with. Trust relationships may be dynamic, not pre-provisioned, and the more open-ended the task the more likely that is true. The authorization decisions are contextual. Sometimes a human needs to approve what’s happening in real time. An agent might need to negotiate access to a resource that neither you nor the agent anticipated when the mission started.
None of that fits the workload model. Not because agents think or exercise judgment, but because the ecosystem they operate in is open. Workload identity was built for closed ecosystems. The more capable and autonomous agents become, the less they stay inside them.
Discovery Is the Problem Nobody Wants to Talk About
The open ecosystem problem goes deeper than just “agents interact with arbitrary services.” The whole point of an agent is to find paths you didn’t anticipate. Tell an agent “go figure out why certificate issuance is broken” and it might follow a trail from CT logs to a CA status page to vendor Slack to a three-year-old wiki page to someone’s personal notes. That path isn’t architected. It emerges from the agent reasoning about the problem.
Every existing authorization model assumes someone already enumerated what exists.
System
Resource Space
Discovery Model
Auth Timing
Trust Model
SPIFFE
Closed, architected
None, interaction graph is designed
Deploy-time
Static, identity-bound
OAuth
Bounded by pre-registered integrations
None, API contracts exist
Integration-time + user consent
Static after consent
IAM
Closed, catalogued
None, administratively maintained
Admin-time
Static, role-bound
Zero Trust
Bounded by inventory and policy plane
None, known endpoints
Per-request
Session-scoped, contextual
Browser Security
Open, unbounded
Full, arbitrary traversal
Per-request, per-capability
None, no accumulation
Agentic Auth (needed)
Open, task-emergent
Reasoning-driven, discovered at runtime
Continuous, intra-task
Accumulative, task-scoped
Every model except browser security assumes a closed resource space. Browser security is the only open-space model, but it doesn’t accumulate trust. Agents need open-space discovery with accumulative trust. Nothing in the current stack does both.
Structured authorization models assume you can enumerate the paths. But enumeration kills emergence. If you have to pre-authorize every possible resource an agent might touch, you’ve pre-solved the problem space. That defeats the purpose of having an agent explore it.
The security objection here is obvious. An agent “discovering paths you didn’t anticipate” sounds a lot like lateral movement. The difference is authorization. An attacker discovers paths to exploit vulnerabilities. An agent discovers paths to find capabilities, under a delegation, subject to policy, with every step logged. The distinction only holds if the governance layer is actually doing its job. Without it, agent discovery and attacker reconnaissance are indistinguishable. That’s not an argument against discovery. It’s an argument for getting the governance layer right.
The Authorization Direction Is Inverted
Workload identity is additive. You enumerate what’s permitted. Here’s the role, here’s the scope, here’s the list of services this workload can talk to. Everything outside that list is denied.
Agents need something different. Not pure positive enumeration, but mixed constraints: here’s the goal, here’s the scope you’re operating in, here’s what’s off limits, here’s when you escalate. Access outside the defined scope isn’t default-allowed. It’s negotiable through demonstrated relevance and appropriate oversight.
That’s goal-scoped authorization with negative constraints rather than positive enumeration. And before the security people start hyperventilating, this doesn’t mean “default allow with a blacklist.” That would be insane. Nobody is proposing that.
What it actually looks like is how we scope human delegation in practice. When a company hires a consultant and says “fix our efficiency problem,” they don’t hand them a list of every room they can enter, every file they can read, every person they can talk to. They give them a badge, a scope of work, a set of boundaries (don’t access HR records, don’t make personnel decisions), escalation requirements (get approval before committing to anything over $50k), and monitoring (weekly check-ins, expense reports, audit trail). That’s not default allow. It’s delegated authority with boundaries, escalation paths, and oversight.
The constraints are a mix of positive (here’s your scope), negative (here’s what’s off limits), and procedural (here’s when you need to ask). To be fair, no deployed identity protocol fully supports this mixed-constraint model today. OAuth scopes are basically positive enumeration. RBAC is positive enumeration. Policy grammars that can express mixed constraints exist (Cedar and its derivatives can express allow, deny, and escalation rules against the same resource), but nobody has deployed them for agent governance yet.
The mixed-constraint approach is how we govern humans organizationally, with identity infrastructure providing one piece of it. But the human identity stack is at least oriented in this direction. It has the concepts of delegation, consent, and conditional access. The workload identity stack doesn’t even have the vocabulary for it, because it was never designed for actors that discover their own paths.
The workload model can’t support this because it was designed to enumerate. The human model is oriented toward it because humans were the first actors that needed to operate in open, unbounded problem spaces with delegated authority and loosely defined scope.
The Human Identity Stack Got Here First
The human identity stack evolved these properties because humans needed them. Delegation exists because users interact with arbitrary services and need to grant scoped authority. Federation exists because trust crosses organizational boundaries. Consent flows exist because sometimes a human needs to approve what’s happening. Progressive auth exists because different operations require different levels of assurance, though in practice it’s barely deployed because it’s hard to implement well.
That last point matters. Progressive auth has been a nice-to-have for human identity, something most organizations skip because the friction isn’t worth it for human users who can just re-authenticate. For agents, it becomes essential. The more emergent the expectations, the more you need the ability to step up trust dynamically. Agents make progressive auth a requirement, not an aspiration.
And unlike the human case, progressive auth for agents is more tractable to build. The agent proposes an action, a policy engine or human approves, the scope expands with full audit. The governance gates can be automated. The building blocks exist. The composition is the work.
The human stack built these primitives because humans operate in open, dynamic ecosystems. Workloads historically didn’t. Now agents do. And agents are going to force the deployment of progressive auth patterns that the human stack defined but never fully delivered on.
And you can see this playing out in real time. Every serious attempt to solve agent identity reaches for human identity concepts, not workload identity concepts. Dick Hardt built AAuth around delegation, consent, progressive trust, and token exchange. Not because those are OAuth features, but because those are the properties agents need, and the human identity stack is where they were first defined. Microsoft’s Entra Agent ID uses On-Behalf-Of flows, confidential clients, and delegation patterns. Google’s A2A protocol uses OAuth, task-based delegation, and agent cards for discovery.
You can stretch SPIFFE or WIMSE to cover simple agent automation. But once agents operate across discovered systems rather than pre-enumerated ones, the model starts to strain. That’s not because those are bad technologies. It’s because they solve a different layer. Agent auth lives above attestation, in the governance layer, and the concepts that keep showing up there, delegation, consent, session scope, progressive trust, all originate on the human side.
That’s not a coincidence. The people building the protocols are voting with their architecture, and they’re voting for the human side. They’re doing it because that’s where the right primitives already exist.
“Why Not Just Extend Workload Identity?”
The obvious counterargument is that you could start from workload identity and extend it to cover agents. It’s worth taking seriously.
SPIFFE is good technology and it works well where it fits. Cloud-native environments, Kubernetes clusters, modern service meshes. In those environments, SPIFFE’s model of dynamic attestation and identity issuance is exactly right. The problem isn’t SPIFFE. The problem is that you don’t get to change all the systems.
That’s why WIMSE exists. Not because SPIFFE failed, but because the real world has more environments than SPIFFE was designed for. Legacy systems, hybrid deployments, multi-cloud sprawl, enterprise environments that aren’t going to rearchitect around SPIFFE’s model. WIMSE is defining the broader patterns and extending the schemes to fit those other environments. That work is important and it’s still in progress.
There’s also a growing push to treat agents as non-human identities and extend workload identity with agent-specific attributes. Ephemeral provisioning, delegation chains, behavioral monitoring. The idea is that agents are just advanced NHIs, so you start from the workload stack and bolt on what’s missing. I understand the appeal. It lets you build on existing infrastructure without rethinking the model.
But what you end up bolting on is delegation, consent, session scope, and progressive trust. Those aren’t workload identity concepts being extended. Those are human identity concepts being retrofitted onto a foundation that was never designed for them. You’re starting from attestation and trying to work your way up to governance. Every concept you need to add comes from the other stack. At some point you have to ask whether you’re extending workload identity or just rebuilding human identity with extra steps.
Agent Identity Is a Governance Problem
Now apply that same logic to agents more broadly. Agents don’t operate in a world where every system speaks SPIFFE, or WIMSE, or any single workload identity protocol. They interact with whatever is out there. SaaS APIs. Legacy enterprise systems. Third-party services they discover at runtime. The environments agents operate in are even more heterogeneous than the environments WIMSE is trying to address.
And many of those systems don’t support delegation at all. They authenticate users with passwords and passkeys, and that’s it. No OBO flows, no token exchange, no scoped delegation. In those cases agents will need to fully impersonate users, authenticating with the user’s credentials as if they were the user. That’s not the ideal architecture. It’s the practical reality of a world where agents need to interact with systems that were built for humans and haven’t been updated. The identity infrastructure has to treat impersonation as a governed, auditable, revocable act rather than pretending it won’t happen.
I want to be honest about the contradiction here. The moment an agent injects Alice’s password into a legacy SaaS app, all of the governance properties this post argues for vanish. Principal-level accountability, cryptographic provenance, session-scoped delegation — none of it survives that boundary. The legacy system sees Alice. The audit log says Alice. There’s no way to distinguish Alice from an agent acting on Alice’s behalf. You can’t revoke the agent’s access without changing Alice’s password. I don’t have a good answer for that. It’s a real gap, and it will exist for as long as legacy systems do. The faster the world moves toward agent-native endpoints, the smaller this governance black hole gets. But right now it’s large.
At the same time, the world is moving toward agent-native endpoints. I’ve written before about a future where DNS SRV records sit right next to A records, one pointing at the website for humans and one pointing at an MCP endpoint for agents. That’s the direction. But identity infrastructure has to handle the full spectrum, from legacy systems that only understand passwords to native agent endpoints that support delegation and attestation natively. The spectrum will exist for a long time.
More than with humans or workloads, agent identity turns into a governance problem. Human identity is mostly about authentication. Workload identity is mostly about attestation. Agent identity is mostly about governance. Who authorized this agent. What scope was it given. Is that scope still valid. Should a human approve the next step. Can the delegation be revoked right now. Those are all governance questions, and they matter more for agents than they ever did for humans or workloads because agents act autonomously under delegated authority across systems nobody fully controls.
And unlike humans, agents possess neither liability nor common sense. A human with overly broad access still has judgment that says “this is technically allowed but clearly a bad idea” and faces personal consequences for getting it wrong. Agents have neither brake. The governance infrastructure has to provide externally what humans provide partially on their own.
For humans and workloads, identity and authorization are cleanly separable layers. For agents, they converge. An agent’s identity without its delegation context is meaningless, and its delegation context is authorization. Governance is where those two layers collapse into one.
The reason is structural. Workloads act on behalf of the organization that deployed them. The operator and the principal are the same entity. Agents introduce a new actor in the chain. They act on behalf of a specific human who delegated specific authority for a specific task. That “on behalf of” is simultaneously an identity fact and an authorization fact, and it doesn’t exist in the workload model at all.
That’s why the human identity stack keeps winning this argument.
Meanwhile, human identity concepts are deployed at planetary scale. Delegation and consent are mature, well-understood patterns with decades of deployment experience. Progressive trust is defined but barely deployed. Multi-hop delegation provenance is still being figured out. It’s an incomplete picture, but here’s the thing: the properties that are missing from the human side don’t even have definitions on the workload side. That’s still a decisive advantage.
But I want to be clear. The argument here is about properties, not protocols. I don’t think OAuth is the answer, even with DPoP. OAuth was designed for a world of pre-registered clients and tightly scoped API access. DPoP bolts on proof-of-possession, but it doesn’t change the fundamental model.
When Hardt built AAuth, he didn’t extend OAuth. He started a new protocol. He kept the concepts that work (delegation, consent, token exchange, progressive trust) and rebuilt the mechanics around agent-native patterns. HTTPS-based identity without pre-registration, HTTP message signing on every request, ephemeral keys, and multi-hop token exchange. That’s telling. The human identity stack has the right concepts, but the actual protocols need to be rebuilt for agents. The direction is human-side. The destination is something new.
This isn’t about which stack is theoretically better. It’s about which stack has the right primitives deployed in the environments agents actually operate in. The answer to that question is the human identity stack.
Discretion Makes It Harder, But It’s Not the Main Event
The behavioral stuff still matters. It’s just downstream of the structural argument.
Workloads execute predefined logic. You attest that the right code is running in the right environment, and from there you can reason about what it will do. Agents don’t work that way. When you give an autonomous AI agent access to your infrastructure with the goal of “improve system performance,” you can’t predict whether it will optimize efficiency or find creative shortcuts that break other systems. We’ve already seen models break out of containers by exploiting vulnerabilities rather than completing tasks as intended. Agents optimize objectives in ways that can violate intent unless constrained. That’s not a bug. It’s the expected behavior of systems designed to find novel paths to goals.
That means you can’t rely on code measurement alone to govern what an agent does. You also need behavioral monitoring, anomaly detection, conditional privilege, and the ability to put a human in the loop. Those are all human IAM patterns. But you need them because the ecosystem is open and the behavior is unpredictable. The open ecosystem is the first-order problem. The unpredictable behavior makes it worse.
And this is where the distinction between guidance and enforcement matters. System instructions are suggestions. An agent can be told “don’t access production data” in its prompt and still do it if a tool call is available and the reasoning chain leads there. Prompt injections can override instructions entirely. Policy enforcement is infrastructure. Cryptographic controls, governance layers, and authorization gates that sit outside the agent’s context and can’t be talked around. Agents need infrastructure they can’t override through reasoning, not instructions they’re supposed to follow.
What Agents Actually Need From the Human Stack
Session-scoped authority. I’ve written about this with the Tron identity disc metaphor. Agent spawns, gets a fresh disc, performs a mission, disc expires. That’s session semantics. It exists because the trust relationship is bounded and temporary, the way a user’s interaction with a service is bounded and temporary, not the way a workload’s persistent role in a service mesh works.
Think about what happens without it. An agent gets database write access for a migration task. Task completes. The credentials are still live. The next task is unrelated, but the agent still has write access to that database. A poisoned input, a bad reasoning chain, or just an optimization shortcut the agent thought was clever, and it drops a table. Not because it was malicious. Because it had credentials it no longer needed for a task it was no longer doing. That’s the agent equivalent of Bobby Tables, and it’s entirely preventable.
The logical endpoint of session-scoped authority is zero standing permissions. Every agent session starts empty. No credentials carry over from the last task. The agent accumulates only what it needs for this specific mission, and everything resets when the mission ends.
For humans, zero standing permissions is aspirational but rarely practiced because the friction isn’t worth it. Humans don’t want to re-request access to the same systems every morning. Agents don’t have that problem. They can request, wait, and proceed programmatically. The friction that makes zero standing permissions impractical for humans disappears for agents.
The hard question is how permissions get granted at runtime. Predefined policy handles the predictable paths. Billing agent gets billing APIs. That works, but it’s enumeration, and enumeration breaks down for open-ended tasks. Human-gated expansion handles the unpredictable paths, but it kills autonomy.
The mechanism that would actually make zero standing permissions work for emergent behavior is goal-scoped evaluation. Does this request serve the stated goal within the stated boundaries. That’s the same unsolved problem the rest of this piece keeps circling. Zero standing permissions is the right ideal. It’s achievable today for the predictable portion of agent work. The gap is the same gap.
Delegation with provenance. Agents are user agents in the truest sense. They carry delegated user authority into digital systems. AAuth formalizes this with agent tokens that bind signing keys to identity. The question “who authorized this agent to do this?” is a delegation question. Delegation is a human identity primitive because humans were the first actors that operated across trust boundaries and needed to grant scoped authority to others.
Chaining that delegation cryptographically across multi-hop paths, from user to agent to tool to downstream service while maintaining proof of the original user’s intent, is genuinely hard. Standard OBO flows are often too brittle for this. This is where the industry needs to go, not where it is today.
Progressive trust. AAuth lets a resource demand anything from a signed request to verified agent identity to full user authorization. That gradient only makes sense when the trust relationship is negotiated dynamically. Workloads don’t negotiate trust. They either have a role or they don’t.
Accountability at the principal level. When an agent approves a transaction, files a regulatory report, or alters infrastructure state, the audit question is “who authorized this and was it within scope?” Today’s logs can’t answer that. The log says an API token performed a read on a customer record. That token is shared across dozens of agents. Which agent? Acting on whose delegation? For what task? The log can’t say.
And even if it could identify the agent, there’s nothing connecting that action to the human authorization that allowed it. Nobody asks “which Kubernetes pod approved this wire transfer.” Governance frameworks reason about actors. That’s why every protocol effort maps agent identity to principal identity.
Goal-scoped authorization. Agents need mixed constraints rather than pure positive enumeration. Define the scope, set the boundaries, establish the escalation paths, delegate the goal, let the agent figure out the path. That’s how we’ve governed human actors in organizations for centuries. The identity and authorization infrastructure to support it exists in the human stack because that’s where it was needed first.
But I’ll be direct. Goal-scoped authorization is the hardest unsolved engineering problem in this space. The fundamental tension is temporal. Authorization happens before execution, but agents discover what they need during execution. Current authorization systems operate on verbs and nouns (allow this action on this resource). They don’t understand goals. Translating “fix the billing error” into a set of allowed API calls at runtime, without the agent hallucinating its way into a catastrophe, requires a just-in-time policy layer that doesn’t exist yet.
Progressive trust gets us part of the way there. The agent proposes an action, a policy engine, or a human approves the specific derived action before it executes. But the full solution is ahead of us, not behind us.
I know how this sounds to security people. “Goal-based authorization” sounds like the agent decides what it needs based on its own interpretation of a goal. That’s terrifying. It sounds like self-authorizing AI. But the alternative is pretending we can enumerate every action an agent might need in advance, and that fails silently. Either the agent operates within the pre-authorized list and can’t do its job, or someone over-provisions “just in case” and the agent has access to things it shouldn’t. Both are security failures. One just looks tidy on paper. Goal-based auth at least makes the governance visible. The agent proposes, the policy evaluates, the decision is logged. The scary part isn’t that we need goal-based auth. The scary part is that we don’t have it yet, so people are shipping agents with over-provisioned static credentials instead.
And there’s a deeper problem I want to name honestly. The only thing capable of evaluating whether a specific API call serves a broader goal is another LLM. And that means putting a probabilistic, hallucination-prone, high-latency system into the critical path of every infrastructure request. You’re using the thing you’re trying to govern as the governance mechanism. That’s not just an engineering gap waiting to be filled. It’s a fundamental architectural tension that the industry hasn’t figured out how to resolve. Progressive trust with human-gated escalation is the best interim answer, but it’s a workaround, not a solution.
This Isn’t About Throwing Away Attestation
I want to be clear about something because readers will assume otherwise. This argument is not “throw away workload identity primitives.” I’ve spent years arguing that attestation is MFA for workloads. I’ve written about measured enclaves, runtime attestation, and hardware-rooted identity extensively. None of that goes away.
You absolutely need attestation to prove the agent is running the right code in the right environment. You need runtime measurement to detect tampering. You need hardware roots of trust. If a hacker injects malicious code into an agent that has broad delegated authority, you need to know. That’s the workload identity stack doing its job.
In fact, attestation isn’t just complementary to the governance layer. It’s prerequisite. You can’t safely delegate authority to something you can’t verify. All the governance, delegation, and consent primitives in the world are meaningless if the code executing them has been tampered with. Attestation is the foundation the governance layer stands on.
But attestation alone isn’t enough. Proving that the right code is running doesn’t tell you who authorized this agent to act, what scope it was delegated, whether it’s operating within that scope, or whether a human needs to approve the next action. Those are delegation, consent, and governance questions. Those live in the human identity stack.
What agents actually need is both. Workload-style attestation as the foundation, with human-style delegation, consent, and progressive trust built on top.
I’ve argued before that attestation is MFA for workloads. It proves code integrity, runtime environment, and platform state, the way MFA proves presence, possession, and freshness for humans. For agents, we need to extend that into principal-level attestation. Not just “is this the right code in the right environment?” but also “who delegated authority to this agent, under what policy, with what scope, and is that delegation still valid?”
That’s multi-factor attestation of an acting principal. Code integrity from the workload stack, delegation provenance from the human stack, policy snapshot and session scope binding the two together. Neither stack delivers that alone today.
The argument is about where the center of gravity is, not about discarding one stack entirely. And the center of gravity is on the human side, because the hard problems for agents are delegation and governance, not runtime measurement.
Where the Properties Actually Align (And Where They Don’t)
I’ve been arguing agents are more like humans than workloads. That’s true as a center-of-gravity claim. But it’s not total alignment, and pretending otherwise invites the wrong criticisms. Here’s where the properties actually land.
What agents inherit from the human side:
Delegation with scoped authority. Session-bounded trust. Progressive auth and step-up. Cross-boundary trust negotiation. Principal-level accountability. Open ecosystem discovery. These are the properties that make agents look like humans and not like workloads. They’re also the properties that are hardest to solve and least mature.
What agents inherit from the workload side:
Code integrity attestation. Runtime measurement. Programmatic credential handling with no human in the authentication loop. Ephemeral identity that doesn’t persist across sessions. These are well-understood, and the workload identity stack handles them. Agents don’t authenticate the way humans do. They don’t type passwords or touch biometric sensors. They prove what code is running and in what environment. That’s attestation, and it stays on the workload side.
What neither stack gives them:
This is the part nobody is talking about enough. Agents have properties that don’t map cleanly to either the human or workload model.
Accumulative trust within a task that resets between tasks. Human trust accumulates over a career and persists. Workload trust is static and role-bound. Agent trust needs to build during a mission as the agent demonstrates relevance and competence, then reset completely when the mission ends. Nothing in either stack supports that lifecycle.
Goal-scoped authorization with emergent resource discovery. I’ve already called this the hardest unsolved problem. Current auth systems operate on verbs and nouns. Agents need auth systems that operate on goals and boundaries. Neither stack was designed for this.
Delegation where the delegate doesn’t share the delegator’s intent. Every existing delegation protocol assumes the delegate understands and shares the user’s intent. When a human delegates to another human through OAuth, both parties generally understand what “handle my calendar” means and what it doesn’t.
An agent doesn’t share intent. It shares instructions. It will pursue the letter of the delegation through whatever path optimizes the objective, even if the human would have stopped and said “that’s not what I meant.” This isn’t a philosophy problem. It’s a protocol-level assumption violation. No existing delegation framework accounts for delegates that optimize rather than interpret.
Simultaneous proof of code identity and delegation authority. Agents need to prove both what they are (attestation) and who authorized them to act (delegation) in a single transaction. Those proofs come from different stacks with different trust roots. A system can check both sequentially, verify the attestation, then verify the delegation, and that’s buildable today. But binding them together cryptographically into a single verifiable object so a relying party can verify both at once without trusting the binding layer is an unsolved composition problem.
Vulnerability to context poisoning that persists across sessions. I’ve written about the “Invitation Is All You Need” attack where a poisoned calendar entry injected instructions into an agent’s memory that executed days later. Humans can be socially engineered, but they don’t carry the payload across sessions the way agents do. Workloads don’t accumulate context at all. Agent session isolation is a new problem that needs new primitives.
The honest summary is this. Agents inherit their governance properties from the human side and their verification properties from the workload side, but neither stack addresses the properties that are unique to agents. The solution isn’t OAuth with attestation bolted on. It’s something new that inherits from both lineages and adds primitives for accumulative task-scoped trust, goal-based authorization, and session isolation. That thing doesn’t exist yet.
Where This Framing Breaks
Saying “agents are like humans” implies the workload stack fails because workloads lack something agents have. Discretion, autonomy, behavioral complexity. That’s the wrong diagnosis. The workload stack fails because it was built for a world of pre-registered clients, tightly bound server relationships, and closed trust ecosystems. The more capable agents become, the less they stay in that world.
The human identity stack fits better not because agents are human-like, but because it’s oriented toward the structural properties agents need. Open ecosystems. Dynamic trust negotiation. Delegation across boundaries. Session-scoped authority. Progressive assurance. Not all of these are fully deployed today. Some are defined but immature. Some don’t exist as protocols yet. But the concepts, the vocabulary, and the architectural direction all come from the human side. The workload side doesn’t even have the vocabulary for most of them.
Those properties exist in the human stack because humans needed them first. Now agents need them too.
The Convergence We’ve Already Seen
My blog has traced this progression for a while now. Machines were static, long-lived, pre-registered. Workloads broke that model with ephemeral, dynamic, attestation-based identity. Each step in that evolution adopted identity properties that were already standard in human identity systems. Dynamic issuance. Short credential lifetimes. Context-aware access. Attestation as MFA for workloads. Workload identity got better by becoming more like user identity.
Agents are the next step in that same convergence. They don’t just need dynamic credentials and attestation. They need delegation, consent, progressive trust, session scope, and goal-based authorization. The most complete and most deployed versions of those primitives live in the human stack. Some exist in other forms elsewhere (SPIFFE has trust domain federation, capability tokens like Macaroons exist independently), but the human stack is where the broadest set of these concepts has been defined, tested, and deployed at scale.
The Actual Claim
Agent identity is a governance problem. Not an authentication problem, not an attestation problem. The hard questions are all governance questions. Who delegated authority. What scope. Is it still valid. Should a human approve the next step. For humans and workloads, identity and authorization are separate layers. For agents, they collapse. The delegation is the identity.
The human identity stack is where principal identity primitives live. Not because agents are people, but because people were the first actors that needed identity in open ecosystems with delegated authority and unbounded problem spaces.
Every protocol designer who sits down to solve agent auth rediscovers this and reaches for human identity concepts, not workload identity concepts. The protocols they build aren’t OAuth. They’re something new. But they inherit from the human side every time. That convergence is the argument.
The delegation and governance layer is buildable today. Goal-scoped authorization and intent verification are ahead of us. The first generation of agent identity systems will solve governance. The second will solve intent.
There’s a pattern that plays out across every regulated industry. Requirements increase. Complexity compounds. The people responsible for compliance realize they can’t keep up with manual processes. So instead of building the capacity to meet the rising bar, they quietly lower the specificity of their commitments.
It’s rational behavior. A policy that says “we perform regular reviews” can’t be contradicted the way a policy that says “we perform reviews every 72 hours” can. The less you commit to on paper, the less exposure you carry.
The problem is that this rational behavior, repeated across enough organizations and enough audit cycles, hollows out the entire compliance system from the inside. Documents stop describing what organizations actually do. They start describing the minimum an auditor will accept. The gap between documentation and reality widens. Nobody notices until something breaks.
Amazon Trust Services disclosed that their Certificate Revocation Lists sometimes backdate a timestamp called “thisUpdate” by up to a few hours. The practice itself is defensible. It accommodates clock skew in client systems. When they updated their policy document to disclose the behavior, they described it as CRLs “may be backdated by up to a few hours.”
A community member pointed out the obvious. “A few hours” is un-auditable. Without a defined upper bound, there’s no way for an auditor, a monitoring tool, or a relying party to evaluate whether any given CRL falls within the CA’s stated practice. Twelve hours? Still “a few.” Twenty-four? Who decides?
When pressed, Amazon’s response was telling. They don’t plan to add detailed certificate profiles back into their policy documents. They believe referencing external requirements satisfies their disclosure obligations. We’ll tell you we follow the rules, but we won’t tell you how.
Apple, Mozilla, and Google’s Chrome team then independently pushed back. Each stated that referencing external standards is necessary but not sufficient. Policy documents must describe actual implementation choices with enough precision to be verifiable.
Apple’s Dustin Hollenback was direct. “The Apple Root Program expects policy documents to describe the CA Owner’s specific implementation of applicable requirements and operational practices, not merely incorporate them by reference.”
Mozilla’s Ben Wilson went further, noting that “subjective descriptors without defined bounds or technical context make it difficult to evaluate compliance, support audit testing, or enable independent analysis.” Mozilla has since opened Issue #295 to strengthen the MRSP accordingly.
Chrome’s response summarized the situation most clearly:
We consider reducing a CP/CPS to a generic pointer where it becomes impossible to distinguish between CAs that maintain robust, risk-averse practices and those that merely operate at the edge of compliance as being harmful to the reliable security of Chrome’s users.
They also noted that prior versions of Amazon’s policy had considerably more profile detail, calling the trend of stripping operational commitments “a regression in ecosystem transparency.”
The Pattern Underneath
What makes PKI useful as a case study isn’t that certificate authorities are uniquely bad at this. It’s that their compliance process is uniquely visible. CP/CPS documents are public. Incident reports are filed in public Bugzilla threads. Root program responses are posted where anyone can read them. The entire negotiation between “what we do” and “what we’re willing to commit to on paper” plays out in the open.
In most regulated industries, you never see this. The equivalent conversations in finance, FedRAMP, healthcare, or energy happen behind closed doors between compliance staff and auditors. The dilution is invisible to everyone outside the room. A bank’s internal policies get vaguer over time and nobody outside the compliance team and their auditors knows it happened. A FedRAMP authorization package gets thinner and the only people who notice are the assessors reviewing it. The dynamic is the same. The transparency isn’t.
So when you watch a CA update its policy with “a few hours” and three oversight bodies publicly push back, you’re seeing something that happens constantly across every regulated domain. You’re just not usually allowed to watch.
Strip away the PKI details and the pattern is familiar to anyone who has worked in compliance. An organization starts with detailed documentation of its practices. Requirements grow. Maintaining alignment between what the documents say and what the systems actually do gets expensive. Someone realizes that vague language creates less exposure than specific language. Sometimes it’s the compliance team running out of capacity. Sometimes it’s legal counsel actively advising against specific commitments, believing that “reasonable efforts” is harder to litigate against than “24 hours.” Either way, they’re trading audit risk for liability risk and increasing both. The documents get trimmed. Profiles get removed. Temporal commitments become subjective. “Regularly.” “Promptly.” “Periodically.” Operational descriptions become references to external standards.
Each individual edit is defensible. Taken together, they produce a document that can’t be meaningfully audited because there’s nothing concrete to audit against. One community member in the Amazon thread called this “Compliance by Ambiguity,” the practice of using generic, non-technical language to avoid committing to specific operational parameters. It’s a perfect label for a pattern that shows up everywhere.
This is the compliance version of Goodhart’s Law. When organizations optimize their policy documents for audit survival rather than operational transparency, the documents stop serving any of their original functions. Auditors can’t verify practices against vague commitments. Internal teams can’t use the documents to understand what’s expected of them. Regulators can’t evaluate whether the stated approach actually manages risk. The document becomes theater. And audits are already structurally limited by point-in-time sampling, auditee-selected scope, and the inherent conflict of the auditor working for the entity being audited. Layering ambiguous commitments on top of those limitations removes whatever verification power the process had left.
And it’s accelerating. Financial services firms deal with overlapping requirements from dozens of jurisdictions. Healthcare organizations juggle HIPAA, state privacy laws, and emerging AI governance frameworks simultaneously. Even relatively narrow domains like certificate authority operations have seen requirement growth compound year over year as ballot measures, policy updates, and regional regulations stack on top of each other. The manual approach to compliance documentation was already strained a decade ago. Today it’s breaking.
In PKI alone, governance obligations have grown 52-fold since 2005. The pattern is similar in every regulated domain that has added frameworks faster than it has added capacity to manage them.
Most organizations choose dilution. Not because they’re negligent, but because the alternative barely exists yet. There is no tooling deployed at scale that continuously compares what a policy document says against what the infrastructure actually does. No system that flags when a regulatory update creates a gap between stated practice and new requirements. No automated way to verify that temporal commitments (“within 24 hours,” “no more than 72 hours”) match operational reality. So people do what people do when workload exceeds capacity. They cut corners on the parts that seem least likely to matter this quarter. Policy precision feels like a luxury when you’re scrambling to meet the requirements themselves.
What Vagueness Actually Costs
The short-term calculus makes sense. The long-term cost doesn’t.
I went back and looked at public incidents in the Mozilla CA Program going back to 2018. Across roughly 500 cases, about 70% fall into process and operational failures rather than code-level defects. A large portion trace back to gaps between what an organization actually does and what its documents say it does. The organizations that ultimately lost trust follow a consistent pattern. Documents vague enough to avoid direct contradiction, but too vague to demonstrate that operations stayed within defined parameters. The decay is always gradual. The loss of trust always looks sudden.
The breakdown is telling. Of the four major incident categories, Governance & Compliance failures account for roughly half of all incidents, more than certificate misissuance, revocation failures, and validation errors combined. The primary cause isn’t code bugs or cryptographic weaknesses. It’s administrative oversight. Late audit reports, incomplete analysis, delayed reporting. The stuff that lives in policy documents and process descriptions, not in code.
The distribution looks like this:
This holds outside PKI. The financial institutions that get into the worst trouble with regulators aren’t usually the ones doing something explicitly prohibited. They’re the ones whose internal documentation was too vague to prove they were doing what they claimed. Read the details behind SOX failures, GDPR enforcement actions, and FDA warning letters, and you’ll find the same structural problem. Stated practices didn’t match reality, and nobody caught it because the stated practices were too imprecise to evaluate.
Vagueness also creates operational risk that has nothing to do with regulators. When your own engineering, compliance, and legal teams can’t look at a policy document and know exactly what’s expected, they fill in the gaps with assumptions. Different teams make different assumptions. Practices diverge. The organization thinks it’s operating one way because that’s what the document sort of implies. The reality is something else. And the gap only surfaces when an auditor, a regulator, or an incident forces someone to look closely.
The deeper issue is that vagueness removes auditability as a control surface. When commitments are measurable, deviations surface automatically. A system can check whether a CRL was backdated by more than two hours the same way it checks whether a certificate was issued with the wrong key usage extension. The commitment is binary. It either holds or it doesn’t. When commitments are subjective, deviations become interpretive. “A few hours” can’t be checked by a machine. It can only be argued about by people. That shifts risk detection from systems to negotiation. Negotiation doesn’t scale, produces inconsistent outcomes, and worst of all, it only happens between the auditee and the auditor. The regulators and the public who actually bear the risk aren’t in the room.
That spectrum is the diagnostic. Everything to the right of “machine-checkable” is a gap waiting to be exploited by time pressure, turnover, or organizational drift.
What Would Have to Change
Solving this means treating compliance documentation as infrastructure rather than paperwork. In the same way organizations moved from manual deployments to CI/CD pipelines, compliance needs to move from static documents reviewed annually to living systems verified continuously.
The instinct is to throw AI at it, and that instinct is half right. LLMs are good at ingesting unstructured policy documents. But compliance verification isn’t a search problem. It’s a systematic reasoning problem. You need to trace requirements through hierarchies, exceptions, and precedence rules, then compare them against operational evidence. Recent research shows that RAG-based approaches still hallucinate 17-33% of the time on legal and compliance questions, even with domain-specific retrieval. The failure mode isn’t bad prompting. It’s architectural. You cannot train a model to strictly verify “a few hours” any better than you can train an auditor.
The fix isn’t better retrieval. It’s decomposing complex compliance questions into bounded sub-queries against explicit structures that encode regulatory hierarchy and organizational context, keeping the LLM’s role narrow enough that its errors can be isolated and reviewed.
That means tooling that ingests policy documents and maps commitments to regulatory requirements. Systems that flag language failing basic auditability checks, like temporal bounds described with subjective terms instead of defined thresholds. Automated comparison of stated practices against actual system behavior, running continuously rather than at audit time.
In the Amazon case, a system like this would have caught “a few hours” before it was published. Not because backdating is prohibited, but because the description lacks the specificity needed for anyone to verify compliance with it. The system wouldn’t need to understand CRL semantics. It would just need to know that temporal bounds in operational descriptions require defined, measurable thresholds to be auditable.
Scale that across any compliance domain. Every vague commitment is a gap. Every gap is a place where practice can diverge from documentation without detection. Every undetected divergence is risk accumulating quietly until something forces it into the open.
The Amazon incident is useful because it forced the people who oversee trust decisions to say out loud what has been implicit for years. The bar for documentation specificity is rising, and organizations that optimize for minimal disclosure are optimizing for the wrong thing. That message goes well beyond certificate authorities. The ones that keep diluting their commitments will discover that vagueness isn’t a shield. It’s a slow-moving liability that compounds until it becomes an acute one.
The regulatory environment isn’t going to get simpler. The organizations that treat policy precision as optional will discover that ambiguity scales faster than governance, and that systems which cannot be automatically verified will eventually be manually challenged.
Attestation has become one of the most important yet misunderstood concepts in modern security. It now shows up in hardware tokens, mobile devices, cloud HSMs, TPMs, confidential computing platforms, and operating systems. Regulations and trust frameworks are beginning to depend on it. At the same time people talk about attestation as if it has a single, universally understood meaning. It does not.
Attestation is not a guarantee. It is a signed assertion that provides evidence about something. What that evidence means depends entirely on the system that produced it, the protection boundary of the key that signed it, and the verifier’s understanding of what the attestation asserts and the verifier’s faith in the guarantees provided by the attestation mechanism itself.
To understand where security is heading, you need to understand what attestation can prove, what it cannot prove, and why it is becoming essential in a world where the machines running our code are no longer under our control.
Claims, Attestations, and the Strength of Belief
A claim is something a system says about itself. There is no protection behind it and no expectation of truth. A user agent string is a perfect example. It might say it is an iPhone, an Android device, or Windows. Anyone can forge it. It is just metadata. At best it lets you guess what security properties the device might have, but a guess is not evidence.
Here is a typical user agent string:
Mozilla/5.0 (iPhone; CPU iPhone OS 15_2 like Mac OS X)
AppleWebKit/605.1.15
Mobile/15E148
Safari/605.1.15
If you break it apart it claims to be an iPhone, running iOS, using Safari, and supporting specific web engines. None of this is verified. It is only a claim.
Attestation is different. Attestation is a signed statement produced by a system with a defined protection boundary. That boundary might be hardware, a secure element, a trusted execution environment, a Secure Enclave, a hypervisor-isolated domain, or even an operating system component rooted in hardware measurements but not itself an isolated security boundary. Attestation does not make a statement true, but it provides a basis to believe it because the signing key is protected in a way the verifier can reason about.
Attestation is evidence. The strength of that evidence depends on the strength of the protection boundary and on the verifier’s understanding of what the attestation actually asserts.
Why Attestation Became Necessary
When I worked at Microsoft we used to repeat a simple rule about computer security. If an attacker has access to your computer it is no longer your computer. That rule made sense when software ran on machines we owned and controlled. You knew who had access. You knew who set the policies. You could walk over and inspect the hardware yourself.
That world disappeared.
A classic illustration of this problem is the evil maid attack on laptops. If a device is left unattended an attacker with physical access can modify the boot process, install malicious firmware, or capture secrets without leaving obvious traces. Once that happens the laptop may look like your computer but it is no longer your computer.
This loss of control is not limited to physical attacks. It foreshadowed what came next in computing. First workloads moved into shared data centers. Virtualization blurred the idea of a single physical machine. Cloud computing erased it entirely. Today your software runs on globally distributed infrastructure owned by vendors you do not know, in data centers you will never see, under policies you cannot dictate.
The old trust model depended on physical and administrative control. Those assumptions no longer hold. The modern corollary is clear. If your code is running on someone else’s computer you need evidence that it is behaving the way you expect.
Vendor promises are claims. Documentation is a claim. Marketing is a claim. None of these are evidence. To make correct security decisions in this environment you need verifiable information produced by the platform itself. That is the role attestation plays. The standards community recognized this need and began defining shared models for describing and evaluating attestation evidence, most notably through the IETF RATS architecture.
The IETF RATS View of Attestation
The IETF formalized the attestation landscape through the RATS architecture. It defines three roles. The attester produces signed evidence about itself or about the keys it generates. The verifier checks the evidence and interprets its meaning. The relying party makes a decision based on the verifier’s result.
This separation matters because it reinforces that attestation is not the decision itself. It is the input to the decision, and different attesters produce different types of evidence.
Two Families of Attestation
Attestation appears in many forms, but in practice it falls into two broad families.
One family answers where a key came from and whether it is protected by an appropriate security boundary. The other answers what code is running and whether it is running in an environment that matches expected security policies. They both produce signed evidence but they measure and assert different properties.
Key Management Attestation: Provenance and Protection
YubiKey PIV Attestation
YubiKeys provide a clear example of key management attestation. When you create a key in a PIV slot the device generates an attestation certificate describing that key. The trust structure behind this is simple. Each YubiKey contains a root attestation certificate that serves as the trust anchor. Beneath that root is a device specific issuing CA certificate whose private key lives inside the secure element and cannot be extracted. When a verifier asks the device to attest a slot the issuing CA signs a brand new attestation certificate for that session. The public key in the certificate is always the same if the underlying slot key has not changed, but the certificate itself is newly generated each time with a different serial number and signature. This design allows verifiers to confirm that the key was generated on the device while keeping the blast radius small. If one token is compromised only that device is affected.
Cloud HSMs and the Marvell Ecosystem
Cloud HSMs scale this idea to entire services. They produce signed statements asserting that keys were generated inside an HSM, protected under specific roots, bound to non exportability rules, and conforming to certification regimes. Many cloud HSMs use Marvell hardware, and other commercial and open HSMs implement attestation as well. The Marvell based examples are used here simply because the inconsistencies are illustrative, not because they are the only devices that support attestation. Many vendors provide their own attestation formats and trust chains. AWS CloudHSM and Google Cloud HSM share that silicon base, but their attestation formats differ because they use different firmware and integration layers.
This inconsistency creates a real challenge for anyone who needs to interpret attestation evidence reliably. Even when the underlying hardware is the same the attestation structures are not. To make this practical to work with we maintain an open source library that currently decodes, validates, and normalizes attestation evidence from YubiKeys and Marvell based HSMs, and is designed to support additional attestation mechanisms over time. Normalization matters because if we want attestation to be widely adopted we cannot expect every verifier or relying party to understand every attestation format. Real systems often encounter many different kinds of attestation evidence from many sources, and a common normalization layer is essential to make verification scalable.
Hardware alone does not define the attestation model. The actual evidence produced by the device does.
Mobile Key Attestation: Android and iOS
Mobile devices are the largest deployment of secure hardware anywhere. Their attestation mechanisms reflect years of lessons about device identity, OS integrity, and tamper resistance.
Android Keymaster and StrongBox
Android attestation provides information about the secure element or TEE, OS version, patch level, verified boot state, device identity, downgrade protection, and key properties. It anchors keys to both hardware and system state. This attestation is used for payments, enterprise identity, FIDO authentication, and fraud reduction.
Apple Secure Enclave Attestation
Apple takes a similar approach using a different chain. Secure Enclave attestation asserts device identity, OS trust chain, enclave identity, and key provenance. It supports Apple Pay, iCloud Keychain, MDM enrollment, and per app cryptographic isolation.
Confidential computing attestation solves a different problem. Instead of proving where a key came from, it proves what code is running and whether it is running in an environment that meets expected security constraints.
Intel SGX provides enclave reports that describe enclave measurements. AMD SEV-SNP provides VM measurement reports. AWS Nitro Enclaves use signed Nitro documents. Google Confidential VMs combine SEV-SNP with Google’s verification policies.
This evidence asserts which measurements the hardware recorded, whether memory is isolated, and whether the platform is genuine.
Why the Distinction Matters
Key management attestation cannot answer questions about code execution. Confidential computing attestation cannot answer questions about where keys were created. The evidence is different, the claims are different, and the trust chains are different.
If you do not understand which form of attestation you are dealing with you cannot interpret its meaning correctly.
Regulatory and Policy Pressure
Attestation is becoming important because the bar for trust has been raised. The clearest example is the CA or Browser Forum Code Signing Baseline Requirements, which mandate hardware protected private keys and increasingly rely on attestation as the evidence of compliance.
Secure development frameworks including the EU Cyber Resilience Act push vendors toward demonstrating that firmware and update signing keys were generated and protected in secure environments. Enterprise procurement policies frequently require the same assurances. These rules do not always use the word attestation, but the outcomes they demand can only be met with attestation evidence.
The Lesson
Attestation is evidence. It is not truth. It is stronger than a claim because it is anchored in a protection boundary, but the strength of that boundary varies across systems and architectures. The meaning of the evidence depends on the attester, the verifier, and the assumptions of the relying party.
There are two major forms of attestation. Key management attestation tells you where a key came from and how it is protected. Confidential computing attestation tells you what code is running and where it is running.
As computing continues to move onto systems we do not control and becomes more and more distributed, attestation will become the foundation of trust. Secure systems will rely on verifiable evidence instead of assumptions, and attestation will be the language used to express that evidence.
Code signing was supposed to tell you who published a piece of software and ultimately decide if you can trust the software and install it.. For nearly three decades, cryptographic signatures have bound a binary to a publisher’s identity, guaranteeing it hasn’t been tampered with since signing. But on Windows, that system is now broken in ways that would make its original designers cringe.
But attackers have found ways to completely subvert this promise without breaking a single cryptographic primitive. They can now create an unlimited number of different malicious binaries that all carry the exact same “trusted” signature, or careless publishers operating signing oracles that enable others to turn their software into a bootloader for malware. The result is a system where valid signatures from trusted companies can no longer tell you anything meaningful about what the software will actually do.
Attackers don’t need to steal keys or compromise Certificate Authorities. They use the legitimate vendor software and publicly trusted code signing certificates, perverting the entire purpose of publisher-identity-based code signing.
Microsoft’s Long-Standing Awareness
Microsoft has known about the issue of maleability for at least a decade. In 2013, they patched CVE-2013-3900], where attackers could modify signed Windows executables, adding malicious code in “unverified portions” without invalidating the Authenticode signature. WinVerifyTrust improperly validated these files, allowing one “trusted” signature to represent completely different, malicious behavior.
This revealed a deeper architectural flaw, signed binaries could be altered by unsigned data. Microsoft faced a classic platform dilemma – the kind that every major platform holder eventually confronts. Fixing this comprehensively risked breaking legacy software critical to their vast ecosystem, potentially disrupting thousands of applications that businesses depended on daily. The engineering tradeoffs were genuinely difficult: comprehensive security improvements versus maintaining compatibility for millions of users and enterprise customers who couldn’t easily update or replace critical software.
They made the fix optional, prioritizing ecosystem compatibility over security hardening. This choice might have been understandable from a platform perspective in 2013, when the threat landscape was simpler and the scale of potential abuse wasn’t yet clear. But it becomes increasingly indefensible as attacks evolved and the architectural weaknesses became a systematic attack vector rather than an isolated vulnerability.
In 2022, Microsoft republished the advisory, confirming they still won’t enforce stricter verification by default, while today’s issues differ, they are part of a similar class of vulnerabilities attackers now exploit systematically. The “trusted-but-mutable” flaw is now starting to permeate the Windows code signing ecosystem. Attackers use legitimate, signed applications as rootkit-like trust proxies, inheriting vendors’ reputation and bypass capabilities to deliver arbitrary malicious payloads.
Two incidents show we’re not dealing with isolated bugs but systematic assaults on Microsoft’s code signing’s core assumptions.
ConnectWise: When Legitimate Software Adopts Malware Design Patterns
ConnectWise didn’t stumble into a vulnerability. They deliberately engineered their software using design patterns from the malware playbook. Their “attribute stuffing” technique embeds unsigned configuration data in the unauthenticated_attributes field of the PKCS#7 (CMS) envelope, a tactic malware authors use to conceal payloads in signed binaries.
In PKCS#7, the SignedData structure includes a signed digest (covering the binary and metadata) and optional unauthenticated_attributes, which lie outside the digest and can be modified post-signing without invalidating the signature. ConnectWise’s ScreenConnect installer misuses the Microsoft-reserved OID for Individual Code Signing ([1.3.6.1.4.1.311].4.1.1) in this field to store unsigned configuration data, such as server endpoints that act as the command control server of their client. This OID, meant for specific code signing purposes, is exploited to embed attacker-controlled configs, allowing the same signed binary to point to different servers without altering the trusted signature.
The ConnectWise ScreenConnect incident emerged when River Financial’s security team found attackers creating a fake website, distributing malware as a “River desktop app.” It was a trust inheritance fraud, a legitimately signed ScreenConnect client auto-connecting to an attacker-controlled server.
Windows trusts this as legitimate ConnectWise software, no SmartScreen warnings, no UAC prompts, silent installation, and immediate remote control. Attackers generate a fresh installer via a ConnectWise trial account or simply found an existing package and manually edited the unauthenticated_attributes, extracting a benign signature, grafting a malicious configuration blob (e.g., attacker C2 server), inserting the modified signature, and creating a “trusted” binary. Each variant shares the certificate’s reputation, bypassing Windows security.
Why does Windows trust binaries with oversized, unusual unauthenticated_attributes? Legitimate signatures need minimal metadata, yet Windows ignores red flags like large attribute sections, treating them as fully trusted. ConnectWise’s choice to embed mutable configs mirrors malware techniques, creating an infinite malware factory where one signed object spawns unlimited trusted variants.
Similarly, ConnectWise’s deliberate use of PKCS#7 unauthenticated attributes for ScreenConnect configurations, like server endpoints, bypasses code signing’s security, allowing post-signing changes that mirror malware tactics hiding payloads in signed binaries. Likely prioritizing cost-saving over security, this choice externalizes abuse costs to users, enabling phishing campaigns. It’s infuriating for weaponizing signature flexibility warned about for decades, normalizing flaws that demand urgent security responses. Solutions exist to fix this.
The Defense Dilemma
Trust inheritance attacks leave security teams in genuinely impossible positions – positions that highlight the fundamental flaws in our current trust model. Defenders face a no-win scenario where every countermeasure either fails technically or creates operational chaos.
Blocking file hashes fails because attackers generate infinite variants with different hashes but the same trusted signature – each new configuration changes the binary’s hash while preserving the signature’s validity. This isn’t a limitation of security tools; it’s the intended behavior of code signing, where the same certificate can sign multiple different binaries.
Blocking the certificate seems like the obvious solution until you realize it disrupts legitimate software, causing operational chaos for organizations relying on the vendor’s products. For example, consider how are they to know what else was signed by that certificate? Doing so is effectively a self-inflicted denial-of-service that can shut down critical business operations. Security teams face the impossible choice between allowing potential malware or breaking their own infrastructure.
Behavioral detection comes too late in the attack chain. By the time suspicious behavior triggers alerts, attackers have already gained remote access, potentially disabled monitoring, installed additional malware, or begun data exfiltration. The initial trust inheritance gives attackers a crucial window of legitimacy.
These attacks operate entirely within the bounds of “legitimate” signed software, invisible to signature-based controls that defenders have spent years tuning and deploying. Traditional security controls assume that valid signatures from trusted publishers indicate safe software – an assumption these attacks systematically exploit. Cem Paya’s detailed analysis, part of River Financial’s investigation, provides a proof-of-concept for attribute grafting, showing how trivial it is to create trusted malicious binaries.
ConnectWise and Atera resemble modern Back Orifice, which debuted at DEF CON in August 1998 to demonstrate security flaws in Windows 9x. The evolution is striking: Back Orifice emerged two years after Authenticode’s 1996 introduction, specifically to expose Windows security weaknesses, requiring stealth and evasion to avoid detection. Unlike Back Orifice, which had to hide from the code signing protections Microsoft had established, these modern tools don’t evade those protections – they weaponize them, inheriting trust from valid signatures while delivering the same remote control capabilities without warnings.
Atera: A Trusted Malware Factory
Atera provides a legitimate remote monitoring and management (RMM) platform similar to ConnectWise ScreenConnect, providing IT administrators with remote access capabilities for managing client systems. Like other RMM solutions, Atera distributes signed client installers that establish persistent connections to their management servers.
They also operate what effectively amounts to a public malware signing service. Anyone with an email can register for a free trial and receive customized, signed, timestamped installers. Atera’s infrastructure embeds attacker-supplied identifiers into the MSI’s Property table, then signs the package with their legitimate certificate.
This breaks code signing’s promise of publisher accountability. Windows sees “Atera Networks Ltd,” associates the reputation of the code based on the reputation of the authentic package, but can’t distinguish whether the binary came from Atera’s legitimate operations or an anonymous attacker who signed up minutes ago. The signature’s identity becomes meaningless when it could represent anyone.
In a phishing campaign targeting River Financial’s customers, Atera’s software posed as a “River desktop app,” with attacker configs embedded in a signed binary.
The binary carried this valid signature, signed by:
Atera provides a cloud-based remote monitoring and management (RMM) platform, unlike ScreenConnect, which supports both on-premises and cloud deployments with custom server endpoints. Atera’s agents connect only to Atera’s servers, but attackers abuse its free trial to generate signed installers tied to their accounts via embedded identifiers (like email or account ID) in the MSI Property table. This allows remote control through Atera’s dashboard, turning it into a proxy for malicious payloads. Windows trusts the “Atera Networks Ltd.” signature but cannot distinguish legitimate from attacker-generated binaries. Atera’s lack of transparency, with no public list of signed binaries or auditable repository, hides abuse, leaving defenders fighting individual attacks while systemic issues persist.
A Personal Reckoning
I’ve been fighting this fight for over two decades. Around 2001, as a Product Manager at Microsoft, overseeing a wide range of security and platform features, I inherited Authenticode among many responsibilities. Its flaws were glaring, malleable PE formats, weak ASN.1 parsing, and signature formats vulnerable to manipulation.
We fixed some issues – hardened parsing, patched PE malleability – but deeper architectural changes faced enormous resistance. Proposals for stricter signature validation or new formats to eliminate mutable fields were blocked by the engineering realities of platform management. The tension between security ideals and practical platform constraints was constant and genuinely difficult to navigate.
The mantra was “good enough,” but this wasn’t just engineering laziness. Authenticode worked for 2001’s simpler threat landscape, where attacks were primarily about bypassing security rather than subverting trust itself. The flexibility we preserved was seen as a necessary feature for ecosystem compatibility – allowing for signature formats that could accommodate different types of metadata and varying implementation approaches across the industry.
The engineering tradeoffs were real, every architectural improvement risked breaking existing software, disrupting the development tools and processes that thousands of ISVs depended on, and potentially fragmenting the ecosystem. The business pressures were equally real: maintaining compatibility was essential for Windows’ continued dominance and Microsoft’s relationships with enterprise customers who couldn’t easily migrate critical applications.
It was never good enough for the long term. We knew it then, and we certainly know it now. The flexibility we preserved, designed for a simpler era, became systematic vulnerabilities as threats evolved from individual attackers to sophisticated operations exploiting trust infrastructure itself. Every time we proposed fundamental fixes, legitimate compatibility concerns and resource constraints won out over theoretical future risks that seemed manageable at the time.
This is why I dove into Sigstore, Binary Transparency, and various other software supply chain security efforts. These projects embody what we couldn’t fund in 2001, transparent, verifiable signing infrastructure that doesn’t rely on fragile trust-based compromises. As I wrote in How to keep bad actors out in open ecosystems, our digital identity models fail to provide persistent, verifiable trust that can scale with modern threat landscapes.
The Common Thread
ConnectWise and Atera expose a core flaw, code signing relies on trust and promises, not verifiable proof. The CA/Browser Forum’s 2023 mandate requires FIPS 140-2 Level 2 hardware key storage, raising the bar against key theft and casual compromise. But it’s irrelevant for addressing the fundamental problem: binaries designed for mutable, unsigned input or vendors running public signing oracles.
Figure 1: Evolution of Code Signing Hardware Requirements (2016-2024)
The mandate addresses yesterday’s threat model – key compromise – while today’s attacks work entirely within the intended system design. Compliance often depends on weak procedural attestations where subscriber employees sign letters swearing keys are on HSMs, rather than cryptographic proof of hardware protection. The requirement doesn’t address software engineered to bypass code signing’s guarantees, leaving systematic trust subversion untouched.
True cryptographic attestation, where hardware mathematically proves key protection, is viable today. Our work on Peculiar Ventures’ attestation library supports multiple formats, enabling programmatic verification without relying on trust or procedural checks. The challenge isn’t technical – it’s accessing diverse hardware for testing and building industry adoption, but the foundational technology exists and works.
The Path Forward
We know how to address this. A supply chain security renaissance is underway, tackling decades of accumulated technical debt and architectural compromise. Cryptographic attestation, which I’ve spent years developing, provides mathematical proof of key protection that can be verified programmatically by any party. For immediate risk reduction, the industry should move toward dynamic, short-lived credentials that aren’t reused across projects, limiting the blast radius when compromise or abuse occurs.
The industry must implement these fundamental changes:
Hardware-rooted key protection with verifiable attestation. The CA/Browser Forum mandates hardware key storage, but enforcement relies heavily on subscriber self-attestation rather than cryptographic proof. Requirements should be strengthened to mandate cryptographic attestations proving keys reside in FIPS 140-2/3 or Common Criteria certified modules. When hardware attestation isn’t available, key generation should be observed and confirmed by trusted third parties (such as CA partners with fiduciary relationships) rather than relying on subscriber claims.
Explicit prohibition of mutable shells and misaligned publisher identity. Signing generic stubs whose runtime behavior is dictated by unsigned configuration already violates Baseline Requirements §9.6.3 and §1.6.1, but this isn’t consistently recognized as willful signing of malware because the stub itself appears benign. The BRs should explicitly forbid mutable-shell installers and signing oracles that allow subscribers to bypass code signing’s security guarantees. A signed binary must faithfully represent its actual runtime behavior. Customized or reseller-specific builds should be signed by the entity that controls that behavior, not by a vendor signing a generic stub.
Subscriber accountability and disclosure of abusive practices. When a CA becomes aware that a subscriber is distributing binaries where the trusted signature is decoupled from actual behavior, this should be treated as a BR violation requiring immediate action. CAs should publish incident disclosures, suspend or revoke certificates per §9.6.3, and share subscriber histories to prevent CA shopping after revocation. This transparency is essential for ecosystem-wide learning and deterrence.
Code Signing Certificate Transparency. All CAs issuing code signing certificates should be required to publish both newly issued and historical certificates to dedicated CT logs. Initially, these could be operated by the issuing CAs themselves, since ecosystem building takes time and coordination. Combined with the existing list of code signing CAs and log lookup systems (like CCADB.org]), this would provide ecosystem-wide visibility into certificate issuance, enable faster incident response, and support independent monitoring for misissuance and abuse patterns.
Explicit Subscriber Agreement obligations and blast radius management. Subscriber Agreements should clearly prohibit operating public signing services or designing software that bypasses code signing security properties such as mutable shells or unsigned configuration. Certificate issuance flows should require subscribers to explicitly acknowledge these obligations at the time of certificate request. To reduce the blast radius of revocation, subscribers should be encouraged or required to use unique keys or certificates per product or product family, ensuring that a single compromised or misused certificate doesn’t invalidate unrelated software.
Controls for automated or cloud signing systems. Subscribers using automated or cloud-based signing services should implement comprehensive use-authorization controls, including policy checks on what enters the signing pipeline, approval workflows for signing requests, and auditable logs of all signing activity. Without these controls, automated signing pipelines become essentially malware factories with legitimate certificates. Implementation requires careful balance between automation efficiency and security oversight, but this is a solved problem in other high-security domains.
Audit logging and evidence retention. Subscribers using automated and cloud signing services should maintain detailed logs of approval records for each signing request, cryptographic hashes of submitted inputs and signed outputs, and approval decision trails. These logs must be retained for a defined period (such as two years or more) and made available to the CA or authorized auditors upon request. This ensures complete traceability and accountability, preventing opaque signing systems from being abused as anonymous malware distribution platforms.
Microsoft must take immediate action on multiple fronts. In addition to championing the above industry changes, they should automatically distrust executables if their Authenticode signature exceeds rational size thresholds, reducing the attack surface of oversized signature blocks as mutation vectors. They should also invest seriously in Binary Transparency adoption, publishing Authenticode signed binaries to tamper-evident transparency logs as is done in Sigstore, Golang module transparency, and Android Firmware Transparency. Their SCITT-based work for confidential computing would be a reasonable approach for them to extend to the rest of their code signing infrastructure. This would provide a tamper-evident ledger of every executable Windows trusts, enabling defenders to trace and block malicious payloads quickly and systematically.
Until these controls become standard practice, Authenticode cannot reliably distinguish benign signed software from weaponized installers designed for trust subversion.
Breaking the Trust Contamination Infrastructure
These code-signing attacks mirror traditional rootkits in their fundamental approach: both subvert trust mechanisms rather than bypassing them entirely. A kernel rootkit doesn’t break the OS security model – it convinces the OS that malicious code is legitimate system software. Similarly, these “trusted wrapper” and “signing oracle” attacks don’t break code signing cryptography – they convince Windows that malware is legitimate software from trusted publishers.
The crucial difference is that while rootkits require sophisticated exploitation techniques and deep system knowledge, these trust inheritance attacks exploit the system’s intended design patterns, making them accessible to a much broader range of attackers and much harder to defend against using traditional security controls.
ConnectWise normalized malware architecture in legitimate enterprise software. Atera built an industrial-scale malware factory that operates in plain sight. Microsoft’s platform dutifully executes the result with full system trust, treating sophisticated trust subversion attacks as routine software installations.
This isn’t about isolated vulnerabilities that can be patched with point fixes. We’re facing a systematic trust contamination infrastructure that transforms the code signing ecosystem into an adversarial platform where legitimate trust mechanisms become attack vectors. Until we address the architectural flaws that enable this pattern systematically, defenders will remain stuck playing an unwinnable game of certificate whack-a-mole against an endless assembly line of trusted malware.
The technology to fix this exists today. Modern supply chain security projects demonstrate that transparent, verifiable trust infrastructure is not only possible but practical and deployable.
The only missing ingredient is the industry-wide will to apply these solutions and the recognition that “good enough” security infrastructure never was – and in today’s threat landscape, the costs of inaction far exceed the disruption of fundamental architectural improvements.
P.S. Thanks to Cem Paya, and Matt Ludwig from River Financial for the great research work they did on both of these incidents.
In the past, I’ve written about how to measure the WebPKI, and from time to time I post brief updates on how the market is evolving.
The other day, Matthew McPherrin posted a script showing how to use Mozilla telemetry data to analyze which Certificate Authorities are more critical to the web. Specifically, what percentage of browsing relies on each CA. Mozilla provides public data from Firefox’s telemetry on how many times a CA is used to successfully validate certificates. This is a pretty good measure for how “big” a CA actually is. The data is pretty hard to view in Mozilla’s public systems though, so he made a script to combine a few data sources and graph it.
I normally focus on total issuance numbers since they’re easier to obtain. That data comes from Certificate Transparency logs, which contain all publicly trusted certificates that you might encounter without seeing an interstitial warning about the certificate not being logged (like this example).
What the Data Reveals
Both datasets feature many of the same major players. But there are some striking differences that reveal important insights about the WebPKI ecosystem.
Let’s Encrypt dominates certificate issuance at 46.1% of all certificates. But it ranks third in Firefox’s actual usage telemetry. This suggests Let’s Encrypt serves many lower-traffic sites. Meanwhile, Google Trust Services leads in Firefox usage while ranking second in certificate issuance volume. This shows how high-traffic sites can amplify a CA’s real-world impact.
DigiCert ranks second in Firefox usage while placing fourth in certificate issuance volume at 8.3%. This reflects their focus on major enterprise customers. With clients like Meta (Facebook, Instagram, WhatsApp), they secure some of the world’s highest-traffic websites. This “fewer certificates, massive impact” approach drives them up the usage charts despite not competing on volume with Let’s Encrypt.
Google’s dominance reflects more than just their own properties like Google.com, YouTube, and Gmail. Google Cloud offers arguably the best load balancer solution in the market (full disclosure I worked on this project). You get TLS by default for most configurations. Combined with their global network that delivers CDN-like benefits out of the gate, this attracts major platforms like Wix and many others to build on Google Cloud. When these platforms choose Google’s infrastructure, they automatically inherit Google Trust Services certificates.
Looking at the usage data reveals other interesting patterns. Deutsche Telekom Security, Government of Turkey, (UPDATE: turns out the Turkey entry is a Firefox bug: they’re using bucket #1 for both locally installed roots and Kamu SM, apparently by accident) and SECOM Trust Systems all appear prominently in Firefox telemetry but barely register in issuance numbers. In some respects, it’s no surprise that government-issued certificates see disproportionate usage. Government websites are often mandated for use. Citizens have to visit them for taxes, permits, benefits, and other essential services.
Microsoft Corporation appears significantly in issuance data (6.5%) but doesn’t register in the Firefox telemetry. This reflects their focus on enterprise and Windows-integrated scenarios rather than public web traffic.
GoDaddy shows strong issuance numbers (10.5%) but more modest representation in browsing telemetry. This reflects their massive domain parking operations. They issue certificates for countless parked domains that receive minimal actual user traffic.
Why This Matters
Mozilla Firefox represents under 3% of global browser market share. This telemetry reflects a smaller segment of internet users. While this data provides valuable insights into actual CA usage patterns, it would be ideal if Chrome released similar telemetry data. Given Chrome’s dominant 66.85% market share, their usage data would dramatically improve our understanding of what real WebPKI usage actually looks like across the broader internet population.
The contrast between certificate issuance volume and actual browsing impact reveals important truths about internet infrastructure. CT logs currently show over 450,000 certificates being issued per hour across all CAs. Yet as this Firefox telemetry data shows, much of that volume serves lower-traffic sites while a smaller number of high-traffic certificates drive the actual user experience. Some CAs focus on high-volume, automated issuance for parked domains and smaller sites. Others prioritize fewer certificates for high-traffic, essential destinations. Understanding both metrics helps us better assess the real-world criticality of different CAs for internet security and availability.
Raw certificate counts don’t tell the whole story. The websites people actually visit, and sometimes must visit, matter just as much as the sheer number of certificates issued. Some certificates protect websites with “captive audiences” or essential services, while others protect optional destinations. A government tax portal or YouTube will always see more traffic than the average small business website, regardless of how many certificates each CA issues.
Regardless of how you count, I’ve had the pleasure of working closely with at least 7 of the CAs in the top 10 in their journeys to become publicly trusted CAs. Each of these CAs have had varying goals for their businesses and operations, and that’s exactly why you see different manifestations in the outcomes. Let’s Encrypt focused on automation and volume. DigiCert targeted enterprise customers. Google leveraged their cloud infrastructure. GoDaddy built around domain services.
Either way, it’s valuable to compare and contrast these measurement approaches to see what the WebPKI really looks like beyond just raw certificate counts.
When we discuss the WebPKI, we naturally focus on Certificate Authorities (CAs), browser root programs, and the standards established by the CA/Browser Forum. Yet for these standards to carry real weight, they must be translated into formal, auditable compliance regimes. This is where assurance frameworks enter the picture, typically building upon the foundational work of the CA/Browser Forum.
The WebTrust framework, overseen by professional accounting bodies, is only one way to translate CA/Browser Forum requirements into auditable criteria. In Europe, a parallel scheme relies on the European Telecommunications Standards Institute (ETSI) for the technical rules, with audits carried out by each country’s ISO/IEC 17065-accredited Conformity Assessment Bodies. Both frameworks follow the same pattern: they take the CA/Browser Forum standards and repackage them into structured compliance audit programs.
Understanding the power dynamics here is crucial. While these audits scrutinize CAs, they exercise no direct control over browser root programs. The root programs at Google, Apple, Microsoft, and Mozilla remain the ultimate arbiters. They maintain their own policies, standards, and processes that extend beyond what these audit regimes cover. No one compels the browsers to require WebTrust or ETSI audits; they volunteer because obtaining clean reports from auditors who have seen things in person helps them understand if the CA is competent and living up to their promises.
How WebTrust Actually Works
With this context established, let’s examine the WebTrust model prevalent across North America and other international jurisdictions. In North America, administration operates as a partnership between the AICPA (for the U.S.) andCPA Canada. For most other countries, CPA Canada directly manages international enrollment, collaborating with local accounting bodies like the HKICPA for professional oversight.
These organizations function through a defined sequence of procedural steps: First, they participate in the CA/Browser Forum to provide auditability perspectives. Second, they fork the core technical requirements and rebundle them as the WebTrust Principles and Criteria. Third, they license accounting firms to conduct audits based on these principles and criteria. Fourth, they oversee licensed practitioners through inspection and disciplinary processes.
The audit process follows a mechanical flow. CA management produces an Assertion Letter claiming compliance. The auditor then tests that assertion and produces an Attestation Report, a key data point for browser root programs. Upon successful completion, the CA can display the WebTrust seal.
This process creates a critical misconception about what the WebTrust seal actually signifies. Some marketing approaches position successful audits as a “gold seal” of approval, suggesting they represent the pinnacle of security and best practices. They do not. A clean WebTrust report simply confirms that a CA has met the bare minimum requirements for WebPKI participation, it represents the floor, not the ceiling. The danger emerges when CAs treat this floor as their target; these are often the same CAs responsible for significant mis-issuances and ultimate distrust by browser root programs.
Where Incentives Break Down
Does this system guarantee consistent, high-quality CA operations? The reality is that the system’s incentives and structure actively work against that goal. This isn’t a matter of malicious auditors; we’re dealing with human nature interacting with a flawed system, compounded by a critical gap between general audit principles and deep technical expertise.
Security professionals approach assessments expecting auditors to actively seek problems. That incentive doesn’t exist here. CPA audits are fundamentally designed for financial compliance verification, ensuring documented procedures match stated policies. Security assessments, by contrast, actively hunt for vulnerabilities and weaknesses. These represent entirely different audit philosophies: one seeks to confirm documented compliance, the other seeks to discover hidden risks.
This philosophical gap becomes critical when deep technical expertise meets general accounting principles. Even with impeccably ethical and principled auditors, you can’t catch what you don’t understand. A financial auditor trained to verify that procedures are documented and followed may completely miss that a technically sound procedure creates serious security vulnerabilities.
This creates a two-layer problem. First, subtle but critical ambiguities or absent content in a CA’s Certification Practice Statement (CPS) and practices might not register as problems to non-specialists. Second, even when auditors do spot vague language, commercial pressures create an impossible dilemma: push the customer toward greater specificity (risking the engagement and future revenue), or let it slide due to the absence of explicit requirements.
This dynamic creates a classic moral hazard, an issue similar to the one we explored in our recent post, Auditors are paid by the very entities they’re supposed to scrutinize critically, creating incentives to overlook issues in order to maintain business relationships. Meanwhile, the consequences of missed problems, security failures, compromised trust, and operational disruptions fall on the broader WebPKI ecosystem and billions of relying parties who had no voice in the audit process. This dynamic drives the inconsistencies we observe today and reflects a broader moral hazard problem plaguing the entire WebPKI ecosystem, where those making critical security decisions rarely bear the full consequences of poor choices.
This reality presents a prime opportunity for disruption through intelligent automation. The core problem lies in expertise “illiquidity”, deep compliance knowledge remains locked in specialists’ minds, trapped in manual processes, and is prohibitively expensive to scale.
Current compliance automation has only created “automation asymmetry,” empowering auditees to generate voluminous, polished artifacts that overwhelm manual auditors. This transforms audits from operational fact-finding into reviews of well-presented fiction.
The solution requires creating true “skill liquidity” through AI: not just another LLM, but an intelligent compliance platform embedding structured knowledge from seasoned experts. This system would feature an ontology of controls, evidence requirements, and policy interdependencies, capable of performing the brutally time-consuming rote work that consumes up to 30% of manual audits: policy mapping, change log scrutiny, with superior speed and consistency.
When auditors and program administrators gain access to this capability, the incentive model fundamentally transforms. AI can objectively flag ambiguities and baseline deviations that humans might feel pressured to overlook or lack the skill to notice, directly addressing the moral hazard inherent in the current system. When compliance findings become objective data points generated by intelligent systems rather than subjective judgments influenced by commercial relationships, they become much harder to ignore or rationalize away.
This transformation liquefies rote work, liberating human experts to focus on what truly matters: making high-stakes judgment calls, investigating system-flagged anomalies, and assessing control effectiveness rather than mere documented existence. This elevation transforms auditors from box-checkers into genuine strategic advisors, addressing the system’s core ethical challenges.
This new transparency and accountability shifts the entire dynamic. Audited entities can evolve from reactive fire drills to proactive, continuous self-assurance. Auditors, with amplified expertise and judgment focused on true anomalies rather than ambiguous documentation, can deliver exponentially greater value.
Moving Past the Performance
This brings us back to the fundamental issue: the biggest problem in communication is the illusion that it has occurred. Today’s use of the word “audit” creates a dangerous illusion of deep security assessment.
By leveraging AI to create skill liquidity, we can finally move past this illusion by automating the more mundane audit elements giving space where the assumed security and correctness assessments also happen. We can forge a future where compliance transcends audit performance theater, becoming instead a foundation of verifiable, continuous operational integrity, built on truly accessible expertise rather than scarce, locked-away knowledge.
The WebPKI ecosystem deserves better than the bare minimum. With the right tools and transformed incentives, we can finally deliver it.
TL;DR: Root programs, facing user loss, prioritize safety, while major CAs, with browsers, shape WebPKI rules. Most CAs, risking distrust or customers, seek leniency, shifting risks to billions of voiceless relying parties. Subscribers’ push for ease fuels CA resistance, demanding reform.
The recent Mozilla CA Program roundtable discussion draws attention to a fundamental flaw in how we govern the WebPKI, one that threatens the security of billions of internet users. It’s a classic case of moral hazard: those making critical security decisions face minimal personal or professional consequences for poor choices, while those most affected have virtually no say in how the system operates.
The Moral Hazard Matrix
The numbers reveal a dangerous imbalance in who controls WebPKI policy versus who bears the consequences. Browsers, as root programs, face direct accountability; if security fails, users abandon them. CAs on the other hand are incentivized to reduce customer effort and boost margins, externalize risks, leaving billions of relying parties to absorb the fallout:
A classic moral hazard structure, with a key distinction: browser vendors, as root programs, face direct consequences, lose security, lose users, aligning incentives with safety. CAs, while risking distrust or customer loss, often externalize greater risks to relying parties, leaving them to face the fallout betting that they wont be held accountable for these decisions.
Mapping the Accountability Breakdown
The roundtable revealed a systematic divide in how stakeholders approach CPS compliance issues. CAs, driven by incentives to minimize customer effort for easy sales and reduce operational costs for higher margins, consistently seek to weaken accountability, while root programs and the security community demand reliable commitments:
Position
Supported By
Core Argument
What It Really Reveals
“Revocation too harsh for minor CPS errors”
CA Owners
Policy mismatches shouldn’t trigger mass revocation
Want consequences-free policy violations
“Strict enforcement discourages transparency”
CA Owners
Fear of accountability leads to vague CPSs
Treating governance documents as optional “documentation”
“SLA-backed remedies for enhanced controls”
CA Owners
Credits instead of revocation for optional practices
Attempt to privatize trust governance
“Split CPS into binding/non-binding sections”
CA Owners
Reduce revocation triggers through document structure
Avoid accountability while claiming transparency
“Human error is inevitable”
CA Owners
Manual processes will always have mistakes
Excuse for not investing in automation
“Retroactive CPS fixes should be allowed”
CA Owners
Patch documents after problems surface
Gut the very purpose of binding commitments
“CPS must be enforceable promises”
Root Programs, Security Community
Documents should reflect actual CA behavior
Public trust requires verifiability
“Automation makes compliance violations preventable”
Technical Community
65+% ACME adoption proves feasibility
Engineering solutions exist today
The pattern is unmistakable: CAs consistently seek reduced accountability, while those bearing security consequences demand reliable commitments. The Microsoft incident perfectly illustrates this, rather than addressing the absence of systems that would automatically catch discrepancies before millions of certificates were issued incorrectly, industry discussion focused on making violations easier to excuse retroactively.
The Fundamental Mischaracterization
Much of the roundtable suffered from a critical misconception: the CPS is “documentation” rather than what it is, the foundational governance document that defines how a CA operates.
A CPS looks like a contract because it is a contract, a contract with the world. It’s the binding agreement that governs CA operations, builds trust by showing relying parties how the CA actually works, guides subscribers through certification requirements, and enables oversight by giving auditors a baseline against real-world issuance. When we minimize it as “documentation,” we’re arguing that CAs should violate their core operational commitments with minimal consequences.
CPS documents are the public guarantee that a CA knows what it’s doing and will stand behind it, in advance, in writing, in full view of the world. The moment we treat them as optional “documentation” subject to retroactive fixes, we’ve abandoned any pretense that trustworthiness can be verified rather than simply taken on blind faith.
Strategic Choices Masquerading as Constraints
Much CA pushback treats organizational and engineering design decisions as inevitable operational constraints. When CAs complain about “compliance staff being distant from engineering” or “inevitable human errors in 100+ page documents,” they’re presenting strategic choices as unchangeable facts.
CAs choose to separate compliance from operations rather than integrate them. They choose to treat CPS creation as documentation rather than operational specification. They choose to bolt compliance on after the fact rather than build it into core systems. When you choose to join root programs to be trusted by billions of people, you choose those responsibilities.
The CAs that consistently avoid compliance problems made different choices from the beginning, they integrated policy into operations, invested in automation, and designed systems where compliance violations are structurally difficult. These aren’t companies with magical resources; they’re companies that prioritized operational integrity.
The Technology-Governance Gap
The “automation is too hard” argument collapses against actual WebPKI achievements:
Challenge
Current State
Feasibility Evidence
CA Resistance
Domain Validation
Fully automated via ACME
65+% of web certificates
✅ Widely adopted
Certificate Linting
Real-time validation at issuance
CT logs, zlint tooling
✅ Industry standard
Transparency Logging
All certificates publicly logged
Certificate Transparency
✅ Mandatory compliance
Renewal Management
Automated with ARI
Let’s Encrypt, others
✅ Proven at scale
CPS-to-Issuance Alignment
Manual, error-prone
Machine-readable policies possible
❌ “Too complex”
Policy Compliance Checking
After-the-fact incident reports
Automated validation possible
❌ “Inevitable human error”
The pattern is unmistakable: automation succeeds when mandated, fails when optional. With Certificate Transparency providing complete visibility, automated validation systems proven at scale, and AI poised to transform compliance verification across industries, operational CPSs represent evolution, not revolution.
The argument is that these “minor” incidents don’t represent smoke, as in where there is smoke there is fire, when we know through past distrust events it is always a pattern of mistakes often snowballing while the most mature CA programs only occasional have issues, and when they do they deal with them well.
Trust Is Not an Entitlement
The question “why would CAs voluntarily adopt expensive automation?” reveals a fundamental misunderstanding. CAs are not entitled to being trusted by the world.
Trust store inclusion is a privilege that comes with responsibilities. If a CA cannot or will not invest in operational practices necessary to serve billions of relying parties reliably, they should not hold that privilege.
The economic argument is backwards:
Current framing: “Automation is expensive, so CAs shouldn’t be required to implement it”
Correct framing: “If you can’t afford to operate, securely, accuratley and reliably, you can’t afford to be a public CA”
Consider the alternatives: public utilities must maintain infrastructure standards regardless of cost, financial institutions must invest in security regardless of expense, aviation companies must meet safety standards regardless of operational burden. The WebPKI serves more people than any of these industries, yet we’re supposed to accept that operational excellence is optional because it’s “expensive”?
CAs with consistent compliance problems impose costs on everyone else, subscribers face revocation disruption, relying parties face security risks, root programs waste resources on incident management. The “expensive automation” saves the ecosystem far more than it costs individual CAs.
When Accountability Actually Works
The example of Let’s Encrypt changing their CPS from “90 days” to “less than 100 days” after a compliance issue is often cited as evidence that strict enforcement creates problems. This completely misses the point.
The “system” found a real compliance issue, inadequate testing between policy and implementation. That’s exactly what publishing specific commitments accomplishes: making gaps visible so they can be fixed. The accountability mechanism worked perfectly, Let’s Encrypt learned they needed better testing to ensure policy-implementation alignment.
This incident also revealed that we need infrastructure like ACME Renewal Information (ARI) so the ecosystem can manage obligations without fire drills. The right response isn’t vaguer CPSs to hide discrepancies, but better testing and ecosystem coordination so you can reliably commit to 90 days and revocations when mistakes happen.
The Solution: Operational CPSs
Instead of weakening accountability, we need CPSs as the living center of CA operations, machine-readable on one side to directly govern issuance systems, human-readable on the other for auditors and relying parties. In the age of AI, tools like large language models and automated validation can make this dual-purpose CPS tractable, aligning policy with execution.
This means CPSs written by people who understand actual issuance flows, updated in lock-step with operational changes, tied directly to automated linting, maintained in public version control, and tested continuously to verify documentation matches reality.
Success criteria are straightforward:
Scope clarity: Which root certificates does this cover?
Profile fidelity: Could someone recreate certificates matching actual issuance?
Validation transparency: Can procedures be understood without insider knowledge?
Most CPSs fail these basic tests. The few that pass prove it’s entirely achievable when CAs prioritize operational integrity over administrative convenience.
Systemic Reform Requirements
Fixing moral hazard requires accountability mechanisms aligned with actual capabilities. Root programs typically operate with 1-2 people overseeing ~60 organizations issuing 450,000+ certificates per hour, structural challenges that automation must address.
Clearer requirements for CPS documents, automated evaluation tools, clear standards
Scalable infrastructure requiring scope clarity, profile fidelity, and validation transparency
Standards Bodies
Voluntary guidelines, weak enforcement
Mandatory automation requirements
Updated requirements to ensure adoption of automation that helps ensure commitments are met.
Audit System
Annual snapshots, limited scope
Continuous monitoring, real-time validation
Integration with operational systems
Root programs that tolerate retroactive CPS fixes inadvertently encourage corner-cutting on prevention systems. Given resource constraints, automated evaluation tools and clear standards become essential for consistent enforcement.
The Stakes Demand Action
Eight billion people depend on this system. We cannot allow fewer than 60 CA owning organizations to keep treating public commitments as optional paperwork instead of operational specifications.
When certificate failures occur, people lose life savings, have private communications exposed, lose jobs when business systems fail, or face physical danger when critical infrastructure is compromised. DigiNotar’s 2011 collapse showed how single CA failures can compromise national digital infrastructure. CAs make decisions that enable these risks; relying parties bear the consequences.
The choice is stark:
Continue excuse-making and accountability avoidance while billions absorb security consequences
Or demand that CAs and root programs invest in systems making trust verifiable
The WebPKI’s moral hazard problem won’t solve itself. Those with power to fix it have too little incentive to act; those who suffer consequences have too little voice to demand change.
The WebPKI stands at a turning point. Root programs, the guardians of web privacy, are under strain from the EU’s eIDAS 2.0 pushing questionable CAs, tech layoffs thinning their teams, and the U.S. DOJ’s plan to break up Chrome, a cornerstone of web security. With eight billion people depending on this system, weak CAs could fuel phishing scams, data breaches, or outages that upend lives, as DigiNotar’s 2011 downfall showed. That failure taught us trust must be earned through action. Automation, agility, and transparency can deliver a WebPKI where accountability is built-in. Let’s urge CAs, root programs, and the security community to adopt machine-readable CPSs by 2026, ensuring trust is ironclad. The time to act is now, together, we can secure the web for our children and our grandchildren.
I’ve been in the PKI space for a long time, and I’ll be honest, digging through Certificate Policies (CPs) and Certification Practice Statements (CPSs) is far from my favorite task. But as tedious as they can be, these documents serve real, high-value purposes. When you approach them thoughtfully, the time you invest is anything but wasted.
What a CPS Is For
Beyond satisfying checkbox compliance, a solid CPS should:
Build trust by showing relying parties how the CA actually operates.
Guide subscribers by spelling out exactly what is required to obtain a certificate.
Clarify formats by describing certificate profiles, CRLs, and OCSP responses so relying parties know what to expect.
Enable oversight by giving auditors, root store programs, and researchers a baseline to compare against real-world issuance.
If a CPS fails at any of these, it fails in its primary mission.
Know Your Audience
A CPS is not just for auditors. It must serve subscribers who need to understand their obligations, relying parties weighing whether to trust a certificate, and developers, security researchers, and root store operators evaluating compliance and interoperability.
The best documents speak to all of these readers in clear, plain language without burying key points under mountains of boilerplate.
A useful parallel is privacy policies or terms of service documents. Some are written like dense legal contracts, full of cross-references and jargon. Others aim for informed consent and use plain language to help readers understand what they are agreeing to. CPs and CPSs should follow that second model.
Good Examples Do Exist
If you’re looking for CPS documents that get the basics right, Google Trust Services and Fastly are two strong models:
There are many ways to evaluate a CPS, but given the goals of these documents, fundamental tests of “good” would certainly include:
Scope clarity: Is it obvious which root certificates the CPS covers?
Profile fidelity: Could a reader recreate reference certificates that match what the CA actually issues?
Most CPSs fail even these basic checks. Google and Fastly pass, and their structure makes independent validation relatively straightforward. Their documentation is not just accurate, it is structured to support validation, monitoring, and trust.
Where Reality Falls Short
Unfortunately, most CPSs today don’t meet even baseline expectations. Many lack clear scope. Many don’t describe what the issued certificates will look like in a way that can be independently verified. Some fail to align with basics like RFC 3647, the framework they are supposed to follow.
Worse still, many CPS documents fail to discuss how or if they meet requirements they claim compliance with. That includes not just root program expectations, but also standards like:
Server Certificate Baseline Requirements
S/MIME Baseline Requirements
Network and Certificate System Security Requirements
These documents may not need to replicate every technical detail, but they should objectively demonstrate awareness of and alignment with these core expectations. Without that, it’s difficult to expect trust from relying parties, browsers, or anyone else depending on the CA’s integrity.
Even more concerning, many CPS documents don’t fully reflect the requirements of the root programs that grant them inclusion:
These failures are not theoretical. They have led to real-world consequences.
Take Bug 1962829, for example, a recent incident involving Microsoft PKI Services. “A typo” introduced during a CPS revision misstated the presence of the keyEncipherment bit in some certificates. The error made it through publication and multiple reviews, even as millions of certificates were issued under a document that contradicted actual practice.
The result? Distrust risks, revocation discussions, and a prolonged, public investigation.
The Microsoft incident reveals a deeper problem, CAs that lack proper automation between their documented policies and actual certificate issuance. This wasn’t just a documentation error, it exposed the absence of systems that would automatically catch such discrepancies before millions of certificates were issued under incorrect policies.
This isn’t an isolated case. CP and CPS “drift” from actual practices has played a role in many other compliance failures and trust decisions. This post discusses CA distrust and misissuance due to CP or CPS not matching observable reality is certainly a common factor.
Accuracy Is Non-Negotiable
Some voices in the ecosystem now suggest that when a CPS is discovered to be wrong, the answer is simply to patch the document retroactively and move on. This confirms what I have said for ages, too many CAs want the easy way out, patching documents after problems surface rather than investing in the automation and processes needed to prevent mismatches in the first place.
That approach guts the very purpose of a CPS. Making it easier for CAs to violate their commitments creates perverse incentives to avoid investing in proper compliance infrastructure.
Accountability disappears if a CA can quietly “fix” its promises after issuance. Audits lose meaning because the baseline keeps shifting. Relying-party trust erodes the moment documentation no longer reflects observable reality.
A CPS must be written by people who understand the CA’s actual issuance flow. It must be updated in lock-step with code and operational changes. And it must be amended before new types of certificates are issued. Anything less turns it into useless marketing fluff.
Make the Document Earn Its Keep
Treat the CPS as a living contract:
Write it in plain language that every audience can parse.
Tie it directly to automated linting so profile deviations are caught before issuance. Good automation makes policy violations nearly impossible; without it, even simple typos can lead to massive compliance failures.
Publish all historical versions so the version details in the document are obvious and auditable. Better yet, maintain CPS documents in a public git repository with markdown versions that make change history transparent and machine-readable.
Run every operational change through a policy-impact checklist before it reaches production.
If you expect others to trust your certificates, your public documentation must prove you deserve that trust. Done right, a CPS is one of the strongest signals of a CA’s competence and professionalism. Done wrong, or patched after the fact, it is worse than useless.
Root programs need to spend time documenting the minimum criteria that these documents must meet. Clear, measurable standards would give CAs concrete targets and make enforcement consistent across the ecosystem. Root programs that tolerate retroactive fixes inadvertently encourage CAs to cut corners on the systems and processes that would prevent these problems entirely.
CAs, meanwhile, need to ask themselves hard questions: Can someone unfamiliar with internal operations use your CPS to accomplish the goals outlined in this post? Can they understand your certificate profiles, validation procedures, and operational commitments without insider knowledge?
More importantly, CAs must design their processes around ensuring these documents are always accurate and up to date. This means implementing testing to verify that documentation actually matches reality, not just hoping it does.
The Bottom Line
CPS documents matter far more than most people think. They are not busywork. They are the public guarantee that a CA knows what it is doing and is willing to stand behind it, in advance, in writing, and in full view of the ecosystem.