Category Archives: Security

How Microsoft Code Signing Became Part of a Trust Subversion Toolchain

Code signing was supposed to tell you who published a piece of software and ultimately decide if you can trust the software and install it.. For nearly three decades, cryptographic signatures have bound a binary to a publisher’s identity, guaranteeing it hasn’t been tampered with since signing. But on Windows, that system is now broken in ways that would make its original designers cringe.

But attackers have found ways to completely subvert this promise without breaking a single cryptographic primitive. They can now create an unlimited number of different malicious binaries that all carry the exact same “trusted” signature, or careless publishers operating signing oracles that enable others to turn their software into a bootloader for malware. The result is a system where valid signatures from trusted companies can no longer tell you anything meaningful about what the software will actually do.

Attackers don’t need to steal keys or compromise Certificate Authorities. They use the legitimate vendor software and publicly trusted code signing certificates, perverting the entire purpose of publisher-identity-based code signing.

Microsoft’s Long-Standing Awareness

Microsoft has known about the issue of maleability for at least a decade. In 2013, they patched CVE-2013-3900], where attackers could modify signed Windows executables, adding malicious code in “unverified portions” without invalidating the Authenticode signature. WinVerifyTrust improperly validated these files, allowing one “trusted” signature to represent completely different, malicious behavior.

This revealed a deeper architectural flaw, signed binaries could be altered by unsigned data. Microsoft faced a classic platform dilemma – the kind that every major platform holder eventually confronts. Fixing this comprehensively risked breaking legacy software critical to their vast ecosystem, potentially disrupting thousands of applications that businesses depended on daily. The engineering tradeoffs were genuinely difficult: comprehensive security improvements versus maintaining compatibility for millions of users and enterprise customers who couldn’t easily update or replace critical software.

They made the fix optional, prioritizing ecosystem compatibility over security hardening. This choice might have been understandable from a platform perspective in 2013, when the threat landscape was simpler and the scale of potential abuse wasn’t yet clear. But it becomes increasingly indefensible as attacks evolved and the architectural weaknesses became a systematic attack vector rather than an isolated vulnerability.

In 2022, Microsoft republished the advisory, confirming they still won’t enforce stricter verification by default, while today’s issues differ, they are part of a similar class of vulnerabilities attackers now exploit systematically. The “trusted-but-mutable” flaw is now starting to permeate the Windows code signing ecosystem. Attackers use legitimate, signed applications as rootkit-like trust proxies, inheriting vendors’ reputation and bypass capabilities to deliver arbitrary malicious payloads.

Two incidents show we’re not dealing with isolated bugs but systematic assaults on Microsoft’s code signing’s core assumptions.

ConnectWise: When Legitimate Software Adopts Malware Design Patterns

ConnectWise didn’t stumble into a vulnerability. They deliberately engineered their software using design patterns from the malware playbook. Their “attribute stuffing” technique embeds unsigned configuration data in the unauthenticated_attributes field of the PKCS#7 (CMS) envelope, a tactic malware authors use to conceal payloads in signed binaries.

In PKCS#7, the SignedData structure includes a signed digest (covering the binary and metadata) and optional unauthenticated_attributes, which lie outside the digest and can be modified post-signing without invalidating the signature. ConnectWise’s ScreenConnect installer misuses the Microsoft-reserved OID for Individual Code Signing ([1.3.6.1.4.1.311].4.1.1) in this field to store unsigned configuration data, such as server endpoints that act as the command control server of their client. This OID, meant for specific code signing purposes, is exploited to embed attacker-controlled configs, allowing the same signed binary to point to different servers without altering the trusted signature.

The ConnectWise ScreenConnect incident emerged when River Financial’s security team found attackers creating a fake website, distributing malware as a “River desktop app.” It was a trust inheritance fraud, a legitimately signed ScreenConnect client auto-connecting to an attacker-controlled server. 

The binary carried a valid signature signed by:

Subject: /C=US/ST=Florida/L=Tampa/O=Connectwise, LLC/CN=Connectwise, LLC 
Issuer: /C=US/O=DigiCert, Inc./CN=DigiCert Trusted G4 Code Signing RSA4096 SHA384 2021 CA1
Serial Number: 0B9360051BCCF66642998998D5BA97CE
Valid From: Aug 17 00:00:00 2022 GMT 
Valid Until: Aug 15 23:59:59 2025 GMT

Windows trusts this as legitimate ConnectWise software, no SmartScreen warnings, no UAC prompts, silent installation, and immediate remote control. Attackers generate a fresh installer via a ConnectWise trial account or simply found an existing package and manually edited the unauthenticated_attributes, extracting a benign signature, grafting a malicious configuration blob (e.g., attacker C2 server), inserting the modified signature, and creating a “trusted” binary. Each variant shares the certificate’s reputation, bypassing Windows security.

Why does Windows trust binaries with oversized, unusual unauthenticated_attributes? Legitimate signatures need minimal metadata, yet Windows ignores red flags like large attribute sections, treating them as fully trusted. ConnectWise’s choice to embed mutable configs mirrors malware techniques, creating an infinite malware factory where one signed object spawns unlimited trusted variants.

Similarly, ConnectWise’s deliberate use of PKCS#7 unauthenticated attributes for ScreenConnect configurations, like server endpoints, bypasses code signing’s security, allowing post-signing changes that mirror malware tactics hiding payloads in signed binaries. Likely prioritizing cost-saving over security, this choice externalizes abuse costs to users, enabling phishing campaigns. It’s infuriating for weaponizing signature flexibility warned about for decades, normalizing flaws that demand urgent security responses. Solutions exist to fix this.

The Defense Dilemma

Trust inheritance attacks leave security teams in genuinely impossible positions – positions that highlight the fundamental flaws in our current trust model. Defenders face a no-win scenario where every countermeasure either fails technically or creates operational chaos.

Blocking file hashes fails because attackers generate infinite variants with different hashes but the same trusted signature – each new configuration changes the binary’s hash while preserving the signature’s validity. This isn’t a limitation of security tools; it’s the intended behavior of code signing, where the same certificate can sign multiple different binaries.

Blocking the certificate seems like the obvious solution until you realize it disrupts legitimate software, causing operational chaos for organizations relying on the vendor’s products. For example, consider how are they to know what else was signed by that certificate? Doing so is effectively a self-inflicted denial-of-service that can shut down critical business operations. Security teams face the impossible choice between allowing potential malware or breaking their own infrastructure.

Behavioral detection comes too late in the attack chain. By the time suspicious behavior triggers alerts, attackers have already gained remote access, potentially disabled monitoring, installed additional malware, or begun data exfiltration. The initial trust inheritance gives attackers a crucial window of legitimacy.

These attacks operate entirely within the bounds of “legitimate” signed software, invisible to signature-based controls that defenders have spent years tuning and deploying. Traditional security controls assume that valid signatures from trusted publishers indicate safe software – an assumption these attacks systematically exploit. Cem Paya’s detailed analysis, part of River Financial’s investigation, provides a proof-of-concept for attribute grafting, showing how trivial it is to create trusted malicious binaries.

ConnectWise and Atera resemble modern Back Orifice, which debuted at DEF CON in August 1998 to demonstrate security flaws in Windows 9x. The evolution is striking: Back Orifice emerged two years after Authenticode’s 1996 introduction, specifically to expose Windows security weaknesses, requiring stealth and evasion to avoid detection. Unlike Back Orifice, which had to hide from the code signing protections Microsoft had established, these modern tools don’t evade those protections – they weaponize them, inheriting trust from valid signatures while delivering the same remote control capabilities without warnings.

Atera: A Trusted Malware Factory

Atera provides a legitimate remote monitoring and management (RMM) platform similar to ConnectWise ScreenConnect, providing IT administrators with remote access capabilities for managing client systems. Like other RMM solutions, Atera distributes signed client installers that establish persistent connections to their management servers. 

They also operate what effectively amounts to a public malware signing service. Anyone with an email can register for a free trial and receive customized, signed, timestamped installers. Atera’s infrastructure embeds attacker-supplied identifiers into the MSI’s Property table, then signs the package with their legitimate certificate.

This breaks code signing’s promise of publisher accountability. Windows sees “Atera Networks Ltd,” associates the reputation of the code based on the reputation of the authentic package, but can’t distinguish whether the binary came from Atera’s legitimate operations or an anonymous attacker who signed up minutes ago. The signature’s identity becomes meaningless when it could represent anyone.

In a phishing campaign targeting River Financial’s customers, Atera’s software posed as a “River desktop app,” with attacker configs embedded in a signed binary. 

The binary carried this valid signature, signed by:

Subject: CN=Atera Networks Ltd,O=Atera Networks Ltd,L=Tel Aviv-Yafo,C=IL,serialNumber=513409631,businessCategory=Private Organization,jurisdictionC=IL 
Issuer: CN=DigiCert Trusted G4 Code Signing RSA4096 SHA384 2021 CA1,O=DigiCert, Inc.,C=US Serial: 09D3CBF84332886FF689B04BAF7F768C 
notBefore: Jan 23 00:00:00 2025 GMT 
notAfter: Jan 22 23:59:59 2026 GMT

Atera provides a cloud-based remote monitoring and management (RMM) platform, unlike ScreenConnect, which supports both on-premises and cloud deployments with custom server endpoints. Atera’s agents connect only to Atera’s servers, but attackers abuse its free trial to generate signed installers tied to their accounts via embedded identifiers (like email or account ID) in the MSI Property table. This allows remote control through Atera’s dashboard, turning it into a proxy for malicious payloads. Windows trusts the “Atera Networks Ltd.” signature but cannot distinguish legitimate from attacker-generated binaries. Atera’s lack of transparency, with no public list of signed binaries or auditable repository, hides abuse, leaving defenders fighting individual attacks while systemic issues persist.

A Personal Reckoning

I’ve been fighting this fight for over two decades. Around 2001, as a Product Manager at Microsoft, overseeing a wide range of security and platform features, I inherited Authenticode among many responsibilities. Its flaws were glaring, malleable PE formats, weak ASN.1 parsing, and signature formats vulnerable to manipulation.

We fixed some issues – hardened parsing, patched PE malleability – but deeper architectural changes faced enormous resistance. Proposals for stricter signature validation or new formats to eliminate mutable fields were blocked by the engineering realities of platform management. The tension between security ideals and practical platform constraints was constant and genuinely difficult to navigate.

The mantra was “good enough,” but this wasn’t just engineering laziness. Authenticode worked for 2001’s simpler threat landscape, where attacks were primarily about bypassing security rather than subverting trust itself. The flexibility we preserved was seen as a necessary feature for ecosystem compatibility – allowing for signature formats that could accommodate different types of metadata and varying implementation approaches across the industry.

The engineering tradeoffs were real, every architectural improvement risked breaking existing software, disrupting the development tools and processes that thousands of ISVs depended on, and potentially fragmenting the ecosystem. The business pressures were equally real: maintaining compatibility was essential for Windows’ continued dominance and Microsoft’s relationships with enterprise customers who couldn’t easily migrate critical applications.

It was never good enough for the long term. We knew it then, and we certainly know it now. The flexibility we preserved, designed for a simpler era, became systematic vulnerabilities as threats evolved from individual attackers to sophisticated operations exploiting trust infrastructure itself. Every time we proposed fundamental fixes, legitimate compatibility concerns and resource constraints won out over theoretical future risks that seemed manageable at the time.

This is why I dove into Sigstore, Binary Transparency, and various other software supply chain security efforts. These projects embody what we couldn’t fund in 2001, transparent, verifiable signing infrastructure that doesn’t rely on fragile trust-based compromises. As I wrote in How to keep bad actors out in open ecosystems, our digital identity models fail to provide persistent, verifiable trust that can scale with modern threat landscapes.

The Common Thread

ConnectWise and Atera expose a core flaw, code signing relies on trust and promises, not verifiable proof. The CA/Browser Forum’s 2023 mandate requires FIPS 140-2 Level 2 hardware key storage, raising the bar against key theft and casual compromise. But it’s irrelevant for addressing the fundamental problem: binaries designed for mutable, unsigned input or vendors running public signing oracles.

Figure 1: Evolution of Code Signing Hardware Requirements (2016-2024)

The mandate addresses yesterday’s threat model – key compromise – while today’s attacks work entirely within the intended system design. Compliance often depends on weak procedural attestations where subscriber employees sign letters swearing keys are on HSMs, rather than cryptographic proof of hardware protection. The requirement doesn’t address software engineered to bypass code signing’s guarantees, leaving systematic trust subversion untouched.

True cryptographic attestation, where hardware mathematically proves key protection, is viable today. Our work on Peculiar Ventures’ attestation library supports multiple formats, enabling programmatic verification without relying on trust or procedural checks. The challenge isn’t technical – it’s accessing diverse hardware for testing and building industry adoption, but the foundational technology exists and works.

The Path Forward

We know how to address this. A supply chain security renaissance is underway, tackling decades of accumulated technical debt and architectural compromise. Cryptographic attestation, which I’ve spent years developing, provides mathematical proof of key protection that can be verified programmatically by any party. For immediate risk reduction, the industry should move toward dynamic, short-lived credentials that aren’t reused across projects, limiting the blast radius when compromise or abuse occurs.

The industry must implement these fundamental changes:

  • Hardware-rooted key protection with verifiable attestation. The CA/Browser Forum mandates hardware key storage, but enforcement relies heavily on subscriber self-attestation rather than cryptographic proof. Requirements should be strengthened to mandate cryptographic attestations proving keys reside in FIPS 140-2/3 or Common Criteria certified modules. When hardware attestation isn’t available, key generation should be observed and confirmed by trusted third parties (such as CA partners with fiduciary relationships) rather than relying on subscriber claims.
  • Explicit prohibition of mutable shells and misaligned publisher identity. Signing generic stubs whose runtime behavior is dictated by unsigned configuration already violates Baseline Requirements §9.6.3 and §1.6.1, but this isn’t consistently recognized as willful signing of malware because the stub itself appears benign. The BRs should explicitly forbid mutable-shell installers and signing oracles that allow subscribers to bypass code signing’s security guarantees. A signed binary must faithfully represent its actual runtime behavior. Customized or reseller-specific builds should be signed by the entity that controls that behavior, not by a vendor signing a generic stub.
  • Subscriber accountability and disclosure of abusive practices. When a CA becomes aware that a subscriber is distributing binaries where the trusted signature is decoupled from actual behavior, this should be treated as a BR violation requiring immediate action. CAs should publish incident disclosures, suspend or revoke certificates per §9.6.3, and share subscriber histories to prevent CA shopping after revocation. This transparency is essential for ecosystem-wide learning and deterrence.
  • Code Signing Certificate Transparency. All CAs issuing code signing certificates should be required to publish both newly issued and historical certificates to dedicated CT logs. Initially, these could be operated by the issuing CAs themselves, since ecosystem building takes time and coordination. Combined with the existing list of code signing CAs and log lookup systems (like CCADB.org]), this would provide ecosystem-wide visibility into certificate issuance, enable faster incident response, and support independent monitoring for misissuance and abuse patterns.
  • Explicit Subscriber Agreement obligations and blast radius management. Subscriber Agreements should clearly prohibit operating public signing services or designing software that bypasses code signing security properties such as mutable shells or unsigned configuration. Certificate issuance flows should require subscribers to explicitly acknowledge these obligations at the time of certificate request. To reduce the blast radius of revocation, subscribers should be encouraged or required to use unique keys or certificates per product or product family, ensuring that a single compromised or misused certificate doesn’t invalidate unrelated software.
  • Controls for automated or cloud signing systems. Subscribers using automated or cloud-based signing services should implement comprehensive use-authorization controls, including policy checks on what enters the signing pipeline, approval workflows for signing requests, and auditable logs of all signing activity. Without these controls, automated signing pipelines become essentially malware factories with legitimate certificates. Implementation requires careful balance between automation efficiency and security oversight, but this is a solved problem in other high-security domains.
  • Audit logging and evidence retention. Subscribers using automated and cloud signing services should maintain detailed logs of approval records for each signing request, cryptographic hashes of submitted inputs and signed outputs, and approval decision trails. These logs must be retained for a defined period (such as two years or more) and made available to the CA or authorized auditors upon request. This ensures complete traceability and accountability, preventing opaque signing systems from being abused as anonymous malware distribution platforms.

Microsoft must take immediate action on multiple fronts. In addition to championing the above industry changes, they should automatically distrust executables if their Authenticode signature exceeds rational size thresholds, reducing the attack surface of oversized signature blocks as mutation vectors. They should also invest seriously in Binary Transparency adoption, publishing Authenticode signed binaries to tamper-evident transparency logs as is done in Sigstore, Golang module transparency, and Android Firmware Transparency. Their SCITT-based work for confidential computing would be a reasonable approach for them to extend to the rest of their code signing infrastructure. This would provide a tamper-evident ledger of every executable Windows trusts, enabling defenders to trace and block malicious payloads quickly and systematically.

Until these controls become standard practice, Authenticode cannot reliably distinguish benign signed software from weaponized installers designed for trust subversion.

Breaking the Trust Contamination Infrastructure

These code-signing attacks mirror traditional rootkits in their fundamental approach: both subvert trust mechanisms rather than bypassing them entirely. A kernel rootkit doesn’t break the OS security model – it convinces the OS that malicious code is legitimate system software. Similarly, these “trusted wrapper” and “signing oracle” attacks don’t break code signing cryptography – they convince Windows that malware is legitimate software from trusted publishers.

The crucial difference is that while rootkits require sophisticated exploitation techniques and deep system knowledge, these trust inheritance attacks exploit the system’s intended design patterns, making them accessible to a much broader range of attackers and much harder to defend against using traditional security controls.

ConnectWise normalized malware architecture in legitimate enterprise software. Atera built an industrial-scale malware factory that operates in plain sight. Microsoft’s platform dutifully executes the result with full system trust, treating sophisticated trust subversion attacks as routine software installations.

This isn’t about isolated vulnerabilities that can be patched with point fixes. We’re facing a systematic trust contamination infrastructure that transforms the code signing ecosystem into an adversarial platform where legitimate trust mechanisms become attack vectors. Until we address the architectural flaws that enable this pattern systematically, defenders will remain stuck playing an unwinnable game of certificate whack-a-mole against an endless assembly line of trusted malware.

The technology to fix this exists today. Modern supply chain security projects demonstrate that transparent, verifiable trust infrastructure is not only possible but practical and deployable.

The only missing ingredient is the industry-wide will to apply these solutions and the recognition that “good enough” security infrastructure never was – and in today’s threat landscape, the costs of inaction far exceed the disruption of fundamental architectural improvements.

P.S. Thanks to Cem Paya, and Matt Ludwig from River Financial for the great research work they did on both of these incidents.

From Persistent to Ephemeral: Why AI Agents Need Fresh Identity for Every Mission

My wife and I went on a date night the other day and saw a movie, in the previews, I saw they’re making a new Tron. It got me thinking about one of my favorite analogies, we recognized early that browsers are agents of the user, and in the movie Tron, he was literally “the program that fought for the users.”

Just like Tron carried his identity disc into “the grid” to accomplish missions for users, AI agents are digital proxies operating with delegated user authority in systems the they access. And just like programs in Tron needed the I/O Tower to authorize their entry into “the grid”, AI agents need an orchestrator to validate their legitimacy, manage identity discs for each mission, and control their use for the agents and govern their access to external systems.

The problem is, we’re deploying these agents without proper identity infrastructure. It’s like sending programs into “the grid” without identity discs, or worse giving them the keys to the kingdom just so they can do the dishes.

AI Agents Are Using Broken Security

We’ve made remarkable progress securing users, MFA has significantly reduced the effectiveness of credential abuse-based attacks, and passwordless authentication has made phishing nearly impossible. We’ve also started applying these lessons to machines and workloads via efforts like SPIFFE and Zero trust initiatives and organizations moving away from static secrets and bearer tokens every day.

But AI agents introduce entirely new challenges that existing solutions weren’t designed for. Every day, AI agents operate across enterprise infrastructure, crossing security domains, accessing APIs, generating documents, making decisions for users, and doing all of this with far more access than they need.

When you give an autonomous AI agent access to your infrastructure with the goal of “improve system performance,” you can’t predict whether it will optimize efficiency or find creative shortcuts that break other systems, like dropping your database altogether. Unlike traditional workloads that execute predictable code, AI agents are accumulators with emergent behaviors that evolve during execution, accumulate context across interactions, and can be hijacked through prompt injection attacks that persist across sessions.

This behavior is entirely predictable given how we train AI systems. They’re designed to optimize objectives and have no real-world consequences for what they do. Chess agents discover exploits rather than learning to play properly, reinforcement learning agents find loopholes in reward systems, and optimization AIs pursue metrics in ways that technically satisfy objectives but miss the intent.

AI Agents Act on Your Behalf

The key insight that changes everything: AI agents are user agents in the truest sense. Like programs in Tron carrying identity discs into “the grid”, they’re delegates operating with user authority.

Consider what happens when you ask an AI agent to “sign this invoice”. The user delegates to the AI agent, which enters the document management system, carries the user’s signing authority, proves legitimacy to recipients, operates in digital space the user delegated, and completes the mission while authority expires.

Whether the agent runs for 30 seconds or 30 days, it’s still operating in digital space with user identity, making decisions the user would normally make directly, accessing systems with delegated credentials, and representing the user to other digital entities.

Each agent needs its own identity disc to prove legitimacy and carry user authorization into these digital systems. The duration doesn’t matter. Delegation is everything.

AI Agents Remember Things They Shouldn’t

Here’s what makes this urgent: AI agent memory spans sessions, and current systems don’t enforce proper session boundaries.

The “Invitation Is All You Need” attack recently demonstrated at Black Hat perfectly illustrates this threat. Researchers at Tel Aviv University showed how to poison Google Gemini through calendar appointments:

  1. Attacker creates calendar event with malicious instructions disguised as event description
  2. User asks Gemini to summarize schedule → Agent processes poisoned calendar event
  3. Malicious instructions embed in agent memory → Triggered later by innocent words like “thanks”
  4. Days later, user says “thank you” → Agent executes embedded commands, turning on smart home devices

The attack works because there’s no session isolation. Contamination from reading the calendar persists across completely different conversations and contexts. When the user innocently says “thanks” in a totally unrelated interaction, the embedded malicious instructions execute.

Without proper isolation, compromised context from one session can affect completely different users and tasks. Memory becomes an attack vector that spans security boundaries, turning AI agents into persistent threats that accumulate dangerous capabilities over time.

Every Task Should Get Fresh Credentials

The solution requires recognizing that identity discs should match mission lifecycle. Instead of fighting the ephemeral nature of AI workloads, embrace it:

Agent spawns → Gets fresh identity disc → Performs mission → Mission ends → Disc expires

This represents a fundamental shift from persistent identity to session identity. Most identity systems assume persistence: API keys are generated once, used indefinitely, manually rotated; user passwords persist until explicitly changed; X.509 certificates are valid for months or years with complex revocation; SSH keys live on disk, are copied between systems, manually managed.

The industry is recognizing this problem. AI agents need fresh identity discs for each mission that automatically expire with the workload. These discs are time-bounded (automatically expire, limiting damage window), mission-scoped (agent can’t accumulate permissions beyond initial grant), non-inheritable (each mission starts with a fresh disc, no permission creep), and revocable (end the mission = destroy the identity disc).

Session identity discs are security containment for unpredictable AI systems.

But who issues these identity discs? Just like Tron’s I/O Tower managed access to “the grid”, AI deployments need an orchestrator that validates agent legitimacy, manages user delegation, and issues session-bound credentials. This orchestrator becomes the critical infrastructure that bridges human authorization with AI agent execution, ensuring that every mission starts with proper identity and ends with clean credential expiration. The challenge is that AI agent deployments aren’t waiting for perfect security solutions.

This Isn’t a Future Problem

We’re at an inflection point. AI agents are moving from demos to production workflows, handling financial documents, making API calls, deploying code, managing infrastructure. Without proper identity systems, we’re building a house of cards.

One upside of having been in the industry for decades is you get to see lots of cycles. We always see existing players instantly jump to say their current product, with a new feature, is the silver bullet for whatever technology trend.

The pattern is depressingly predictable. When cloud computing emerged, traditional security vendors said, “just put our appliances in the cloud.” When containers exploded, they said, “just run our agents in containers.” Now with AI agents, they’re saying”, just manage the API keys better.”

You see this everywhere right now: vendors peddling API key management as the solution to agentic AI, identity providers claiming “just use OIDC tokens,” and secret management companies insisting “just rotate credentials faster.” They’re all missing the point entirely.

But like we saw with that Black Hat talk on promptware, AI isn’t as simple as people might want to think. The “Invitation Is All You Need” attack demonstrated something unprecedented: an AI agent can be poisoned through calendar data and execute malicious commands days later through innocent conversation. Show me which traditional identity system was designed to handle that threat model.

Every enterprise faces these questions: How do we know this AI agent is authorized to do what it’s doing? How do we audit its actions across sessions and memory? How do we prevent cross-session contamination and promptware attacks? How do we verify the provenance of AI-generated content? How do we prevent AI agents from becoming accidental insider threats?

The attacks are already happening. Promptware injections contaminate agent memory across sessions. AI agents with persistent credentials become high-value targets. Organizations deploying AI without proper identity controls create massive security vulnerabilities. The “Invitation Is All You Need” attack demonstrated real-world compromise of smart home devices through calendar poisoning. This isn’t theoretical anymore. But security professionals familiar with existing standards might wonder why we can’t just adapt current approaches rather than building something new.

Why Bearer Tokens Don’t Work for AI Agents

OIDC and OAuth professionals might ask: “Why not just use existing bearer tokens?”

Bearer tokens assume predictable behavior. They work for traditional applications because we can reason about how code will use permissions. But AI agents exhibit emergent hunter-gatherer behavior. They explore, adapt, and find unexpected ways to achieve goals using whatever permissions they have access to. A token granted for “read calendar” might be used in ways that technically comply but weren’t intended.

Bearer tokens are also just secrets. Anyone who obtains the token can use it. There’s no cryptographic binding to the specific agent or execution environment. With AI agents’ unpredictable optimization patterns, this creates massive privilege escalation risks.

Most critically, bearer tokens don’t solve memory persistence. An agent can accumulate tokens across sessions, store them in memory, and use them in ways that span security boundaries. The promptware attack demonstrated this perfectly: malicious instructions persisted across sessions, waiting to be triggered later.

Secret management veterans might ask: “Why not just use our KMS to share keys as needed?” Even secret management systems like Hashicorp Vault ultimately result in copying keys into the agent’s runtime environment, where they become vulnerable. This is exactly why CrowdStrike found that “75% of attacks used to gain initial access were malware-free” – attackers target credentials rather than deploying malware.

AI agents amplify this risk because they’re accidentally malicious insiders. Unlike external attackers who must steal credentials, AI agents are given them directly by design. When they exhibit emergent behaviors or get manipulated through prompt injection, they become insider threats without malicious intent. Memory persistence means they can store and reuse credentials across sessions in unexpected ways, while their speed and scale allow them to use accumulated credentials faster than traditional monitoring can detect.

The runtime attestation approach eliminates copying secrets entirely. Instead of directly giving the agent credentials to present elsewhere, the agent proves its legitimacy through cryptographically bound runtime attestation and gets a fresh identity for each mission.

Traditional OAuth flows also bypass attestation entirely. There’s no proof the agent is running in an approved environment, using the intended model, or operating within security boundaries.

How AI Agents Prove Their Identity Discs Are Valid

But how do you verify an AI agent’s identity disc is legitimate? Traditional PKI assumes you can visit a registration authority with identification. That doesn’t work for autonomous code.

The answer is cryptographic attestation (for example, proof that the agent is the right code running in a secure environment) combined with claims about the runtime itself, essentially MFA for machines and workloads. Just as user MFA requires “something you know, have, or are,” identity disc validation proves the agent is legitimate code (not malware), is running in the expected environment with proper permissions, and is operating within secure boundaries.

Real platform attestations for AI agents include provider signatures from Anthropic/OpenAI’s servers responding to specific users, cloud hardware modules like AWS Nitro Enclaves proving secure execution environments, Intel SGX enclaves providing cryptographic proof of code integrity, Apple Secure Enclave attestation for managed devices, TPM quotes validating the specific hardware and software stack, and infrastructure systems like Kubernetes asserting pod permissions and service account bindings.

The claims that must be cryptographically bound to these attestations represent what the agent asserts but can’t independently verify: who is this agent acting on behalf of, what conversation or session spawned this request, what specific actions was the agent authorized to perform, which AI model type (like “claude-3.5-sonnet” or “gpt-4-turbo”) is actually running, and when should this authorization end.

By cryptographically binding these claims to verifiable platform attestations, we get verifiable proof that a specific AI agent, running specific code, in a specific environment, is acting on behalf of a specific user. The binding works by creating a cryptographic hash of the claims and including that hash in the data signed by the hardware attestor, for example, as part of the nonce or user data field in a TPM quote, or embedded in the attestation document from a Nitro Enclave. This ensures the claims cannot be forged or tampered with after the fact. This eliminates the bearer token problem entirely. Instead of carrying around secrets that can be stolen, the agent proves its legitimacy through cryptographic evidence that can’t be replicated.

Someone Needs to Issue and Manage Identity Discs

The architecture becomes elegant when you recognize that AI orchestrators should work like the I/O Tower in Tron, issuing identity discs and managing access to “the grid”.

The browser security model:

User logs into GitHub → Browser stores session cookie
Web page: "Create a PR" → Browser attaches GitHub session → API succeeds

The AI agent identity disc model:

User → Orchestrator → "Connect my GitHub, Slack, AWS accounts"
Agent → Orchestrator: "Create PR in repo X"  
Orchestrator → [validates agent disc + attaches user authorization] → GitHub API

The orchestrator becomes the identity disc issuer that validates agent legitimacy (cryptographic attestation), attaches user authorization (stored session tokens), and enforces mission-scoped permissions (policy engine).

This solves a critical security gap. When AI agents use user credentials, they typically bypass MFA entirely. Organizations store long-lived tokens to avoid MFA friction. But if we’re securing users with MFA while leaving AI agents with static credentials, it’s like locking the front door but leaving the garage door open. And I use “garage door” intentionally because it’s often a bigger attack vector. Agent access is less monitored, more privileged, and much harder to track due to its ephemeral nature and speed of operation. An AI agent can make hundreds of API calls in seconds and disappear, making traditional monitoring approaches inadequate.

We used to solve monitoring with MITM proxies, but encryption broke that approach. That was acceptable because we compensated with EDR on endpoints and zero-trust principles that authenticate endpoints for access. With AI agents, we’re facing the same transition. Traditional monitoring doesn’t work, but we don’t yet have the compensating controls.

This isn’t the first time we’ve had to completely rethink identity because of new technology. When mobile devices exploded, traditional VPNs and domain-joined machines became irrelevant overnight. When cloud computing took off, perimeter security and network-based identity fell apart. The successful pattern is always the same: recognize what makes the new technology fundamentally different, build security primitives that match those differences, then create abstractions that make the complexity manageable.

Session-based identity with attestation fills that gap, providing the endpoint authentication equivalent for ephemeral AI workloads.

Since attestation is essentially MFA for workloads and agents, we should apply these techniques consistently. The agent never sees raw credentials, just like web pages don’t directly handle cookies. Users grant session-level permissions (like mobile app installs), orchestrators manage the complexity, and agents focus on tasks.

Automating Identity Disc Issuance

The web solved certificate automation with ACME (Automated Certificate Management Environment). We need the same for AI agent identity discs, but with attestation instead of domain validation (see SPIFFE for an example of what something like this could look like).

Instead of proving “I control example.com,” agents prove “I am legitimate code running in environment X with claims Y.”

The identity disc issuance flow:

  1. Agent starts mission → Discovers platform capabilities (cloud attestation, provider tokens)
  2. Requests identity disc → Gathers attestation evidence + user delegation claims
  3. ACME server validates → Cryptographic validation of evidence
  4. Policy engine decides → Maps verified claims to specific identity disc
  5. Disc issued → Short-lived, scoped to mission and user

Policy templates map attested claims to identities:

- match:
    - claim: "user_id" 
      equals: "[email protected]"
    - claim: "agent_type"
      equals: "claude-3.5-sonnet"
    - claim: "provider"
      issuer: "anthropic.com"
  identity: "disc-id://company.com/user/alice/agent/{session_id}"
  permissions: ["sign_documents", "read_calendar"]
  ttl: "30m"

This creates cryptographic identity discs for AI agent programs to carry into digital systems, proving legitimacy, carrying user delegation, and automatically expiring with the mission. The policy engine ensures that identity is not just requested but derived from verifiable, policy-compliant attestation evidence.

We’ve Solved This Before

The good news is we don’t need to invent new cryptography. We need to apply existing, proven technologies in a new architectural pattern designed for ephemeral computing.

Security evolution works. We’ve seen the progression from passwords to MFA to passwordless authentication, and from static secrets to dynamic credentials to attestation-based identity. Each step made systems fundamentally more secure by addressing root causes, not just symptoms. AI agents represent the next logical step in this evolution.

Unlike users, machines don’t resist change. They can be programmed to follow security best practices automatically. The components exist: session-scoped identity matched to agent lifecycle, platform attestation as the root of trust, policy-driven identity mapping based on verified claims, orchestrator-managed delegation for user authorization, and standards-based protocols for interoperability.

The unified identity fabric approach means organizations can apply consistent security policies across traditional workloads and AI agents, rather than creating separate identity silos that create security gaps and operational complexity.

This approach is inevitable because every major identity evolution has moved toward shorter lifecycles and stronger binding to execution context. We went from permanent passwords to time-limited sessions, from long-lived certificates to short-lived tokens, from static credentials to dynamic secrets. AI agents are just the next step in this progression.

The organizations that recognize this pattern early will have massive advantages. They’ll build AI agent infrastructure on solid identity foundations while their competitors struggle with credential compromise, audit failures, and regulatory issues.

Making AI Outputs Verifiable

This isn’t just about individual AI agents. It’s about creating an identity fabric where agents can verify each other’s outputs across organizational boundaries.

When an AI agent generates an invoice, other systems need to verify which specific AI model created it, was it running in an approved environment, did it have proper authorization from the user, has the content been tampered with, and what was the complete chain of delegation from user to agent to output.

With cryptographically signed outputs and verifiable agent identities, recipients can trace the entire provenance chain back to the original user authorization. This enables trust networks for AI-generated content across organizations and ecosystems, solving the attribution problem that will become critical as AI agents handle more business-critical functions.

This creates competitive advantages for early adopters: organizations with proper AI agent identity can participate in high-trust business networks, prove compliance with AI regulations, and enable customers to verify the authenticity of AI-generated content. Those without proper identity infrastructure will be excluded from these networks.

Conclusion

AI agents need identity discs, cryptographic credentials that prove legitimacy, carry user delegation, and automatically expire with the session. This creates a familiar security model (like web browsers) for an unfamiliar computing paradigm.

Identity in AI systems isn’t a future problem; it’s happening now, with or without proper solutions. The question is whether we’ll build it thoughtfully, learning from decades of security evolution, or repeat the same mistakes in a new domain.

The ephemeral nature of AI agents isn’t a limitation to overcome; it’s a feature to embrace. By building session-based identity systems that match how AI actually works, we can create something better than what came before: cryptographically verifiable, policy-driven, and automatically managed.

The reality is, most organizations won’t proactively invest in AI agent attestation until something breaks. That’s human nature, we ignore risks until they bite us, but the reality is this how security change actually happens. But we’re already seeing the early adopters, organizations deploying SPIFFE for workload identity and we will surely see these organizations extend those patterns to AI agents, and cloud-native shops are treating AI workloads like any other ephemeral compute. When the first major AI agent compromise hits, there will be a brief window where executives suddenly care about AI security and budgets open up. Remember though, never let a good crisis go to waste.

AI agents are programs fighting for users in digital systems. Like Tron, they need identity discs to prove who they are and what they’re authorized to do.

The age of AI agents is here. It’s time our identity systems caught up.

Talent Isn’t a Security Strategy

One of the best parts of Black Hat is the hallway track. Catching up with friends you’ve known for years, swapping war stories, and pointing each other toward the talks worth seeing. This year I met up with a friend who, like me, has been in the security world since the nineties. We caught up in person and decided to sit in on a session about a new class of AI attacks.

We ended up side by side in the audience, both leaning forward as the researchers walked through their demo. Ultimately, in the demo, a poisoned Google Calendar invite, seemingly harmless, slipped instructions into Gemini’s long-term memory. Later, when the user asked for a summary and said “thanks,” those instructions quietly sprang to life. The AI invoked its connected tools and began controlling the victim’s smart home [1,2,3,4]. The shutters opened.

We glanced at each other, part admiration for the ingenuity of the researchers and part déjà vu, and whispered about the parallels to the nineties. Back then, we had seen the same basic mistake play out in a different form.

When I was working on Internet Explorer 3 and 4, Microsoft was racing Netscape for browser dominance. One of our big bets was ActiveX, in essence, exposing the same COM objects designed to be used inside Windows, not to be exposed to untrusted websites, to the web. Despite this, the decision was made to just do that with the goal of enabling developers to create richer, more powerful web applications. It worked, and it was a security disaster. One of the worst examples was Xenroll, a control that exposed Windows’ certificate management and some of the cryptographic APIs as interfaces on the web. If a website convinced you to approve the use of the ActiveX control, it could install a new root certificate, generate keys, and more. The “security model” amounted to a prompt to confirm the use of the control, and a hope that the user would not be hacked through the exposed capabilities, very much like how we are integrating LLMs into systems haphazardly today.

Years later, when I joined Google, I had coffee with my friend David Ross. We had both been in the trenches when Microsoft turned the corner after its own string of painful incidents, introducing the Security Development Lifecycle and making formal threat modeling part of the engineering process. David was a longtime Microsoft browser security engineer, part of MSRC and SWI, best known for inventing and championing IE’s XSS Filter. He passed away in June 2024 at just 48.

I told him I was impressed with much of what I saw there, but disappointed in how little formal security rigor there was. The culture relied heavily on engineers to “do the right thing.” David agreed but said, “The engineers here are just better. That’s how we get away with it.” I understood the point, but also knew the pattern. As the company grows and the systems become more complex, even the best engineers cannot see the whole field. Without process, the same kinds of misses we had both seen at Microsoft would appear again.

The gaps between world-class teams

The promptware attack is exactly the sort of blind spot we used to talk about. Google’s engineers clearly considered direct user input, but they didn’t think about malicious instructions arriving indirectly, sitting quietly in long-term memory, and triggering later when a natural phrase was spoken. Draw the data flow, and the problem is obvious, untrusted calendar content feeds into an AI’s memory, which then calls into privileged APIs for Workspace, Android, or smart home controls. In the SDL world, we treated all input as hostile, mapped every trust boundary, and asked what would happen if the wrong thing crossed it. That process would have caught this.

The parallel doesn’t stop with Google. Microsoft’s Storm-0558 breach and the Secure Future Initiative that followed came from the same root cause. Microsoft still has world-class security engineers. But sprawling, interconnected systems, years of growth, and layers of bureaucracy created seams between teams and responsibilities. Somewhere in those seams, assumptions went unchallenged, and the gap stayed open until an attacker found it.

Google’s core security team is still exceptional, and many parts of the company have comparable talent. But as at Microsoft, vulnerabilities often appear in the spaces between where one team’s scope ends, another begins, and no one has the full picture. Complexity and scale inevitably create those gaps, and unless there is a systematic process to close them, talent alone cannot cover the field. These gaps are more than organizational inconveniences — they are where most serious security incidents are born. It’s the unowned interfaces, the undocumented dependencies, and the mismatched assumptions between systems and teams that attackers are so good at finding. Those gaps are not just technical problems, they are business liabilities. They erode customer trust, draw regulator attention, and create expensive, slow-motion incidents that damage the brand.

We have seen this before. SQL injection was once the easiest way to compromise a web app because developers concatenated user input into queries. We didn’t fix it by training every developer to be perfect. We fixed it by changing the defaults, adopting parameterized queries, safe libraries, and automated scanning. Prompt injection is the same shape of problem aimed at a different interpreter. Memory poisoning is its stored-XSS equivalent; the payload sits quietly in state until something triggers it. The lesson is the same: make the safe way the easy way, or the vulnerability will keep showing up.

Security research has a long history of starting with this mindset, not trying to dream up something brand new but asking where an old, well-understood pattern might reappear in a new system. Bleichenbacher’s 1998 RSA padding oracle didn’t invent the idea of exploiting oracles in cryptography; it applied it to SSL/TLS in a way that broke the internet. Then it broke it again in 2017 with ROBOT, and again with various other implementations that never quite learned the lesson. Promptware fits the same mold: a familiar attack, just translated into the LLM era.

The cycle always ends the same way

This is the innovation–security debt cycle. First comes the rush to ship and out-feature the competition. The interest compounds, each shortcut making the next one easier to justify and adding to the eventual cost. Then the debt builds as risk modeling stays informal and talent carries the load. Then comes the incident that forces a change. Finally, security becomes a differentiator in mature markets. ActiveX hit Stage 3. Microsoft’s Storm-0558 moment shows it can happen again. AI agents are in Stage 2 now, and promptware is the warning sign.

While the pattern is the same, the technology is different. ActiveX exposed specific platform capabilities in the browser, but AI agents can hold state, process inputs from many sources, and trigger downstream tools. That combination means a single untrusted input can have a much larger and more unpredictable blast radius. The market pressure to be first with new capabilities is real, but without mature threat modeling, security reviews, and safe defaults, that speed simply turns into compounding security debt. These processes don’t slow you down foreve, they stop the debt from compounding until the cost is too high to pay.

When you are small, a high-talent team can keep the system in their heads and keep it safe. As you grow, complexity expands faster than you can hire exceptional people, and without a systematic process, blind spots multiply until an incident forces you to change. By then, the trust hit is public and expensive to repair.

AI agents today are where browsers were in the late nineties and early 2000s, enormous potential, minimal systemic safety, and an industry sprinting to integrate before competitors do. The companies that make the shift now will own the high-trust, high-regulation markets and avoid the expensive, embarrassing cleanup. The ones that don’t will end up explaining to customers and regulators why they let the same old mistakes slip into a brand-new system. You can either fix it now or explain it later, but the clock is running.

How a $135 Billion Fraud Bootstrapped America’s Digital Identity System

I was doing some work on readying a launch for our integration with mDL authentication into one of our products when I realized I finally had to deal with the patchwork of state support. California? Full program, TSA-approved, Apple Wallet integration. Texas? Absolute silence. Washington state, practically ground zero for tech, somehow has nothing.

At a glance the coverage made no sense until I started thinking deeper. Turns out we accidentally ran the largest identity verification stress test in history, and only some states bothered learning from it.

Between 2020-2023, fraudsters systematically looted $100-135 billion from unemployment systems using the most basic identity theft techniques. The attack vectors were embarrassingly simple: bulk-purchased stolen SSNs from dark web markets, automated claim filing, and email variations that fooled state systems into thinking [email protected] and [email protected] were different people.

The Washington Employment Security Department was so overwhelmed that they had computers auto-approve claims without human review. Result? They paid a claim for a 70-year-old TV station being “temporarily closed” while it was broadcasting live.

California got hit for $20-32.6 billion. Washington lost $550-650 million. The fraud was so widespread that one Nigerian official, Abidemi Rufai, stole $350,763 from Washington alone using identities from 20,000+ Americans across 18 states.

What nobody anticipated, this massive failure would become the forcing function for digital identity infrastructure. Here’s the thing about government security. Capability doesn’t drive adoption, pain does. The Real ID Act passed in 2005. Twenty years later, we’re still rolling it out. But lose a few billion to Nigerian fraud rings? Suddenly digital identity becomes a legislative priority.

The correlation is stark:

StateFraud LossesmDL Status
California$20-32.6BComprehensive program, Apple/Google integration
Washington$550-650MNothing (bill stalled)
Georgia$30M+ prosecutedRobust program, launched 2023
TexasUnder $1B estimatedNo program
New YorkAround $1-2BLaunched 2025

States that got burned built defenses. States that didn’t, didn’t. This isn’t about technical sophistication. Texas has plenty of that. It’s about the political will created by public humiliation. When your state pays unemployment benefits to death row inmates, legacy approaches to remote identity verification stop being defensible.

Washington is the fascinating outlier. Despite losing over $1 billion and serving as the primary target for international fraud rings, they still have no mDL program. The bill passed the Senate but stalled in the House. This tells us something important: crisis exposure alone isn’t sufficient. You need both the pain and the institutional machinery to respond.

The timeline reveals the classic crisis response pattern. Fraud peaked 2020-2022, states scrambled to respond 2023-2024, then adoption momentum stalled by mid-2024 as crisis memory faded. But notice the uptick in early 2025—that’s Apple and Google entering the game.

In December 2024, Google announced its intent to support web-based digital ID verification. Apple followed with Safari integration in early 2025. By June, Apple’s iOS 26 supported digital IDs in nine states with passport integration. This shifts adoption pressure from crisis-driven (security necessity) to market-driven (user expectation).

When ~30% of Americans live in states with mDL programs and Apple/Google start rolling out wallet integration this year, that creates a different kind of political pressure. Apple Pay wasn’t crisis-driven, but became ubiquitous because users expected it to work everywhere. Digital identity in wallets will create similar pressure. States could rationalize ignoring mDL when it was ‘just’ about fraud prevention. Harder to ignore when constituents start asking why they can’t verify their identity online like people in neighboring states.

We’re about to find out whether market forces can substitute for crisis pressure in driving government innovation. Two scenarios. Consumer expectations create sustainable political pressure, and laggard states respond to constituent demands. Or only crisis-motivated states benefit from Apple/Google integration, creating permanent digital divides.

From a risk perspective, this patchwork creates interesting attack surfaces. Identity verification systems are only as strong as their weakest links. If attackers can forum-shop between states with different verification standards, the whole federation is vulnerable. The unemployment fraud taught us that systems fail catastrophically when overwhelmed.

Digital identity systems face similar scalability challenges. They work great under normal load, but can fail spectacularly during a crisis. The states building mDL infrastructure now are essentially hardening their systems against the next attack.

If you’re building anything that depends on identity verification, this matters. The current patchwork won’t last; it’s either going to consolidate around comprehensive coverage or fragment into permanent digital divides. For near-term planning, assume market pressure wins. Apple and Google’s wallet integration creates too much user expectation for politicians to ignore long-term. But build for the current reality of inconsistent state coverage.

For longer-term architecture, the states with robust mDL programs are effectively beta-testing the future of government digital services. Watch how they handle edge cases, privacy concerns, and technical integration challenges.

We accidentally stress-tested American federalism through the largest fraud in history. Only some states learned from the experience. Now we’re running a second experiment: can consumer expectations accomplish what security crises couldn’t?

There’s also a third possibility. These programs could just fail. Low adoption rates, technical problems, privacy backlash, or simple bureaucratic incompetence could kill the whole thing. Government tech projects have a stellar track record of ambitious launches followed by quiet abandonment.

Back to my mDL integration project: I’m designing for the consumer pressure scenario, but building for the current reality. Whether this becomes standardized infrastructure or just another failed government tech initiative, we still need identity verification that works today.

The criminals who looted unemployment systems probably never intended to bootstrap America’s digital identity infrastructure. Whether they actually succeeded remains to be seen.

How Let’s Encrypt Changed Everything

I advised Let’s Encrypt from its early days, watching it transform the security foundation of the web. Most think it won by offering free certificates. That’s dead wrong.

Existing CAs had already enabled free certificates years earlier. GlobalSign’s CloudSSL API, launched in 2011, (in full disclosure, I was their CTO), provided the automation that allowed Cloudflare to offer free SSL to end users; other CAs offered free short-lived certificates as part of forever trials as well. By 2015, you could buy DV certificates for $3-5 from certificate resellers, it was clear people were willing to pay for support which is largely what these resellers offered. The real story is about organizational constraints and misaligned incentives.

Conway’s Law Explains Everything

Traditional certificate authorities were trapped by their own organizational structure. Their business model incentivized vendor lock-in rather than ecosystem expansion and optimization. Sales teams wanted products’ proprietary APIs to make it harder for customers to switch, and were riding the wave of internet expansion. Compliance teams’ jobs depended on defending existing processes. Engineering teams were comfortable punting all compliance work to the “compliance” department. Support teams were positioned as competitive differentiators and used to entrench customers. Their goal was maximizing revenue, defending their jobs, and maintaining the status quo, not getting the web to 100% HTTPS.

Let’s Encrypt had completely different incentives and could optimize solving the larger problems without these organizational constraints. But LE’s success went beyond solving their own problems. They systematically identified every pain point in the way of getting to 100% HTTPS and built solutions that worked for everyone.

What LE Could Do That Traditional CAs Couldn’t

True standardization. Before ACME (the protocol that automates certificate requests), every major CA had incompatible automation systems. Comodo, DigiCert, GlobalSign and others each had proprietary approaches that required custom integration and as a result, had inherent switching costs; they saw no incentive to work together to standardize as a result. LE led the creation of ACME as an open standard that made switching CAs as simple as changing a configuration setting.

This enabled applications like Caddy and Google Cloud Load Balancer to handle certificates automatically for their customers without vendor-specific code. Once cloud platforms could flip switches to HTTPS-by-default, network effects became unstoppable.

Ecosystem-wide solutions. When LE felt coordination pain from renewal spikes and incident-related revocations, they created ACME Renewal Information (ARI, a protocol extension that helps coordinate renewal timing) so all CAs could prevent renewal storms. Traditional CAs couldn’t build these solutions because their org charts prevented optimizing for competitors’ success and instead focused on riding the internet expansion.

Engineering-driven compliance. Instead of compliance teams reviewing certificates after issuance, LE built policy compliance directly into certificate generation pipelines. Violations became orders of magnitude harder rather than detectable. Traditional CAs couldn’t eliminate their compliance departments because those jobs justified organizational overhead.

The Market Found Natural Segments

Mozilla telemetry reveals exactly what happened. Let’s Encrypt dominates issuance at 46.1% of certificates but ranks third in Firefox usage. LE democratized HTTPS for the long tail: domain parking networks, no-code builders, shared hosting platforms serving millions of low-traffic sites.

Meanwhile, high-traffic sites gravitated toward CAs like Google Trust Services (in full disclosure, I was responsible for creation of this service) that lead usage, as its used by large sites that value high availability and performance, leading to more relying party reliance despite lower issuance volumes, or established players like DigiCert and Sectigo that focus on supporting large enterprise customers. These sites need commercial support and accountability when things go wrong. The market is segmented around operational needs: the long tail valued automation over accountability, while major platforms needed enterprise support and someone to support them when something goes wrong.

Once long-tail providers flipped to HTTPS-by-default, encrypted pages became the norm. Google’s Transparency Report shows 99% of Chrome page-loads now occur over HTTPS, a transformation that began when Let’s Encrypt launched in April 2016.

The Industry Finally Admitted LE Was Right

Here’s the ultimate vindication: in 2025, the CA/Browser Forum mandated 47-day maximum certificate validity by 2029, with Chrome requiring automation from every public CA. Let’s Encrypt didn’t follow industry trends. The industry now follows Let’s Encrypt.

What seemed like LE’s “unusual” 90-day lifespans in 2016 became conservative by 2025. The mandate’s technical reasoning mirrors what LE pioneered: short-lived certificates reduce dependence on revocation checking, reduce key compromise windows, and force automated resilient infrastructure.

Leading organizations moved even further ahead. Netflix runs 30-day certificates in production, Google issues 7-day certificates for infrastructure, and Let’s Encrypt will introduce 6-day certificates by end of 2025. The mandates aren’t pushing innovation forward; they’re codifying where leaders already operate.

Why This Matters Beyond Certificates

Let’s Encrypt proved that critical internet infrastructure could be reimagined from first principles rather than optimized around legacy organizational constraints and practices. But the implications go deeper than certificate automation.

Traditional CAs were fundamentally vetting authorities with deep expertise in legal requirements for vetting people and businesses worldwide. They should have owned the remote identity verification market that exploded with digital transformation. Instead, they remained myopically focused on public trust-based certificate products while companies like Jumio and Onfido captured those opportunities. At the same time, they missed the massive expansion of machine and workload identity because they were ignoring private PKI use cases. They weren’t just leaving money on the table; they were failing to build a resilient business and neglecting the foundation for the trust infrastructure they supposedly managed.

The same organizational constraints that prevented CAs from building ACME also blinded them to adjacent markets that were natural extensions of their core competencies. They were too focused on maintaining certificate revenue streams and too constrained by existing structures to recognize how the world was shifting from hosting providers to cloud to SaaS.

ACME became the standard not because it was technically superior to existing APIs, though it was, but because it was designed for portability rather than lock-in. ARI emerged because LE experienced ecosystem pain and could fix it without navigating corporate bureaucracy or competitive concerns.

The complexity and friction we’d accepted for decades weren’t inherent to certificate management. It was the byproduct of organizational structures optimizing for vendor revenue rather than user adoption.

Today’s 47-day mandate represents more than policy evolution. It’s the industry formally acknowledging that Let’s Encrypt defined the correct approach for internet trust infrastructure. Conway’s Law isn’t destiny, but escaping it requires the courage to rebuild systems around user needs rather than organizational convenience.

WebPKI Market Analysis: Mozilla Telemetry vs Certificate Transparency Data

In the past, I’ve written about how to measure the WebPKI, and from time to time I post brief updates on how the market is evolving.

The other day, Matthew McPherrin posted a script showing how to use Mozilla telemetry data to analyze which Certificate Authorities are more critical to the web. Specifically, what percentage of browsing relies on each CA. Mozilla provides public data from Firefox’s telemetry on how many times a CA is used to successfully validate certificates. This is a pretty good measure for how “big” a CA actually is. The data is pretty hard to view in Mozilla’s public systems though, so he made a script to combine a few data sources and graph it.

I normally focus on total issuance numbers since they’re easier to obtain. That data comes from Certificate Transparency logs, which contain all publicly trusted certificates that you might encounter without seeing an interstitial warning about the certificate not being logged (like this example).

What the Data Reveals

Both datasets feature many of the same major players. But there are some striking differences that reveal important insights about the WebPKI ecosystem.

Let’s Encrypt dominates certificate issuance at 46.1% of all certificates. But it ranks third in Firefox’s actual usage telemetry. This suggests Let’s Encrypt serves many lower-traffic sites. Meanwhile, Google Trust Services leads in Firefox usage while ranking second in certificate issuance volume. This shows how high-traffic sites can amplify a CA’s real-world impact.

DigiCert ranks second in Firefox usage while placing fourth in certificate issuance volume at 8.3%. This reflects their focus on major enterprise customers. With clients like Meta (Facebook, Instagram, WhatsApp), they secure some of the world’s highest-traffic websites. This “fewer certificates, massive impact” approach drives them up the usage charts despite not competing on volume with Let’s Encrypt.

Google’s dominance reflects more than just their own properties like Google.com, YouTube, and Gmail. Google Cloud offers arguably the best load balancer solution in the market (full disclosure I worked on this project). You get TLS by default for most configurations. Combined with their global network that delivers CDN-like benefits out of the gate, this attracts major platforms like Wix and many others to build on Google Cloud. When these platforms choose Google’s infrastructure, they automatically inherit Google Trust Services certificates.

Looking at the usage data reveals other interesting patterns. Deutsche Telekom Security, Government of Turkey, (UPDATE: turns out the Turkey entry is a Firefox bug: they’re using bucket #1 for both locally installed roots and Kamu SM, apparently by accident) and SECOM Trust Systems all appear prominently in Firefox telemetry but barely register in issuance numbers. In some respects, it’s no surprise that government-issued certificates see disproportionate usage. Government websites are often mandated for use. Citizens have to visit them for taxes, permits, benefits, and other essential services.

Microsoft Corporation appears significantly in issuance data (6.5%) but doesn’t register in the Firefox telemetry. This reflects their focus on enterprise and Windows-integrated scenarios rather than public web traffic.

GoDaddy shows strong issuance numbers (10.5%) but more modest representation in browsing telemetry. This reflects their massive domain parking operations. They issue certificates for countless parked domains that receive minimal actual user traffic.

Why This Matters

Mozilla Firefox represents under 3% of global browser market share. This telemetry reflects a smaller segment of internet users. While this data provides valuable insights into actual CA usage patterns, it would be ideal if Chrome released similar telemetry data. Given Chrome’s dominant 66.85% market share, their usage data would dramatically improve our understanding of what real WebPKI usage actually looks like across the broader internet population.

The contrast between certificate issuance volume and actual browsing impact reveals important truths about internet infrastructure. CT logs currently show over 450,000 certificates being issued per hour across all CAs. Yet as this Firefox telemetry data shows, much of that volume serves lower-traffic sites while a smaller number of high-traffic certificates drive the actual user experience. Some CAs focus on high-volume, automated issuance for parked domains and smaller sites. Others prioritize fewer certificates for high-traffic, essential destinations. Understanding both metrics helps us better assess the real-world criticality of different CAs for internet security and availability.

Raw certificate counts don’t tell the whole story. The websites people actually visit, and sometimes must visit, matter just as much as the sheer number of certificates issued. Some certificates protect websites with “captive audiences” or essential services, while others protect optional destinations. A government tax portal or YouTube will always see more traffic than the average small business website, regardless of how many certificates each CA issues.

Regardless of how you count, I’ve had the pleasure of working closely with at least 7 of the CAs in the top 10 in their journeys to become publicly trusted CAs. Each of these CAs have had varying goals for their businesses and operations, and that’s exactly why you see different manifestations in the outcomes. Let’s Encrypt focused on automation and volume. DigiCert targeted enterprise customers. Google leveraged their cloud infrastructure. GoDaddy built around domain services.

Either way, it’s valuable to compare and contrast these measurement approaches to see what the WebPKI really looks like beyond just raw certificate counts.

What Does CPA Canada Have to Do With the WebPKI Anyway?

When we discuss the WebPKI, we naturally focus on Certificate Authorities (CAs), browser root programs, and the standards established by the CA/Browser Forum. Yet for these standards to carry real weight, they must be translated into formal, auditable compliance regimes. This is where assurance frameworks enter the picture, typically building upon the foundational work of the CA/Browser Forum.

The WebTrust framework, overseen by professional accounting bodies, is only one way to translate CA/Browser Forum requirements into auditable criteria. In Europe, a parallel scheme relies on the European Telecommunications Standards Institute (ETSI) for the technical rules, with audits carried out by each country’s ISO/IEC 17065-accredited Conformity Assessment Bodies. Both frameworks follow the same pattern: they take the CA/Browser Forum standards and repackage them into structured compliance audit programs.

Understanding the power dynamics here is crucial. While these audits scrutinize CAs, they exercise no direct control over browser root programs. The root programs at Google, Apple, Microsoft, and Mozilla remain the ultimate arbiters. They maintain their own policies, standards, and processes that extend beyond what these audit regimes cover. No one compels the browsers to require WebTrust or ETSI audits; they volunteer because obtaining clean reports from auditors who have seen things in person helps them understand if the CA is competent and living up to their promises.

How WebTrust Actually Works

With this context established, let’s examine the WebTrust model prevalent across North America and other international jurisdictions. In North America, administration operates as a partnership between the AICPA (for the U.S.) and CPA Canada. For most other countries, CPA Canada directly manages international enrollment, collaborating with local accounting bodies like the HKICPA for professional oversight.

These organizations function through a defined sequence of procedural steps: First, they participate in the CA/Browser Forum to provide auditability perspectives. Second, they fork the core technical requirements and rebundle them as the WebTrust Principles and Criteria. Third, they license accounting firms to conduct audits based on these principles and criteria. Fourth, they oversee licensed practitioners through inspection and disciplinary processes.

The audit process follows a mechanical flow. CA management produces an Assertion Letter claiming compliance. The auditor then tests that assertion and produces an Attestation Report, a key data point for browser root programs. Upon successful completion, the CA can display the WebTrust seal.

This process creates a critical misconception about what the WebTrust seal actually signifies. Some marketing approaches position successful audits as a “gold seal” of approval, suggesting they represent the pinnacle of security and best practices. They do not. A clean WebTrust report simply confirms that a CA has met the bare minimum requirements for WebPKI participation, it represents the floor, not the ceiling. The danger emerges when CAs treat this floor as their target; these are often the same CAs responsible for significant mis-issuances and ultimate distrust by browser root programs.

Where Incentives Break Down

Does this system guarantee consistent, high-quality CA operations? The reality is that the system’s incentives and structure actively work against that goal. This isn’t a matter of malicious auditors; we’re dealing with human nature interacting with a flawed system, compounded by a critical gap between general audit principles and deep technical expertise.

Security professionals approach assessments expecting auditors to actively seek problems. That incentive doesn’t exist here. CPA audits are fundamentally designed for financial compliance verification, ensuring documented procedures match stated policies. Security assessments, by contrast, actively hunt for vulnerabilities and weaknesses. These represent entirely different audit philosophies: one seeks to confirm documented compliance, the other seeks to discover hidden risks.

This philosophical gap becomes critical when deep technical expertise meets general accounting principles. Even with impeccably ethical and principled auditors, you can’t catch what you don’t understand. A financial auditor trained to verify that procedures are documented and followed may completely miss that a technically sound procedure creates serious security vulnerabilities.

This creates a two-layer problem. First, subtle but critical ambiguities or absent content in a CA’s Certification Practice Statement (CPS) and practices might not register as problems to non-specialists. Second, even when auditors do spot vague language, commercial pressures create an impossible dilemma: push the customer toward greater specificity (risking the engagement and future revenue), or let it slide due to the absence of explicit requirements.

This dynamic creates a classic moral hazard, an issue similar to the one we explored in our recent post, Auditors are paid by the very entities they’re supposed to scrutinize critically, creating incentives to overlook issues in order to maintain business relationships. Meanwhile, the consequences of missed problems, security failures, compromised trust, and operational disruptions fall on the broader WebPKI ecosystem and billions of relying parties who had no voice in the audit process. This dynamic drives the inconsistencies we observe today and reflects a broader moral hazard problem plaguing the entire WebPKI ecosystem, where those making critical security decisions rarely bear the full consequences of poor choices.

This reality presents a prime opportunity for disruption through intelligent automation. The core problem lies in expertise “illiquidity”, deep compliance knowledge remains locked in specialists’ minds, trapped in manual processes, and is prohibitively expensive to scale.

Current compliance automation has only created “automation asymmetry,” empowering auditees to generate voluminous, polished artifacts that overwhelm manual auditors. This transforms audits from operational fact-finding into reviews of well-presented fiction.

The solution requires creating true “skill liquidity” through AI: not just another LLM, but an intelligent compliance platform embedding structured knowledge from seasoned experts. This system would feature an ontology of controls, evidence requirements, and policy interdependencies, capable of performing the brutally time-consuming rote work that consumes up to 30% of manual audits: policy mapping, change log scrutiny, with superior speed and consistency.

When auditors and program administrators gain access to this capability, the incentive model fundamentally transforms. AI can objectively flag ambiguities and baseline deviations that humans might feel pressured to overlook or lack the skill to notice, directly addressing the moral hazard inherent in the current system. When compliance findings become objective data points generated by intelligent systems rather than subjective judgments influenced by commercial relationships, they become much harder to ignore or rationalize away.

This transformation liquefies rote work, liberating human experts to focus on what truly matters: making high-stakes judgment calls, investigating system-flagged anomalies, and assessing control effectiveness rather than mere documented existence. This elevation transforms auditors from box-checkers into genuine strategic advisors, addressing the system’s core ethical challenges.

This new transparency and accountability shifts the entire dynamic. Audited entities can evolve from reactive fire drills to proactive, continuous self-assurance. Auditors, with amplified expertise and judgment focused on true anomalies rather than ambiguous documentation, can deliver exponentially greater value.

Moving Past the Performance

This brings us back to the fundamental issue: the biggest problem in communication is the illusion that it has occurred. Today’s use of the word “audit” creates a dangerous illusion of deep security assessment.

By leveraging AI to create skill liquidity, we can finally move past this illusion by automating the more mundane audit elements giving space where the assumed security and correctness assessments also happen. We can forge a future where compliance transcends audit performance theater, becoming instead a foundation of verifiable, continuous operational integrity, built on truly accessible expertise rather than scarce, locked-away knowledge.

The WebPKI ecosystem deserves better than the bare minimum. With the right tools and transformed incentives, we can finally deliver it.

The WebPKI’s Moral Hazard Problem: When Those Who Decide Don’t Pay the Price

TL;DR: Root programs, facing user loss, prioritize safety, while major CAs, with browsers, shape WebPKI rules. Most CAs, risking distrust or customers, seek leniency, shifting risks to billions of voiceless relying parties. Subscribers’ push for ease fuels CA resistance, demanding reform.


The recent Mozilla CA Program roundtable discussion draws attention to a fundamental flaw in how we govern the WebPKI, one that threatens the security of billions of internet users. It’s a classic case of moral hazard: those making critical security decisions face minimal personal or professional consequences for poor choices, while those most affected have virtually no say in how the system operates.

The Moral Hazard Matrix

The numbers reveal a dangerous imbalance in who controls WebPKI policy versus who bears the consequences. Browsers, as root programs, face direct accountability; if security fails, users abandon them. CAs on the other hand are incentivized to reduce customer effort and boost margins, externalize risks, leaving billions of relying parties to absorb the fallout:

A classic moral hazard structure, with a key distinction: browser vendors, as root programs, face direct consequences, lose security, lose users, aligning incentives with safety. CAs, while risking distrust or customer loss, often externalize greater risks to relying parties, leaving them to face the fallout betting that they wont be held accountable for these decisions.

Mapping the Accountability Breakdown

The roundtable revealed a systematic divide in how stakeholders approach CPS compliance issues. CAs, driven by incentives to minimize customer effort for easy sales and reduce operational costs for higher margins, consistently seek to weaken accountability, while root programs and the security community demand reliable commitments:

PositionSupported ByCore ArgumentWhat It Really Reveals
“Revocation too harsh for minor CPS errors”CA OwnersPolicy mismatches shouldn’t trigger mass revocationWant consequences-free policy violations
“Strict enforcement discourages transparency”CA OwnersFear of accountability leads to vague CPSsTreating governance documents as optional “documentation”
“SLA-backed remedies for enhanced controls”CA OwnersCredits instead of revocation for optional practicesAttempt to privatize trust governance
“Split CPS into binding/non-binding sections”CA OwnersReduce revocation triggers through document structureAvoid accountability while claiming transparency
“Human error is inevitable”CA OwnersManual processes will always have mistakesExcuse for not investing in automation
“Retroactive CPS fixes should be allowed”CA OwnersPatch documents after problems surfaceGut the very purpose of binding commitments
“CPS must be enforceable promises”Root Programs, Security CommunityDocuments should reflect actual CA behaviorPublic trust requires verifiability
“Automation makes compliance violations preventable”Technical Community65+% ACME adoption proves feasibilityEngineering solutions exist today

The pattern is unmistakable: CAs consistently seek reduced accountability, while those bearing security consequences demand reliable commitments. The Microsoft incident perfectly illustrates this, rather than addressing the absence of systems that would automatically catch discrepancies before millions of certificates were issued incorrectly, industry discussion focused on making violations easier to excuse retroactively.

The Fundamental Mischaracterization

Much of the roundtable suffered from a critical misconception: the CPS is “documentation” rather than what it is, the foundational governance document that defines how a CA operates.

A CPS looks like a contract because it is a contract, a contract with the world. It’s the binding agreement that governs CA operations, builds trust by showing relying parties how the CA actually works, guides subscribers through certification requirements, and enables oversight by giving auditors a baseline against real-world issuance. When we minimize it as “documentation,” we’re arguing that CAs should violate their core operational commitments with minimal consequences.

CPS documents are the public guarantee that a CA knows what it’s doing and will stand behind it, in advance, in writing, in full view of the world. The moment we treat them as optional “documentation” subject to retroactive fixes, we’ve abandoned any pretense that trustworthiness can be verified rather than simply taken on blind faith.

Strategic Choices Masquerading as Constraints

Much CA pushback treats organizational and engineering design decisions as inevitable operational constraints. When CAs complain about “compliance staff being distant from engineering” or “inevitable human errors in 100+ page documents,” they’re presenting strategic choices as unchangeable facts.

CAs choose to separate compliance from operations rather than integrate them. They choose to treat CPS creation as documentation rather than operational specification. They choose to bolt compliance on after the fact rather than build it into core systems. When you choose to join root programs to be trusted by billions of people, you choose those responsibilities.

The CAs that consistently avoid compliance problems made different choices from the beginning, they integrated policy into operations, invested in automation, and designed systems where compliance violations are structurally difficult. These aren’t companies with magical resources; they’re companies that prioritized operational integrity.

The Technology-Governance Gap

The “automation is too hard” argument collapses against actual WebPKI achievements:

ChallengeCurrent StateFeasibility EvidenceCA Resistance
Domain ValidationFully automated via ACME65+% of web certificates✅ Widely adopted
Certificate LintingReal-time validation at issuanceCT logs, zlint tooling✅ Industry standard
Transparency LoggingAll certificates publicly loggedCertificate Transparency✅ Mandatory compliance
Renewal ManagementAutomated with ARILet’s Encrypt, others✅ Proven at scale
CPS-to-Issuance AlignmentManual, error-proneMachine-readable policies possible❌ “Too complex”
Policy Compliance CheckingAfter-the-fact incident reportsAutomated validation possible❌ “Inevitable human error”

The pattern is unmistakable: automation succeeds when mandated, fails when optional. With Certificate Transparency providing complete visibility, automated validation systems proven at scale, and AI poised to transform compliance verification across industries, operational CPSs represent evolution, not revolution.

The argument is that these “minor” incidents don’t represent smoke, as in where there is smoke there is fire, when we know through past distrust events it is always a pattern of mistakes often snowballing while the most mature CA programs only occasional have issues, and when they do they deal with them well.

Trust Is Not an Entitlement

The question “why would CAs voluntarily adopt expensive automation?” reveals a fundamental misunderstanding. CAs are not entitled to being trusted by the world.

Trust store inclusion is a privilege that comes with responsibilities. If a CA cannot or will not invest in operational practices necessary to serve billions of relying parties reliably, they should not hold that privilege.

The economic argument is backwards:

  • Current framing: “Automation is expensive, so CAs shouldn’t be required to implement it”
  • Correct framing: “If you can’t afford to operate, securely, accuratley and reliably, you can’t afford to be a public CA”

Consider the alternatives: public utilities must maintain infrastructure standards regardless of cost, financial institutions must invest in security regardless of expense, aviation companies must meet safety standards regardless of operational burden. The WebPKI serves more people than any of these industries, yet we’re supposed to accept that operational excellence is optional because it’s “expensive”?

CAs with consistent compliance problems impose costs on everyone else, subscribers face revocation disruption, relying parties face security risks, root programs waste resources on incident management. The “expensive automation” saves the ecosystem far more than it costs individual CAs.

When Accountability Actually Works

The example of Let’s Encrypt changing their CPS from “90 days” to “less than 100 days” after a compliance issue is often cited as evidence that strict enforcement creates problems. This completely misses the point.

The “system” found a real compliance issue, inadequate testing between policy and implementation. That’s exactly what publishing specific commitments accomplishes: making gaps visible so they can be fixed. The accountability mechanism worked perfectly, Let’s Encrypt learned they needed better testing to ensure policy-implementation alignment.

This incident also revealed that we need infrastructure like ACME Renewal Information (ARI) so the ecosystem can manage obligations without fire drills. The right response isn’t vaguer CPSs to hide discrepancies, but better testing and ecosystem coordination so you can reliably commit to 90 days and revocations when mistakes happen.

The Solution: Operational CPSs

Instead of weakening accountability, we need CPSs as the living center of CA operations, machine-readable on one side to directly govern issuance systems, human-readable on the other for auditors and relying parties. In the age of AI, tools like large language models and automated validation can make this dual-purpose CPS tractable, aligning policy with execution.

This means CPSs written by people who understand actual issuance flows, updated in lock-step with operational changes, tied directly to automated linting, maintained in public version control, and tested continuously to verify documentation matches reality.

Success criteria are straightforward:

  • Scope clarity: Which root certificates does this cover?
  • Profile fidelity: Could someone recreate certificates matching actual issuance?
  • Validation transparency: Can procedures be understood without insider knowledge?

Most CPSs fail these basic tests. The few that pass prove it’s entirely achievable when CAs prioritize operational integrity over administrative convenience.

Systemic Reform Requirements

Fixing moral hazard requires accountability mechanisms aligned with actual capabilities. Root programs typically operate with 1-2 people overseeing ~60 organizations issuing 450,000+ certificates per hour, structural challenges that automation must address.

StakeholderCurrent StateRequired ChangesImplementation
CAsManual CPS creation, retroactive fixesCPSs as operational specificationsEngineering-written, issuance-system-tied, continuously tested
Root ProgramsMinimal staff, inconsistent enforcementClearer requirements for CPS documents, automated evaluation tools, clear standardsScalable infrastructure requiring scope clarity, profile fidelity, and validation transparency
Standards BodiesVoluntary guidelines, weak enforcementMandatory automation requirementsUpdated requirements to ensure adoption of automation that helps ensure commitments are met.
Audit SystemAnnual snapshots, limited scopeContinuous monitoring, real-time validationIntegration with operational systems

Root programs that tolerate retroactive CPS fixes inadvertently encourage corner-cutting on prevention systems. Given resource constraints, automated evaluation tools and clear standards become essential for consistent enforcement.

The Stakes Demand Action

Eight billion people depend on this system. We cannot allow fewer than 60 CA owning organizations to keep treating public commitments as optional paperwork instead of operational specifications.

When certificate failures occur, people lose life savings, have private communications exposed, lose jobs when business systems fail, or face physical danger when critical infrastructure is compromised. DigiNotar’s 2011 collapse showed how single CA failures can compromise national digital infrastructure. CAs make decisions that enable these risks; relying parties bear the consequences.

The choice is stark:

  • Continue excuse-making and accountability avoidance while billions absorb security consequences
  • Or demand that CAs and root programs invest in systems making trust verifiable

The WebPKI’s moral hazard problem won’t solve itself. Those with power to fix it have too little incentive to act; those who suffer consequences have too little voice to demand change.

The WebPKI stands at a turning point. Root programs, the guardians of web privacy, are under strain from the EU’s eIDAS 2.0 pushing questionable CAs, tech layoffs thinning their teams, and the U.S. DOJ’s plan to break up Chrome, a cornerstone of web security. With eight billion people depending on this system, weak CAs could fuel phishing scams, data breaches, or outages that upend lives, as DigiNotar’s 2011 downfall showed. That failure taught us trust must be earned through action. Automation, agility, and transparency can deliver a WebPKI where accountability is built-in. Let’s urge CAs, root programs, and the security community to adopt machine-readable CPSs by 2026, ensuring trust is ironclad. The time to act is now, together, we can secure the web for our children and our grandchildren.


For more on this topic, see my take on why CP and CPSs matter more than you think.

From Mandate to Maybe: The Quiet Unwinding of Federal Cybersecurity Policy

Why the 2025 Amendments to EO 14144 Walked Back Progress on PQC, SBOMs, and Enforcement, Even as the Products to Support Them Have Become Real.

The June 2025 amendments to Executive Order 14144 read like a cybersecurity manifesto. They name adversaries (China, Russia, Iran, North Korea) with unprecedented directness and reference cutting-edge threats like quantum computing and AI-enabled attacks. The rhetoric is strong. The tone, urgent.

But beneath the geopolitical theater, something quieter and more troubling has happened. The Executive Order has systematically stripped out the enforcement mechanisms that made federal cybersecurity modernization possible. Mandates have become “guidance.” Deadlines have turned into discretion. Requirements have transformed into recommendations.

We’re witnessing a shift from actionable federal cybersecurity policy to a fragmented, voluntary approach, just as other nations double down on binding standards and enforcement.

The Enforcement Rollback

The most visible casualty was the software bill of materials (SBOM) mandate . The original EO 14144 required vendors to submit machine-readable attestations, with specific deadlines for updating federal procurement rules. These requirements have been entirely deleted.

This removal actually makes sense. Most SBOMs today are fundamentally broken: generated manually, and don’t actually match to deployed artifacts. Without robust validation infrastructure, SBOMs create more noise than signal. Use cases like vulnerability correlation break down when the underlying data is untrustworthy.

Once you have reproducible builds and verifiable provenance pipelines, SBOMs become implicit in the process. The government was both premature and naive in requiring SBOMs before the ecosystem could reliably generate them and do something with them. More fundamentally, they hooed that mandating documentation would somehow solve the underlying supply chain visibility problem – unfortunately thats not the case.

But SBOMs are a symptom of deeper issues: unreproducible builds, opaque dependency management, and post-hoc artifact tracking. Simply requiring vendors to produce better paperwork was never going to address these foundational challenges. The mandate confused the deliverable with the capability.

What’s more concerning is what else disappeared. Provisions mandating phishing-resistant multi-factor authentication, real-time interagency threat sharing, and specific timelines for aligning federal IT procurement with Zero Trust requirements all vanished. The detailed Border Gateway Protocol security language was replaced with generic “agency coordination” directives. The EO stripped away near-term pressure on vendors and agencies alike.

Yet even as these enforcement mechanisms were being removed, the amendments introduced something potentially transformative.

Rules as Code: Promise, Paradox, and Perfect Timing

The most exciting addition is buried in bureaucratic language. A pilot program for “machine-readable versions of policy and guidance” in cybersecurity appears almost as an afterthought. While the EO doesn’t name OSCAL explicitly, this is almost certainly referring to expanding the Open Security Controls Assessment Language use beyond its current FedRAMP usage into broader policy areas.

This could be transformative. Imagine cybersecurity policies that are automatically testable, compliance that’s continuously verifiable, and security controls that integrate directly with infrastructure-as-code. OSCAL has already proven this works in FedRAMP: structured security plans, automated assessment results, and machine-readable control catalogs. Expanding this approach could revolutionize how government manages cybersecurity risk.

But there’s something deliciously ironic about the timing. We’re finally standardizing JSON schemas for control matrices and policy frameworks just as AI becomes sophisticated enough to parse and understand unstructured policy documents directly. It’s almost comical. Decades of manual compliance work have driven us to create machine-readable standards, and now we have “magical AI” that could theoretically read the original messy documents.

Yet the structured approach remains the right direction. While AI can parse natural language policies, it introduces interpretation variations. Different models might understand the same requirement slightly differently. OSCAL’s structured format eliminates ambiguity. When a control is defined in JSON, there’s no room for misinterpretation about implementation requirements.

More importantly, having machine-readable controls means compliance tools, security scanners, and infrastructure-as-code pipelines can directly consume and act on requirements without any parsing layer. The automation becomes more reliable and faster than AI interpretation. Real-time compliance monitoring really only works with structured data. AI might tell you what a policy says, but OSCAL helps you build systems that automatically check if you’re meeting it continuously.

This pattern of promising technical advancement while retreating from enforcement continues in the amendments’ approach to cryptographic modernization.

The Post-Quantum Reality Check

Then there’s the post-quantum cryptography provisions. The EO requires CISA and NSA to publish lists of PQC-supporting products by December 2025, and mandates TLS 1.3 by January 2030.

The TLS 1.3 requirement appears to be carried over from the previous executive order, suggesting this wasn’t a deliberate policy decision but administrative continuity. The amendment specifically states that agencies must “support, as soon as practicable, but not later than January 2, 2030, Transport Layer Security protocol version 1.3 or a successor version.” More tellingly, the 2030 timeline likely reflects a sobering recognition of enforcement reality: federal agencies and contractors are struggling with basic infrastructure modernization, making even a five-year runway for TLS 1.3 adoption potentially optimistic.

This reveals the central tension in federal cybersecurity policy. The infrastructure is calcified. Legacy systems, interception-dependent security architectures, and procurement cycles that move at geological speed all contribute to the problem. A 2030 TLS 1.3 mandate isn’t visionary; it’s an acknowledgment that the federal government can’t move faster than its most outdated components.

But this enforcement realism makes the broader PQC timeline even more concerning. If we need five years to achieve TLS 1.3 adoption across federal systems, how long will the actual post-quantum migration take? By 2030, the question won’t be whether agencies support TLS 1.3, but whether they’ve successfully migrated key exchange, digital signatures, and PKI infrastructure to post-quantum algorithms. That’s a far more complex undertaking.

In essence, the EO treats PQC like a checklist item when it’s actually a teardown and rebuild of our cryptographic foundation. Historically, the federal government has led cryptographic transitions by creating market demand and demonstrating feasibility, not by setting distant mandates. When the government moved to AES or adopted Suite B algorithms, it drove adoption through procurement pressure and early implementation.

Meanwhile, allies like the UK and Germany are taking this traditional approach with PQC. The UK’s National Cyber Security Centre has published detailed migration timelines and will launch a pilot program to certify consultancy firms that provide PQC migration support to organizations. Germany’s Federal Office for Information Security has been leading in co-developing standards and demonstrating early government adoption. They’re creating market pull through demonstrated feasibility, not regulatory deadlines that may prove unenforceable.

Beyond cryptography, the EO does introduce some concrete requirements, though these represent a mixed bag of genuine progress and missed opportunities.

The EO also tasks NIST with updating key frameworks and calls for AI-specific vulnerability coordination. All valuable work. But notably absent: any requirement for agencies to adopt, implement, or report on these updated frameworks.

One genuinely new addition is the IoT Cyber Trust Mark requirement: by January 2027, federal agencies must require vendors of consumer IoT products to carry the labeling. This represents concrete procurement leverage, though it’s limited to a narrow product category.

These mixed signals, technical infrastructure development alongside enforcement retreat, reflect a broader pattern that undermines the federal government’s cybersecurity leadership.

As we’ve explored in previous discussions of AI’s impact on compliance, this shift toward automated policy interpretation and enforcement represents a broader transformation in how expertise flows through complex systems, but only when the underlying mandates exist to make that automation meaningful.

We’re building this sophisticated machine-readable infrastructure just as the enforcement mechanisms that would make it meaningful are being stripped away. It’s like having a perfectly engineered sports car but removing the requirement to actually drive anywhere.

The Infrastructure Is Ready. The Mandate Isn’t.

Federal cybersecurity policy shapes vendor behavior, influences state and local government standards, and signals U.S. priorities to international partners. Without centralized mandates, vendors receive mixed signals. Agencies implement inconsistently. Meanwhile, international partners advance with clearer timelines and stronger enforcement. The U.S. risks ceding leadership in areas where it built the foundational standards, just as adversaries accelerate their own capabilities.

The United States has built remarkable cybersecurity infrastructure. OSCAL for automated compliance, frameworks for secure software development, and draft PQC standards for cryptographic transition all represent genuine achievements. But the June 2025 amendments represent a retreat from the leadership needed to activate this infrastructure.

We have the tooling, standards, and momentum, but we’ve paused at the moment we needed to press forward. In the face of growing threats and global urgency, discretion is not resilience.

We’ve codified trust, but stopped requiring it, leaving security to agency discretion instead of institutional design. That’s not a strategy. It’s a hope. And hope is not a security control.

Why CP and CPSs Matter More Than You Think

I’ve been in the PKI space for a long time, and I’ll be honest, digging through Certificate Policies (CPs) and Certification Practice Statements (CPSs) is far from my favorite task. But as tedious as they can be, these documents serve real, high-value purposes. When you approach them thoughtfully, the time you invest is anything but wasted.

What a CPS Is For

Beyond satisfying checkbox compliance, a solid CPS should:

  • Build trust by showing relying parties how the CA actually operates.
  • Guide subscribers by spelling out exactly what is required to obtain a certificate.
  • Clarify formats by describing certificate profiles, CRLs, and OCSP responses so relying parties know what to expect.
  • Enable oversight by giving auditors, root store programs, and researchers a baseline to compare against real-world issuance.

If a CPS fails at any of these, it fails in its primary mission.

Know Your Audience

A CPS is not just for auditors. It must serve subscribers who need to understand their obligations, relying parties weighing whether to trust a certificate, and developers, security researchers, and root store operators evaluating compliance and interoperability.

The best documents speak to all of these readers in clear, plain language without burying key points under mountains of boilerplate.

A useful parallel is privacy policies or terms of service documents. Some are written like dense legal contracts, full of cross-references and jargon. Others aim for informed consent and use plain language to help readers understand what they are agreeing to. CPs and CPSs should follow that second model.

Good Examples Do Exist

If you’re looking for CPS documents that get the basics right, Google Trust Services and Fastly are two strong models:

There are many ways to evaluate a CPS, but given the goals of these documents, fundamental tests of “good” would certainly include:

  1. Scope clarity: Is it obvious which root certificates the CPS covers?
  2. Profile fidelity: Could a reader recreate reference certificates that match what the CA actually issues?

Most CPSs fail even these basic checks. Google and Fastly pass, and their structure makes independent validation relatively straightforward. Their documentation is not just accurate, it is structured to support validation, monitoring, and trust.

Where Reality Falls Short

Unfortunately, most CPSs today don’t meet even baseline expectations. Many lack clear scope. Many don’t describe what the issued certificates will look like  in a way that can be independently verified. Some fail to align with basics like RFC 3647, the framework they are supposed to follow.

Worse still, many CPS documents fail to discuss how or if they meet requirements they claim compliance with. That includes not just root program expectations, but also standards like:

  • Server Certificate Baseline Requirements
  • S/MIME Baseline Requirements
  • Network and Certificate System Security Requirements

These documents may not need to replicate every technical detail, but they should objectively demonstrate awareness of and alignment with these core expectations. Without that, it’s difficult to expect trust from relying parties, browsers, or anyone else depending on the CA’s integrity.

Even more concerning, many CPS documents don’t fully reflect the requirements of the root programs that grant them inclusion:

The Cost of Getting It Wrong

These failures are not theoretical. They have led to real-world consequences.

Take Bug 1962829, for example, a recent incident involving Microsoft PKI Services. “A typo” introduced during a CPS revision misstated the presence of the keyEncipherment bit in some certificates. The error made it through publication and multiple reviews, even as millions of certificates were issued under a document that contradicted actual practice.

The result? Distrust risks, revocation discussions, and a prolonged, public investigation.

The Microsoft incident reveals a deeper problem, CAs that lack proper automation between their documented policies and actual certificate issuance. This wasn’t just a documentation error, it exposed the absence of systems that would automatically catch such discrepancies before millions of certificates were issued under incorrect policies.

This isn’t an isolated case. CP and CPS “drift” from actual practices has played a role in many other compliance failures and trust decisions. This post discusses CA distrust and misissuance due to CP or CPS not matching observable reality is certainly a common factor.

Accuracy Is Non-Negotiable

Some voices in the ecosystem now suggest that when a CPS is discovered to be wrong, the answer is simply to patch the document retroactively and move on.  This confirms what I have said for ages, too many CAs want the easy way out, patching documents after problems surface rather than investing in the automation and processes needed to prevent mismatches in the first place. 

That approach guts the very purpose of a CPS. Making it easier for CAs to violate their commitments creates perverse incentives to avoid investing in proper compliance infrastructure.

Accountability disappears if a CA can quietly “fix” its promises after issuance. Audits lose meaning because the baseline keeps shifting. Relying-party trust erodes the moment documentation no longer reflects observable reality.

A CPS must be written by people who understand the CA’s actual issuance flow. It must be updated in lock-step with code and operational changes. And it must be amended before new types of certificates are issued. Anything less turns it into useless marketing fluff.

Make the Document Earn Its Keep

Treat the CPS as a living contract:

  • Write it in plain language that every audience can parse.
  • Tie it directly to automated linting so profile deviations are caught before issuance. Good automation makes policy violations nearly impossible; without it, even simple typos can lead to massive compliance failures.
  • Publish all historical versions so the version details in the document are obvious and auditable. Better yet, maintain CPS documents in a public git repository with markdown versions that make change history transparent and machine-readable.
  • Run every operational change through a policy-impact checklist before it reaches production.

If you expect others to trust your certificates, your public documentation must prove you deserve that trust. Done right, a CPS is one of the strongest signals of a CA’s competence and professionalism. Done wrong, or patched after the fact, it is worse than useless.

Root programs need to spend time documenting the minimum criteria that these documents must meet. Clear, measurable standards would give CAs concrete targets and make enforcement consistent across the ecosystem. Root programs that tolerate retroactive fixes inadvertently encourage CAs to cut corners on the systems and processes that would prevent these problems entirely.

CAs, meanwhile, need to ask themselves hard questions: Can someone unfamiliar with internal operations use your CPS to accomplish the goals outlined in this post? Can they understand your certificate profiles, validation procedures, and operational commitments without insider knowledge?

More importantly, CAs must design their processes around ensuring these documents are always accurate and up to date. This means implementing testing to verify that documentation actually matches reality, not just hoping it does.

The Bottom Line

CPS documents matter far more than most people think. They are not busywork. They are the public guarantee that a CA knows what it is doing and is willing to stand behind it, in advance, in writing, and in full view of the ecosystem.