Category Archives: AI

The Illusion of Constant Acceleration

Spend enough time around AI right now and you start to get the feeling that everything is speeding up, all the time.

Every week there is a new model, a new capability, a new claim that some industry is about to be remade. It starts to feel like rapid change is just the new baseline. Like history has bent into a permanently steeper slope.

I do not think that is right.

What I think is closer to the truth is that we have gotten used to confusing motion with progress, and delay with inevitability. Some things are moving very quickly. Others are barely moving at all. We treat the former as inevitable and the latter as unavoidable.

Neither is true.

My father was born in 1942. That is not ancient history. When he was born, there was still a lot of basic infrastructure left to build.

Within a little more than a decade, nonstop transcontinental passenger air service became viable. Less than eight years after that, a human entered space. Eight years later, people were walking on the Moon.

That is a staggering amount of change in a very short period of time.

In one person’s early life, we went from making coast-to-coast air travel practical to landing human beings on another celestial body. Not as a thought experiment. Not as a roadmap. We just did it.

And it was not only aerospace. The Golden Gate Bridge was built in about four years. The first transcontinental railroad was completed in about six. These were massive physical undertakings that reshaped how people moved and how economies functioned, delivered on timelines that would feel almost implausible now.

The easy way to dismiss this is to say that software is fast and physical infrastructure is slow. That if AI looks fast and transit looks slow, that is just how the world works.

But that does not really hold up.

Ukraine did not build its drone ecosystem on leisurely timelines. Tesla compressed what many assumed would be a slow industrial transition into something the rest of the auto industry had to react to. When something actually matters, physical systems move. Supply chains get reorganized. Tradeoffs get made. Bureaucracies get bent. Talent concentrates. People stop explaining why something is hard and start figuring out how to get it done.

That is part of what makes Artemis interesting.

This is not a criticism of Artemis. It is an ambitious and serious effort. But it is also a reminder that progress is not self-sustaining. Apollo is often remembered as a triumph of technology, but it was just as much a triumph of focus, alignment, and urgency. Artemis reminds us that those things matter just as much as the rockets do.

There is another force that shows up in systems like this.

At Google, there was a name for it: slime mold.

It is what happens when layers of process, approvals, coordination costs, and local incentives build up over time until forward motion gets harder even when nobody involved is being unreasonable. Everything makes sense on its own. The system just moves more slowly.

Technology policy has its own versions of slime mold.

We saw it in the crypto wars, when policymakers convinced themselves that math could be slowed down with policy, as if cryptographic reality were open to negotiation. It was not. What that produced was not real control. It produced friction, workarounds, and the illusion of governance.

You can see the same instinct showing up again in parts of the conversation around AI. When institutions feel outpaced, they respond with process. That instinct is understandable, but it rarely solves the problem. You do not make systems safer by pretending inevitabilities are optional. You make them safer by building the infrastructure, incentives, and accountability needed to deal with what is actually happening.

But that is not how we tend to think about progress.

We talk about technological achievement as if it were mostly about invention, as if once something has been demonstrated it remains latent in society, ready to be called back into service whenever we need it.

That is not how any of this works.

The ability to do ambitious things quickly depends on organizational memory, industrial capacity, political alignment, tolerance for risk, and a culture that still expects big things to happen on human timescales.

Lose enough of that, and even getting back to where you once were becomes hard.

You can see it in infrastructure. Projects that once would have been treated as urgent now take decades, often in fragments so small that earlier generations would have treated them as preliminary milestones. Over time, that changes expectations. Slowness starts to look like responsibility. Ambition starts to sound naive.

That is the trap.

The problem is not just that progress slows. It is that people get used to it. What would once have looked like drift starts to look like process. What would once have sounded like an excuse starts to sound like maturity.

Meanwhile, in domains where urgency and incentives line up, things still move very quickly. ChatGPT was released publicly in late 2022. In a few years, AI went from something most people associated with research labs to something embedded in everyday workflows, products, and policy debates.

AI did not prove that everything is accelerating.

It proved that when enough capability, capital, and attention line up, rapid change is still possible.

That is the point.

The world is not uniformly speeding up. Some parts of it are. Others are not. And the difference has less to do with atoms versus bits than with whether we have decided something actually matters.

That ought to make us a little less complacent.

People like to tell themselves that once a technology is important enough, the rest somehow sorts itself out. The problems get solved. The risks get managed. The surrounding systems catch up.

History does not really support that.

Things were only all right in the past because people worked very hard to make them all right. The systems that made aviation safe, that made infrastructure dependable, that made computing usable in high-trust environments, none of that appeared on its own.

The same will be true here.

If we want AI to be safe, trustworthy, and broadly useful, that will not happen as a side effect of capability gains. Security will not emerge on its own. Governance will not emerge on its own. The infrastructure needed to make these systems worthy of dependence will not emerge on its own.

Those things only happen when people decide they matter.

That is the real problem with the idea that everything is accelerating. It makes it easy to believe that progress takes care of itself.

It does not.

Progress happens when people decide it needs to, and then do the work.

Confidential Computing’s Inconvenient Truth

This is part of a series on confidential computing. See also: Confidential Computing: What It Is, What It Isn’t, and How to Think About It for practical deployment guidance, and Why Nobody Can Verify What Booted Your Server for the attestation infrastructure gap. Two companion reference documents provide the evidence base: the TEE Vulnerability Taxonomy and TPM Attestation and PCR Verification: The Infrastructure Gap.

Confidential computing has a vulnerability record that grows every year, an attestation infrastructure that does not work at scale, and a hardware root of trust with a demonstrated shelf life. This piece explains why.

I want to be clear about where I stand before cataloging problems. I believe in this technology. What Signal has done with Private Contact Discovery and Sealed Sender using SGX enclaves, building systems where even Signal’s own servers cannot see who is contacting whom, is exactly the kind of architecture that confidential computing makes possible. Apple’s Private Cloud Compute takes the model further. Every production build is published to a transparency log, user devices will only communicate with nodes whose attested measurements match the log, and Apple released a virtual research environment so anyone can verify the claims independently. Moxie Marlinspike’s Confer applies the same idea to AI inference, with all processing inside a TEE and remote attestation so the service provider never has access to your conversations. These are real systems delivering real privacy guarantees that would be hard to achieve any other way.

More broadly, TEEs make systems more verifiable. Instead of asking users to take on faith that a service handles their data correctly, the service can prove it through attestation. I wrote earlier about attestation as the MFA for machines and workloads, and I explored the same idea in 2022 in the context of certificate authorities. If the CA runs open-source software on attesting hardware with reproducible builds, you can verify its behavior rather than trusting an annual audit. That shift, from asserted trust to verifiable trust, is genuinely important, and confidential computing is what makes it possible.

But “the direction is right” is not the same as “the current state is adequate.” We should not make perfection the enemy of good. This technology delivers real value today. But we also cannot afford to mistake the current state for the desired end state. Getting to where this technology needs to be requires seeing clearly where it actually is. That is what this piece is about.

The answer is not “the implementations are buggy.” The answer is structural. These technologies were designed for threat models that do not match how they are being deployed. Smart cards and HSMs were physically discrete devices with clear trust boundaries. TPMs were designed for boot integrity on enterprise desktops. Intel SGX was designed for desktop DRM. Each was repurposed for the cloud because the technology existed and the market needed something now. The repurposing created systematic security gaps that the research community has spent a decade documenting and the market has spent a decade deploying through.

In March 2025, I published a technical reference on security hardware and an in-depth companion document that categorized how these technologies fail. One of those failure categories was “Misuse Issues”: vulnerabilities that occur when security technology is adopted beyond its original design. A year later, with TDXRay reconstructing LLM prompts from inside encrypted VMs, TEE.Fail extracting attestation keys with a $1,000 device, and the SGX Global Wrapping Key extracted from hardware fuses, that observation warrants a much fuller treatment.

Timeline

YearEventCategory
1968Smart card patents (Dethloff, Moreno). Special-purpose computers in tamper-resistant packages. The original TEE.Hardware TEE
1980sIBM secure coprocessors for banking. US government funds kernelized secure OS research.Hardware TEE
1996nCipher founded. nShield HSMs with CodeSafe: custom application code inside tamper-resistant hardware.Hardware TEE
1998IBM 4758 commercially available. Arbitrary code execution inside tamper-responding enclosure. FIPS 140-1 Level 4.Hardware TEE
2003TCG founded, TPM standardized. Designed for boot integrity from ring -x. Hardware root of trust, measurement chains, attestation concepts established.Institutional
2006AWS launches EC2. Public cloud computing begins. Workloads move to shared infrastructure owned by someone else.Cloud
2006BitLocker ships with TPM support. TPMs reach millions of enterprise devices. Reference value infrastructure never materializes.Hardware TEE
2008-2010Cloud goes mainstream. Azure (2010), GCP (2008), OpenStack (2010). Multi-tenant shared infrastructure becomes the default enterprise compute model.Cloud
2012AlexNet wins ImageNet. Deep learning proven at scale on GPUs. AI workloads begin moving to cloud GPU infrastructure.AI
2013Apple Secure Enclave Processor (iPhone 5s). Physically separate processor on SoC. First mass-market TEE. Invisible to users.Hardware TEE
2015Intel SGX (Skylake). Enclaves inside the CPU. Designed for desktop DRM: single-tenant threat model. Cloud providers begin evaluating for multi-tenant use.CPU TEE
2016AMD SEV. VM-level memory encryption. First CPU TEE designed with virtualization in mind.CPU TEE
2017Transformer architecture published (“Attention Is All You Need”). Foundation for the model scale that will drive confidential computing demand.AI
2017First SGX side-channel attacks. Cache-timing, Spectre adaptation. Desktop design meets multi-tenant reality.Vulnerability
2018Foreshadow (L1TF) reads arbitrary SGX memory. SEVered remaps SEV guest pages. Desktop-to-cloud threat model gap exploited.Vulnerability
2019Confidential Computing Consortium founded (Google, Microsoft, IBM, Intel, Linux Foundation). Repurposing becomes official strategy.Institutional
2019Plundervolt, ZombieLoad, RIDL. Three distinct attack classes against SGX in one year.Vulnerability
2020GPT-3 (175B parameters). Model weights become billion-dollar assets. Protecting weights on shared infrastructure becomes a business requirement.AI
2020AWS Nitro Enclaves. Purpose-built for cloud, not repurposed from desktop. The exception to the pattern.Cloud
2020AMD SEV-SNP, Intel TDX announced. VM-level TEEs designed for cloud but still sharing microarchitectural resources. Azure/GCP ship confidential VMs with vTPMs.Cloud
2021Intel deprecates SGX on consumer CPUs (11th/12th gen Core). Desktop DRM cannot sustain the technology alone.CPU TEE
2022ChatGPT launches (Nov). AI goes mainstream. Every enterprise begins evaluating LLM deployment on cloud infrastructure.AI
2022ÆPIC Leak, SGX.Fail. Vulnerable platforms remain in TRUSTED attestation state months after disclosure.Vulnerability
2023GPT-4, Llama 2, Claude 2. Foundation model race accelerates. EU AI Act passed.AI
2023Downfall (SGX), CacheWarp (SEV-SNP). CacheWarp is first software-based attack defeating SEV-SNP integrity. NVIDIA H100 confidential GPU ships.Vulnerability
2024Confidential AI goes mainstream. Azure, GCP, AWS all position confidential computing for AI. TDXdown and Heckler attacks hit TDX. HyperTheft extracts model weights via ciphertext side channels.AI / Vulnerability
2025 FebGoogle finds insecure hash in AMD microcode signature validation (CVE-2024-56161). Malicious microcode loadable under SEV-SNP.Vulnerability
2025 MayGoogle announces confidential GKE nodes with NVIDIA H100 GPUs. Confidential AI training and inference on GPU clusters.AI
2025 OctTEE.Fail. $1K DDR5 bus interposer extracts attestation keys from Intel TDX and AMD SEV-SNP. Attestation forgery demonstrated.Vulnerability
2025 DecIDC survey: 75% of organizations adopting confidential computing, 84% cite attestation validation as top challenge. Gartner predicts 75% of untrusted-infra processing uses CC by 2029.Institutional
2025 DecIETF RATS CoRIM reaches draft-09. Reference value format standards mature. Vendor adoption of publishing measurements remains minimal.Institutional
2026 JanStackWarp (CVE-2025-29943). Stack Engine synchronization bug enables deterministic stack pointer manipulation inside SEV-SNP guest via MSR toggling. Affects AMD Zen 1 through Zen 5. USENIX Security 2026.Vulnerability
2026TDXRay (IEEE S&P 2026). Reconstructs LLM user prompts word-for-word from encrypted TDX VMs by monitoring tokenizer cache access patterns. No crypto broken. UC San Diego, CISPA, Google.AI / Vulnerability
2026 MarNVIDIA publishes zero-trust AI factory reference architecture. CPU TEE + confidential GPU + CoCo + KBS. Model weights encrypted until attestation passes.AI
2026 Mar 31Ermolov extracts SGX Global Wrapping Key from Intel Gemini Lake. Root key extraction via arbitrary microcode. Unpatchable (hardware fuses).Vulnerability

Trusted Platform Modules: Boot Integrity and System State

The idea that hardware should measure and attest to software integrity goes back to the late 1990s. The Trusted Computing Group, formed in 2003, standardized the Trusted Platform Module, a discrete chip that stores cryptographic keys and maintains Platform Configuration Registers recording the boot chain as a sequence of hash measurements.

The TPM was designed to solve a specific problem: bootloader-level attacks. Rootkits and bootkits that compromised the system before the OS loaded were invisible to any software-based security tool. The TPM sat below the OS, measuring each boot stage before execution. It could answer a question that no operating system could answer about itself: did this machine boot the software it was supposed to boot?

Each boot stage measures the next before handing off execution. The measurements are extended into PCRs using a one-way hash chain: PCR_new = Hash(PCR_old || measurement). The TPM can produce a signed quote of its PCR values, and a remote verifier can check whether the system booted the expected software stack.

TPMs shipped in millions of enterprise laptops and servers. BitLocker used TPM-sealed keys for disk encryption. Linux distributions added measured boot support. But TPMs never achieved the broad security impact their designers envisioned. The problem was practical: to verify a TPM quote, you need to know what the correct PCR values should be, and nobody built the infrastructure to distribute and maintain those reference values at scale.

The TPM could tell you what booted. It could not tell you whether what booted was good.

What TPMs did accomplish was laying the conceptual groundwork for everything that followed. Hardware root of trust, measurement chains, remote attestation, platform state quotes. All of this vocabulary originated in the TPM ecosystem. Modern CPU TEEs inherited these concepts even as their architectures diverged significantly from the TPM model.

Hardware-Isolated Execution: Older Than You Think

Running code inside a tamper-resistant hardware boundary did not start with Intel or Apple. It started with smart cards.

Smart cards emerged in the late 1960s as special-purpose computers embedded in plastic cards. By the 1980s, they were executing cryptographic operations in banking, telecommunications, and government ID. A smart card is a tiny computer with its own processor, memory, and operating system, running inside a tamper-resistant package. That is a trusted execution environment by any reasonable definition, even if nobody called it that at the time.

HSMs extended the same concept to server-class computing. IBM’s 4758, commercially available in the late 1990s, provided a tamper-responding enclosure with its own processor, battery-backed memory, and secure boot chain. If someone tried to open the case, drill through it, or expose it to extreme temperatures, the device would zeroize its keys. The 4758 ran arbitrary code inside the boundary.

nCipher (founded 1996, later acquired by Thales) took this further with CodeSafe on the nShield HSM line, a development framework for deploying custom applications inside the HSM. This was general-purpose computation inside a hardware trust boundary, exactly the model that SGX would later attempt to replicate in silicon without a separate physical device. I spent years working with these HSMs. They ran custom signing logic, policy engines, tokenization routines, and key derivation functions, all inside the tamper-resistant module where the host OS could not observe or interfere.

The difference between these earlier systems and modern confidential computing is not the concept. It is the integration point. Smart cards and HSMs are discrete devices with well-defined physical boundaries. You can see the trust boundary. You can hold it in your hand. SGX, TDX, and SEV moved the trust boundary inside the CPU itself, eliminating the separate device but also eliminating the physical clarity. When the trust boundary is a set of microarchitectural state bits inside a processor with billions of transistors and a microcode layer updated quarterly, the attack surface becomes much larger.

Apple’s Secure Enclave Processor, introduced with the iPhone 5s in 2013, sat between these two models. It was a physically separate processor on the SoC with its own encrypted memory, dedicated to protecting biometric data and cryptographic keys. Even a fully compromised application processor with root privileges could not reach the Secure Enclave’s memory.

The SEP succeeded where HSMs had stayed confined to data centers for two reasons. It was invisible to users. Nobody configured it or provisioned it. And it protected something users cared about: their fingerprints and their money. The security was a means to a consumer feature, not a product in itself.

Intel SGX: Designed for the Desktop

Intel SGX, introduced with Skylake processors in 2015, brought the enclave concept to general-purpose computing. Instead of a separate processor, SGX created isolated memory regions within the main CPU. Code and data inside an enclave are encrypted in memory and protected from all other software on the system. The enclave’s measurement (MRENCLAVE) is a hash of exactly what was loaded, making attestation straightforward. One binary, one deterministic hash.

SGX was designed for the desktop. Its primary use cases were single-tenant scenarios like content protection, DRM key management, and Ultra HD Blu-ray playback. The threat model is clear. One machine, one user, and the enclave protects the content owner’s code from that user.

This is a single-tenant threat model. The attacker is the machine owner. There is no hypervisor. There are no co-tenant workloads competing for shared microarchitectural resources. The side-channel attack surface exists, but the economic incentive is limited. The attacker gains access to one DRM key or one media stream.

Enterprise adoption beyond DRM was limited. SGX enclaves had severe memory constraints (initially 128MB). Programming for SGX required partitioning applications into trusted and untrusted components. Intel deprecated SGX from consumer processors in 2021. The desktop DRM use case was not enough to sustain the technology.

Cloud Adoption and the Threat Model Mismatch

The cloud introduced a fundamentally different threat model, and this is where the problems began.

In the desktop DRM model, you protect your code from one user on one machine. In the cloud, you protect your code and data from the infrastructure provider, co-tenant workloads, the hypervisor, firmware, and anyone with physical access to a shared data center. The provider controls the hardware, the hypervisor, the firmware, the physical facility, and the scheduling of workloads across shared CPU cores.

The industry took technologies designed for the desktop single-tenant model and applied them to this multi-tenant cloud model. The architectural mismatch opened attack surfaces that the original designs did not anticipate.

SGX on a desktop shares caches, branch predictors, execution ports, and power delivery with the enclave owner’s own code. On a cloud server, those same resources are shared with co-tenant workloads controlled by different parties, each potentially adversarial. Cache-timing attacks that were theoretical on a desktop became practical in the cloud because the attacker could run arbitrary code on the same physical core. The side-channel catalog that accumulated against SGX from 2017 onward was not a series of implementation bugs. It was a consequence of deploying a single-tenant design in a multi-tenant environment.

AMD SEV and Intel TDX were designed with the cloud threat model more explicitly in mind, protecting entire virtual machines rather than individual enclaves. But they still share fundamental hardware resources with the hypervisor and co-tenants. CPU caches, memory buses, power delivery, and microarchitectural scheduling state. CacheWarp, StackWarp, WeSee, and Heckler all exploit the interfaces between the confidential VM and the hypervisor that manages it.

Virtual TPMs are another instance of the same pattern. Physical TPMs provide hardware-rooted trust because they are discrete chips with their own silicon. A vTPM is software running inside the hypervisor or a confidential VM. Cloud providers adopted vTPMs because provisioning hardware TPMs per VM is impractical at scale. The vTPM’s trust root is the software stack that hosts it. If the hypervisor is compromised, the vTPM is compromised.

The Repurposing Pattern

This is a recurring pattern in security technology, and it is one I have watched play out multiple times in my career. Build X for threat model Y, then repurpose X for threat model Z because X already exists and deploying it is cheaper than building something new.

SMS was designed for person-to-person messaging. It was repurposed for two-factor authentication because every phone could receive an SMS. The threat model assumed the cellular network was trusted. SIM swapping, SS7 interception, and malware-based SMS capture exploited the gap between “messaging channel” and “authentication channel.” NIST deprecated SMS-based 2FA. SMS OTP is still everywhere because deployment inertia exceeds the security community’s ability to move the market.

SSL was designed for securing web browsing sessions. It was repurposed for API authentication, IoT device communication, email encryption, and VPN tunneling. Each repurposing exposed assumptions in the original design that did not hold in the new context. The ecosystem spent two decades fixing the gaps through Certificate Transparency, HSTS, and progressively stricter CA/Browser Forum requirements. I was part of that ecosystem. The fixes were not inevitable. They required sustained institutional effort.

TPMs were designed for boot integrity on enterprise desktops. They were repurposed as vTPMs for cloud VM attestation, trading hardware isolation for scalability. SGX was designed for desktop DRM. It was repurposed for cloud confidential computing, trading single-tenant simplicity for multi-tenant attack surface. Each repurposing followed the same logic. The technology existed, the market needed something, and “available now with known limitations” beat “purpose-built but years away.”

The repurposed technology works well enough to create adoption. The adoption creates dependency. The dependency makes it difficult to replace even after the threat model gap is well understood. And the security research community spends years documenting the consequences while the market continues deploying.

AWS took a different path with Nitro Enclaves. Rather than building on CPU instruction extensions designed for desktops, Nitro Enclaves are isolated virtual machines on a purpose-built hypervisor with no persistent storage, no network access, and no access from the host. The Nitro model sidestepped many of the shared-resource problems because the hypervisor is minimal and the enclave has dedicated resources. The measurement model is clean. One image, one deterministic measurement.

Azure and GCP followed with confidential VM offerings on AMD SEV-SNP and Intel TDX. Google has positioned confidential computing as foundational to AI, expanding support across Confidential VMs, Confidential GKE Nodes, and Confidential Space with Intel TDX and NVIDIA H100 GPUs.

NVIDIA entered with confidential GPU support on H100 and Blackwell architectures. Their reference architecture for “zero-trust AI factories” combines CPU TEEs with confidential GPUs, Confidential Containers via Kata, and a Key Broker Service that releases model decryption keys only after remote attestation succeeds. Model weights remain encrypted until the hardware proves the enclave is genuine. This positions confidential computing as IP protection for model owners deploying on infrastructure they do not control.

Intel launched Trust Authority as a SaaS attestation service independent of the cloud provider. If the cloud provider both runs your TEE and verifies its attestation, you are still trusting the provider. An independent verifier breaks that circularity.

By 2025, every major hardware vendor and every major cloud provider had a confidential computing offering. The question was no longer whether the technology existed. It was whether anyone could make it work at scale.

Why It Never Hit Mass Adoption

Despite the investment, confidential computing did not achieve mass adoption through the SGX era or the first wave of confidential VMs. Several problems compounded.

Attestation is hard to operationalize. The verification step requires infrastructure that most organizations do not have and that the ecosystem has not built. I wrote about this problem in detail in Why Nobody Can Verify What Booted Your Server. The short version: 84% of IT leaders cite attestation validation as their top adoption challenge.

The performance overhead was non-trivial in early implementations. SGX had significant costs from enclave transitions and limited memory. Confidential VMs with SEV-SNP and TDX reduced this to single-digit percentage overhead for most workloads, but the perception of “secure means slow” persisted.

The developer experience was poor. SGX required application partitioning and a specialized SDK. Confidential VMs improved this by running unmodified applications, but attestation integration, key management, and secret provisioning still required specialized knowledge. As of early 2026, deploying a confidential workload still requires expertise that most teams do not have.

The vulnerability narrative undermined confidence. The side-channel attacks against SGX were not random bugs. They were a predictable consequence of deploying a single-tenant design in a multi-tenant environment. Each new attack generated press coverage and reinforced the perception that the technology could not deliver. Security teams found a long list of CVEs, academic attacks, and “known limitations” that made the risk-benefit calculus uncertain.

And without AI, the use cases were niche. DRM, financial services MPC, healthcare analytics, sovereign cloud compliance. Real markets, but not mass markets. Not enough volume to drive the ecosystem maturity needed for broad adoption.

The Vulnerability Record

The side-channel attacks did not stop with SGX’s partial deprecation. They followed the technology into the cloud.

Intel TDX still shares microarchitectural resources with the hypervisor. TDXdown demonstrated single-stepping and instruction counting against TDX trust domains. PortPrint showed that CPU port contention reveals distinctive execution signatures across SGX, TDX, and SEV alike, and because it exploits instruction-level parallelism rather than thread-level parallelism, disabling SMT does not help.

The attack that most directly undermines the “Private AI” narrative is TDXRay (IEEE S&P 2026, UC San Diego, CISPA, Google). TDXRay produces cache-line-granular memory access traces of unmodified, encrypted TDX VMs. The researchers reconstructed user prompts word-for-word from a confidential LLM inference session. No cryptography was broken. The attack works because standard LLM tokenizers traverse a hash map to find token IDs, and that traversal creates a memory access pattern observable at 64-byte cache-line resolution. The host watches which hash map nodes the tokenizer visits and stitches the prompt back together. The encryption protects the data in memory. The computation pattern leaks it through the cache.

TEE.Fail (ACM CCS 2025) is the most dramatic recent finding. Researchers built a $1,000 physical interposer that monitors the DDR5 memory bus and extracted ECDSA attestation keys from Intel’s Provisioning Certification Enclave, the keys that underpin the entire SGX and TDX attestation chain. Attestation can be forged. The attack requires physical access, which limits applicability. But cloud providers have physical access to every server they operate.

On March 31, 2026, Mark Ermolov announced the extraction of the SGX Global Wrapping Key from Intel Gemini Lake. This is not a side-channel leak. It is extraction of the root cryptographic key that protects SGX sealing operations. The key wraps Fuse Key 0, which means the entire key hierarchy rooted in hardware fuses is compromised for that platform generation. No microcode update can change fuses. Ermolov’s assessment: “its fundamental break means that the HW Root of Trust approach is not unshakable.”

Gemini Lake is a low-power consumer chip, not a Xeon server processor. The same attack has not been demonstrated on current server-class implementations. But the research trajectory is clear. Each generation of hardware trust primitives has been broken by the next generation of hardware security research.

Why the Pattern Persists: Five Broken Design Assumptions

The vulnerability record is not a collection of unrelated bugs. It is the predictable result of specific design assumptions that held in the original use cases but fail in the cloud and AI contexts where the technology is now deployed.

The attacker does not share physical hardware with the victim. SGX was designed for a desktop where one user runs one workload. In the cloud, co-tenants share CPU cores, caches, branch predictors, TLBs, execution ports, memory controllers, and power delivery. CacheWarp, StackWarp, and TDXRay all exploit resources that remain shared because complete resource partitioning would make the hardware unusable for general-purpose computing.

The platform owner is not the adversary. TPMs and early SGX assumed the platform owner was the user or a trusted IT department. In the cloud, the provider controls the hypervisor, firmware, BMC, physical facility, and scheduling. The interfaces between the TEE and the provider-controlled environment become the attack surface. WeSee, Heckler, and SEVered exploit these interfaces. TEE.Fail exploits the provider’s physical access to the memory bus.

The hardware root of trust is immutable. The attestation model depends on root keys being beyond the reach of software attacks. This assumption has been violated repeatedly. Ermolov reached fuse-based keys through microcode. Google’s CVE-2024-56161 found an insecure hash in AMD’s microcode signature validation. Sinkclose provided universal Ring-2 escalation on AMD CPUs back to 2006.

Attestation verification is someone else’s problem. The specifications define how to produce attestation evidence but not how to verify it at scale. In the desktop DRM case, one binary produced one hash. In the cloud, PCR values are combinatorial across firmware, bootloader, kernel, and boot configuration.

Performance and security tradeoffs are invisible. On a desktop running DRM playback, a 5% performance hit is imperceptible. On a cloud server running AI inference at scale, every percentage point is cost. Disabling SMT, applying Downfall mitigations, and enabling inline encryption all have measurable overhead. Organizations are pressured to disable countermeasures for performance, reopening the attack surface.

These assumptions compound. The attacker shares hardware with a platform owner who is the adversary, exploiting a hardware root of trust that has a shelf life, verified through attestation infrastructure that does not exist at scale, with mitigations that carry performance costs the deployment context cannot absorb. No single patch addresses the compound effect. The assumptions are architectural, not implementational, which is why the vulnerability catalog grows despite continuous investment in mitigations.

The full root cause analysis with specific attack mappings for each assumption is in the companion TEE Vulnerability Taxonomy.

AI Changes the Calculus

All of the problems described above are real and unresolved. None of them are stopping adoption, because AI changed the calculus.

Model weights represent billions of dollars in training investment. A leaked foundation model is a competitive catastrophe. Running inference on shared cloud infrastructure means trusting the cloud provider not to inspect memory, which is the exact problem TEEs solve.

Training data includes regulated information across healthcare, financial services, and government. The EU AI Act, DORA, CCPA, and evolving federal privacy frameworks create compliance pressure that confidential computing directly addresses.

Multi-party AI scenarios (federated learning, collaborative training, secure inference on third-party data) require environments where no single party sees the complete dataset. TEEs provide the isolation boundary. This is why every major hyperscaler is building on confidential computing despite its known limitations.

But AI workloads amplify every weakness. GPU TEEs are new and their attestation models are immature. The attestation chain now spans CPU TEE, GPU TEE, and potentially TPM, each with different measurement schemes. AI workloads run on heterogeneous infrastructure across multiple cloud providers. And AI workloads are the most valuable targets for the attacks TEEs are vulnerable to. An attacker who extracts model weights via a side channel gets a multi-billion-dollar asset.

The market treats the different TEE designs (SGX, SEV, TDX, Nitro, NVIDIA confidential GPU) as interchangeable. They are not. Each has different properties and different security guarantees. Pretending otherwise is how organizations end up deploying against a threat model their chosen TEE was not designed to address.

The Trust Model Gap

The deeper issue is the gap between what is marketed and what is engineered.

Confidential computing marketing says “even the infrastructure provider cannot access your data.”

The engineering reality is different. The infrastructure provider cannot access your data through the software stack, but the hardware has known side-channel leakages that a sufficiently motivated attacker with privileged access can exploit. The attestation infrastructure that proves the TEE is genuine has structural limitations that make verification at scale dependent on each organization building its own reference value databases. And the hardware root of trust that anchors the entire system has a demonstrated shelf life.

This is a reasonable tradeoff for many threat models. Most organizations are defending against curious administrators, software-level compromise, and regulatory compliance requirements. Side-channel attacks require significant expertise and often physical access. But the market does not present it as a tradeoff.

What Needs to Happen

Closing the gap between the market narrative and the engineering reality requires work that is less exciting than launching new AI services.

Firmware and OS vendors need to publish reference measurements. The standards exist. CoRIM provides the format. RFC 9683 provides the framework. What is missing is the operational commitment to publish signed measurement values for every release. I wrote about the infrastructure that would need to exist and why none of it does yet.

The industry needs honest threat modeling that acknowledges what TEEs protect against and what they do not. TEE.Fail requires physical access, but cloud providers have physical access to every server. TDXdown requires a malicious hypervisor, which is precisely the threat TDX is designed to defend against. These are not edge cases. They are the threat model.

Attestation verification needs to become a commodity. Organizations should not need to build their own reference value databases, write their own event log parsers, and maintain their own golden image registries. This infrastructure should be as standardized and available as Certificate Transparency logs are for the web PKI.

And the security research community’s findings need to be incorporated into the market narrative rather than treated as exceptions. The pattern of continuous vulnerability discovery and mitigation is the normal state of the technology, not an aberration.

Confidential computing is directionally correct. The ability to verify what code is running on hardware you do not control, rather than simply trusting the operator, is a fundamental improvement in how we build systems. Signal proved the model works. The challenge is closing the gap between that promise and the current engineering reality.

The organizations deploying confidential computing for AI workloads today should understand what they are buying. Against the threats they are most likely to face, curious administrators, software-level compromise, regulatory compliance gaps, and unauthorized data access by the infrastructure operator, confidential computing is a significant improvement. Against a well-resourced attacker with physical access to the hardware, side-channel expertise, or the ability to exploit a hardware root-of-trust vulnerability, it is a partial mitigation, not an absolute guarantee.

That is a defensible position. It is just not the one being marketed.


For practical guidance on deployment, see Confidential Computing: What It Is, What It Isn’t, and How to Think About It.

For the full vulnerability catalog and root cause framework, see the TEE Vulnerability Taxonomy and TPM Attestation and PCR Verification .

Previously: TPMs, TEEs, and Everything In Between (March 2025). See also: Why Nobody Can Verify What Booted Your Server.

We Built It With Slide Rules. Then We Forgot How.

My father grew up on a subsistence farm, the kind that raised chickens and grew just enough to get by. Farmers were the original hackers. You couldn’t wait for the right tool or the right expert. You fixed what was broken with what you had, because the alternative was worse.

As a kid he taught himself rocket chemistry. Not from a kit. From whatever he could source locally. He was trying to make things burn hotter and fly farther, adjusting mixtures through trial and error long before he had words like specific impulse or oxidizer ratio for what he was doing.

The materials weren’t exotic. Potassium nitrate sold as stump remover. Sulfur and charcoal. Mix them correctly and you have black powder, the same oxidizer-fuel logic underlying every solid rocket motor ever built. More ambitious builders used potassium perchlorate from chemical suppliers, mixed with aluminum powder or sugar to control burn rate and energy density. All of it over the counter. All of it accessible to someone willing to read carefully and try things until they worked.

He wasn’t following a plan. He was just that kind of person.

Most people have forgotten that the Air Force had its own space program before NASA existed. NASA was carved out of NACA in 1958, but the Air Force had been running parallel efforts since the mid-1950s. That generation had grown up on science fiction and wanted to see it happen. When Sputnik launched in October 1957 the country went into a low-grade panic about whether it understood physics well enough to survive, and suddenly the kids who had been dreaming about space since they could read had somewhere to go with it. What followed was one of the rare moments in American history when technical aptitude was a genuine class elevator. The government needed people who understood this stuff badly enough to find them wherever they were.

He enlisted in his early twenties, aerospace degree in hand. The Air Force space program was what he was aiming at. He ended up working on attitude control thrusters for reconnaissance satellites, the kind that could resolve fine surface detail on Earth from hundreds of miles up. For that mission attitude control wasn’t a secondary problem. It was the central one. A camera that can’t hold still is useless. The thrusters are what made the intelligence possible. The underlying engineering was the same problem he had been teaching himself: oxidizer, fuel, combustion geometry, now controlled to tolerances that left no margin.

I remember him watching a satellite reenter on the cable news when I was young. I don’t know which one or exactly what year. What I remember is that he cried. He told me later there was a plate on that satellite with his name engraved on it. Work he had done, hardware he had touched, in orbit for years and now gone. Grief with no adequate audience, because the context was secret and the people who would have understood were scattered across programs that didn’t officially exist.

Years later my father was excited watching Iridium launch, Motorola’s commercial satellite constellation, first launches 1997. The same fundamental technology, now accessible to anyone with a phone. His generation had figured out how to do this, quietly, under classification, and here it finally was in the open. The knowledge had propagated. Just not through the channels that were supposed to carry it.

He kept a green chalkboard in the garage. He would pull out his slide rule and work through things with me. Orbital decay, thrust, specific impulse, delta-v, the rocket equation and why it makes everything harder than it looks. He had a worry he came back to often – society had forgotten how to go to the moon. The knowledge existed in aging engineers and partially classified documents and it was not being transmitted. The chalkboard was what he could do about that.

Last year Destin Sandlin, an aerospace engineer who describes himself as a redneck from Alabama, walked into a room full of the most senior people in American space policy and did something worth an hour of your time to watch. He asked questions that people inside the institutional food chain had stopped asking. Starting with the most basic one: how many rockets does it take to fuel the Artemis lunar lander?

The room went quiet. Nervous laughter. EPublic estimates have varied, but all point to a strikingly high number of launches and on-orbit refueling operations before a landing attempt depending on assumptions about boil-off and reuse, and nobody in the room had a confident answer.

These are not uninformed people. A core operational parameter of their own mission architecture was not common knowledge among the people running it.

Then Destin asked the room a simpler question.

“Is this the simplest solution?”

Silence.

Destin pointed them at NASA SP-287, a document the Apollo engineers wrote and left behind specifically so the next generation wouldn’t have to rediscover everything from scratch. The title is “What Made Apollo a Success.” It has been sitting there, public, for decades. Most of the people in that room had not read it.

The principle at the center of that document is blunt:

“Build it simple and then double up on as many components or systems so that if one fails, the other will take over.”

Simple first. Then redundant. Not complex and hoping.

Simple isn’t just aesthetic preference. Simple is how you keep the system inside your head. Simple is how you build procedures all the way down to bolt cutters and still know what comes next. When a system gets complex enough that a room full of its leaders can’t answer a basic operational question about it, it has exceeded the boundary of what they actually understand. They are renting the complexity along with the capability.

The Apollo engineers meant it literally. When designing the ascent stage separation, the mechanism that gets astronauts off the lunar surface, they didn’t stop at one solution or two. They built redundancy on top of redundancy. Flip the switch. If that fails, go outside and trip the manual release. If that fails, depressurize, suit up, go to the bottom of the spacecraft with bolt cutters, and cut the straps holding the stages together. Harrison Schmitt said there was one more procedure after the bolt cutters. Nobody would say what it was.

That’s not genius. That’s a chicken farmer’s epistemology applied to the hardest engineering problem humans had ever attempted. You don’t wait for perfect conditions or perfect knowledge. You start simple, you build every fallback you can think of, and then you think of one more.

Destin argues that Artemis didn’t follow that logic. The NRHO/Gateway architecture was publicly justified in part on communications, surface access, stability, and operational grounds, but Destin argues that it also reflects deeper architectural constraints that accumulated into a more complex solution. Destin’s read, and he makes a detailed case for it, is that it’s an architectural constraint dressed up as a design choice, complexity that accumulated because the real constraints couldn’t be named publicly. A room full of program leaders who couldn’t tell you the basic parameters of the system they were running.

That’s what happens when you lose the thread.

Destin also interviewed an engineer who had worked on the lunar landing training vehicle, the machine that taught Apollo astronauts to land in one-sixth gravity by actually putting them in a vehicle where their life depended on getting it right. Destin asked whether the Apollo engineers were smarter than engineers today. The answer was no. What they had wasn’t superior intelligence. It was a bias toward doing, toward simplicity, toward keeping the system inside human heads rather than delegating it to complexity they couldn’t fully reason about.

NASA SP-287 exists because those engineers understood something important. Capability doesn’t survive on its own. Knowledge doesn’t transmit automatically. You have to codify it deliberately or it dies with the people who held it. It is ownership made explicit. Here is what we understood. Here is why it worked. Here is the playbook so the next generation doesn’t have to rediscover it at the cost of lives.

The space race created a machine for turning hands-on knowledge into national capability. It found people like my father wherever they were because it needed what they had already taught themselves. It was the on-ramp, the forcing function that pulled curiosity into programs that mattered and gave it somewhere to go. That same forcing function generated SP-287, the discipline to write it down, the institutional pressure to transmit it. When the race ended the machine stopped. The on-ramp closed. The knowledge didn’t vanish immediately. It aged out, program by program, engineer by engineer, panel by panel. What remained was credentials and institutional memory of having once known how, which is a different thing entirely from knowing how.

We took that gift and built a lunar return architecture that, at least in its public form, often looks more operationally intricate than the Apollo playbook would have preferred. More complex architecture. Estimates ranging from eight to fifteen or more rockets just to fuel the lander. A room full of its leaders who hadn’t read the playbook.

“Is this the simplest solution?”

Silence.

That’s not an aerospace problem. That’s the pattern. The knowledge transmission problem is older than aerospace. I’ve been writing about it in other contexts for a while, starting here.

My father spent my childhood pointing at this from a chalkboard in a garage. I didn’t become an astronaut. That was his hope, not my path. The chalkboard worked anyway. The knowledge moved. The Iridium launches proved it. The knowledge his generation developed under classification eventually became infrastructure anyone could hold in their pocket. You can’t fully control where it lands. You can only decide whether to try.

Now AI is doing to software what the end of the space race did to aerospace. It is consuming the early career tasks that used to serve as scaffolding for building judgment. The debugging, the boilerplate, the routine iteration that taught tradeoffs and edge cases before anyone trusted you with the hard problems. The visible work disappears first. The tacit knowledge becomes unreachable just as it becomes most important. The on-ramp closes. And at some point a room full of senior people goes quiet when someone asks a basic operational question, not because they’re uninformed, but because the complexity was delegated before the understanding had time to form.

That is the cautionary tale. Not that AI is bad. That capability outsourced before it is understood leaves you renting decisions you don’t control while keeping consequences you can’t transfer. The room goes quiet. And eventually nobody even thinks to ask whether this is the simplest solution.

My father saw it coming. That’s what the chalkboard was for.

The question isn’t whether you work in aerospace or software. It’s whether you’ve stopped asking basic questions about the system you’re running. Whether it has exceeded the boundary of what you actually understand. Whether you’re renting complexity along with capability and calling it progress.

You don’t wait for perfect knowledge. You read every playbook you can find. You build redundancy all the way down to bolt cutters. And then you think of one more thing.

The chemicals are still on the shelves. SP-287 is still public. The Destin talk is an hour of your time and worth every minute.

Read the playbook.

The WebPKI and Client Authentication Are at a Crossroads

The CA/Browser Forum is having its first serious conversation about whether publicly trusted client authentication certificates deserve their own Baseline Requirements. Nick France kicked off the discussion on the public list last week, asking for concrete use cases, and the responses so far have been a useful window into how the industry thinks about this problem. Or rather, how it doesn’t.

The timing isn’t accidental. Chrome Root Program Policy v1.6 is forcing a structural realignment of the WebPKI, and client authentication is caught in the middle. All PKI hierarchies in the Chrome Root Store must now be dedicated solely to TLS server authentication. Chrome stopped accepting new intermediate CA applications with mixed EKUs in June 2025, and by June 15, 2026, Chrome will distrust any newly issued leaf certificate containing clientAuth EKU from a Chrome Root Store hierarchy. Multi-purpose roots get phased out entirely. Mozilla, Apple, and Microsoft are all aligning with this direction. Every major public CA has published a sunset schedule. Sectigo stopped including clientAuth by default in September 2025, DigiCert followed in October, and Let’s Encrypt is phasing it out through ACME profiles. By mid-2026, you will not be able to get a publicly trusted TLS certificate that also works for client authentication.

This is the right call. The historical practice of stuffing both serverAuth and clientAuth into the same certificate, from the same hierarchy, created exactly the kind of entanglement that makes the WebPKI brittle. The SHA-1 migration is the canonical example. Payment terminals that relied on client auth from the same roots as server certs couldn’t upgrade, holding back the entire transition for years. Today, Cisco Expressway is the poster child for the same problem, using a single certificate for both server and client auth in SIP mTLS connections and scrambling to decouple them before the deadline. Dedicated hierarchies for dedicated purposes. It’s a principle the WebPKI should have enforced from the start.

What to do about it

What’s emerging is a clearer, more honest WebPKI, but one with a gap that nobody is cleanly addressing. If you’re currently relying on publicly trusted certificates for client authentication, the path forward depends on your use case.

If the client auth is internal to your organization, VPN access, Wi-Fi onboarding, device authentication, mTLS between your own services, you should be moving to private PKI. This was always the right answer for internal use cases, and modern private CA solutions have made it far more practical than it used to be. You get full control over certificate profiles, lifetimes, and revocation without being subject to external root program policy changes. The blast radius of a private CA is contained to your organization, which is exactly what you want for internal trust.

If the client auth is between your organization and a small number of known partners, like B2B API integrations or supply chain connections, private PKI still works well. You exchange trust anchors with your partners and configure your systems to trust their specific CA. This is how most of these integrations should have been built in the first place. The “convenience” of using publicly trusted certs for this was always a false economy, because you were accidentally opening your trust boundary to every entity that could buy a cert from the same CA.

But if the client auth needs to work across organizational boundaries at scale, meaning you can’t reasonably pre-configure trust anchors for every potential counterparty, this is where it gets interesting and where the current alternatives fall short. Private PKI doesn’t solve this. You need some form of shared trust anchor, which is what public PKI provides for server authentication today. The question is whether a similar model can work for client authentication with properly scoped identifiers and validation methods.

The human identity case is the relatively easy part

On the CA/B Forum list, Sebastian Nielsen argued that public CAs shouldn’t issue client auth certificates at all, pointing to the name collision problem. He makes a fair point, but the conclusion is too broad. I’m Ryan Hurst the security practitioner, and there’s also Ryan Hurst the actor (Remember the Titans, Sons of Anarchy). A public CA asserting “Ryan Hurst” in a DN doesn’t help a relying party figure out which one of us is authenticating. The DN is a vestige of the X.500 global directory that never materialized. There is no global directory. Even local directories that correspond to DN structures don’t exist in any meaningful density. Identity in the WebPKI belongs in the SAN, where we have identifiers that are both globally unique and reachable.

S/MIME already handles the human case correctly. The rfc822Name in the SAN is at least unique at the time of issuance. More importantly, it’s reachable. You can send a challenge to an email address and get a response. You can’t send a challenge to a social security number. You can’t send a challenge to “Ryan Hurst, US.” The broad intent of the WebPKI is to make things reachable in an authenticated way. DNS names and email addresses fit that model. DNs do not.

Even with email, there’s a temporal problem. Addresses get reassigned, domains lapse, providers recycle accounts, and throwaway addresses exist by design. CAs can’t monitor for reassignment, so these are inherently short-lived assertions. The certificate lifetime is the outer bound of your trust in that binding. Broader questions around PII and auditability are really about how Key Transparency can be bolted into the ecosystem. I wrote about that previously.

There is valuable work happening in this space. Ballot SMC015v2 enabling mDLs and EU digital identity wallets for S/MIME identity proofing shows this evolving in a meaningful direction. Client authentication and signed email under S/MIME belong together. Apple has argued that emailProtection EKU should mean mandatory S/MIME BR compliance, closing the loophole where CAs omit email addresses from emailProtection certificates to avoid the BRs. I think that’s the right direction. One nuance worth calling out though. S/MIME bundles signing, authentication, and encryption, and I think that’s right for the first two but not the third. Signing and authentication are real-time assertions that work well as short-lived credentials. Encryption is different. The key is bound to an identifier that may not be durable, and without frequent rotation you risk bygone-SSL style attacks where a new holder of an email address could access messages intended for the previous one. The encryption case deserves its own careful treatment around key lifecycle and rotation.

Browsers are actively looking to remove client auth from TLS certificates, and I don’t disagree given how poorly specified and unconstrained it has been. That signals whatever comes next needs to be much more tightly defined. The human client auth case is covered by S/MIME, browser-based client auth is on its way out for good reason, and a new working group doesn’t need to revisit the human case.

The machine identity gap

Where it gets interesting is cross-organizational service-to-service authentication on the public internet. Today this is mostly handled with API keys, OAuth client credentials, or IP allowlisting, all with well-known limitations. mTLS with publicly trusted client certs could fill a real gap, but only if the identity model is built correctly.

Many current uses of mTLS with publicly trusted client certs are misplaced. Organizations are often assuming a level of assurance they don’t actually get when they accidentally cross security domains by relying on the public WebPKI for what is fundamentally a private trust relationship. A publicly trusted cert for payments.example.com tells you that the entity controlling that domain authenticated, nothing more. It does not mean they are your trusted partner, your approved vendor, or anyone you intended to grant access to. Public trust gives you authenticated identity, not authorization. Organizations that conflate the two will accidentally open up access based solely on someone having obtained a client cert. The examples collected on the list so far, Cisco Expressway and EPP, are mostly legacy compatibility problems being fixed. A working group built on those foundations would produce weak Baseline Requirements.

The better foundation is the emerging need for authenticated service-to-service communication across organizational boundaries. Consider SMTP. Mail servers already authenticate to each other over the public internet using TLS, and MTA-STS is pushing that toward authenticated connections. The logical next step is mutual authentication, where the receiving mail server can cryptographically verify the sending server’s identity, not just the other direction. SMTP and mTLS go together like peanut butter and jelly, but there’s no clean way to do it with publicly trusted client certs today. Or consider vendor supply chains. If a manufacturer’s procurement system needs to query a supplier’s inventory API, or a logistics provider needs to authenticate to a retailer’s fulfillment service, the options today are API keys, OAuth flows, or standing up an industry-specific trust framework just so machines can talk to each other. mTLS with publicly trusted client certs would let these systems authenticate directly, without building bespoke trust infrastructure for every partnership.

And this need is accelerating beyond any single industry. As AI agents increasingly act as user agents on the open internet, calling APIs, negotiating with services, and transacting across organizational boundaries on behalf of users, mutual authentication between machines that have no pre-established trust relationship is becoming a practical necessity, not a theoretical concern. You can’t pre-configure trust anchors for every service an agent might need to interact with any more than you can pre-configure them for every website a browser might visit. I wrote about this dynamic previously, and the trajectory is clear. The machine-to-machine authentication problem on the open internet is starting to look a lot like the server authentication problem that the WebPKI was built to solve, just in both directions.

For machines, the name collision problem largely disappears. DNS names are globally unique by design. A client cert with a dNSName SAN of payments-api.example.com or registry-client.registrar.example.net doesn’t have an ambiguity problem. The relying party knows exactly what organization controls that name. Nick’s original question on the list asked about what parts of the DN the relying party verifies. I’d argue that’s almost the wrong framing. There is no global X.500 directory. The question should be, what SAN types are needed, and what validation methods can we define for them?

For straightforward service identification, dNSName works today with no new validation methods needed.

  • payments-api.example.com
  • erp-connector.supplier.example.net
  • registry-client.registrar.example.com

For more expressive service identification, uniformResourceIdentifier SANs encode not just the organization but the specific service.

  • https://example.com/services/payments
  • urn:example:service:billing:v2

This URI-based approach isn’t speculative. SPIFFE already uses URI SANs (spiffe://cluster.local/ns/production/sa/checkout) to represent service identities in Kubernetes mTLS contexts. The pattern is proven and widely deployed within private PKI. Extending it to public trust for cross-organizational federation is a natural evolution of an approach the industry has already validated. URI SANs can be validated through .well-known challenge methods (like ACME HTTP-01 scoped to a URI path) and ALPN-based methods, extending battle-tested ACME-era infrastructure rather than building from X.500-era assumptions.

What the industry is doing instead

Almost all the CA and vendor messaging right now says “move to private PKI.” That’s the right answer for internal use cases, but it doesn’t address cross-organizational trust. The most interesting alternative emerging is the DigiCert X9 PKI, launched in partnership with ASC X9, the financial standards body. X9 PKI is a completely independent trust framework, governed by X9’s policy committee rather than the CA/Browser Forum or browser root programs. It supports both clientAuth and serverAuth EKUs, uses a common root of trust for cross-organizational interoperability, and is WebTrust audited. It’s specifically designed for the financial sector’s mTLS needs, though they’re expanding to other sectors.

X9 PKI is essentially a “public PKI that isn’t the WebPKI” for service-to-service auth. It validates the premise that there’s a real need for cross-organizational client authentication with a shared trust anchor. But it’s sector-specific and governed outside the CA/Browser Forum, which means it doesn’t solve the general case. The EU’s eIDAS QWAC framework is another sector-specific approach. These are workarounds for the absence of a general-purpose, properly scoped public client auth certificate type.

If this moves forward

I’m not advocating for or against a working group at the CA/Browser Forum. But if the Forum does decide to take this on, the scope needs to be narrow IMHO. Machine and service client auth only, with identity in the SAN using dNSName and uniformResourceIdentifier. DN fields should not be relied upon for authentication decisions. Validation methods should build on existing domain control mechanisms. Human client auth stays in S/MIME where it belongs. The BRs should address the authentication versus authorization distinction explicitly, so relying parties understand that a publicly trusted client cert tells them who is connecting, not whether that entity should be granted access. This is already how server certificates work, and client auth should follow the same model. And the issuing CAs need to be dedicated, separate from server auth hierarchies. The SHA-1 payment terminal debacle, the Cisco Expressway mess. Every time client and server auth are entangled in the same hierarchy, one use case holds back progress on the other. Don’t repeat that.

The bigger picture

What we’re watching is a structural realignment of the WebPKI’s purpose. The WebPKI is being narrowed to mean “TLS server authentication for web browsers,” full stop. Everything else, client auth, S/MIME, code signing, is being pushed to dedicated hierarchies, private PKI, or alternative trust frameworks. That’s mostly the right direction. But the service-to-service authentication gap is real, growing, and not well served by any of the current alternatives. Private PKI doesn’t solve cross-organizational trust. X9 PKI is sector-specific. The CA/Browser Forum has the institutional knowledge, the validation infrastructure, and the trust framework to define something that works here. Whether they choose to is another question.

The conversation is happening now on the public list. If you have concrete use cases for cross-organizational service authentication with publicly trusted client certificates, this is the time to share them. The shape of what comes next depends on whether the use cases justify the effort, and right now the list is thin.

Introducing the WebPKI Observatory

For as long as I have been in this industry, the WebPKI compliance conversation has run on impressions. People with long memories and regular conference attendance have built up a picture of which CAs are well-run, which are struggling, and where the oversight gaps are. That picture has generally been accurate. It has also been almost entirely unmeasured.

The WebPKI Observatory at webpki.systematicreasoning.com, a project from Systematic Reasoning, is an attempt to change that. It’s a public dashboard covering 1,690 compliance incidents drawn from Mozilla Bugzilla between 2014 and 2025, cross-referenced with CCADB membership data, certificate issuance volumes from CT logs, root program trust store compositions, and the complete history of CA distrust events. The goal was simple: replace the shared intuition with actual data, and see what the data shows that intuition missed.

Some of it confirmed what most people in this space already suspected. Some of it was genuinely surprising.

The finding that reframes everything else is detection. When a compliance incident occurs, who finds it? Root programs find 52% of incidents. Automated external tools — CT log monitors, certificate linters, community scanning infrastructure — find 14%. CAs find their own problems in 9% of cases.

That number deserves more attention than it typically gets. One in eleven. CAs have full access to their own issuance systems, their own audits, their own CPSs, their own disclosure obligations, and they are the least effective detection mechanism in the ecosystem. External parties without any privileged access outperform internal CA monitoring by a factor of six or more. The compliance monitoring function has been effectively outsourced to external parties by default, and mostly without anyone deciding that was the right architecture.

Everything else in the data follows from that.

The failure classes that have grown are instructive. Technical misissuance has declined as a share of incidents over the past decade. What has grown is the process layer. In 2019, governance failures represented 21% of all incidents. By 2025 that figure was 60%. Policy violations, CPS failures, disclosure deadline misses. These are by definition things internal compliance programs should be catching. The 260 incidents tagged policy-failure or disclosure-failure in the dataset are a direct indictment of internal compliance operations. A CA that violates its own documented policy is not being surprised by an external attacker.

The oversight picture is also worth examining. In 2017, Mozilla engaged with 79% of Bugzilla compliance bugs. Chrome had no formal root program yet and was near zero. By 2025 the picture had reversed and degraded simultaneously. Chrome now contributes the dominant share of oversight engagement but covers only 18% of incidents. Mozilla covers 8%. The total corpus has roughly doubled since 2017 while combined meaningful oversight coverage has fallen by two-thirds. The Chrome Root Program launched in 2021, and its effect on the governance landscape is visible in the data — Chrome has made 239 substantive oversight comments in recent years versus Mozilla’s 158 over the same period. The center of gravity in CA compliance governance has shifted to the browser with 78% market share. That is structurally significant. Microsoft, which operates the largest trust store by root count at 346 trusted roots, has made zero recorded governance comments across all 1,690 incidents spanning 11 years.

The distrust history is also clarifying. The common mental model is that CAs get removed for catastrophic technical failures. The data does not support that model. 14 of 16 distrust events involve compliance operations failures. The behavioral taxonomy matters, negligent noncompliance, willful circumvention, demonstrated incompetence, and argumentative noncompliance. In 10 of the 16 cases, the distrust event was preceded by a documented pattern of prior incidents. The median runway from the first incident to distrust is 3.2 years. The failures were not hidden. They were in Bugzilla the whole time. The CA just was not resolving them systematically.

That means distrust is largely predictable given sufficient data. The indicators show up well before the outcome. That is a sobering observation about past oversight and a useful one for anyone thinking about what the compliance monitoring function should actually do.

The Observatory is a measurement tool, not a verdict. The dataset has limits — Bugzilla under-represents incidents that never reach public disclosure, CT-derived issuance volumes reflect only unexpired certificates at the time of measurement, and the behavioral taxonomy applied to distrust events involves judgment calls. But the patterns are robust enough to be useful.

For CA operators, the detection data alone should prompt hard questions about internal monitoring coverage. For root programs, the oversight gap data quantifies a scaling problem that is currently being absorbed by Chrome without anyone having explicitly decided that is the right architecture. For the policy community, the shift from technical to governance failures as the dominant incident class has direct implications for what audit frameworks should actually measure.

The dashboard is live at webpki.systematicreasoning.com, updated daily. The methodology is documented. Pull requests are welcome

When Compliance Records Become the Only Honest Signal

I’ve been spending a lot of time lately building Systematic Reasoning with my long-time friend Vishal. The core premise is straightforward. Organizations reveal their true operational character through how they design to prevent failure, how they plan to handle it when it happens, and how they actually do. That signal deserves to be tracked, structured, and acted on. We’re building an agentic compliance platform to do exactly that.

Systematic Reasoning won’t be limited to any single domain, but we decided to start with the Web PKI. The reasoning was simple. It’s high impact in a way that’s hard to overstate. Every internet user depends, whether they know it or not, on a relatively small number of Certificate Authorities getting things right. The margin for error is zero. If that trust layer breaks, it breaks for everyone.

DigiNotar is the canonical example. A small Dutch CA, compromised so thoroughly that attackers could impersonate any website on the web, and did. That capability was used to spy on Iranian dissidents, intercepting communications that people believed were private and secure. The trust infrastructure that was supposed to protect them was turned into a weapon against them. DigiNotar isn’t an edge case or a cautionary tale from a more naive era; it’s a demonstration of the actual ceiling of what can go wrong. And it isn’t the only one. State-affiliated certificate authorities have been caught performing man-in-the-middle attacks on their own citizens’ traffic, something the Baseline Requirements explicitly prohibit, but prohibition only matters if it’s enforced. The web’s trust model works right up until the moment someone decides it’s more useful as surveillance infrastructure.

At the core of Systematic Reasoning, is a belief I’ve held for a while. Compliance can be a vital sign of organizational security, but only if it’s continuous. The reality today is that it isn’t. Code ships daily. Audits happen annually. The gap between those two rhythms is where things go quietly wrong.

I’ve written before about why I have limited faith in the current audit regime. Auditors are engaged by the organizations they assess. Their product is a clean seal; their incentive is to keep the client. They operate on point-in-time sampling with auditee-selected scope, and they’re often compliance professionals rather than engineers, which means they’re checking whether a policy exists more than whether the system actually behaves correctly. That’s if you’re lucky. Sometimes the audit is scoped against a version of the Baseline Requirements that was superseded over a year ago.

The same incentive shapes how certificate authorities write their governance documents. A CP/CPS that relies heavily on incorporation by reference, that omits specifics about what the organization actually does and what constraints it operates under, is easier to audit against than one that makes precise, testable commitments. Vagueness isn’t always carelessness. Sometimes it’s a design choice. The same thing happens in incident reports. A report that attributes a failure to “organic process evolution” or “human error” without describing the actual control gap is easier to close than one that names the broken system and commits to a specific fix. In both cases the document gets the box checked without creating accountability. References establish authority. Commitments establish accountability.

The audit gap isn’t compensated for by strong internal monitoring either. The majority of significant compliance failures are not caught internally. They are caught by external researchers, root program staff, or community tooling. A broken validation endpoint runs for five years and the organization finds out because someone posted a 404 error in a public issue tracker. A validation race condition exists undetected for seven and a half years not because it was well hidden but because nobody was looking. The absence of an internal alarm is not evidence that the system is healthy. It is often evidence that the monitoring itself is missing.

So public incident reports and governance documents become some of the most signal-rich material available. Policy documents tell you what an organization claims it will do. Incident reports tell you what happened when reality diverged from that claim. Together they create a longitudinal picture that neither document produces alone.

Building a system to reason over that data surfaced a problem I didn’t fully anticipate. When you’re working from the outside, with no access to internal systems and no way to verify what actually changed, the public record is almost all you have. The question isn’t whether to treat it with skepticism. It’s how much skepticism to build in by default.

The temptation is to give the benefit of the doubt. Organizations are required to describe the blast radius of an incident. Not every localized bug is a symptom of something systemic. But accepting minimizing language at face value is its own failure.

“Only” is doing a lot of work when the bug it’s describing went undetected for seven and a half years. “No compromise of end-entities” is doing a lot of work when what it really means is that nobody found the gap before you did. Framing survival as security isn’t reporting, it’s PR. And if an organization believes an incident is no big deal, you can predict with reasonable confidence that the root cause analysis will be shallow and the remediation will be a band-aid.

ForgeIQX, our first offering, tracks those signals longitudinally across both policy documents and incident reports. Not to prosecute organizations for their language choices, but to notice when a commitment made in a CP/CPS quietly disappears in the next version, or when a promised fix is nowhere to be found when the same failure mode surfaces years later. That’s commitment decay, the slow evaporation of a promise made under pressure, and it’s only visible if you’re tracking across multiple documents and incidents over time rather than treating each one in isolation.

The calibration problem is real and doesn’t have a clean answer. Get it wrong in one direction and you build a system that cries wolf. Get it wrong in the other and you build a system that launders PR-speak into clean signals, which is just automating the thing we already do too much of.

There’s a third failure mode that took me longer to see. A system like this can be gamed. Swap “we got lucky” for “our monitoring detected no active exploitation.” Replace “only thirty certificates” with a more clinical impact scoping statement that says the same thing in language that sounds like engineering rigor. The words change; the institutional posture doesn’t. A system that can be satisfied by better prose isn’t measuring operational maturity, it’s measuring communications sophistication.

That means the system has to be built with structural pessimism. Not cynicism for its own sake, but a deliberate prior that clean language is not the same as clean operations, and that the absence of red flags is not the same as the presence of green ones. We can’t verify that an organization fixed what it said it would fix. What we can do is watch whether the same failure mode surfaces again and whether the pattern of shallow root cause analyses continues or breaks. The historical record doesn’t tell us what’s true inside these organizations. It tells us what they were willing to say in public, under pressure, over time. Given the alternatives, that may be the most honest signal available.

A certificate authority with genuine operational maturity should want this kind of scrutiny applied to itself. Not because it will always produce a clean result, but because it surfaces the gaps before an external party does. ForgeIQX gives organizations a way to continuously monitor their own compliance posture, so their practices and code keep pace with their commitments. The same is true for auditors who want their findings to mean something beyond a checkbox. The problem with the current regime isn’t that the people in it are careless. It’s that the incentive structures don’t reward rigor, and the tooling to demonstrate it continuously doesn’t exist. That’s what we’re building.

The Web PKI is where we started because the stakes are concrete and the public record is unusually rich. But any regulated industry where compliance is measured annually, where governance documents are written to satisfy auditors rather than inform relying parties, and where incident reports are drafted with one eye on legal exposure, has the same gap between what the paper says and what the organization actually does. We started here. We don’t intend to stop here.

When Building Gets Cheap, Distribution Becomes Destiny

“Distribution is the new moat.” You can find some version of that sentence in almost any startup discussion from the last year. It circulates as a take, gets liked, gets reshared, and then gets reproduced by someone else who arrived at the same conclusion independently. The observation has become cheap to make precisely because it is true. What is harder, and what most of those takes skip, is understanding why the structural mechanics behind it matter and what they actually require you to do differently.

For decades, venture capital rewarded the ability to build. In the AI era, building is no longer scarce. Distribution is.

There was a time when building complex software required deep teams, long timelines, and substantial capital. Engineering was the constraint. Infrastructure was the constraint. Expertise was the constraint. That constraint justified venture scale returns.

AI is dissolving that constraint, not all at once, and not uniformly across every domain, but steadily and in ways that are already measurable.

This is not a cliff. It is a slope.

The companies founded today still face real execution challenges. The ones founded three years from now will face fewer. The ones founded ten years from now will operate in an environment where the cost of building sophisticated systems is a fraction of what it is today. We are in the early middle of this shift, not at the end of it. That matters because the temptation is to look at current valuations, current outcomes, and current M&A multiples and conclude that nothing has changed. Something has changed. It is just moving at the pace of markets and human institutions, not at the pace of model releases.

The Repricing of Expertise

We are watching a repricing of expertise, a slow one, with uneven edges.

Not at the foundational layer. Paradigm-shifting breakthroughs still matter. The rare intellectual leap that unlocks a new architecture or a new computational primitive remains valuable and durable. But most companies are not those breakthroughs. Most companies sit on top of them.

I have written before about how AI is repricing skill at the individual level, injecting liquidity into what was once a slow-moving market for technical expertise. What is happening at the venture level is the same dynamic playing out across entire product categories. When fifty startups can build near-equivalent products in twelve months, product differentiation compresses. Expertise becomes assisted. Execution becomes accelerated. Barriers to entry fall.

It is worth being direct about what that means. AI does not just flatten products. It flattens people. The scarcity that once justified premium human expertise, the advisor with the rare insight, the consultant who had seen this problem before, is narrowing. That edge does not disappear, but it compresses fast unless the expertise is embedded in distribution, in relationships and customer context that cannot be replicated from a prompt.

There is an important exception. In data-rich verticals, proprietary datasets create compounding advantages that AI amplifies rather than erodes. Healthcare, finance, legal, infrastructure – in these markets the data is not just an asset, it is a moat that gets stronger as it grows. AI makes that data more useful, not less defensible. The dynamic in these verticals is different. The scarcity is not building capability or even distribution in the generic sense. It is the data itself, and the domain-specific judgment required to use it correctly. This connects to a broader point worth sitting with: when you rent the capability layer, you rent the moat. In AI-native verticals, whoever owns the model behavior owns the product – and that is a different kind of lock-in than anything cloud computing created.

The result is predictable. A wave of companies will launch in every attractive AI-adjacent category. Many will grow quickly. Many will look venture-scale in their first 24 to 36 months. Most will not become venture-scale businesses.

They will explode and then flatten.

Not because they were poorly run. Not because the founders lacked talent. But because it became too inexpensive to create what they created. The winner-take-most dynamic compresses margins and growth for everyone except the few that secure durable control.

Cheap building creates crowded categories. Crowded categories destroy the middle of the return distribution.

The venture math here deserves to be stated plainly. Cheap building means more competitors. More competitors cap market power. Capped market power caps exit multiples. In a crowded AI category where any competent team can replicate the core product, the venture model itself compresses. Not because the market is small, but because structural dominance becomes harder to achieve and sustain. Many of these companies are structurally unlikely to become venture-scale businesses. The category economics will not support multiple large players once replication costs collapse, and most founders do not have the distribution infrastructure to be the one that survives. Asymmetric outcomes remain possible. They are just harder to achieve and harder to sustain in categories where the product itself can be reproduced quickly.

What This Does to Venture Capital

This has structural consequences for venture capital, though they will play out over years, not quarters.

If building is cheap and competition is abundant, returns concentrate harder and faster. You get more rockets. Fewer reach orbit.

Investors will demand signal sooner. Growth becomes the proxy for distribution dominance. Capital is deployed to test whether the company can win quickly, not whether it can build elegantly. The tolerance for long, patient build cycles without distribution proof shrinks. Capital releases in stages tied to evidence of emerging control.

This is reshaping round structure too. When building is cheap, large upfront rounds are harder to justify – you no longer need $20M to construct the product. Seed rounds compress because the build cost does not warrant more. But growth rounds are becoming larger and more heavily tranched, with capital tied to distribution milestones rather than product ones. Channel proof. Embedded customer cohorts. Pipeline velocity. The structure of the round starts to reflect the new scarcity. Capital flows in proportion to what is actually hard, and what is actually hard is no longer building the thing.

The traditional power-law model assumed a long tail of moderate outcomes. In a world of rapid replication, the moderate outcome becomes harder to sustain.

Meanwhile, IPO pathways have narrowed. The regulatory intent was investor protection. The outcome was exclusion. By making it harder for companies to go public early, regulators locked retail investors out of the steepest part of the value curve, the years when a company moves from promising to dominant. Secondary markets expanded to fill the gap, but access to those markets is not democratic. Private capital captures what public markets used to offer to a broader population. Venture starts to look less like broad-based growth capital and more like concentrated private allocation, closer to family offices, less like 1990s expansion funds. AI will likely accelerate that dynamic. The companies creating the most value will stay private longer, and the people with access to them will be a narrower group than before.

Selectivity increases. Portfolio sizes shrink or become more strategically concentrated. The “grow at all costs, you’ll get more later” model becomes harder to justify when many fast-growing companies are structurally incapable of sustaining dominance. Capital no longer buys uniqueness. It buys speed – the time and resources to build a distribution funnel, execute against it, and reach durable entrenchment before a competitor replicates the product and races to the same buyers.

Built for Acquisition, But It Is Not a Spreadsheet Decision

There is another dynamic that becomes more visible in this environment. Some startups are designed not to become category winners, but to slot perfectly into one specific incumbent. Not strategic fit in the abstract sense. Deliberate adjacency to a single buyer. The product is built to complete a portfolio gap. The roadmap mirrors a specific weakness in a specific acquirer’s product line. Some founders are not optimizing for market dominance. They are optimizing for perfect adjacency to one buyer, and shaping every decision around what makes that buyer say yes.

This is not new. But the calculus around it is shifting.

When technology is easier to replicate, the premium for strategic fit increases relative to the premium for raw IP. At the same time, the value of acquiring technology alone diminishes. If a product can be rebuilt internally in 12 to 18 months, the acquisition multiple compresses. The technology becomes a starting point for an internal conversation, not a reason to write a check.

What remains valuable in M&A is harder to replicate. Embedded distribution. Contractual entrenchment. Regulatory positioning. Customer relationships. Data gravity.

In regulated verticals, this goes further. A company that has already navigated the compliance requirements to operate in a market – secured the certifications, built the audit trails, established the regulatory relationships – has compressed years of a buyer’s time to market into something acquirable. Compliance readiness is not a cost center. It is a distribution accelerator. Vertical access and compliance readiness are part of the distribution story, not separate from it. For an acquirer trying to enter a regulated market, the fastest path is often not to build the product. It is to buy the company that already has permission to operate. That shifts what gets priced into an acquisition and why some targets command premiums that pure technology analysis cannot explain.

Technology without distribution is just an expensive prototype.

But what gets lost in that clean analysis is that acquisition decisions are not made by spreadsheets. They are made by people, in rooms, often under time pressure, with incomplete information and competing organizational interests.

A founder who has built real relationships inside a strategic buyer has a fundamentally different acquisition outcome than one who has not, even if the products are comparable. The internal champion who has watched you execute, who trusts your judgment, who has gone to bat for you in internal budget conversations, is not a nice-to-have. They are often the reason a deal happens at all.

Perception compounds this. Acquirers pay for confidence as much as capability. A company perceived as the category leader, even in a crowded category, commands a premium that may not be fully justified by its metrics. Market positioning, analyst coverage, conference presence, and the quality of your reference customers, these shape the narrative in an acquirer’s boardroom. The story they can tell internally about why they did this deal matters enormously. Acquisitions have to survive internal politics.

Timing is almost never purely rational either. Companies get acquired when a buyer is scared, or ambitious, or has capital to deploy, or is about to lose a competitive advantage they can feel slipping. Being visible and credible at that moment, not just when you need a buyer, is what closes deals.

None of this means product and metrics do not matter. They do. But they matter as the floor. Above the floor, acquisition outcomes are determined by relationships, reputation, and the story someone is willing to tell on your behalf inside an organization that does not know you.

The Irony of Automating Your Own Moat

Customer management is one of the domains AI is aggressively trying to automate. AI SDRs. AI account managers. Synthetic personalization. Automated follow-up. Generated relationship intelligence.

In a world where distribution is the scarce resource and relationships drive acquisition outcomes, the industry is racing to replace human relationship infrastructure with synthetic substitutes.

This is not irrational. Automation increases efficiency. Most sales and account management processes have enormous amounts of low-value activity that could and should be automated.

But in high-value markets, buyers are not just purchasing functionality. They are purchasing risk reduction. They are purchasing accountability. They are purchasing confidence. And confidence is built through consistent human judgment over time, through the accumulation of trust that comes from someone showing up, delivering, and being present when things go wrong.

There is a related dynamic at the talent level. I have written about how AI is eliminating the on-ramp for early-career engineers, absorbing the low-context work that once let junior developers accumulate the judgment and institutional knowledge that makes senior engineers valuable. The same problem applies to the people who build enterprise relationships. The craft of reading a room, navigating a stalled deal, and managing a difficult renewal, these compound over years of real exposure. Automating the entry-level work in sales and customer success is not just an efficiency play. It shapes who gets the chance to develop the judgment the role ultimately requires.

Assistive automation increases efficiency. Primary automation risks eroding the very thing that becomes the last defensible moat.

The counterargument is that AI can also accelerate distribution itself. Faster outreach. Better targeting. Smarter personalization at scale. That is true as far as it goes. But it confuses distribution tactics with distribution durability. AI can help you reach more people faster. It cannot manufacture the trust that makes them stay, the embeddedness that makes switching costly, or the relationship capital that makes an acquirer’s internal champion go to bat for you. Speed without stickiness is just faster noise.

In a world saturated with synthetic output, authentic relationships are appreciated. The companies that understand this distinction, between automating the low-value repetitive work and preserving the high-value human judgment, will have a structural advantage over those that optimize purely for efficiency.

Forward-deployed engineers become strategic assets. Customer success becomes competitive infrastructure. Enterprise sales become durable leverage.

This will not be obvious in year one. It will be obvious in year five.

Overgrowth Risk

Cheap building combined with abundant capital creates another problem. When capital is deployed to chase an early signal, companies scale headcount and burn before structural dominance is secured. If they are not the winner in their category, they are left with a cost structure built for orbit and a trajectory that never left the atmosphere.

They grew too fast for a market that would not support multiple large players.

This risk increases when categories are crowded, and replication is easy. AI does not eliminate business fundamentals. It amplifies their consequences.

The Structural Shift

The AI era does not eliminate venture capital, entrepreneurship, or breakthrough innovation.

It shifts the locus of scarcity, gradually, unevenly, and irreversibly.

Foundational intellectual leaps remain rare and valuable. But most startups are not foundational leaps. When building was expensive, builders won. When building becomes cheap, distribution becomes destiny.

This transition is already underway. It is not complete. The companies founded in the next few years will discover its contours the hard way, either because they adapted early or because they did not.

The founders who understand what is happening will optimize differently. They will invest in buyer access before polishing perfection. They will treat relationships as infrastructure. They will see funnel design as a core product, not a marketing afterthought. They will build the internal champions inside their strategic targets before they need them.

And they will move fast on all of it. When building is cheap, the window to establish distribution before a competitor replicates the product is shorter than it has ever been. Timing has always mattered in startups. In this environment, it compounds differently – being six months earlier into a key account, a channel partnership, or a strategic relationship can be the difference between owning the category and being one of the many that flattened. Speed used to be about shipping. Now it is about embedding.

The VCs who understand it will underwrite differently. They have always asked whether the product is impressive and whether the founders are domain experts worth betting on. Those questions do not go away. But distribution used to be a problem you could punt on, something a strong team would figure out in year two or three. That tolerance is shrinking. Investors will put more weight on whether the company already has a credible path to controlling the channel, and be less willing to assume it will materialize later.

Because in a world where fifty companies can build the same thing, the only one that matters is the one that owns the channel and has convinced someone on the inside that betting on them was the right call.

Technology used to be the moat.

Now the moat is access. And access is built by people, over time, in ways that are harder to automate than we would like to admit.

You’re Not Outsourcing Infrastructure. You’re Outsourcing Capability.

Chamath posted this week: “Is on-premise the new cloud? I’m beginning to think yes. It’s the only way for companies to not blow themselves up and have some semblance of capability in an AI world.” Jason Fried dropped a link to Basecamp’s cloud exit and five words: “Saving us $10M, at least.”

Most people read this as a cost conversation. It’s not. Cost is the part that’s easy to measure. The structural problem underneath is harder to see and harder to fix. The cloud lets you rent compute and keep control. AI doesn’t offer that deal.

The cloud deal changed

Cloud worked because compute was deterministic. Both sides ran code. AWS ran millions of lines of service code. You ran your application. When something broke, you could trace it. Their bug or your bug, but someone’s bug, and the behavior was reproducible. The shared responsibility model worked because the boundary was clear. Provider secures the infrastructure; you secure what runs on it. Both sides knew which side of the line they were on.

AI breaks that. Not because there’s suddenly code you don’t control. That was always true in cloud. What’s new is behavior that isn’t traceable to anyone’s code in the traditional sense. A provider updates the model, and your system behavior changes. The model isn’t buggy. It’s probabilistic. Nobody wrote a line of code that says “produce this different output.” New failure modes show up without any deployment on your end. Pricing shifts once you’re locked in. Your data may be training their next competitive advantage. The model’s behavior isn’t infrastructure, and it isn’t your code. It’s a third thing, and it doesn’t fit on either side of the old responsibility boundary.

This isn’t renting infrastructure anymore. It’s renting capability. And the difference matters, because when AI becomes core to the product, whoever owns the capability layer owns the product. Everything else is a wrapper.

Liability doesn’t outsource

When your upstream model changes behavior and you violate a regulation, misprice risk, or produce unlawful output, that’s your problem. Not the API provider’s. Control and responsibility don’t decouple just because you didn’t train the weights.

Courts are already working through this, and the early results are clarifying.

In January 2026, the consolidated NYT v. OpenAI copyright litigation produced a discovery order compelling OpenAI to hand over 20 million anonymized ChatGPT logs. OpenAI had proposed the sample size itself, then tried to walk it back to keyword-filtered subsets. The court said no. Users who voluntarily submit conversations to a third-party platform have limited privacy protections over those interactions. Twenty million logs, 0.5% of the tens of billions OpenAI retains, and the court found that proportional.

Every conversation your team has with a hosted model is a record on someone else’s infrastructure, subject to someone else’s legal disputes.

Then on February 10, Judge Rakoff ruled in United States v. Heppner that 31 documents a defendant created using a commercial AI tool and shared with his defense attorneys aren’t privileged. Not attorney-client privilege, not work product. The court found “not remotely any basis” for protection. The AI platform isn’t an attorney; its terms disclaim any such relationship, and sending pre-existing unprivileged documents to a lawyer doesn’t retroactively create privilege. The government compared it to Google searches. Running a search and sharing results with your attorney doesn’t make the search history privileged.

Same direction, both cases. When you run your thinking through a third-party AI platform, you create discoverable records on infrastructure you don’t control, under terms you probably haven’t read carefully, with no privilege protection even if you later involve counsel.

Externalize capability. Retain liability.

Competing on rented capability

There’s a reason major retailers avoid AWS. Amazon is their competitor. Running your recommendation engine, pricing logic, or supply chain optimization on a competitor’s infrastructure isn’t philosophical. It’s operational. They see your usage patterns, your scale, your growth trajectories.

The same dynamic is showing up with AI providers. Build differentiated capabilities on a hosted model, and the provider has visibility into what you’re building and how. Your usage patterns become their product intelligence, whether or not they train on your data directly. You’re renting AI capabilities from the same companies you’re trying to compete with. Hard to build moats on someone else’s foundation.

Confidential compute solves one dimension

The obvious technical response to the privacy problem is confidential computing. Run the model inside a hardware enclave so even the infrastructure operator can’t see your data.

Moxie Marlinspike launched Confer in December. The Signal playbook applied to AI. End-to-end encrypted inference inside a Trusted Execution Environment. The host never sees your conversations. Architecturally private, not policy-private. As Marlinspike put it, AI chat logs reveal how you think, and once advertising arrives (it already has at OpenAI), “it will be as if a third party pays your therapist to convince you of something.”

Tinfoil takes a more general approach, building a confidential computing platform on NVIDIA’s Hopper and Blackwell GPUs with open-source verification and cryptographic attestation. They’re collaborating with Red Hat on open-source confidential AI infrastructure and recently joined the Confidential Computing Consortium. Privacy of on-prem, convenience of cloud, backed by hardware rather than promises.

Apple’s Private Cloud Compute is the big-company version. Extend the device security model to cloud inference with attestable guarantees about what code handles your request.

All serious work. All a long road.

The hardware foundations keep getting hit. Intel SGX has been battered by years of side-channel attacks. AMD SEV has had its own issues. Intel TDX, the newer play, just went through a joint security review with Google’s bug hunters that surfaced real problems. Each generation improves. None are yet where you’d stake regulatory compliance on the enclave boundaries holding against a motivated attacker with physical access.

But even if confidential compute fully matures, even if you can cryptographically guarantee nobody sees your data during inference, you’ve only solved one dimension of the problem.

Data privacy doesn’t fix model behavior. A provider pushes an update, your outputs change, and confidential compute didn’t help. Your data was private the whole time. Your system still broke.

Privacy is necessary. Ownership is the harder problem.

The infrastructure is catching up

The historical objection to “just run it yourself” was operational. Cloud won because it made infrastructure someone else’s problem. APIs, elastic scaling, managed services, no procurement cycles. Going on-prem meant going backward on developer experience and velocity.

That gap is closing. Oxide builds rack-scale systems that bring cloud architecture to hardware you own. API-driven infrastructure, elastic storage, integrated networking. Not commodity servers you’re left to assemble, but a single integrated system purpose-built from hardware through operating system. They’ve raised roughly $300 million to date and their customers include Lawrence Livermore National Laboratory and CoreWeave.

Bryan Cantrill, Oxide’s CTO, resists the term “private cloud.” He calls it “on-premises elastic infrastructure” because private cloud historically meant duct-taping multi-vendor stacks together and hoping. Oxide was built from scratch, so the operational model actually works.

37signals proved the economics. Moving seven applications off AWS onto their own hardware saved $10 million over five years on a hardware investment that paid for itself in six months. But cost was always the easy argument. The harder one, the one Chamath is circling, is about control over what actually makes your product work. Not just the servers. The model versioning, the update cadence, the safety filters, the logging policy, and the alignment decisions. Capability evolution on your timeline, not someone else’s. Enterprise contracts can promise some of this. Version pinning, indemnification, non-training guarantees. But contractual assurances are not the same as technical control over capability evolution. A contract says they won’t change your model without notice. Ownership means they can’t.

The common middle ground is hybrid. Train in the cloud, run inference on-prem. That works for latency and cost. It doesn’t solve the ownership problem. If you’re still pulling model updates from an upstream provider, you’ve moved the compute but not the dependency. The failure mode is the same. It just happens on your hardware.

There’s a harder version of this objection. Model capabilities are still compounding. If you pin an open-weights model on your own rack for stability and control, but your competitor rides the frontier API curve, they’re accepting volatility in exchange for raw intelligence. Stability is the right metric for infrastructure. For capability, sometimes you need the smartest model available, even if it’s unpredictable. The on-prem bet only works long-term if open-weights models keep pace with closed-source APIs. If they don’t, ownership becomes a stability play at the cost of falling behind the intelligence frontier.

And for most companies, training or fine-tuning a frontier model isn’t realistic. They don’t have the data, the talent, or the compute budget. The API dependency isn’t a bad decision. It’s the only one available. Which means this isn’t a trade-off most organizations can avoid. It’s one they need to understand clearly, because the costs of not understanding it are compounding in courtrooms and competitive markets right now.

The access problem

If the answer to AI privacy and control is “own the infrastructure,” we already know who can afford that and who can’t.

Enterprises with budget and technical depth will run their own inference on their own hardware. They’ll pin model versions, control their data, keep their logs out of other people’s lawsuits. The well-resourced get privacy, control, and capability independence.

Everyone else gets the free tier. Their conversations live on someone else’s servers, train someone else’s models, show up in someone else’s discovery obligations, and get monetized through advertising that knows exactly how they think. This is the most intimate technology ever built, and access to the private version of it tracks directly to the ability to pay.

This pattern isn’t new. Same split as healthcare, education, and legal representation. But AI sharpens it because the privacy gap isn’t about what you can afford to buy. It’s about what you’re forced to reveal by using the product at all.

The consumer version plays out in personal AI. Local models on personal hardware will happen. They’re already happening. But the timeline to frontier parity is longer than the optimists claim, and the cost of the hardware isn’t trivial. The people who can afford local inference or premium privacy tiers will opt out of the surveillance model. Everyone else won’t have the choice.

This is where confidential compute matters most. Not for enterprises, who solve the problem with hardware and headcount, but for the everyone-else case. If Confer or Tinfoil, or Apple PCC can make private inference the default rather than the premium option, if the cryptographic guarantees get strong enough that you don’t need to own the rack to own your data, that changes the access equation.

It doesn’t solve the capability ownership problem. Companies building products on AI will still need to control their model stack. But it could mean that using AI doesn’t require surrendering the record of how you think to whoever runs the server.

That’s one leg of the problem. A meaningful one. The other legs, model behavior stability, capability independence, and liability alignment, still require ownership for anyone building on top of these systems.

Where this goes

The cloud era trained everyone to think of infrastructure as a commodity you rent. For deterministic compute, that was right. The cycles did what you told them. Responsibility was clear.

AI couples capability to liability in a way cloud computing never did. The compute isn’t just running your logic. It’s making decisions, generating records, and creating obligations that follow you regardless of where the model runs or who trained it.

Ownership is becoming the default for anything that touches the capability layer. The infrastructure to make that viable is catching up. The open-weights ecosystem has to keep pace for it to work. And the question of who gets access to the private, controlled version of AI versus who’s stuck with the surveilled version will define the next decade of policy fights.

Renting capability means renting decisions you don’t control while keeping consequences you can’t outsource.

Agents Are More Like Humans Than Workloads. Here’s Why That Matters for Identity.

This is a long one. But as a great man once said, forgive the length, I didn’t have time to write a short one.

The industry has been going back and forth on where agent identity belongs. Is it closer to workload identity (attestation, pre-enumerated trust graphs, role-bound authorization) or closer to human identity (delegation, consent, progressive trust, session scope)? The answer from my perspective is human identity. But the reason isn’t what most people think.

The usual argument goes like this. Agents exercise discretion. They interpret ambiguous input. They pick tools. They sequence actions. They surprise you. Workloads don’t do any of that. Therefore agents need human-style identity.

That argument is true but it’s not the load-bearing part. The real reason is simpler and more structural.

Think about it this way. A robot arm on an assembly line is bolted to the floor. It’s “Arm #42.” It picks up a bolt from Bin A and puts it in Hole B. If it tries to reach for Bin Z, the system shuts it down. It has no reason to ever touch Bin Z. That’s workload identity. It works because the environment is closed and architected.

Now think about a consultant hired to “fix efficiency.” They roam the entire building. They’re “Alice, acting on behalf of the CEO.” They don’t have a list of rooms they can enter. They have a badge that says “CEO’s Proxy.” When they realize the problem is in the basement, the security guard checks their badge and lets them in, even though the CEO didn’t write “Alice can go to the basement” on a list that morning. The badge isn’t unlimited access. It’s a delegation primitive combined with policy. That’s human identity. It works because the environment is open and emergent.

Agents are the consultant, not the robot arm. Workload identity is built for maps: you know the territory, you draw the routes, if a service goes off-route it’s an error. Agent identity is built for compasses: you know the destination, but the route is discovered at runtime. Our identity infrastructure needs to reflect that difference.

To be clear, I am not suggesting agents are human. This isn’t about moral equivalence, legal personhood, or anthropomorphism. It’s about principal modeling. Agents occupy a similar architectural role to humans in identity systems. Discretionary actors operating in open ecosystems under delegated authority. That’s a structural observation, not a philosophical claim.

A fair objection is that today’s agents mostly work on concrete, short-lived tasks. A coding agent fixes a bug. A support agent resolves a ticket. The autonomy they exercise is handling subtle variance within a well-defined scope, not roaming across open ecosystems making judgment calls. That’s true, and in those cases the workload identity model is a reasonable fit.

But the majority of the value everyone is chasing accrues when agents can act for longer periods of time on more open-ended problems. Investigate why this system is slow. Manage this compliance process. Coordinate across these teams to ship this feature. And the longer an agent runs, the more likely it is to need permissions beyond what anyone anticipated at the start. That’s the nature of open-ended work.

The longer the horizon and the more open the problem space, the more the identity challenges described here become real engineering constraints rather than theoretical concerns. What follows is increasingly true as agents move in that direction, and every serious investment in agent capability is pushing them there.

Workload Identity Was Built for Closed Ecosystems

Think about how workload identity actually works in practice. You know which services are in your infrastructure. You know which service talks to which service. You pre-provision the credentials or you set up attestation so that the right code running in the right environment gets the right identity at boot time. SPIFFE loosened some of the static parts with dynamic attestation, but the mental model is still the same: I know what’s in my infrastructure, and I’m issuing identity to things I control.

That model works because workloads operate in closed ecosystems. Your Kubernetes cluster. Your cloud account. Your service mesh. The set of actors is known. The trust relationships are pre-defined. The identity system’s job is to verify that the thing asking for access is the thing you already decided should have access.

Agents broke that assumption.

An MCP client can talk to any server. An agent operating on your behalf might need to interact with services it was never pre-registered with. Trust relationships may be dynamic, not pre-provisioned, and the more open-ended the task the more likely that is true. The authorization decisions are contextual. Sometimes a human needs to approve what’s happening in real time. An agent might need to negotiate access to a resource that neither you nor the agent anticipated when the mission started.

None of that fits the workload model. Not because agents think or exercise judgment, but because the ecosystem they operate in is open. Workload identity was built for closed ecosystems. The more capable and autonomous agents become, the less they stay inside them.

Discovery Is the Problem Nobody Wants to Talk About

The open ecosystem problem goes deeper than just “agents interact with arbitrary services.” The whole point of an agent is to find paths you didn’t anticipate. Tell an agent “go figure out why certificate issuance is broken” and it might follow a trail from CT logs to a CA status page to vendor Slack to a three-year-old wiki page to someone’s personal notes. That path isn’t architected. It emerges from the agent reasoning about the problem.

Every existing authorization model assumes someone already enumerated what exists.

SystemResource SpaceDiscovery ModelAuth TimingTrust Model
SPIFFEClosed, architectedNone, interaction graph is designedDeploy-timeStatic, identity-bound
OAuthBounded by pre-registered integrationsNone, API contracts existIntegration-time + user consentStatic after consent
IAMClosed, cataloguedNone, administratively maintainedAdmin-timeStatic, role-bound
Zero TrustBounded by inventory and policy planeNone, known endpointsPer-requestSession-scoped, contextual
Browser SecurityOpen, unboundedFull, arbitrary traversalPer-request, per-capabilityNone, no accumulation
Agentic Auth (needed)Open, task-emergentReasoning-driven, discovered at runtimeContinuous, intra-taskAccumulative, task-scoped

Every model except browser security assumes a closed resource space. Browser security is the only open-space model, but it doesn’t accumulate trust. Agents need open-space discovery with accumulative trust. Nothing in the current stack does both.

Structured authorization models assume you can enumerate the paths. But enumeration kills emergence. If you have to pre-authorize every possible resource an agent might touch, you’ve pre-solved the problem space. That defeats the purpose of having an agent explore it.

The security objection here is obvious. An agent “discovering paths you didn’t anticipate” sounds a lot like lateral movement. The difference is authorization. An attacker discovers paths to exploit vulnerabilities. An agent discovers paths to find capabilities, under a delegation, subject to policy, with every step logged. The distinction only holds if the governance layer is actually doing its job. Without it, agent discovery and attacker reconnaissance are indistinguishable. That’s not an argument against discovery. It’s an argument for getting the governance layer right.

The Authorization Direction Is Inverted

Workload identity is additive. You enumerate what’s permitted. Here’s the role, here’s the scope, here’s the list of services this workload can talk to. Everything outside that list is denied.

Agents need something different. Not pure positive enumeration, but mixed constraints: here’s the goal, here’s the scope you’re operating in, here’s what’s off limits, here’s when you escalate. Access outside the defined scope isn’t default-allowed. It’s negotiable through demonstrated relevance and appropriate oversight.

That’s goal-scoped authorization with negative constraints rather than positive enumeration. And before the security people start hyperventilating, this doesn’t mean “default allow with a blacklist.” That would be insane. Nobody is proposing that.

What it actually looks like is how we scope human delegation in practice. When a company hires a consultant and says “fix our efficiency problem,” they don’t hand them a list of every room they can enter, every file they can read, every person they can talk to. They give them a badge, a scope of work, a set of boundaries (don’t access HR records, don’t make personnel decisions), escalation requirements (get approval before committing to anything over $50k), and monitoring (weekly check-ins, expense reports, audit trail). That’s not default allow. It’s delegated authority with boundaries, escalation paths, and oversight.

The constraints are a mix of positive (here’s your scope), negative (here’s what’s off limits), and procedural (here’s when you need to ask). To be fair, no deployed identity protocol fully supports this mixed-constraint model today. OAuth scopes are basically positive enumeration. RBAC is positive enumeration. Policy grammars that can express mixed constraints exist (Cedar and its derivatives can express allow, deny, and escalation rules against the same resource), but nobody has deployed them for agent governance yet.

The mixed-constraint approach is how we govern humans organizationally, with identity infrastructure providing one piece of it. But the human identity stack is at least oriented in this direction. It has the concepts of delegation, consent, and conditional access. The workload identity stack doesn’t even have the vocabulary for it, because it was never designed for actors that discover their own paths.

The workload model can’t support this because it was designed to enumerate. The human model is oriented toward it because humans were the first actors that needed to operate in open, unbounded problem spaces with delegated authority and loosely defined scope.

The Human Identity Stack Got Here First

The human identity stack evolved these properties because humans needed them. Delegation exists because users interact with arbitrary services and need to grant scoped authority. Federation exists because trust crosses organizational boundaries. Consent flows exist because sometimes a human needs to approve what’s happening. Progressive auth exists because different operations require different levels of assurance, though in practice it’s barely deployed because it’s hard to implement well.

That last point matters. Progressive auth has been a nice-to-have for human identity, something most organizations skip because the friction isn’t worth it for human users who can just re-authenticate. For agents, it becomes essential. The more emergent the expectations, the more you need the ability to step up trust dynamically. Agents make progressive auth a requirement, not an aspiration.

And unlike the human case, progressive auth for agents is more tractable to build. The agent proposes an action, a policy engine or human approves, the scope expands with full audit. The governance gates can be automated. The building blocks exist. The composition is the work.

The human stack built these primitives because humans operate in open, dynamic ecosystems. Workloads historically didn’t. Now agents do. And agents are going to force the deployment of progressive auth patterns that the human stack defined but never fully delivered on.

And you can see this playing out in real time. Every serious attempt to solve agent identity reaches for human identity concepts, not workload identity concepts. Dick Hardt built AAuth around delegation, consent, progressive trust, and token exchange. Not because those are OAuth features, but because those are the properties agents need, and the human identity stack is where they were first defined. Microsoft’s Entra Agent ID uses On-Behalf-Of flows, confidential clients, and delegation patterns. Google’s A2A protocol uses OAuth, task-based delegation, and agent cards for discovery.

You can stretch SPIFFE or WIMSE to cover simple agent automation. But once agents operate across discovered systems rather than pre-enumerated ones, the model starts to strain. That’s not because those are bad technologies. It’s because they solve a different layer. Agent auth lives above attestation, in the governance layer, and the concepts that keep showing up there, delegation, consent, session scope, progressive trust, all originate on the human side.

That’s not a coincidence. The people building the protocols are voting with their architecture, and they’re voting for the human side. They’re doing it because that’s where the right primitives already exist.

“Why Not Just Extend Workload Identity?”

The obvious counterargument is that you could start from workload identity and extend it to cover agents. It’s worth taking seriously.

SPIFFE is good technology and it works well where it fits. Cloud-native environments, Kubernetes clusters, modern service meshes. In those environments, SPIFFE’s model of dynamic attestation and identity issuance is exactly right. The problem isn’t SPIFFE. The problem is that you don’t get to change all the systems.

That’s why WIMSE exists. Not because SPIFFE failed, but because the real world has more environments than SPIFFE was designed for. Legacy systems, hybrid deployments, multi-cloud sprawl, enterprise environments that aren’t going to rearchitect around SPIFFE’s model. WIMSE is defining the broader patterns and extending the schemes to fit those other environments. That work is important and it’s still in progress.

There’s also a growing push to treat agents as non-human identities and extend workload identity with agent-specific attributes. Ephemeral provisioning, delegation chains, behavioral monitoring. The idea is that agents are just advanced NHIs, so you start from the workload stack and bolt on what’s missing. I understand the appeal. It lets you build on existing infrastructure without rethinking the model.

But what you end up bolting on is delegation, consent, session scope, and progressive trust. Those aren’t workload identity concepts being extended. Those are human identity concepts being retrofitted onto a foundation that was never designed for them. You’re starting from attestation and trying to work your way up to governance. Every concept you need to add comes from the other stack. At some point you have to ask whether you’re extending workload identity or just rebuilding human identity with extra steps.

Agent Identity Is a Governance Problem

Now apply that same logic to agents more broadly. Agents don’t operate in a world where every system speaks SPIFFE, or WIMSE, or any single workload identity protocol. They interact with whatever is out there. SaaS APIs. Legacy enterprise systems. Third-party services they discover at runtime. The environments agents operate in are even more heterogeneous than the environments WIMSE is trying to address.

And many of those systems don’t support delegation at all. They authenticate users with passwords and passkeys, and that’s it. No OBO flows, no token exchange, no scoped delegation. In those cases agents will need to fully impersonate users, authenticating with the user’s credentials as if they were the user. That’s not the ideal architecture. It’s the practical reality of a world where agents need to interact with systems that were built for humans and haven’t been updated. The identity infrastructure has to treat impersonation as a governed, auditable, revocable act rather than pretending it won’t happen.

I want to be honest about the contradiction here. The moment an agent injects Alice’s password into a legacy SaaS app, all of the governance properties this post argues for vanish. Principal-level accountability, cryptographic provenance, session-scoped delegation — none of it survives that boundary. The legacy system sees Alice. The audit log says Alice. There’s no way to distinguish Alice from an agent acting on Alice’s behalf. You can’t revoke the agent’s access without changing Alice’s password. I don’t have a good answer for that. It’s a real gap, and it will exist for as long as legacy systems do. The faster the world moves toward agent-native endpoints, the smaller this governance black hole gets. But right now it’s large.

At the same time, the world is moving toward agent-native endpoints. I’ve written before about a future where DNS SRV records sit right next to A records, one pointing at the website for humans and one pointing at an MCP endpoint for agents. That’s the direction. But identity infrastructure has to handle the full spectrum, from legacy systems that only understand passwords to native agent endpoints that support delegation and attestation natively. The spectrum will exist for a long time.

More than with humans or workloads, agent identity turns into a governance problem. Human identity is mostly about authentication. Workload identity is mostly about attestation. Agent identity is mostly about governance. Who authorized this agent. What scope was it given. Is that scope still valid. Should a human approve the next step. Can the delegation be revoked right now. Those are all governance questions, and they matter more for agents than they ever did for humans or workloads because agents act autonomously under delegated authority across systems nobody fully controls.

And unlike humans, agents possess neither liability nor common sense. A human with overly broad access still has judgment that says “this is technically allowed but clearly a bad idea” and faces personal consequences for getting it wrong. Agents have neither brake. The governance infrastructure has to provide externally what humans provide partially on their own.

For humans and workloads, identity and authorization are cleanly separable layers. For agents, they converge. An agent’s identity without its delegation context is meaningless, and its delegation context is authorization. Governance is where those two layers collapse into one.

The reason is structural. Workloads act on behalf of the organization that deployed them. The operator and the principal are the same entity. Agents introduce a new actor in the chain. They act on behalf of a specific human who delegated specific authority for a specific task. That “on behalf of” is simultaneously an identity fact and an authorization fact, and it doesn’t exist in the workload model at all.

That’s why the human identity stack keeps winning this argument.

Meanwhile, human identity concepts are deployed at planetary scale. Delegation and consent are mature, well-understood patterns with decades of deployment experience. Progressive trust is defined but barely deployed. Multi-hop delegation provenance is still being figured out. It’s an incomplete picture, but here’s the thing: the properties that are missing from the human side don’t even have definitions on the workload side. That’s still a decisive advantage.

But I want to be clear. The argument here is about properties, not protocols. I don’t think OAuth is the answer, even with DPoP. OAuth was designed for a world of pre-registered clients and tightly scoped API access. DPoP bolts on proof-of-possession, but it doesn’t change the fundamental model.

When Hardt built AAuth, he didn’t extend OAuth. He started a new protocol. He kept the concepts that work (delegation, consent, token exchange, progressive trust) and rebuilt the mechanics around agent-native patterns. HTTPS-based identity without pre-registration, HTTP message signing on every request, ephemeral keys, and multi-hop token exchange. That’s telling. The human identity stack has the right concepts, but the actual protocols need to be rebuilt for agents. The direction is human-side. The destination is something new.

This isn’t about which stack is theoretically better. It’s about which stack has the right primitives deployed in the environments agents actually operate in. The answer to that question is the human identity stack.

Discretion Makes It Harder, But It’s Not the Main Event

The behavioral stuff still matters. It’s just downstream of the structural argument.

Workloads execute predefined logic. You attest that the right code is running in the right environment, and from there you can reason about what it will do. Agents don’t work that way. When you give an autonomous AI agent access to your infrastructure with the goal of “improve system performance,” you can’t predict whether it will optimize efficiency or find creative shortcuts that break other systems. We’ve already seen models break out of containers by exploiting vulnerabilities rather than completing tasks as intended. Agents optimize objectives in ways that can violate intent unless constrained. That’s not a bug. It’s the expected behavior of systems designed to find novel paths to goals.

That means you can’t rely on code measurement alone to govern what an agent does. You also need behavioral monitoring, anomaly detection, conditional privilege, and the ability to put a human in the loop. Those are all human IAM patterns. But you need them because the ecosystem is open and the behavior is unpredictable. The open ecosystem is the first-order problem. The unpredictable behavior makes it worse.

And this is where the distinction between guidance and enforcement matters. System instructions are suggestions. An agent can be told “don’t access production data” in its prompt and still do it if a tool call is available and the reasoning chain leads there. Prompt injections can override instructions entirely. Policy enforcement is infrastructure. Cryptographic controls, governance layers, and authorization gates that sit outside the agent’s context and can’t be talked around. Agents need infrastructure they can’t override through reasoning, not instructions they’re supposed to follow.

What Agents Actually Need From the Human Stack

Session-scoped authority. I’ve written about this with the Tron identity disc metaphor. Agent spawns, gets a fresh disc, performs a mission, disc expires. That’s session semantics. It exists because the trust relationship is bounded and temporary, the way a user’s interaction with a service is bounded and temporary, not the way a workload’s persistent role in a service mesh works.

Think about what happens without it. An agent gets database write access for a migration task. Task completes. The credentials are still live. The next task is unrelated, but the agent still has write access to that database. A poisoned input, a bad reasoning chain, or just an optimization shortcut the agent thought was clever, and it drops a table. Not because it was malicious. Because it had credentials it no longer needed for a task it was no longer doing. That’s the agent equivalent of Bobby Tables, and it’s entirely preventable.

The logical endpoint of session-scoped authority is zero standing permissions. Every agent session starts empty. No credentials carry over from the last task. The agent accumulates only what it needs for this specific mission, and everything resets when the mission ends.

For humans, zero standing permissions is aspirational but rarely practiced because the friction isn’t worth it. Humans don’t want to re-request access to the same systems every morning. Agents don’t have that problem. They can request, wait, and proceed programmatically. The friction that makes zero standing permissions impractical for humans disappears for agents.

The hard question is how permissions get granted at runtime. Predefined policy handles the predictable paths. Billing agent gets billing APIs. That works, but it’s enumeration, and enumeration breaks down for open-ended tasks. Human-gated expansion handles the unpredictable paths, but it kills autonomy.

The mechanism that would actually make zero standing permissions work for emergent behavior is goal-scoped evaluation. Does this request serve the stated goal within the stated boundaries. That’s the same unsolved problem the rest of this piece keeps circling. Zero standing permissions is the right ideal. It’s achievable today for the predictable portion of agent work. The gap is the same gap.

Delegation with provenance. Agents are user agents in the truest sense. They carry delegated user authority into digital systems. AAuth formalizes this with agent tokens that bind signing keys to identity. The question “who authorized this agent to do this?” is a delegation question. Delegation is a human identity primitive because humans were the first actors that operated across trust boundaries and needed to grant scoped authority to others.

Chaining that delegation cryptographically across multi-hop paths, from user to agent to tool to downstream service while maintaining proof of the original user’s intent, is genuinely hard. Standard OBO flows are often too brittle for this. This is where the industry needs to go, not where it is today.

Progressive trust. AAuth lets a resource demand anything from a signed request to verified agent identity to full user authorization. That gradient only makes sense when the trust relationship is negotiated dynamically. Workloads don’t negotiate trust. They either have a role or they don’t.

Accountability at the principal level. When an agent approves a transaction, files a regulatory report, or alters infrastructure state, the audit question is “who authorized this and was it within scope?” Today’s logs can’t answer that. The log says an API token performed a read on a customer record. That token is shared across dozens of agents. Which agent? Acting on whose delegation? For what task? The log can’t say.

And even if it could identify the agent, there’s nothing connecting that action to the human authorization that allowed it. Nobody asks “which Kubernetes pod approved this wire transfer.” Governance frameworks reason about actors. That’s why every protocol effort maps agent identity to principal identity.

Goal-scoped authorization. Agents need mixed constraints rather than pure positive enumeration. Define the scope, set the boundaries, establish the escalation paths, delegate the goal, let the agent figure out the path. That’s how we’ve governed human actors in organizations for centuries. The identity and authorization infrastructure to support it exists in the human stack because that’s where it was needed first.

But I’ll be direct. Goal-scoped authorization is the hardest unsolved engineering problem in this space. The fundamental tension is temporal. Authorization happens before execution, but agents discover what they need during execution. Current authorization systems operate on verbs and nouns (allow this action on this resource). They don’t understand goals. Translating “fix the billing error” into a set of allowed API calls at runtime, without the agent hallucinating its way into a catastrophe, requires a just-in-time policy layer that doesn’t exist yet.

Progressive trust gets us part of the way there. The agent proposes an action, a policy engine, or a human approves the specific derived action before it executes. But the full solution is ahead of us, not behind us.

I know how this sounds to security people. “Goal-based authorization” sounds like the agent decides what it needs based on its own interpretation of a goal. That’s terrifying. It sounds like self-authorizing AI. But the alternative is pretending we can enumerate every action an agent might need in advance, and that fails silently. Either the agent operates within the pre-authorized list and can’t do its job, or someone over-provisions “just in case” and the agent has access to things it shouldn’t. Both are security failures. One just looks tidy on paper. Goal-based auth at least makes the governance visible. The agent proposes, the policy evaluates, the decision is logged. The scary part isn’t that we need goal-based auth. The scary part is that we don’t have it yet, so people are shipping agents with over-provisioned static credentials instead.

And there’s a deeper problem I want to name honestly. The only thing capable of evaluating whether a specific API call serves a broader goal is another LLM. And that means putting a probabilistic, hallucination-prone, high-latency system into the critical path of every infrastructure request. You’re using the thing you’re trying to govern as the governance mechanism. That’s not just an engineering gap waiting to be filled. It’s a fundamental architectural tension that the industry hasn’t figured out how to resolve. Progressive trust with human-gated escalation is the best interim answer, but it’s a workaround, not a solution.

This Isn’t About Throwing Away Attestation

I want to be clear about something because readers will assume otherwise. This argument is not “throw away workload identity primitives.” I’ve spent years arguing that attestation is MFA for workloads. I’ve written about measured enclaves, runtime attestation, and hardware-rooted identity extensively. None of that goes away.

You absolutely need attestation to prove the agent is running the right code in the right environment. You need runtime measurement to detect tampering. You need hardware roots of trust. If a hacker injects malicious code into an agent that has broad delegated authority, you need to know. That’s the workload identity stack doing its job.

In fact, attestation isn’t just complementary to the governance layer. It’s prerequisite. You can’t safely delegate authority to something you can’t verify. All the governance, delegation, and consent primitives in the world are meaningless if the code executing them has been tampered with. Attestation is the foundation the governance layer stands on.

But attestation alone isn’t enough. Proving that the right code is running doesn’t tell you who authorized this agent to act, what scope it was delegated, whether it’s operating within that scope, or whether a human needs to approve the next action. Those are delegation, consent, and governance questions. Those live in the human identity stack.

What agents actually need is both. Workload-style attestation as the foundation, with human-style delegation, consent, and progressive trust built on top.

I’ve argued before that attestation is MFA for workloads. It proves code integrity, runtime environment, and platform state, the way MFA proves presence, possession, and freshness for humans. For agents, we need to extend that into principal-level attestation. Not just “is this the right code in the right environment?” but also “who delegated authority to this agent, under what policy, with what scope, and is that delegation still valid?”

That’s multi-factor attestation of an acting principal. Code integrity from the workload stack, delegation provenance from the human stack, policy snapshot and session scope binding the two together. Neither stack delivers that alone today.

The argument is about where the center of gravity is, not about discarding one stack entirely. And the center of gravity is on the human side, because the hard problems for agents are delegation and governance, not runtime measurement.

Where the Properties Actually Align (And Where They Don’t)

I’ve been arguing agents are more like humans than workloads. That’s true as a center-of-gravity claim. But it’s not total alignment, and pretending otherwise invites the wrong criticisms. Here’s where the properties actually land.

What agents inherit from the human side:

Delegation with scoped authority. Session-bounded trust. Progressive auth and step-up. Cross-boundary trust negotiation. Principal-level accountability. Open ecosystem discovery. These are the properties that make agents look like humans and not like workloads. They’re also the properties that are hardest to solve and least mature.

What agents inherit from the workload side:

Code integrity attestation. Runtime measurement. Programmatic credential handling with no human in the authentication loop. Ephemeral identity that doesn’t persist across sessions. These are well-understood, and the workload identity stack handles them. Agents don’t authenticate the way humans do. They don’t type passwords or touch biometric sensors. They prove what code is running and in what environment. That’s attestation, and it stays on the workload side.

What neither stack gives them:

This is the part nobody is talking about enough. Agents have properties that don’t map cleanly to either the human or workload model.

Accumulative trust within a task that resets between tasks. Human trust accumulates over a career and persists. Workload trust is static and role-bound. Agent trust needs to build during a mission as the agent demonstrates relevance and competence, then reset completely when the mission ends. Nothing in either stack supports that lifecycle.

Goal-scoped authorization with emergent resource discovery. I’ve already called this the hardest unsolved problem. Current auth systems operate on verbs and nouns. Agents need auth systems that operate on goals and boundaries. Neither stack was designed for this.

Delegation where the delegate doesn’t share the delegator’s intent. Every existing delegation protocol assumes the delegate understands and shares the user’s intent. When a human delegates to another human through OAuth, both parties generally understand what “handle my calendar” means and what it doesn’t.

An agent doesn’t share intent. It shares instructions. It will pursue the letter of the delegation through whatever path optimizes the objective, even if the human would have stopped and said “that’s not what I meant.” This isn’t a philosophy problem. It’s a protocol-level assumption violation. No existing delegation framework accounts for delegates that optimize rather than interpret.

Simultaneous proof of code identity and delegation authority. Agents need to prove both what they are (attestation) and who authorized them to act (delegation) in a single transaction. Those proofs come from different stacks with different trust roots. A system can check both sequentially, verify the attestation, then verify the delegation, and that’s buildable today. But binding them together cryptographically into a single verifiable object so a relying party can verify both at once without trusting the binding layer is an unsolved composition problem.

Vulnerability to context poisoning that persists across sessions. I’ve written about the “Invitation Is All You Need” attack where a poisoned calendar entry injected instructions into an agent’s memory that executed days later. Humans can be socially engineered, but they don’t carry the payload across sessions the way agents do. Workloads don’t accumulate context at all. Agent session isolation is a new problem that needs new primitives.

The honest summary is this. Agents inherit their governance properties from the human side and their verification properties from the workload side, but neither stack addresses the properties that are unique to agents. The solution isn’t OAuth with attestation bolted on. It’s something new that inherits from both lineages and adds primitives for accumulative task-scoped trust, goal-based authorization, and session isolation. That thing doesn’t exist yet.

Where This Framing Breaks

Saying “agents are like humans” implies the workload stack fails because workloads lack something agents have. Discretion, autonomy, behavioral complexity. That’s the wrong diagnosis. The workload stack fails because it was built for a world of pre-registered clients, tightly bound server relationships, and closed trust ecosystems. The more capable agents become, the less they stay in that world.

The human identity stack fits better not because agents are human-like, but because it’s oriented toward the structural properties agents need. Open ecosystems. Dynamic trust negotiation. Delegation across boundaries. Session-scoped authority. Progressive assurance. Not all of these are fully deployed today. Some are defined but immature. Some don’t exist as protocols yet. But the concepts, the vocabulary, and the architectural direction all come from the human side. The workload side doesn’t even have the vocabulary for most of them.

Those properties exist in the human stack because humans needed them first. Now agents need them too.

The Convergence We’ve Already Seen

My blog has traced this progression for a while now. Machines were static, long-lived, pre-registered. Workloads broke that model with ephemeral, dynamic, attestation-based identity. Each step in that evolution adopted identity properties that were already standard in human identity systems. Dynamic issuance. Short credential lifetimes. Context-aware access. Attestation as MFA for workloads. Workload identity got better by becoming more like user identity.

Agents are the next step in that same convergence. They don’t just need dynamic credentials and attestation. They need delegation, consent, progressive trust, session scope, and goal-based authorization. The most complete and most deployed versions of those primitives live in the human stack. Some exist in other forms elsewhere (SPIFFE has trust domain federation, capability tokens like Macaroons exist independently), but the human stack is where the broadest set of these concepts has been defined, tested, and deployed at scale.

The Actual Claim

Agent identity is a governance problem. Not an authentication problem, not an attestation problem. The hard questions are all governance questions. Who delegated authority. What scope. Is it still valid. Should a human approve the next step. For humans and workloads, identity and authorization are separate layers. For agents, they collapse. The delegation is the identity.

The human identity stack is where principal identity primitives live. Not because agents are people, but because people were the first actors that needed identity in open ecosystems with delegated authority and unbounded problem spaces.

Every protocol designer who sits down to solve agent auth rediscovers this and reaches for human identity concepts, not workload identity concepts. The protocols they build aren’t OAuth. They’re something new. But they inherit from the human side every time. That convergence is the argument.

The delegation and governance layer is buildable today. Goal-scoped authorization and intent verification are ahead of us. The first generation of agent identity systems will solve governance. The second will solve intent.

“A Few Hours” and the Slow Erosion of Auditable Commitments

There’s a pattern that plays out across every regulated industry. Requirements increase. Complexity compounds. The people responsible for compliance realize they can’t keep up with manual processes. So instead of building the capacity to meet the rising bar, they quietly lower the specificity of their commitments.

It’s rational behavior. A policy that says “we perform regular reviews” can’t be contradicted the way a policy that says “we perform reviews every 72 hours” can. The less you commit to on paper, the less exposure you carry.

The problem is that this rational behavior, repeated across enough organizations and enough audit cycles, hollows out the entire compliance system from the inside. Documents stop describing what organizations actually do. They start describing the minimum an auditor will accept. The gap between documentation and reality widens. Nobody notices until something breaks.

A Real-Time Example

A recent incident in the Mozilla CA Program put this dynamic on public display in a way worth studying regardless of whether you work in PKI.

Amazon Trust Services disclosed that their Certificate Revocation Lists sometimes backdate a timestamp called “thisUpdate” by up to a few hours. The practice itself is defensible. It accommodates clock skew in client systems. When they updated their policy document to disclose the behavior, they described it as CRLs “may be backdated by up to a few hours.”

A community member pointed out the obvious. “A few hours” is un-auditable. Without a defined upper bound, there’s no way for an auditor, a monitoring tool, or a relying party to evaluate whether any given CRL falls within the CA’s stated practice. Twelve hours? Still “a few.” Twenty-four? Who decides?

When pressed, Amazon’s response was telling. They don’t plan to add detailed certificate profiles back into their policy documents. They believe referencing external requirements satisfies their disclosure obligations. We’ll tell you we follow the rules, but we won’t tell you how.

Apple, Mozilla, and Google’s Chrome team then independently pushed back. Each stated that referencing external standards is necessary but not sufficient. Policy documents must describe actual implementation choices with enough precision to be verifiable.

Apple’s Dustin Hollenback was direct. “The Apple Root Program expects policy documents to describe the CA Owner’s specific implementation of applicable requirements and operational practices, not merely incorporate them by reference.”

Mozilla’s Ben Wilson went further, noting that “subjective descriptors without defined bounds or technical context make it difficult to evaluate compliance, support audit testing, or enable independent analysis.” Mozilla has since opened Issue #295 to strengthen the MRSP accordingly.

Chrome’s response summarized the situation most clearly:

We consider reducing a CP/CPS to a generic pointer where it becomes impossible to distinguish between CAs that maintain robust, risk-averse practices and those that merely operate at the edge of compliance as being harmful to the reliable security of Chrome’s users.

They also noted that prior versions of Amazon’s policy had considerably more profile detail, calling the trend of stripping operational commitments “a regression in ecosystem transparency.”

The Pattern Underneath

What makes PKI useful as a case study isn’t that certificate authorities are uniquely bad at this. It’s that their compliance process is uniquely visible. CP/CPS documents are public. Incident reports are filed in public Bugzilla threads. Root program responses are posted where anyone can read them. The entire negotiation between “what we do” and “what we’re willing to commit to on paper” plays out in the open.

In most regulated industries, you never see this. The equivalent conversations in finance, FedRAMP, healthcare, or energy happen behind closed doors between compliance staff and auditors. The dilution is invisible to everyone outside the room. A bank’s internal policies get vaguer over time and nobody outside the compliance team and their auditors knows it happened. A FedRAMP authorization package gets thinner and the only people who notice are the assessors reviewing it. The dynamic is the same. The transparency isn’t.

So when you watch a CA update its policy with “a few hours” and three oversight bodies publicly push back, you’re seeing something that happens constantly across every regulated domain. You’re just not usually allowed to watch.

Strip away the PKI details and the pattern is familiar to anyone who has worked in compliance. An organization starts with detailed documentation of its practices. Requirements grow. Maintaining alignment between what the documents say and what the systems actually do gets expensive. Someone realizes that vague language creates less exposure than specific language. Sometimes it’s the compliance team running out of capacity. Sometimes it’s legal counsel actively advising against specific commitments, believing that “reasonable efforts” is harder to litigate against than “24 hours.” Either way, they’re trading audit risk for liability risk and increasing both. The documents get trimmed. Profiles get removed. Temporal commitments become subjective. “Regularly.” “Promptly.” “Periodically.” Operational descriptions become references to external standards.

Each individual edit is defensible. Taken together, they produce a document that can’t be meaningfully audited because there’s nothing concrete to audit against. One community member in the Amazon thread called this “Compliance by Ambiguity,” the practice of using generic, non-technical language to avoid committing to specific operational parameters. It’s a perfect label for a pattern that shows up everywhere.

This is the compliance version of Goodhart’s Law. When organizations optimize their policy documents for audit survival rather than operational transparency, the documents stop serving any of their original functions. Auditors can’t verify practices against vague commitments. Internal teams can’t use the documents to understand what’s expected of them. Regulators can’t evaluate whether the stated approach actually manages risk. The document becomes theater. And audits are already structurally limited by point-in-time sampling, auditee-selected scope, and the inherent conflict of the auditor working for the entity being audited. Layering ambiguous commitments on top of those limitations removes whatever verification power the process had left.

And it’s accelerating. Financial services firms deal with overlapping requirements from dozens of jurisdictions. Healthcare organizations juggle HIPAA, state privacy laws, and emerging AI governance frameworks simultaneously. Even relatively narrow domains like certificate authority operations have seen requirement growth compound year over year as ballot measures, policy updates, and regional regulations stack on top of each other. The manual approach to compliance documentation was already strained a decade ago. Today it’s breaking.

In PKI alone, governance obligations have grown 52-fold since 2005. The pattern is similar in every regulated domain that has added frameworks faster than it has added capacity to manage them.

Most organizations choose dilution. Not because they’re negligent, but because the alternative barely exists yet. There is no tooling deployed at scale that continuously compares what a policy document says against what the infrastructure actually does. No system that flags when a regulatory update creates a gap between stated practice and new requirements. No automated way to verify that temporal commitments (“within 24 hours,” “no more than 72 hours”) match operational reality. So people do what people do when workload exceeds capacity. They cut corners on the parts that seem least likely to matter this quarter. Policy precision feels like a luxury when you’re scrambling to meet the requirements themselves.

What Vagueness Actually Costs

The short-term calculus makes sense. The long-term cost doesn’t.

I went back and looked at public incidents in the Mozilla CA Program going back to 2018. Across roughly 500 cases, about 70% fall into process and operational failures rather than code-level defects. A large portion trace back to gaps between what an organization actually does and what its documents say it does. The organizations that ultimately lost trust follow a consistent pattern. Documents vague enough to avoid direct contradiction, but too vague to demonstrate that operations stayed within defined parameters. The decay is always gradual. The loss of trust always looks sudden.

The breakdown is telling. Of the four major incident categories, Governance & Compliance failures account for roughly half of all incidents, more than certificate misissuance, revocation failures, and validation errors combined. The primary cause isn’t code bugs or cryptographic weaknesses. It’s administrative oversight. Late audit reports, incomplete analysis, delayed reporting. The stuff that lives in policy documents and process descriptions, not in code.

The distribution looks like this:

This holds outside PKI. The financial institutions that get into the worst trouble with regulators aren’t usually the ones doing something explicitly prohibited. They’re the ones whose internal documentation was too vague to prove they were doing what they claimed. Read the details behind SOX failures, GDPR enforcement actions, and FDA warning letters, and you’ll find the same structural problem. Stated practices didn’t match reality, and nobody caught it because the stated practices were too imprecise to evaluate.

Vagueness also creates operational risk that has nothing to do with regulators. When your own engineering, compliance, and legal teams can’t look at a policy document and know exactly what’s expected, they fill in the gaps with assumptions. Different teams make different assumptions. Practices diverge. The organization thinks it’s operating one way because that’s what the document sort of implies. The reality is something else. And the gap only surfaces when an auditor, a regulator, or an incident forces someone to look closely.

The deeper issue is that vagueness removes auditability as a control surface. When commitments are measurable, deviations surface automatically. A system can check whether a CRL was backdated by more than two hours the same way it checks whether a certificate was issued with the wrong key usage extension. The commitment is binary. It either holds or it doesn’t. When commitments are subjective, deviations become interpretive. “A few hours” can’t be checked by a machine. It can only be argued about by people. That shifts risk detection from systems to negotiation. Negotiation doesn’t scale, produces inconsistent outcomes, and worst of all, it only happens between the auditee and the auditor. The regulators and the public who actually bear the risk aren’t in the room.

Measurable commitments create automatic drift detection. Subjective commitments create negotiated drift.

That spectrum is the diagnostic. Everything to the right of “machine-checkable” is a gap waiting to be exploited by time pressure, turnover, or organizational drift.

What Would Have to Change

Solving this means treating compliance documentation as infrastructure rather than paperwork. In the same way organizations moved from manual deployments to CI/CD pipelines, compliance needs to move from static documents reviewed annually to living systems verified continuously.

The instinct is to throw AI at it, and that instinct is half right. LLMs are good at ingesting unstructured policy documents. But compliance verification isn’t a search problem. It’s a systematic reasoning problem. You need to trace requirements through hierarchies, exceptions, and precedence rules, then compare them against operational evidence. Recent research shows that RAG-based approaches still hallucinate 17-33% of the time on legal and compliance questions, even with domain-specific retrieval. The failure mode isn’t bad prompting. It’s architectural. You cannot train a model to strictly verify “a few hours” any better than you can train an auditor.

The fix isn’t better retrieval. It’s decomposing complex compliance questions into bounded sub-queries against explicit structures that encode regulatory hierarchy and organizational context, keeping the LLM’s role narrow enough that its errors can be isolated and reviewed.

That means tooling that ingests policy documents and maps commitments to regulatory requirements. Systems that flag language failing basic auditability checks, like temporal bounds described with subjective terms instead of defined thresholds. Automated comparison of stated practices against actual system behavior, running continuously rather than at audit time.

In the Amazon case, a system like this would have caught “a few hours” before it was published. Not because backdating is prohibited, but because the description lacks the specificity needed for anyone to verify compliance with it. The system wouldn’t need to understand CRL semantics. It would just need to know that temporal bounds in operational descriptions require defined, measurable thresholds to be auditable.

Scale that across any compliance domain. Every vague commitment is a gap. Every gap is a place where practice can diverge from documentation without detection. Every undetected divergence is risk accumulating quietly until something forces it into the open.

The Amazon incident is useful because it forced the people who oversee trust decisions to say out loud what has been implicit for years. The bar for documentation specificity is rising, and organizations that optimize for minimal disclosure are optimizing for the wrong thing. That message goes well beyond certificate authorities. The ones that keep diluting their commitments will discover that vagueness isn’t a shield. It’s a slow-moving liability that compounds until it becomes an acute one.

The regulatory environment isn’t going to get simpler. The organizations that treat policy precision as optional will discover that ambiguity scales faster than governance, and that systems which cannot be automatically verified will eventually be manually challenged.