Monthly Archives: May 2026

A CA That Produces Evidence, Not Promises

In my last post I argued that high-assurance systems should stop asking to be trusted on the basis of institutional promises and start producing verifiable runtime evidence about what actually happened. This post is the worked example. A certificate authority built that way, what choices it forced, and what is and is not done yet.

When I was at Google I got to work a bit with the BeyondCorp folks. What most didn’t understand is that the BeyondCorp Google used internally was substantially different from the BeyondCorp they launched to customers. On Windows and Linux, the path was TPM-backed device credentials associated with each machine, turning possession of the laptop and an authenticated credential into another factor. The deployment timelines varied by platform, but the important point was the architectural shape hardware-bound device identity became part of the access decision.

You will notice I didn’t mention Macs. That’s because Apple, although it had a similar secure processor on its devices, did not give customers attestations over keys stored inside it. We could put keys in the Secure Enclave. We could not prove to a third party that we had. For years we tried to get Apple to change that. Eventually they did, and what they shipped was not what we asked for.

We didn’t get the ability to put arbitrary keys in the enclave under our control with attestation. We got a device-bound credential signed by Apple that let us verify, at enrollment time, that a request was coming from one of our devices. We used that as a bootstrap to enroll a short-lived credential that the OS stored and used for day-to-day authentication. Apple’s attestation answered the which device question. Our short-lived credential answered everything that came after.

That worked. The standardization piece is draft-ietf-acme-device-attest, an IETF working group document I co-author, which lets the same ACME flow carry an Apple Managed Device Attestation statement, a TPM key attestation, or a YubiKey assertion without the CA needing to special-case each platform. Apple’s adoption was what unlocked normalizing on a single way across the fleet to authenticate these devices.

That normalization is the relying-party side of the credential story. The hard-edge property, this key lives on this specific device, signed by a chain you can verify back to the manufacturer, is now what we expect from a workload, a laptop, a phone, a passkey, a SPIFFE workload identity. MFA was the first compensating control for the weaknesses in passwords and API keys and bearer tokens. Passkeys, SPIFFE, and certificate-based zero-trust segmentation are the structural answer. We replaced the secret-you-know with a key-you-hold and bound it to hardware where we could.

Google’s internal production systems run the same shape, with Titan chips as the foundational substrate for the devices that get on the network for internal and cloud workloads.

That shift is well underway on the side of the wire that uses credentials.

The side that issues them is still mostly software on a server with an API key and a SOC 2 report.

If issuance hasn’t kept up with what we now expect from credential holders, what would catching up actually look like? The previous post made the general argument. The CA should be asked to produce evidence, not to be trusted. This post is what that actually looks like when you build it. A certificate authority where the issuance path itself is the evidence, where the policy that fired is part of what was measured, where the key release is gated on the measurement of the binary asking for it, and where every issued certificate is accompanied by a portable bundle a relying party can verify against trust anchors they already hold.

The architecture is designed to run on both AWS and Google Cloud. On AWS, enclave-based deployments use Nitro Enclave attestations, while VM-level deployments can use NitroTPM-backed evidence for measured boot, instance identity, and workload state. On Google Cloud, Confidential VM deployments use AMD SEV-SNP or Intel TDX attestation for the protected execution environment, with Cloud HSM as the key custodian. Where the system needs VM identity, boot-state evidence, or platform posture outside the confidential-computing attestation itself, it can also use vTPM-based evidence.

That breadth is intentional, and I’ll come back to why. But the real load-bearing architectural choice is not which cloud, HSM, enclave, VM, or attestation primitive we use. It is the shape of the issuance system inside each trust boundary.

Two components, one evidence chain

The split is structural, not procedural.

In a single-binary CA, policy enforcement and signing authority are separated by code paths inside one process. A vulnerability in the policy evaluator is a vulnerability in the signing authority, because they share an address space. The compartments exist in the source code. They do not exist at runtime.

Here, the compartments are real. The architecture splits issuance into two attested components.

The registration authority receives the certificate request, resolves identity from authoritative sources, evaluates the issuance policy against the requester’s posture and attestation, and produces a signed authorization context. The signing oracle holds the path to the CA’s signing key and produces the signature. Each runs in a separate attested environment. Each is measured separately. Each has its own keys. Both are written in Go. Memory-safe in normal code, reproducible without ceremony, and a standard library that already covers most of what a CA needs.

The RA and the oracle are different binaries, with different measurements, with different keys, on different network endpoints, joined by mutually-attested TLS where each side has independently verified the other’s measurement before either will talk. A compromise of the RA does not give the attacker the signing key, because the signing key is not in the RA’s address space and is not reachable from the RA’s network position. A compromise of the oracle does not give the attacker the policy evaluator’s identity providers, because the oracle has no identity providers wired into it. The interface between them is narrow, signed, and replay-protected.

This is the cross-machine version of the kernel-userland boundary. The kernel does not trust userland’s claim that a syscall is authorized. It does the check itself, every time. The oracle does the same thing for the parts of issuance it can verify independently.

What this buys is a bounded blast radius on a compromised RA.

If an attacker takes over the RA, including the RA’s signing key, the damage is bounded only to the extent that the oracle requires independently verifiable evidence for the facts that matter. For profile authorization, key binding, replay protection, RA authorization, and certificate structure, the oracle can check those facts locally.

For domain control validation, the same is true only when the evidence is cryptographic or independently corroborated, such as DNSSEC-validatable DNS evidence or signed observations from independent multi-perspective validators. The current implementation does not yet do either of these things, but adding support for them is a straightforward extension of the model. Without that, however, an RA compromise can still become a validation compromise.

That distinction matters. The oracle does not make an RA trustworthy. It makes the RA’s assertions conditional. Where the RA presents verifiable evidence, the oracle can check it. Where the RA presents only its own statement about what it observed, the oracle can enforce structure, freshness, profile policy, replay protection, and RA authorization, but it cannot turn that statement into ground truth.

The architectural property is not that the oracle magically knows every fact the RA observed. It is structural separation plus independent verification wherever the fact is independently verifiable. Every other property the rest of this post will describe, measured policy, gated key release, per-operation attestation, portable evidence, depends on that boundary.

What I mean by evidence

Before walking through the attestations, it is worth being precise about the word evidence.

A raw measurement is a narrow statement, this binary had this digest, this key was generated by this HSM, this quote was signed by this platform, this public key matches the attested key. Those statements matter because they are the hard edge of the system.

But they are not enough to explain issuance. A certificate is issued because policy evaluated a set of facts and authorized a specific profile for a specific requester at a specific time.

So evidence here means the decision record, the raw attestations where they can be disclosed, the verifier results, the policy digest, the profile binding, the facts the policy evaluated, the signed authorization context, the oracle’s per-operation attestation, the custodian’s key-release evidence, and the transparency proof.

draft-ietf-acme-device-attest helps with the requester-side device and key attestation. The CA-side runtime evidence chain is still built from platform-specific TEE, TPM, HSM, KMS, and transparency-log evidence. The point is not that one standard covers all of it. The point is that the CA should preserve the evidence chain instead of collapsing it into a one-bit pass/fail result or a promise.

What each attestation lets you verify

Cross-checking only matters if the things being checked are concrete. Four attestations cross the issuance flow, each one produced by a different party, each one letting the next party check something specific. It is worth walking through what each one actually lets you verify before going further.

The client’s attestation

What the RA verifies: that the requestor controls the private key whose public half is in the certificate request, that the key lives on a specific piece of hardware, that the hardware has a manufacturer-vouched identity, and that the key was generated under conditions the policy can reason about.

The shape of the attestation changes between platforms. A TPM-bound key on a Windows or Linux machine, a Secure Enclave key on an Apple device, and a key in a YubiKey’s PIV slot each arrive in a different format with a different signing chain and different platform-specific fields. The function is the same. Each one carries a binding between the key in the CSR and a piece of hardware, an identifier for that hardware, the conditions under which the key can be used, and a certificate chain back to a manufacturer the policy can decide whether to trust.

draft-ietf-acme-device-attest is the wire format that carries any of these in the same ACME flow, so the CA doesn’t need to special-case each platform. What the policy sees in the end is the same five things regardless of which platform produced them. Does the requestor control the private key. Is the key on a piece of hardware. Whose hardware. Under what conditions can it be used. Does the manufacturer chain trace back to a root the policy will accept.

The RA’s environment attestation

What anyone holding the document can verify: that the RA ran a specific measured image inside an attested execution environment, that the cloud provider signed off on the measurement, that the workload identity or role is the one the deployment expects, and that the public key bound to that measured environment for the rest of its lifetime is the one in the document.

AWS Nitro, AMD SEV-SNP, and Intel TDX each produce a different document, but the load-bearing contents are the same: a measurement of what was loaded, an identifier for the platform that produced it, the operating context, and a vendor chain back to a root the relying party can verify.

Worth being explicit about what the document does not let you verify. It does not tell you who deployed the measured environment, who runs the cloud account, or what the operator intended to run. It tells you what was measured. The operator’s intent is a separate question, answered by the published image list and the policy that names which measurements are acceptable. A relying party who trusts the hardware vendor’s chain can verify the measurement. They still have to verify, separately, that the measurement is one they should accept.

The oracle’s environment attestation

What it lets the RA verify, during handshake: that the oracle the RA is about to send an authorization context to is running the published image, and that the signing key the oracle will use for its half of the mutual TLS is the one bound to that image at boot.

What it lets a relying party verify, after issuance: the same thing, plus one additional binding. The oracle produces a fresh attestation for every signing operation, and the attestation is bound to the certificate that operation just produced and to the RA on the other end of the conversation that authorized it. Where the RA’s boot-time attestation carries the long-lived public key the measured environment will sign with, the oracle’s per-operation attestation pins this specific certificate to this specific oracle measurement with this specific RA.

The difference between boot-time and per-operation attestation is the difference between “the box looked right when it started” and “the box looked right when it did the thing you actually care about.” Boot-time attestation is what most confidential-computing deployments do today. It tells a relying party the deployment was valid at startup. It tells them nothing about whether the deployment was still valid five hours later when an actual issuance happened. Per-operation attestation closes that gap.

The custodian’s attestation

The CA private key is not loaded by the operator. It is held by a custodian, an HSM with discrete-silicon attestation, a cloud KMS that gates use on attestation, or another measured execution environment. The custody model matters. In the HSM case, the CA key is non-exportable. The signing oracle never receives the private key. It sends a signing request to the HSM, and the HSM signs only when the relevant policy, authorization, and attestation conditions are satisfied. In the software-protected or KMS-protected case, the key, or the key-encryption material needed to use it, may be wrapped so it is usable only inside a signing oracle whose attestation matches a published image.

What the custodian’s evidence lets a relying party verify depends on that model. For an HSM-held key, the evidence is not that the key was released. It is that the non-exportable key was used by a particular custodian, under particular firmware, configuration, policy, and authorization conditions. For a wrapped software or KMS-backed key, the evidence can show that key material was made usable only because the oracle’s measurement matched a measurement on the published list. Either way, if an operator runs a different binary, however benign the reason, the signing path fails. The dragon’s teeth around the data center are still there. They no longer have to do the whole job.

For a concrete look at what one of these documents actually contains, rather than just what it proves, the Peculiar Ventures attestation library parses examples from each of these platforms, including the Marvell HSM attestation produced by Google Cloud HSM. An attestation without a verifier is a claim. With a verifier, it is something a relying party can act on.

Policy as mechanism, not promise

Reading all of those attestations means the policy that evaluated them has to be something concrete enough to read.

Today’s CAs publish a CP/CPS. The Certificate Policy and Certification Practice Statement is a document describing what the CA will and will not do. An auditor samples evidence once a year against the document. The document and the system that produces certificates are not cryptographically linked. The relying party trusts that the document describes the system. The auditor’s annual report is the closing of the loop.

Cedar policies are different. They are a domain-specific language with declarative semantics, written under version control, statically analyzable, and small enough to read in a sitting. The policy that fires inside the signing oracle is compiled into the measured binary. The digest of the policy travels in the evidence bundle that accompanies the certificate. A relying party can re-fetch the source at the named digest, read the rules, and decide for themselves whether the policy that authorized their certificate is a policy they accept.

The contrast that matters operationally is the one between policy-as-promise and policy-as-mechanism. A CP/CPS is a promise. The auditor verifies, by sample, that the practice resembles the promise. A Cedar policy compiled into a measured binary, with its digest in the bundle, is a mechanism. The relying party verifies, per certificate, that the rules that fired are the rules the CA published.

There is a sharp footgun specific to Cedar that is worth naming because the answer to it is part of the architecture. Cedar’s evaluator skips a policy that throws while accessing an attribute that was not present. The convenient result is that policies stay readable in the absence of optional context. The inconvenient result is that a forbid policy with an unguarded attribute access can silently drop, which is a fail-open. The lint at build time requires a has guard before any optional attribute access. A policy that would have failed open is now a build error. The defense-in-depth move is structural. The policy author cannot ship a fail-open by accident.

What this looks like in practice

An employee gets a new smart card from corporate IT in a sealed blister pack. They plug it in.

The enrollment client on the laptop sees a card it doesn’t know. It reads the card’s GlobalPlatform Card Production Lifecycle data and the card recognition data, learns this is a factory-fresh retail token from a manufacturer it has a trust root for, and verifies the card identity attestation back to that root. The card is genuine. It has never been provisioned. The enrollment client knows what kind of token it is looking at and what the policy says to do with one.

The client provisions the token. It generates a new keypair on the card under a policy that requires user PIN for use and marks the key non-exportable. The card produces a key attestation: a statement signed by the card’s manufacturer-installed attestation key asserting that this specific public key was generated on this specific token, has these specific usage constraints, and will never leave the hardware. The enrollment client builds a CSR for the new key and has the card sign it, which is the standard proof that the requestor controls the corresponding private key. It then packages the CSR, the user’s identity claim, and the card’s attestation into an ACME request and sends it to the RA.

That is the first link. The chain is going to have several.

The RA receives the request inside its measured execution environment. It verifies the CSR signature against the public key in the CSR, which proves the requestor controls the corresponding private key. It verifies the card’s attestation against the manufacturer chain. Yes, this is a genuine retail token from a manufacturer we trust. Yes, the key in the CSR is on the card. Yes, the usage constraints match policy. It resolves the identity claim against the corporate identity provider. Yes, this user exists. Yes, they are entitled to this credential type. Yes, their device posture matches. It evaluates the Cedar policy against the cross-product of the device, the identity, and the requested profile. Yes, all of it permits issuance. It builds an authorization context, signs it with the key bound to its measured environment, and sends it to the signing oracle.

The signing oracle receives the context inside its own measured execution environment, over mutually-attested TLS where both sides have verified the other’s measurement before they would talk. It does not simply believe what the RA told it. For facts backed by independently verifiable evidence, the oracle re-verifies that evidence against its own configured verifier. In this example, it re-verifies the card’s attestation, cross-checks the claims the RA made about the card, the requested profile, and the requester, validates the profile binding, the certificate type, the validity window, the structural invariants the profile requires, and confirms that this RA is authorized to ask for this kind of issuance. It rejects replays. Only then does it ask the custodian for the signing key. The custodian releases it because the oracle’s attestation matches the published image. The oracle signs. It produces a fresh per-operation attestation binding this certificate to this measured execution environment and to the RA that authorized it. The issuance is written to a transparency log that independent witnesses cosign.

The bundle that comes back to the enrollment client contains the card’s attestation that the key is on the token, the RA’s signed authorization context naming the identity, the profile, the policy digest, and the verifiers it ran, the oracle’s per-operation attestation binding the signature to a measured binary on a measured platform, the custodian’s evidence that the key was released only because the oracle’s measurement matched, the transparency log inclusion proof and witness cosignatures showing the issuance was published before the certificate was returned, and the certificate itself.

Every party in the flow did the logical equivalent of what every other party did. It verified upstream evidence against a manufacturer, platform, custodian, or witness root it already trusted, did its work, and produced its own evidence for the downstream party to verify. The bundle packages those attestations and proofs so the subscriber, or anyone the subscriber shares it with, can re-walk the chain end to end against the same trust roots, without having to take any single party’s word for it.

The card manufacturer’s root says the key is on the hardware. The chip vendor’s TEE root says the RA ran the measured image. The chip vendor’s TEE root says the oracle ran the measured image. The HSM vendor’s root says the key was released to a measurement that matched. The witness network’s cosignatures say the log is what the operator published, not a fork served to one relying party. A relying party who trusts each of those roots can verify the certificate’s basis of issuance from the evidence bundle, instead of relying only on the CA’s institutional promise.

The CA is not asked only to be trusted. The CA produced evidence.

What is built

The architecture above runs in preproduction today on AWS and Google Cloud.

On AWS, that means Nitro Enclaves for enclave-based issuance components and NitroTPM-backed evidence for VM-level identity, measured boot, and workload posture. On Google Cloud, it means AMD SEV-SNP and Intel TDX Confidential VMs for protected execution, Cloud HSM as custodian, and vTPM-based evidence where VM boot state or workload identity needs to be represented.

Classical ECDSA and post-quantum ML-DSA-65 (FIPS 204) hierarchies operate in parallel. ML-KEM-768 (FIPS 203) is the subject key for TLS key-exchange certificates. Cedar policy with the fail-open lint is enforced in the oracle. Per-operation attestation, evidence bundles, the custodian gating key release on measurement, mutually attested TLS between RA and oracle, end-to-end on both clouds.

Each trust domain signs from its own sub-CA, and classical and post-quantum issuance never share a key. Machine, machine-with-EAP, user, group, workload, smart card, TPM AK, and SSH all sit under separate sub-CAs; classical and PQ are separated within each family. A compromise of any single signing key bounds the damage to one family-and-algorithm slice. The architecture treats hierarchy multiplicity as security-domain separation, not as an algorithm-bridging side effect.

Profiles wired up today cover machine authentication including EAP-TLS, DNS-validated TLS server certificates, workload identity including SPIFFE-style URI identifiers, user and group signing and encryption, smart card and PIV logon, TPM AK bootstrap, and SSH user, workload, and host certificates. Each family that supports both has a classical and a post-quantum variant.

The platform breadth is there because no single TEE family fits every customer environment, and the architecture should not be hostage to one chip vendor or one cloud.

The 2029 problem

None of what I’ve described above is algorithm-bound. That matters because the algorithms are about to change.

In March 2026, Google’s Heather Adkins and Sophie Schmieg set 2029 as the target for completing Google’s migration to post-quantum cryptography. Google’s timeline matters beyond Google. They run Chrome and Android, and when they move, the WebPKI moves with them. CNSA 2.0 puts 2027 on software and firmware signing in National Security Systems and 2030 on general use. The CABF is working through its own timeline. Federal procurement requirements are already moving.

The CA infrastructure that exists today was designed for the snapshot of math problems that the 2029 transition invalidates. Every CA in production is going to be re-architected before it lands. The algorithms expire. The migration is not optional.

The transition itself is going to be heterogeneous. The classical-PQC X.509 path is going to run for a long time alongside what eventually replaces it. Merkle Tree Certificates — batched, transparency-native issuance with much smaller per-certificate overhead — are a likely part of the answer to ML-DSA’s signature size on the wire. The architecture above does not care which container format the certificate is in. The attested issuance pipeline, the custodian-gated key release, the evidence bundle, the transparency log — all of it operates on the issuance side. MTC issuance benefits from runtime evidence the same way X.509 issuance does, and the patterns in this post carry over.

A CA built on the runtime-evidence pattern does not cost more to deploy at the moment you are already rebuilding. It costs more only if you skip the rebuild, and skipping is not on the table. The hardware-anchored credential side of the wire has been arriving in production for a decade. The issuance side is the part still running on the old shape. The PQ deadline is the forcing function that makes the issuance side move. The choice is between rebuilding the old shape with new algorithms, and rebuilding it with the same discipline that the relying-party side has spent the last decade adopting.

Short lifetimes with ARI make the operational side tractable. Seven-day certificates with ARI-driven renewal turn the PQ migration from a flag day into a moving window. The fleet rotates without an emergency, without anyone touching a machine, because the CA can shorten the renewal window for specific machines or profiles whenever it wants to.

That is what the next CA looks like. It is not a different CA than the one I have been describing. It is the same one.

A CA Built for the Threat Model We Actually Have

This builds on earlier posts on what attestation actually proves, what confidential computing is and isn’t, and an honest accounting of the problems with the current generation of TEEs. None of those problems go away here. The argument is that despite those limitations, attestation is an important tool. Certificate issuance is overdue to use it.

Back in the 1990s, I was doing some consulting for DigiNotar, yes, that DigiNotar. They had CA facilities in a data center whose perimeter still had WWII-era anti-tank obstacles, large concrete barriers sometimes called “dragon’s teeth.” Of course, this was an artifact of the facility’s history, but data centers are designed from a security perspective with layers of physical protection, including barriers, mantraps, biometrics, individual vaults with cages, individual racks with their own locks and biometrics, cameras, and more. The threat of physical theft, destruction, or manipulation is exactly what these facilities are designed to mitigate.

When building a CA inside one of these facilities, we design yet another layer of protection. Administration networks are segmented from transaction networks, interconnects from supporting infrastructure, the issuance environment from the systems holding root keys. We add our own physical segmentation on top of that so we can build controls around multiple parties being necessary for the more sensitive operations, while still letting routine hardware maintenance happen on the schedule the SLA needs.

These are all useful and important things, but the reality is that CA key material is not likely to be physically stolen. It is more likely to be compromised from the outside. We solve these problems through design, not by writing more code.

Meaningfully measured code forces design upstream of the code itself.

A measurement of a monolithic blob proves almost nothing useful. A measurement that names a specific role, a specific security domain, and a specific assertion the verifier is supposed to act on, proves something. The roles, the domain boundaries, and the questions the verification has to answer have to exist before any code is written, or the attestation is a signature on nothing in particular.

In operating system design, we have similar problems. In naive systems, we load cryptographic keys into memory on a running, network-connected system, accidentally exposing ourselves to memory-disclosure bugs where a network attacker can steal keys. Heartbleed is the canonical example, but the class is what matters. We do this because it is simpler and faster, but it is also less secure. As systems designers, we address this by moving those keys out of the process of the network-connected application and into a different user context. That way, an attacker cannot simply get the network-connected service to dump memory. They have to get persistence and cross a kernel-enforced user boundary.

This is old wisdom. Least privilege and privilege separation exist because network-facing code should not also be the thing that controls the keys.

A parallel showed up in early cryptocurrency exchanges. Hot wallets were used as signing oracles because the design and deployment work needed to prevent that had not been done, and many of the high-profile compromises of that era trace back to that gap. The exchanges that survived learned to put boundaries between the wallet and the network. That boundary did most of the work. The dragon’s teeth around the building did the rest.

When third parties need to rely on external services operating in these environments, they often rely on auditors to attest that management assertions about operational practices are actually being followed. These assessments are usually performed by CPAs, not security specialists, which can limit their value. They also often rely on sampling a small portion of transactions to confirm that the controls being evaluated are being followed. That sample is drawn from evidence provided by the entity being audited, which is also the party paying for the audit.

All of the things discussed above help bring some minimal level of transparency and verifiability, but it is turtles all the way down, layer on layer, none of them reaching the runtime where the actual compromise happens. This is where confidential computing, and solutions like Private Cloud Compute, start to matter.

Policy changes meaning in this model. In the traditional assurance world, policy is a written promise. The CA publishes a CP or CPS, the operator commits to following it, and the auditor samples evidence to decide whether that promise was kept. In a runtime-evidence model, policy becomes part of the mechanism. A measured binary evaluates a specific policy, produces a decision, and the digest of that policy travels with the evidence. The shift is from policy as promise to policy as enforcement, from “trust us, this is what we do” to “this is the policy the measured system actually applied.”

Apple’s Private Cloud Compute is the worked example. PCC nodes attest to the binary they are running, refuse to do work for clients that cannot verify that attestation, and publish every production build for public inspection. The user’s device, not Apple, decides whether a given node is acceptable. That inversion, the relying party verifying the service rather than the service asserting to the relying party, is the part of the pattern that matters. The pieces are not new individually. The combination, at the scale Apple shipped it, proves the pattern is real. The third-party security reviews prove the architecture is serious. Attacks on confidential computing do not refute that point. They prove there is now a boundary worth attacking, measuring, and improving.

Apple is not the only proof point. Signal used SGX remote attestation for private contact discovery in 2017, with clients verifying that the enclave was running the expected open-source code. WhatsApp’s end-to-end encrypted backups use an HSM-based Backup Key Vault to keep recovery keys out of the ordinary service path, and that design was publicly reviewed by NCC Group. Microsoft’s Confidential Consortium Framework powers Azure Confidential Ledger. Different systems, different threat models, same direction of travel. High-assurance services are moving from institutional assurances toward runtime evidence.

What it looks like

Concretely, a CA built on the Private Cloud Compute pattern looks like this.

Issuance is split into two attested components. The first, the registration authority, takes the certificate request, resolves identity from authoritative sources, evaluates the issuance policy, and produces a signed authorization context. The second, the signing oracle, holds the CA private key and produces the signature. Each runs in a separate attested enclave, and each is measured separately. This means policy can evolve without re-measuring the key-custody component, and keys can rotate without re-measuring the policy component.

The policy layer matters here too. Each component is not just running code, it is making a verifiable policy decision before it acts. The RA decides whether the request is authorized and the identity evidence is sufficient. The signing oracle decides whether the RA, the request, and the authorization context are acceptable before it signs. The evidence does not just say which binary ran. It also says which policy that binary evaluated.

The two components do not trust each other because they are on the same network. They trust each other through attestation, mutually verified at every connection, and the signing oracle does not merely accept the RA’s conclusion. Before it signs, it independently verifies the RA’s attestation, checks that the authorization context is fresh, confirms that the request is bound to an allowed profile, and verifies that the policy facts asserted by the RA match the evidence presented to the oracle. A compromised RA, even one with its own signing key, does not get to mint an out-of-profile certificate, bypass attestation, or turn the CA key into a general-purpose signing oracle.

The CA private key is not loaded by the operator. It is held by a custodian, a hardware security module, a cloud KMS, or another enclave, and it is wrapped so that the custodian will release it only to a signing oracle whose attestation matches a published image. The list of acceptable images is small, public, and updated through a documented process. An operator who runs a different binary, however benign the reason, does not get the key. The dragon’s teeth around the data center are still there. They no longer have to do the whole job.

Both the RA binary and the oracle binary are built from public source and are reproducibly buildable. Anyone can rebuild from the published sources, compare their measurement to the one in the attestation, and confirm that the two match. This is the part of the model that makes trust mean something specific. Not the operator’s word, not the auditor’s snapshot, not the CA’s policy statement, but the build process and the published source. To verify what a particular issuance was actually done by, you would not need to be admitted to the data center. You would need a compiler.

Each issued certificate is accompanied by a portable evidence bundle, signed by the attested issuance system. The bundle names the binary that produced the signature, the attestation root that vouched for the binary, the RA policy decision, the oracle policy decision, the identity assertion the RA accepted, and the inputs the oracle independently verified before signing. A relying party who trusts the chip vendor’s attestation root can determine for themselves whether the issuance was performed by code on the published list, against the policy on the published list, by an RA that accepted the identity claim it claimed to accept. The CA is not asked to be trusted. The CA is asked to produce evidence.

None of this removes the HSM, the auditor, or the operator. The HSM is still excellent at the threat it was built for, and a custodian holding a key wrapped to an attestation policy is still doing HSM work under the hood. The auditor is still needed to attest that the published policy is sensible, that the source matches the binary, that the threat model is honest, and that the runbook is followed in the moments where attestation cannot help. The operator is still needed to run the infrastructure and respond when things break.

What changes is what they are asked to prove.

Today, a relying party mostly gets institutional assurances. The CA says it followed its policy. The auditor samples evidence and says the controls were operating. The operator says the production system was the one described. Those are useful assurances, but they are indirect. They do not let the relying party inspect the actual path between a request, a policy decision, and a signature.

A Private Cloud Compute style CA changes that. It turns the issuance path itself into evidence. The question is no longer only whether the CA says it followed the rules. The question becomes which measured binary evaluated this request, which measured binary signed it, which policy digest was used, which identity evidence was accepted, what validation methods were used during issuance, and whether all of that matches the public commitment the CA made.

When the source is open and reproducibly buildable, that evidence includes a hash of the code that made the decision and signed attestations about the runtime elements that went into that decision. When the code is not open source, third parties can come in and validate the source, the build process, and the correctness of the claims, as Apple did with Private Cloud Compute. The public hashes then let others verify that the code claiming to provide these guarantees is, in fact, the code that ran.

Open source is not magic, and the point is not faith in “many eyes.” The point is that this shifts the emphasis from betting on physical security and operational practice audits to secure system design and cryptographic evidence about what code actually ran and what it actually did.

That is the threat model mismatch, and it is not only a CA problem. We built the WebPKI around buildings, cages, ceremonies, HSMs, and audits because those were the tools we had. We did the same thing cryptographically. We built systems around the assumption that factoring large composites and solving discrete logs on elliptic curves were out of reach. Q-day changes that assumption. Runtime compromise changes the operational assumption just as fundamentally.

We apply the same instincts in any environment we want to call high-assurance. They still matter, but most of the failures we care about are not physical failures. They are logical, remote, operational failures in the runtime path. The rate of change makes that gap wider every year. Annual audits are retrospective, and between them systems change thousands of times, so what the auditor described is rarely what is actually running when a relying party sees a certificate.

Cryptography turns security problems into key-management problems. AI turns assurance problems into runtime-evidence problems. Once agents are making decisions, calling tools, and changing state, the question is no longer what policy you wrote or what control an auditor sampled. The question is what actually ran, what it saw, what boundary contained it, what policy constrained it, and what evidence survived execution.

A Private Cloud Compute style CA gives us a way to make that path visible, attestable, and independently verifiable. The same pattern applies wherever the gap between what we say a system does and what it actually does at runtime matters.

The First AI-Built Zero-Day Is Not the Interesting Part

In the mid 90s I worked at a company called Cybersafe. Today it would get labeled an IAM/SSO vendor. What we actually built was a first-generation security platform: Kerberos, password management, PKI-based MFA, key management, host intrusion detection, and what would now be called zero trust access. The company failed for the usual startup reasons. People. Corporate Politics. Timing. The technology was a decade ahead of its market.

One debate from that period has stayed with me. As we expanded into host intrusion detection, the question of automated response kept surfacing. Could a system safely act on its own to contain an intrusion in progress? Drop a connection. Kill a process. Isolate a host. Nobody on the team could imagine a credible answer. The false positive risk was unbounded. The response itself could be weaponized. The rule sets were not trustworthy enough to delegate authority. We shipped detection and let humans make the call.

That debate has an answer now, and it is not the one we expected. Automation on the offensive side is not new. Worms, exploit kits, credential stuffing, and phishing infrastructure have been automated for decades. What is new is broad delegated judgment at machine speed, in the hands of people who do not have to worry about false positives because the blast radius is somebody else’s network.

What the report actually shows

The interesting question is not whether AI helped produce a zero-day. That was inevitable. The interesting questions are operational. What kinds of systems make bad machine judgment cheap enough to deploy at scale. What kinds of defensive systems are still pretending human review is the control boundary.

Google Threat Intelligence Group’s latest AI Threat Tracker report documents the first zero-day exploit that GTIG says it has high confidence was developed with AI assistance. The headline framing is technically correct. The specifics tell a more interesting story.

The exploit was a Python script that bypassed 2FA on an open-source web-based system administration tool. It required valid user credentials in the first place. The criminal group planned a mass exploitation campaign, and Google disrupted it through responsible disclosure to the vendor. GTIG identified the artifact as AI-developed because the code carried obvious tells. A hallucinated CVSS score. Textbook Python formatting. Detailed help menus. Educational docstrings characteristic of training data. The artifact still carried the seams of its production.

This is not the LLM failing at the hard part. The vulnerability itself is a real find. GTIG specifically notes that the 2FA flaw stems from a hardcoded trust assumption, a high-level semantic logic flaw of the kind that fuzzers and static analyzers tend to miss but that frontier LLMs can reason about by reading developer intent. The model did discovery work that previously required a competent human auditor. Where the operation broke down was in weaponization. The attacker shipped an artifact that still looked like a tutorial.

This is a familiar failure pattern showing up on the offensive side for the first time. Fluency reads as competence. The attacker trusted an artifact with hallucinated metadata and educational comments still attached because it looked like a real exploit, in the same way over-eager engineering teams hand agents production credentials because the agent sounded like it knew what it was doing. The criminals here got bitten by the same dynamic that has been producing outages and data loss in vibe-coded production systems for the last eighteen months. The substrate is doing some of the work of inviting the misconfiguration.

Hultquist’s thread on the report is hedged correctly. The importance is the trajectory, not this specific specimen. Pull the camera back and the rest of the report is more interesting than the lede.

Three things worth surfacing

APT45 sending thousands of repetitive prompts. The North Korean group has been observed using recursive prompting to analyze CVEs and validate proof-of-concept exploits at scale. That is the industrial-scale answer to LLM variance. Solve the quality problem by amortizing across volume, then have humans cherry-pick the outputs that survived validation. The same statistical strategy that makes modern fuzzing work, applied one layer up the stack. The model does not have to be reliable. The pipeline has to be cheap enough that unreliability does not matter.

CANFAIL and LONGSTREAM using LLM-generated decoy code. A Russia-nexus intrusion cluster has been deploying malware that uses LLM-generated code to conceal malicious functionality. GTIG documented LONGSTREAM containing 32 instances of code querying the system’s daylight saving status, repetitive benign-looking activity used to camouflage the malicious core. CANFAIL carries similar filler logic with LLM-generated comments self-describing the decoy blocks. The stylistic noise of LLM output is becoming the obfuscation layer. The verbose docstrings. The textbook structure. The over-explained variable names. These used to be tells. They are now camouflage. Any heuristic built on the AI-tell will start producing false negatives.

The wooyun-legacy skill plugin. A specialized GitHub repository is being distributed as a Claude code skill plugin that integrates a distilled knowledge base of over 85,000 real-world vulnerability cases from the Chinese bug bounty platform WooYun (2010 to 2016). This is the supply side of the same market. Skill packs are tooling. Tooling gets distributed. The economic logic for adversarial skill packs is identical to the economic logic for legitimate ones. Any platform hosting them inherits a familiar problem. App stores and package registries have been working through it for two decades. Making trust decisions at distribution scale about code from parties you cannot directly inspect.

Both sides are running on the same substrate

On the defensive side, Google is using Big Sleep to find vulnerabilities and CodeMender (Gemini-driven) to fix them automatically. The criminals are pulling from a model class indistinguishable from the one Google is running its defensive tooling on. Both sides have access to the same substrate. The differential collapses to data quality, harness sophistication, and discipline around permissions.

That last one is the part the 90s HIDS conversation did not anticipate. It is also the part that should be the least surprising. The controls discipline did not get easier because the platform got more capable. If anything the gradient got worse. A confused regex IDS in 1999 had a bounded action space. The rule set was enumerable. You could write down what it would do wrong. A confused agent in 2026 has whatever action space its credentials grant it, which in most deployments is more than it should. The fluency that made it easy to give the agent broad permissions in the first place is exactly the property that makes its failures look reasonable in the moment.

The race Hultquist refers to is real, and it has started. The race is not about model capability. Both sides are running models from the same vendors, often the same model. The race is about who has better-curated data feeding their harnesses. Who has stricter discipline around what their automation can touch. Who has the institutional memory of what happens when you delegate authority to a system whose judgment you cannot audit in advance.

The HIDS debate from the mid-90s got an answer. It came from the other side of the wire. Not because defenders learned how to trust autonomous judgment, but because attackers learned they did not need to. They could delegate broadly, externalize the blast radius, and let volume compensate for judgment. The defensive answer cannot be more vibes, broader credentials, and better prompts. It has to be the inverse. Narrower authority. Better harnesses. Replayable decisions. And institutional memory about what happens when fluent systems get mistaken for trustworthy ones.

AI Is Not Why They Are Cutting (Yet)

Back in 2000, the rule of thumb at Microsoft was that each employee needed to average roughly $600K in top-line revenue. Inflation adjusted, that is about $1.1M to $1.2M today. Microsoft was a high-margin software monopoly at peak, so it is not a universal benchmark, but it gives a sense of what disciplined operating leverage looked like even at a company printing money.

Over the last decade, and especially during the COVID-era zero-rate and QE environment, many companies responded to dysfunction by hiring around it instead of fixing it. Cheap capital reduced the pressure to make hard operating decisions. Necessity is the mother of invention, but cheap money suppressed that necessity for a long time.

Then two things changed at roughly the same time. Rates went from zero to five, and Section 174 of the tax code stopped letting companies expense software developer salaries in the year incurred. The R&D amortization rule from TCJA kicked in for the 2022 tax year, forcing five-year amortization domestically and fifteen years for work done offshore. At the exact moment capital got expensive, a major software-company cost center became less friendly from a cash-tax and after-tax economics perspective.

Now AI has added a new pressure. Companies are adopting AI quickly, but we are still early. Much of what is happening inside enterprises is still R&D, experimentation, platform buildout, workflow redesign, and internal tooling. That work is not free. It comes with token costs, infrastructure commitments, GPU capacity, vendor contracts, and a lot of expensive trial and error.

Jensen Huang has made the point, in characteristically aggressive form, that if he pays someone $500K, he expects them to use a meaningful amount of compute to become more productive. Whether or not you take the specific numbers literally, and you probably should not since Nvidia sells the machines that consume those tokens, the economic point matters. AI spend has to come from somewhere.

That is the part many layoff narratives miss. Companies are not simply replacing workers with AI. They are also reallocating budget toward AI. Token budgets, model access, inference costs, internal AI platforms, data infrastructure, and R&D commitments are becoming real line items. To fund them, companies are looking at the headcount they accumulated under different interest-rate assumptions, different tax assumptions, and a different view of software demand.

There is also a demand-side story. COVID pulled years of enterprise software adoption into eighteen months, and a lot of what gets reported as growth now is ARR rotating through M&A rather than new logos landing. In parts of the market, revenue is moving around as much as it is expanding.

That is the real backdrop for the wave of layoffs. AI is the story being told on earnings calls. The reality is accumulated management debt finally meeting a cost of capital that punishes it. Layers of process. Unclear ownership. Duplicated work. Headcount that grew faster than execution improved. And now, on top of that, companies need to make room for a new class of AI-related spend.

The pressure also lands hard on old farts like me. We are expensive. And to be honest, some of us (not all) do not want to change how we work or keep up with how the technology is evolving. That makes us easy targets when finance needs to hit a cost number. AI gives the story a forward-looking sheen, but the underlying move is simpler: reduce expensive headcount, flatten layers, correct years of operational laziness, and redirect budget toward the new thing everyone believes they must fund.

AI is real. The layoff narrative around it usually is not. When you read a layoff announcement blaming AI, you are mostly reading a press release about cost of capital, tax policy, demand pull-forward, AI infrastructure spend, and an org chart that finally got too expensive to defend.

Read the 10-Qs, not the blog posts.

Smaller, Provable, and on Hardware You Own and Operate

Dino Dai Zovi made an argument recently that I want to build on.

“If you agree that AI will help attackers discover and exploit vulnerabilities 10-100x more easily, then your excess attack surface has also just become 10-100x more of a liability. The right defensive strategy is to prioritize reducing attack surface and trusted computing bases.”

The argument is right. It is also not new.

We have been working on this problem for fifty years

Operating system designers gave this set of principles a name in 1975. Saltzer and Schroeder published The Protection of Information in Computer Systems and laid out economy of mechanism, least privilege, separation of privilege, complete mediation, fail-safe defaults, and open design. The Orange Book formalized “trusted computing base” a few years later, with the central observation that the security of a system depends on what is inside the TCB, and that smaller TCBs are easier to make trustworthy than stronger ones. The microkernel debate that ran from Mach through L4 was an argument about how aggressively to apply these principles to commodity systems. seL4 went further and produced a formally verified microkernel in 2009, demonstrating that the principles could be pushed all the way to mathematical proof.

The same ideas show up everywhere once you look. Chrome’s site isolation is privilege separation applied to the browser. OpenBSD pledge and unveil are least privilege applied to userland. Linux namespaces, capabilities, and seccomp are mediation primitives. CHERI takes the same intuitions down into the instruction set. GlobalPlatform Security Domains are the smart-card-world version of compartmentalized trust, with separate keysets, separate trust roots, and isolation between issuers, verifiers, and applications on the same chip.

None of this is new vocabulary. Security domains. Privilege separation. Attack surface reduction. Trusted computing bases. We have known the names of these things for decades, and we have known what to do about them.

What AI changes is the math, not the principles. Excess privilege has always been a liability. The probability of it mattering on any given day was low enough, and the timescale on which it mattered was long enough, that organizations could carry oversized TCBs and broad blast radii in the backlog as “things we should clean up someday.” AI compresses the timescale and raises the probability. The slack that was tolerable on a five-year cleanup horizon is not tolerable on a six-month one. Dai Zovi’s 10-100x is a multiplier on the cost of carrying slack, not a discovery about whether slack should be carried.

The OS tradition assumed you owned the layer below the boundary

There is one place where the classical OS framework needs an extension before it covers the world we are actually deploying into.

The kernel could enforce process isolation because the kernel was below the processes. The hypervisor could enforce VM isolation because the hypervisor was below the VMs. The trust property was “I control the layer below the boundary, so the boundary is meaningful to me.” Every classical OS-level guarantee depends on that.

Cloud broke that assumption. AI workloads, which run on cloud GPUs and orchestration infrastructure that almost nobody owns, intensify the break. The layer below your workload is operated by someone else. Their hypervisor, their firmware, their physical facility, their scheduling. The classical principles still apply, but their enforcement mechanism is gone.

Reduction is necessary. Reduction is not sufficient. Once you have shrunk the attack surface and the TCB to something defensible, you still have to prove that the small thing you reduced to is the small thing actually running, and that what it just did is what you said it would do. Without that proof, the small thing is functionally indistinguishable from the large thing. An attacker who replaces your tiny attested signing service with a tiny lookalike has bought themselves all the same access at a lower cost.

The defensive posture in an AI-leverage world is not just smaller. It is smaller and provable.

Law #3 did not go away

There is also one law older than the OS-design principles that the cloud security pitch of the last decade has spent a lot of energy pretending to repeal.

Microsoft’s Ten Immutable Laws of Security were published by Scott Culp in 2000. Law #3 is the relevant one here. If a bad actor has unrestricted physical access to your computer, it’s not your computer anymore. The marketing for confidential computing has, in effect, been an extended argument that hardware-encrypted memory and remote attestation make Law #3 obsolete on cloud infrastructure. They do not, and the research record is clear that they will not.

Cloud TEEs share microarchitectural resources with the hypervisor and with co-tenants. That is what produces the side-channel catalog. Cloud providers have physical access to every server they operate. That is what produced TEE.Fail. Hardware roots of trust have a shelf life because they live on the same silicon as everything else, and that silicon is in the operator’s possession. None of these properties are bugs. They are what “running on hardware somebody else owns” means.

Server-side cloud TEEs are useful for narrow, bounded properties. They are not useful for repealing Law #3 against a determined operator, and they will not meaningfully defeat multi-tenant side channels at the scale at which they are deployed. Selling them as if they would is what produces the gap between marketing and engineering that I have been writing about for the last year in Confidential Computing’s Inconvenient Truth, What Is Confidential Computing, What It Isn’t, and How to Think About It, and TPMs, TEEs, and Everything In Between.

The criticism in those pieces is specific. It is about the gap between what cloud TEEs are sold as doing (defeating the operator) and what they actually do (making narrow verifiable claims to relying parties about specific operations). The criticism is not that the underlying assurance technology is useless. The technology delivers exactly what it was originally designed to deliver, in the contexts where the original threat model holds. The marketing has been run over those contexts.

Where the assurance property actually delivers

The assurance property does deliver, where the model fits. The model fits when the hardware is in the user’s possession, when the device is discrete and tamper-resistant, and when attestation is used to prove “the key in this request lives on this specific device and has never left it” rather than to prove “the operator of the rack cannot read your memory.” That is the threat model the technology was designed for, and it has been working in production for a long time.

A few examples of the pattern done honestly.

YubiKey PIV attestation. The YubiKey can produce an attestation certificate, signed by Yubico’s manufacturer key, asserting that a private key was generated on this YubiKey, has the slot and policy attributes you expect, and is non-exportable. Yubico documents the protocol clearly. The trust property is sharp because the device is sharp. Discrete silicon, tamper-resistant package, manufacturer chain you can pin against. Law #3 still applies, and it cuts the right way: the user has unrestricted physical access to the YubiKey, and the YubiKey is the user’s computer.

Apple Secure Enclave for SSH agents. Paprika and Secretive are SSH agents that store the private key in the Mac’s Secure Enclave Processor. The application processor never sees the key, and even root on the Mac cannot extract the key material. Root can still cause the key to be used through the legitimate signing API, modulo whatever consent prompts apply, but extraction itself is what the SEP boundary is built to defeat. The user owns the laptop, the key is on a physically separated processor on the same SoC, and the threat model (other applications on the same device, or malware that compromises the application processor) matches what the SEP was built for.

Smart cards and HSMs. GlobalPlatform Security Domains, the Yubico PIV applet, hardware-backed PKCS#11 tokens, FIPS 140-3 Level 3 modules. Discrete silicon, tamper-resistant packaging, attestation chains rooted in manufacturer keys. The model that worked in the late 1990s and that still works today, because the threat model has not drifted.

PeculiarVentures/attestation is the verification side of all of this. Parsing, validating, and reasoning about attestation evidence from these various sources. Attestation without a verifier is a claim. Attestation with a verifier is something the relying party can act on.

The common shape across all of these is that the user owns the hardware, the boundary is physical, and the attestation chain anchors in a manufacturer key whose threat model the user can actually evaluate. Law #3 is honored rather than denied.

Transparency is the other cross-machine extension

There is a second extension of the classical OS-design tradition that matters for the AI-leverage world, and that composes with attestation in important ways.

Saltzer and Schroeder’s open design principle says the security of a system should not depend on the secrecy of its mechanism. The cryptography community has applied this rule to algorithms for decades. The systems community has been slower to apply it to operations. What is the rack actually doing right now? and what has it done in the past? are operational questions, and historically the answer was “trust the operator’s audit logs.”

Transparency logs are the operational extension of open design. The idea is to publish what a system is doing to an append-only public log, with cryptographic proofs that the log cannot be retroactively modified, and to design the relying party to require evidence from the log before trusting any operation. Multiple independent witnesses cosign the log so that no single party can serve different views of reality to different relying parties.

The pattern is in production at scale. Certificate Transparency requires every WebPKI certificate to be logged publicly before browsers will trust it, which converts CA misissuance from “discovered by accident, sometimes” into “discovered by anyone watching the log.” Sigstore applies the same model to software signing, with every signature published to Rekor and consumers able to require log inclusion before accepting a binary. Google DeepMind’s Verifiable Data Audit was an early attempt to apply the same model to data access in healthcare. The infrastructure is consolidating at transparency.dev, and C2SP standardizes the interoperability primitives: tlog-tiles, the witness and cosignature protocols, signed-note, and static-ct-api.

Attestation tells a relying party “this code is running right now.” Transparency tells a relying party “this code has been published, reproduced, and witnessed by parties whose collusion would be visible.” The two compose. Apple’s Private Cloud Compute is the most prominent recent example. Every production build is published to a transparency log, user devices will only communicate with nodes whose attested measurement matches the log, and Apple released a virtual research environment so anyone can verify the build claims independently. Google’s Project Oak was an earlier expression of the same combination, building remote attestation against publicly-published binaries as the foundation of trust. The Merkle Tree Certificates draft, now a working group document in the IETF’s new PLANTS working group, extends the same logic to TLS at scale, replacing traditional X.509 issuance with batched, transparency-native cert formats designed for the shorter lifetimes the WebPKI is moving toward.

The relevant property for the AI conversation is that transparency reduces the number of parties you have to trust to one less than would otherwise be required. With attestation alone, you trust the manufacturer of the silicon. With transparency, you trust any of the witnesses to be honest, plus the manufacturer of the silicon. That asymmetry is what makes transparency the right tool for environments where the operator might be the adversary.

What this leaves for server-side TEEs

Bounded usefulness, designed honestly.

Server-side cloud TEEs do not defeat the operator. They produce narrow verifiable claims that a relying party can check against their own trust anchors. This signing service ran this image at this measurement. This certificate was produced by this enclave for this RA. This policy was applied. This key was attested as non-exportable by the HSM that signed. Each of those is a useful property. None of them is “the operator cannot see your data.” Building an architecture that pretends otherwise is how organizations end up with a single point of failure they did not know they had.

I have been building GoodKey CA as a worked example of the bounded-usefulness pattern. A certificate authority is a useful test case for this kind of architecture, because the trust property is sharp and the threat model is well understood. The shape of the answer is mostly classical OS design pulled across machine boundaries, with hardware-anchored trust at the endpoints and a deliberately bounded intermediary in the middle.

Each enclave is a security domain. RA, CA, and HSM are independent compartments. Each has its own measured image, its own keys, and its own attested boundary. Compromising one does not compromise the others. Privilege is separated by design rather than by policy.

The TCB inside each domain is small enough to characterize. Each enclave runs a single-purpose deterministic image. The measurement is one number. The image is reproducible from source. There is no general-purpose runtime to subvert and no orchestration sidecar to gain a foothold from. AWS Nitro Enclaves were the deliberate choice over SGX or TDX. The architecture uses VM-level isolation with dedicated CPU and memory rather than carving enclaves out of shared-cache, shared-core silicon, which reduces a large class of the microarchitectural side-channel exposure that the SGX and TDX families have to grapple with. Dedicated resources, minimal hypervisor, deterministic measurement.

Mediation is complete and inside the boundary. Every signing operation goes through the policy evaluator (Cedar) inside the enclave. Authorization is part of what is attested, not external to it. A compromised RA cannot lie about what policy was applied, because the policy evaluation was inside the measurement.

Trust is not transitive. When the RA tells the CA that a client attestation passed, the CA does not believe it. The CA re-runs the verification itself, against its own registered verifier, before signing anything. This is the cross-machine version of “the kernel does not trust userland’s claim that a syscall is authorized.” The CA does the check itself, every time.

Per-operation attestation, not per-boot attestation. The CA produces a fresh Nitro attestation for every certificate it signs, with user_data set to SHA-256(certDER || raKeyFingerprint). That binds this specific certificate to this specific enclave with this specific RA on the other end of the conversation. A boot-time attestation tells you the box looked right when it started. A per-operation attestation tells you the box looked right when it did the thing you actually care about.

Hardware-anchored trust at the endpoints. The signing keys themselves live in a hardware HSM with discrete-silicon attestation rooted in the Marvell manufacturer chain. The clients prove they hold hardware-protected keys via TPM or device attestation. The Nitro layer in the middle does not have to defeat AWS to be useful, because the actual key material is protected by a different boundary that AWS does not own, and the evidence on the wire is anchored in trust roots the relying party already trusts.

Operations published to a transparency log. The CA’s attested measurements, policy versions, and issuance records get logged to an append-only structure with multi-witness cosigning. The operator still chooses what to submit. What the operator does not get is the ability to retract entries after the fact, modify history, or serve a different version of the log to a different relying party without those parties detecting the divergence. A relying party’s confidence that the system has been running honestly over time stops being a function of trust in the operator’s audit logs and starts being a function of properties that hold against the operator. This is the same shape Certificate Transparency gives the WebPKI, applied to the CA’s own operational claims about itself.

Failure modes are bounded by design. Certificates are seven days. ACME Renewal Information lets the CA shorten renewal windows targeted at specific machines or specific profiles, and goodenroll polls for those signals on its own schedule. The fleet rotates without an emergency window and without anyone touching a machine. The exposure window becomes a configuration choice rather than a function of certificate lifetime, and revocation infrastructure stays out of the critical path of the threat model.

Post-quantum where it counts. ML-DSA-65 (FIPS 204) for certificate signing, ML-KEM-768 (FIPS 203) as the subject key for TLS key-exchange certificates. ARI is what makes the migration tractable on the deployed fleet, because you do not have to wait for natural expiry to do the work.

Nitro is a bounded-trust intermediary. AWS still owns the silicon it runs on. What the architecture buys you is that the property the relying party has to verify is narrow and concrete, and that the actual long-lived secrets are protected by hardware that AWS does not own. Against an AWS-internal threat with full physical access and unbounded effort, Law #3 still applies. Against the attacks the architecture is actually defending against (software compromise of the CA pipeline, a rogue admin pulling secrets through the management plane, a tampered build reaching production), the bounded property is exactly the property you need.

The substrate

An architecture like this only works if the underlying primitives are right. Three pieces of infrastructure I have been spending time on are upstream of GoodKey CA.

PeculiarVentures/scp is GlobalPlatform Security Domain key management in Go. The name is not a coincidence. Smart cards and HSMs have been doing security domains in hardware for two decades, with separate keysets, separate trust roots, and isolation between issuer, verifier, and application code on the same chip. The library implements SCP03 and SCP11 and a typed Security Domain management layer for key lifecycle, certificate provisioning, and trust validation, against verified profiles with byte-exact validation against independent reference implementations. This is the unglamorous work of “make sure the keys you are putting on hardware are actually being put on hardware in the way you think they are.” If the key on the device is not where you think it is, every downstream signature is asserting something false.

draft-ietf-acme-device-attest, which I am a co-author on, is the cross-machine extension on the client side. It standardizes how a device proves to an ACME server that the key in a certificate request lives in attested hardware on a specific device. The recent revisions resolved several interoperability gaps that had blocked broad implementation, including the Apple-specific attToBeSigned semantics around sha256(token) versus sha256(keyAuth), an explicit identifier-verification step, the badAttestationStatement error type, and a hardware-module identifier type. The point of the work is to make the client side of the trust chain as verifiable as the CA side. An attested signing service that issues credentials to anyone who asks is not solving the problem, it is moving it.

PeculiarVentures/attestation closes the loop. It is the verifier side that consumes attestation evidence from these various sources (TPMs, YubiKeys, Apple devices, Nitro Enclaves) and reduces it to claims a relying party can act on. Without a verifier, attestation is marketing. With a verifier, it is engineering.

These are not separate efforts. They are what makes hardware-anchored cross-machine trust mean anything in the wild. The transparency-log side of the same problem is being standardized in parallel through transparency.dev, C2SP, and the Merkle Tree Certificates draft, which together extend the same model to issuance auditability at WebPKI scale.

What this asks builders to do

The Dai Zovi prescription is operating-systems hygiene applied to the whole stack. The verifiability corollary is the same hygiene extended across machines you do not own. Both are old. AI is what is making them mandatory.

Pick small. Compartmentalize. Strip privilege to what each component genuinely needs. Make each component’s TCB small enough that one person can characterize it in a sitting. Single-purpose services, deterministic builds, dedicated resources rather than shared microarchitectural state, single-image enclaves rather than orchestrated runtimes.

Make it provable across machines. Per-operation attestation rather than per-boot. Independent re-verification at every hop, not transitive trust. Authorization decisions inside the attested boundary. Evidence bundles the relying party can run a verifier against, with their own trust anchors. Short lifetimes with active rotation rather than long-lived credentials backstopped by revocation. And publish the operations themselves to a transparency log with independent witnesses, so the proofs survive disagreement about who saw what when, and so a single dishonest operator cannot serve different versions of reality to different relying parties.

Anchor trust in hardware whose threat model you can actually evaluate. Where you can put the long-lived secret on hardware the user owns, do that. YubiKey, Apple Secure Enclave, TPM in the laptop on the engineer’s desk, smart card in the operator’s pocket. Where you cannot, use a cloud TEE as a bounded-trust intermediary that produces narrow verifiable claims, and design the architecture so the long-lived material lives in a different boundary that the cloud operator does not own.

And know what your assurance is buying you. Cloud TEEs are not how you defeat the operator. They are how you make narrow operations verifiable to relying parties while accepting that absolute properties against the operator are not on offer. The places where attestation delivers what it advertises are the places where the user owns the silicon. Law #3 has not been repealed, and AI has only raised the cost of pretending otherwise.

Smaller is the easy half. Provable is most of the engineering. On hardware you own is where the property actually holds.