A CA That Produces Evidence, Not Promises

In my last post I argued that high-assurance systems should stop asking to be trusted on the basis of institutional promises and start producing verifiable runtime evidence about what actually happened. This post is the worked example. A certificate authority built that way, what choices it forced, and what is and is not done yet.

When I was at Google I got to work a bit with the BeyondCorp folks. What most didn’t understand is that the BeyondCorp Google used internally was substantially different from the BeyondCorp they launched to customers. Internally, TPMs on Windows and Linux machines were used to create device credentials associated with each machine that were hardware bound, turning possession of the laptop and an authenticated credential into another factor.

You will notice I didn’t mention Macs. That’s because Apple, although it had a similar secure processor on its devices, did not give customers attestations over keys stored inside it. We could put keys in the Secure Enclave. We could not prove to a third party that we had. For years we tried to get Apple to change that. Eventually they did, and what they shipped was not what we asked for.

We didn’t get the ability to put arbitrary keys in the enclave under our control with attestation. We got a device-bound credential signed by Apple that let us verify, at enrollment time, that a request was coming from one of our devices. We used that as a bootstrap to enroll a short-lived credential that the OS stored and used for day-to-day authentication. Apple’s attestation answered the which device question. Our short-lived credential answered everything that came after.

That worked. The standardization piece is draft-ietf-acme-device-attest, an IETF working group document I co-author, which lets the same ACME flow carry an Apple Managed Device Attestation statement, a TPM key attestation, or a YubiKey assertion without the CA needing to special-case each platform. Apple’s adoption was what unlocked normalizing on a single way across the fleet to authenticate these devices.

That normalization is the relying-party side of the credential story. The hard-edge property, this key lives on this specific device, signed by a chain you can verify back to the manufacturer, is now what we expect from a workload, a laptop, a phone, a passkey, a SPIFFE workload identity. MFA was the first compensating control for the weaknesses in passwords and API keys and bearer tokens. Passkeys, SPIFFE, and certificate-based zero-trust segmentation are the structural answer. We replaced the secret-you-know with a key-you-hold and bound it to hardware where we could.

Google’s internal production systems run the same shape, with Titan chips as the foundational substrate for the devices that get on the network for internal and cloud workloads.

That shift is well underway on the side of the wire that uses credentials.

The side that issues them is still mostly software on a server with an API key and a SOC 2 report.

If issuance hasn’t kept up with what we now expect from credential holders, what would catching up actually look like? The previous post made the general argument. The CA should be asked to produce evidence, not to be trusted. This post is what that actually looks like when you build it. A certificate authority where the issuance path itself is the evidence, where the policy that fired is part of what was measured, where the key release is gated on the measurement of the binary asking for it, and where every issued certificate is accompanied by a portable bundle a relying party can verify against trust anchors they already hold.

The architecture runs on both AWS and Google Cloud. On AWS that means Nitro Enclaves with NitroTPM-backed workload attestation. On Google Cloud it means AMD SEV-SNP and Intel TDX Confidential VMs with Cloud HSM as custodian and TPM-based workload attestation. There’s a reason for that breadth, and I’ll get to it. But the deeper architectural choice, the one that does the load-bearing work, is the shape of the system inside any one of those platforms.

Two components, one trust boundary

The split is structural, not procedural.

In a single-binary CA, policy enforcement and signing authority are separated by code paths inside one process. A vulnerability in the policy evaluator is a vulnerability in the signing authority, because they share an address space. The compartments exist in the source code. They do not exist at runtime.

Here, the compartments are real. The architecture splits issuance into two attested components.

The registration authority receives the certificate request, resolves identity from authoritative sources, evaluates the issuance policy against the requester’s posture and attestation, and produces a signed authorization context. The signing oracle holds the path to the CA’s signing key and produces the signature. Each runs in a separate attested environment. Each is measured separately. Each has its own keys. Both are written in Go. Memory-safe in normal code, reproducible without ceremony, and a standard library that already covers most of what a CA needs.

The RA and the oracle are different binaries, with different measurements, with different keys, on different network endpoints, joined by mutually-attested TLS where each side has independently verified the other’s measurement before either will talk. A compromise of the RA does not give the attacker the signing key, because the signing key is not in the RA’s address space and is not reachable from the RA’s network position. A compromise of the oracle does not give the attacker the policy evaluator’s identity providers, because the oracle has no identity providers wired into it. The interface between them is narrow, signed, and replay-protected.

This is the cross-machine version of the kernel-userland boundary. The kernel does not trust userland’s claim that a syscall is authorized. It does the check itself, every time. The oracle does the same thing for issuance.

When the RA hands the oracle a signed authorization context, the oracle does not believe it. The oracle re-runs the verification independently. It re-evaluates the client attestation against its own configured verifier. It cross-checks every policy fact the RA claimed against what its own verifier actually observed. It validates the certificate profile binding, the certificate type, the validity window, the structural invariants the profile requires, and that the RA on the other end of the conversation is one of the RAs authorized to request this profile. It rejects replays. None of this trusts what the RA said.

What this buys is a bounded blast radius on a compromised RA.

If an attacker takes over the RA, including the RA’s signing key, the worst they can do is issue a certificate that the oracle would have signed anyway. They cannot get the oracle to sign a CA certificate, because the oracle’s profile registry doesn’t contain one for them. They cannot get a leaf outside the measured profile registry, because the oracle won’t recognize the profile. They cannot get the oracle to issue under a profile this RA was not authorized for, because the oracle scopes RAs to profiles in its own configuration and does not take the RA’s word for what it is allowed to request. They cannot get a certificate for a subject the RA’s identity providers wouldn’t have resolved, because the oracle re-evaluates the identity claim against its own verifier. They cannot replay an old authorization, because the oracle tracks the freshness of every authorization context it accepts.

The architectural property that makes that statement true is structural separation with independent re-verification. Every other property the rest of this post will describe, measured policy, gated key release, per-operation attestation, portable evidence, depends on this one.

What each attestation lets you verify

Cross-checking only matters if the things being checked are concrete. Four attestations cross the issuance flow, each one produced by a different party, each one letting the next party check something specific. It is worth walking through what each one actually lets you verify before going further.

The client’s attestation

What the RA verifies: that the requestor controls the private key whose public half is in the certificate request, that the key lives on a specific piece of hardware, that the hardware has a manufacturer-vouched identity, and that the key was generated under conditions the policy can reason about.

The shape of the attestation changes between platforms. A TPM-bound key on a Windows or Linux machine, a Secure Enclave key on an Apple device, and a key in a YubiKey’s PIV slot each arrive in a different format with a different signing chain and different platform-specific fields. The function is the same. Each one carries a binding between the key in the CSR and a piece of hardware, an identifier for that hardware, the conditions under which the key can be used, and a certificate chain back to a manufacturer the policy can decide whether to trust.

draft-ietf-acme-device-attest is the wire format that carries any of these in the same ACME flow, so the CA doesn’t need to special-case each platform. What the policy sees in the end is the same five things regardless of which platform produced them. Does the requestor control the private key. Is the key on a piece of hardware. Whose hardware. Under what conditions can it be used. Does the manufacturer chain trace back to a root the policy will accept.

The RA’s environment attestation

What anyone holding the document can verify: that the RA’s enclave ran a specific measured image, that the cloud provider signed off on the measurement, that the role the enclave runs under is the one the deployment expects, and that the public key the enclave will sign with for the rest of its lifetime is the one in the document.

The shape changes between TEE platforms. AWS Nitro, AMD SEV-SNP, and Intel TDX each produce a document with different fields signed by a different chain. The function is the same. Each one carries a measurement of what was loaded into the enclave, an identifier for the platform that produced the measurement, the operating context the enclave runs in, and a certificate chain back to a hardware vendor the relying party can decide whether to trust.

Worth being explicit about what the document does not let you verify. It does not tell you who deployed the enclave, who runs the cloud account, or what the operator intended to run. It tells you what was measured. The operator’s intent is a separate question, answered by the published image list and the policy that names which measurements are acceptable. A relying party who trusts the hardware vendor’s chain can verify the measurement. They still have to verify, separately, that the measurement is one they should accept.

The oracle’s environment attestation

What it lets the RA verify, during handshake: that the oracle the RA is about to send an authorization context to is running the published image, and that the signing key the oracle will use for its half of the mutual TLS is the one bound to that image at boot.

What it lets a relying party verify, after issuance: the same thing, plus one additional binding. The oracle produces a fresh attestation for every signing operation, and the attestation is bound to the certificate that operation just produced and to the RA on the other end of the conversation that authorized it. Where the RA’s boot-time attestation carries the long-lived public key the enclave will sign with, the oracle’s per-operation attestation pins this specific certificate to this specific oracle measurement with this specific RA.

The difference between boot-time and per-operation attestation is the difference between “the box looked right when it started” and “the box looked right when it did the thing you actually care about.” Boot-time attestation is what most confidential-computing deployments do today. It tells a relying party the deployment was valid at startup. It tells them nothing about whether the deployment was still valid five hours later when an actual issuance happened. Per-operation attestation closes that gap.

The custodian’s attestation

The CA private key is not loaded by the operator. It is held by a custodian, an HSM with discrete-silicon attestation, a cloud KMS that gates on attestation, or another enclave. The key is wrapped so the custodian will release it only to a signing oracle whose attestation matches a published image. The list of acceptable images is small and public.

What the custodian’s evidence lets a relying party verify: that the wrapped key was released only because the oracle’s measurement matched a measurement on the published list, signed by the custodian’s hardware root. If an operator runs a different binary, however benign the reason, the key does not get released. The dragon’s teeth around the data center are still there. They no longer have to do the whole job.

For a concrete look at what one of these documents actually contains, rather than just what it proves, the PeculiarVentures attestation library parses examples from each of these platforms, including the Marvell HSM attestation produced by Google Cloud HSM. An attestation without a verifier is a claim. With a verifier, it is something a relying party can act on.

Policy as mechanism, not promise

Reading all of those attestations means the policy that evaluated them has to be something concrete enough to read.

Today’s CAs publish a CP/CPS. The Certificate Policy and Certification Practice Statement is a document describing what the CA will and will not do. An auditor samples evidence once a year against the document. The document and the system that produces certificates are not cryptographically linked. The relying party trusts that the document describes the system. The auditor’s annual report is the closing of the loop.

Cedar policies are different. They are a domain-specific language with declarative semantics, written under version control, statically analyzable, and small enough to read in a sitting. The policy that fires inside the signing oracle is compiled into the measured binary. The digest of the policy travels in the evidence bundle that accompanies the certificate. A relying party can re-fetch the source at the named digest, read the rules, and decide for themselves whether the policy that authorized their certificate is a policy they accept.

The contrast that matters operationally is the one between policy-as-promise and policy-as-mechanism. A CP/CPS is a promise. The auditor verifies, by sample, that the practice resembles the promise. A Cedar policy compiled into a measured binary, with its digest in the bundle, is a mechanism. The relying party verifies, per certificate, that the rules that fired are the rules the CA published.

There is a sharp footgun specific to Cedar that is worth naming because the answer to it is part of the architecture. Cedar’s evaluator skips a policy that throws while accessing an attribute that was not present. The convenient result is that policies stay readable in the absence of optional context. The inconvenient result is that a forbid policy with an unguarded attribute access can silently drop, which is a fail-open. The lint at build time requires a has guard before any optional attribute access. A policy that would have failed open is now a build error. The defense-in-depth move is structural. The policy author cannot ship a fail-open by accident.

What this looks like in practice

An employee gets a new smart card from corporate IT in a sealed blister pack. They plug it in.

The enrollment client on the laptop sees a card it doesn’t know. It reads the card’s GlobalPlatform Card Production Lifecycle data and the card recognition data, learns this is a factory-fresh retail token from a manufacturer it has a trust root for, and verifies the card identity attestation back to that root. The card is genuine. It has never been provisioned. The enrollment client knows what kind of token it is looking at and what the policy says to do with one.

The client provisions the token. It generates a new keypair on the card under a policy that requires user PIN for use and marks the key non-exportable. The card produces a key attestation: a statement signed by the card’s manufacturer-installed attestation key asserting that this specific public key was generated on this specific token, has these specific usage constraints, and will never leave the hardware. The enrollment client builds a CSR for the new key and has the card sign it, which is the standard proof that the requestor controls the corresponding private key. It then packages the CSR, the user’s identity claim, and the card’s attestation into an ACME request and sends it to the RA.

That is the first link. The chain is going to have several.

The RA receives the request inside its measured enclave. It verifies the CSR signature against the public key in the CSR, which proves the requestor controls the corresponding private key. It verifies the card’s attestation against the manufacturer chain. Yes, this is a genuine retail token from a manufacturer we trust. Yes, the key in the CSR is on the card. Yes, the usage constraints match policy. It resolves the identity claim against the corporate identity provider. Yes, this user exists. Yes, they are entitled to this credential type. Yes, their device posture matches. It evaluates the Cedar policy against the cross-product of the device, the identity, and the requested profile. Yes, all of it permits issuance. It builds an authorization context, signs it with its enclave key, and sends it to the signing oracle.

The signing oracle receives the context inside its own measured enclave, over mutually-attested TLS where both sides have verified the other’s measurement before they would talk. It does not believe what the RA told it. It re-verifies the card’s attestation against its own configured verifier. It cross-checks every fact the RA claimed against what its own verification observed. It validates the profile binding, the certificate type, the validity window, the structural invariants the profile requires, and that this RA is authorized to ask for this kind of issuance. It rejects replays. Only then does it ask the custodian for the signing key. The custodian releases it because the oracle’s attestation matches the published image. The oracle signs. It produces a fresh per-operation attestation binding this certificate, to this enclave, talking to this RA. The issuance is written to a transparency log that independent witnesses cosign.

The bundle that comes back to the enrollment client contains the card’s attestation that the key is on the token, the RA’s signed authorization context naming the identity and the profile and the policy digest and what verifiers it ran, the oracle’s per-operation attestation binding the signature to a measured binary on a measured platform, the custodian’s evidence that the key was released only because the oracle’s measurement matched, the transparency log inclusion proof and witness cosignatures showing the issuance was published before the certificate was returned, and the certificate itself.

Every party in the flow did the logical equivalent of what every other party did. Verified the upstream attestation against a manufacturer or vendor root they already trusted, did their work, and produced their own attestation for the downstream party to verify. The bundle is every one of those attestations, packaged so the subscriber, or anyone the subscriber shares it with, can re-walk the chain end to end against the same trust roots, without having to take any single party’s word for it.

The card manufacturer’s root says the key is on the hardware. The chip vendor’s TEE root says the RA ran the measured image. The chip vendor’s TEE root says the oracle ran the measured image. The HSM vendor’s root says the key was released to a measurement that matched. The witness network’s cosignatures say the log is what the operator published, not a fork served to one relying party. A relying party who trusts each of those roots, and they are the roots they already trust for every credential their devices hold today, can verify the certificate’s basis of issuance without trusting the CA.

The CA is not asked to be trusted. The CA produced evidence.

What is built

The architecture above runs in preproduction today on AWS and Google Cloud. On AWS that means Nitro Enclaves and NitroTPM. On Google Cloud it means AMD SEV-SNP and Intel TDX Confidential VMs, Cloud HSM as custodian, and TPM-based workload attestation. Classical ECDSA and post-quantum ML-DSA-65 (FIPS 204) hierarchies operate in parallel. ML-KEM-768 (FIPS 203) is the subject key for TLS key-exchange certificates. Cedar policy with the fail-open lint is enforced in the oracle. Per-operation attestation, evidence bundles, the custodian gating key release on measurement, mutually attested TLS between RA and oracle, end-to-end on both clouds.

The breadth is there because no single TEE family fits every customer environment, and the architecture should not be hostage to one chip vendor or one cloud.

The 2029 problem

The math snapshot has a date.

In March 2026, Google’s Heather Adkins and Sophie Schmieg set 2029 as the target for completing Google’s migration to post-quantum cryptography. Google’s timeline matters beyond Google. They run Chrome and Android, and when they move, the WebPKI moves with them. CNSA 2.0 puts 2027 on software and firmware signing in National Security Systems and 2030 on general use. The CABF is working through its own timeline. Federal procurement requirements are already moving.

The CA infrastructure that exists today was designed for the snapshot of math problems that the 2029 transition invalidates. Every CA in production is going to be re-architected before it lands. The algorithms expire. The migration is not optional.

The transition itself is going to be heterogeneous. The classical-PQC X.509 path is going to run for a long time alongside what eventually replaces it. Merkle Tree Certificates — batched, transparency-native issuance with much smaller per-certificate overhead — are a likely part of the answer to ML-DSA’s signature size on the wire. The architecture above does not care which container format the certificate is in. The attested issuance pipeline, the custodian-gated key release, the evidence bundle, the transparency log — all of it operates on the issuance side. MTC issuance benefits from runtime evidence the same way X.509 issuance does, and the patterns in this post carry over.

The architectural question is what the replacement looks like.

A CA built on the runtime-evidence pattern does not cost more to deploy at the moment you are already rebuilding. It costs more only if you skip the rebuild, and skipping is not on the table. The hardware-anchored credential side of the wire has been arriving in production for a decade. The issuance side is the part still running on the old shape. The PQ deadline is the forcing function that makes the issuance side move. The choice is between rebuilding the old shape with new algorithms, and rebuilding it with the same discipline that the relying-party side has spent the last decade adopting.

Short lifetimes with ARI make the operational side tractable. Seven-day certificates with ARI-driven renewal turn the PQ migration from a flag day into a moving window. The fleet rotates without an emergency, without anyone touching a machine, because the CA can shorten the renewal window for specific machines or profiles whenever it wants to.

That is what the next CA looks like. It is not a different CA than the one I have been describing. It is the same one.

Leave a Reply

Your email address will not be published. Required fields are marked *