A CA Built for the Threat Model We Actually Have

This builds on earlier posts on what attestation actually proves, what confidential computing is and isn’t, and an honest accounting of the problems with the current generation of TEEs. None of those problems go away here. The argument is that despite those limitations, attestation is an important tool. Certificate issuance is overdue to use it.

Back in the 1990s, I was doing some consulting for DigiNotar, yes, that DigiNotar. They had CA facilities in a data center whose perimeter still had WWII-era anti-tank obstacles, large concrete barriers sometimes called “dragon’s teeth.” Of course, this was an artifact of the facility’s history, but data centers are designed from a security perspective with layers of physical protection, including barriers, mantraps, biometrics, individual vaults with cages, individual racks with their own locks and biometrics, cameras, and more. The threat of physical theft, destruction, or manipulation is exactly what these facilities are designed to mitigate.

When building a CA inside one of these facilities, we design yet another layer of protection. Administration networks are segmented from transaction networks, interconnects from supporting infrastructure, the issuance environment from the systems holding root keys. We add our own physical segmentation on top of that so we can build controls around multiple parties being necessary for the more sensitive operations, while still letting routine hardware maintenance happen on the schedule the SLA needs.

These are all useful and important things, but the reality is that CA key material is not likely to be physically stolen. It is more likely to be compromised from the outside. We solve these problems through design, not by writing more code.

Meaningfully measured code forces design upstream of the code itself.

A measurement of a monolithic blob proves almost nothing useful. A measurement that names a specific role, a specific security domain, and a specific assertion the verifier is supposed to act on, proves something. The roles, the domain boundaries, and the questions the verification has to answer have to exist before any code is written, or the attestation is a signature on nothing in particular.

In operating system design, we have similar problems. In naive systems, we load cryptographic keys into memory on a running, network-connected system, accidentally exposing ourselves to memory-disclosure bugs where a network attacker can steal keys. Heartbleed is the canonical example, but the class is what matters. We do this because it is simpler and faster, but it is also less secure. As systems designers, we address this by moving those keys out of the process of the network-connected application and into a different user context. That way, an attacker cannot simply get the network-connected service to dump memory. They have to get persistence and cross a kernel-enforced user boundary.

This is old wisdom. Least privilege and privilege separation exist because network-facing code should not also be the thing that controls the keys.

A parallel showed up in early cryptocurrency exchanges. Hot wallets were used as signing oracles because the design and deployment work needed to prevent that had not been done, and many of the high-profile compromises of that era trace back to that gap. The exchanges that survived learned to put boundaries between the wallet and the network. That boundary did most of the work. The dragon’s teeth around the building did the rest.

When third parties need to rely on external services operating in these environments, they often rely on auditors to attest that management assertions about operational practices are actually being followed. These assessments are usually performed by CPAs, not security specialists, which can limit their value. They also often rely on sampling a small portion of transactions to confirm that the controls being evaluated are being followed. That sample is drawn from evidence provided by the entity being audited, which is also the party paying for the audit.

All of the things discussed above help bring some minimal level of transparency and verifiability, but it is turtles all the way down, layer on layer, none of them reaching the runtime where the actual compromise happens. This is where confidential computing, and solutions like Private Cloud Compute, start to matter.

Policy changes meaning in this model. In the traditional assurance world, policy is a written promise. The CA publishes a CP or CPS, the operator commits to following it, and the auditor samples evidence to decide whether that promise was kept. In a runtime-evidence model, policy becomes part of the mechanism. A measured binary evaluates a specific policy, produces a decision, and the digest of that policy travels with the evidence. The shift is from policy as promise to policy as enforcement, from “trust us, this is what we do” to “this is the policy the measured system actually applied.”

Apple’s Private Cloud Compute is the worked example. PCC nodes attest to the binary they are running, refuse to do work for clients that cannot verify that attestation, and publish every production build for public inspection. The user’s device, not Apple, decides whether a given node is acceptable. That inversion, the relying party verifying the service rather than the service asserting to the relying party, is the part of the pattern that matters. The pieces are not new individually. The combination, at the scale Apple shipped it, proves the pattern is real. The third-party security reviews prove the architecture is serious. Attacks on confidential computing do not refute that point. They prove there is now a boundary worth attacking, measuring, and improving.

Apple is not the only proof point. Signal used SGX remote attestation for private contact discovery in 2017, with clients verifying that the enclave was running the expected open-source code. WhatsApp’s end-to-end encrypted backups use an HSM-based Backup Key Vault to keep recovery keys out of the ordinary service path, and that design was publicly reviewed by NCC Group. Microsoft’s Confidential Consortium Framework powers Azure Confidential Ledger. Different systems, different threat models, same direction of travel. High-assurance services are moving from institutional assurances toward runtime evidence.

What it looks like

Concretely, a CA built on the Private Cloud Compute pattern looks like this.

Issuance is split into two attested components. The first, the registration authority, takes the certificate request, resolves identity from authoritative sources, evaluates the issuance policy, and produces a signed authorization context. The second, the signing oracle, holds the CA private key and produces the signature. Each runs in a separate attested enclave, and each is measured separately. This means policy can evolve without re-measuring the key-custody component, and keys can rotate without re-measuring the policy component.

The policy layer matters here too. Each component is not just running code, it is making a verifiable policy decision before it acts. The RA decides whether the request is authorized and the identity evidence is sufficient. The signing oracle decides whether the RA, the request, and the authorization context are acceptable before it signs. The evidence does not just say which binary ran. It also says which policy that binary evaluated.

The two components do not trust each other because they are on the same network. They trust each other through attestation, mutually verified at every connection, and the signing oracle does not merely accept the RA’s conclusion. Before it signs, it independently verifies the RA’s attestation, checks that the authorization context is fresh, confirms that the request is bound to an allowed profile, and verifies that the policy facts asserted by the RA match the evidence presented to the oracle. A compromised RA, even one with its own signing key, does not get to mint an out-of-profile certificate, bypass attestation, or turn the CA key into a general-purpose signing oracle.

The CA private key is not loaded by the operator. It is held by a custodian, a hardware security module, a cloud KMS, or another enclave, and it is wrapped so that the custodian will release it only to a signing oracle whose attestation matches a published image. The list of acceptable images is small, public, and updated through a documented process. An operator who runs a different binary, however benign the reason, does not get the key. The dragon’s teeth around the data center are still there. They no longer have to do the whole job.

Both the RA binary and the oracle binary are built from public source and are reproducibly buildable. Anyone can rebuild from the published sources, compare their measurement to the one in the attestation, and confirm that the two match. This is the part of the model that makes trust mean something specific. Not the operator’s word, not the auditor’s snapshot, not the CA’s policy statement, but the build process and the published source. To verify what a particular issuance was actually done by, you would not need to be admitted to the data center. You would need a compiler.

Each issued certificate is accompanied by a portable evidence bundle, signed by the attested issuance system. The bundle names the binary that produced the signature, the attestation root that vouched for the binary, the RA policy decision, the oracle policy decision, the identity assertion the RA accepted, and the inputs the oracle independently verified before signing. A relying party who trusts the chip vendor’s attestation root can determine for themselves whether the issuance was performed by code on the published list, against the policy on the published list, by an RA that accepted the identity claim it claimed to accept. The CA is not asked to be trusted. The CA is asked to produce evidence.

None of this removes the HSM, the auditor, or the operator. The HSM is still excellent at the threat it was built for, and a custodian holding a key wrapped to an attestation policy is still doing HSM work under the hood. The auditor is still needed to attest that the published policy is sensible, that the source matches the binary, that the threat model is honest, and that the runbook is followed in the moments where attestation cannot help. The operator is still needed to run the infrastructure and respond when things break.

What changes is what they are asked to prove.

Today, a relying party mostly gets institutional assurances. The CA says it followed its policy. The auditor samples evidence and says the controls were operating. The operator says the production system was the one described. Those are useful assurances, but they are indirect. They do not let the relying party inspect the actual path between a request, a policy decision, and a signature.

A Private Cloud Compute style CA changes that. It turns the issuance path itself into evidence. The question is no longer only whether the CA says it followed the rules. The question becomes which measured binary evaluated this request, which measured binary signed it, which policy digest was used, which identity evidence was accepted, what validation methods were used during issuance, and whether all of that matches the public commitment the CA made.

When the source is open and reproducibly buildable, that evidence includes a hash of the code that made the decision and signed attestations about the runtime elements that went into that decision. When the code is not open source, third parties can come in and validate the source, the build process, and the correctness of the claims, as Apple did with Private Cloud Compute. The public hashes then let others verify that the code claiming to provide these guarantees is, in fact, the code that ran.

Open source is not magic, and the point is not faith in “many eyes.” The point is that this shifts the emphasis from betting on physical security and operational practice audits to secure system design and cryptographic evidence about what code actually ran and what it actually did.

That is the threat model mismatch, and it is not only a CA problem. We built the WebPKI around buildings, cages, ceremonies, HSMs, and audits because those were the tools we had. We did the same thing cryptographically. We built systems around the assumption that factoring large composites and solving discrete logs on elliptic curves were out of reach. Q-day changes that assumption. Runtime compromise changes the operational assumption just as fundamentally.

We apply the same instincts in any environment we want to call high-assurance. They still matter, but most of the failures we care about are not physical failures. They are logical, remote, operational failures in the runtime path. The rate of change makes that gap wider every year. Annual audits are retrospective, and between them systems change thousands of times, so what the auditor described is rarely what is actually running when a relying party sees a certificate.

Cryptography turns security problems into key-management problems. AI turns assurance problems into runtime-evidence problems. Once agents are making decisions, calling tools, and changing state, the question is no longer what policy you wrote or what control an auditor sampled. The question is what actually ran, what it saw, what boundary contained it, what policy constrained it, and what evidence survived execution.

A Private Cloud Compute style CA gives us a way to make that path visible, attestable, and independently verifiable. The same pattern applies wherever the gap between what we say a system does and what it actually does at runtime matters.

UNMITIGATED RISK

un.mit.i.gat.ed: Adj. Not diminished or moderated in intensity or severity; unrelieved. risk: N. The possibiity of suffering harm or loss; danger.

A CA Built for the Threat Model We Actually Have

What it looks like

One thought on “A CA Built for the Threat Model We Actually Have”

Leave a Reply Cancel reply