I’ve been spending a lot of time lately building Systematic Reasoning with my long-time friend Vishal. The core premise is straightforward. Organizations reveal their true operational character through how they design to prevent failure, how they plan to handle it when it happens, and how they actually do. That signal deserves to be tracked, structured, and acted on. We’re building an agentic compliance platform to do exactly that.
Systematic Reasoning won’t be limited to any single domain, but we decided to start with the Web PKI. The reasoning was simple. It’s high impact in a way that’s hard to overstate. Every internet user depends, whether they know it or not, on a relatively small number of Certificate Authorities getting things right. The margin for error is zero. If that trust layer breaks, it breaks for everyone.
DigiNotar is the canonical example. A small Dutch CA, compromised so thoroughly that attackers could impersonate any website on the web, and did. That capability was used to spy on Iranian dissidents, intercepting communications that people believed were private and secure. The trust infrastructure that was supposed to protect them was turned into a weapon against them. DigiNotar isn’t an edge case or a cautionary tale from a more naive era; it’s a demonstration of the actual ceiling of what can go wrong. And it isn’t the only one. State-affiliated certificate authorities have been caught performing man-in-the-middle attacks on their own citizens’ traffic, something the Baseline Requirements explicitly prohibit, but prohibition only matters if it’s enforced. The web’s trust model works right up until the moment someone decides it’s more useful as surveillance infrastructure.
At the core of Systematic Reasoning, is a belief I’ve held for a while. Compliance can be a vital sign of organizational security, but only if it’s continuous. The reality today is that it isn’t. Code ships daily. Audits happen annually. The gap between those two rhythms is where things go quietly wrong.
I’ve written before about why I have limited faith in the current audit regime. Auditors are engaged by the organizations they assess. Their product is a clean seal; their incentive is to keep the client. They operate on point-in-time sampling with auditee-selected scope, and they’re often compliance professionals rather than engineers, which means they’re checking whether a policy exists more than whether the system actually behaves correctly. That’s if you’re lucky. Sometimes the audit is scoped against a version of the Baseline Requirements that was superseded over a year ago.
The same incentive shapes how certificate authorities write their governance documents. A CP/CPS that relies heavily on incorporation by reference, that omits specifics about what the organization actually does and what constraints it operates under, is easier to audit against than one that makes precise, testable commitments. Vagueness isn’t always carelessness. Sometimes it’s a design choice. The same thing happens in incident reports. A report that attributes a failure to “organic process evolution” or “human error” without describing the actual control gap is easier to close than one that names the broken system and commits to a specific fix. In both cases the document gets the box checked without creating accountability. References establish authority. Commitments establish accountability.
The audit gap isn’t compensated for by strong internal monitoring either. The majority of significant compliance failures are not caught internally. They are caught by external researchers, root program staff, or community tooling. A broken validation endpoint runs for five years and the organization finds out because someone posted a 404 error in a public issue tracker. A validation race condition exists undetected for seven and a half years not because it was well hidden but because nobody was looking. The absence of an internal alarm is not evidence that the system is healthy. It is often evidence that the monitoring itself is missing.
So public incident reports and governance documents become some of the most signal-rich material available. Policy documents tell you what an organization claims it will do. Incident reports tell you what happened when reality diverged from that claim. Together they create a longitudinal picture that neither document produces alone.
Building a system to reason over that data surfaced a problem I didn’t fully anticipate. When you’re working from the outside, with no access to internal systems and no way to verify what actually changed, the public record is almost all you have. The question isn’t whether to treat it with skepticism. It’s how much skepticism to build in by default.
The temptation is to give the benefit of the doubt. Organizations are required to describe the blast radius of an incident. Not every localized bug is a symptom of something systemic. But accepting minimizing language at face value is its own failure.
“Only” is doing a lot of work when the bug it’s describing went undetected for seven and a half years. “No compromise of end-entities” is doing a lot of work when what it really means is that nobody found the gap before you did. Framing survival as security isn’t reporting, it’s PR. And if an organization believes an incident is no big deal, you can predict with reasonable confidence that the root cause analysis will be shallow and the remediation will be a band-aid.
ForgeIQX, our first offering, tracks those signals longitudinally across both policy documents and incident reports. Not to prosecute organizations for their language choices, but to notice when a commitment made in a CP/CPS quietly disappears in the next version, or when a promised fix is nowhere to be found when the same failure mode surfaces years later. That’s commitment decay, the slow evaporation of a promise made under pressure, and it’s only visible if you’re tracking across multiple documents and incidents over time rather than treating each one in isolation.
The calibration problem is real and doesn’t have a clean answer. Get it wrong in one direction and you build a system that cries wolf. Get it wrong in the other and you build a system that launders PR-speak into clean signals, which is just automating the thing we already do too much of.
There’s a third failure mode that took me longer to see. A system like this can be gamed. Swap “we got lucky” for “our monitoring detected no active exploitation.” Replace “only thirty certificates” with a more clinical impact scoping statement that says the same thing in language that sounds like engineering rigor. The words change; the institutional posture doesn’t. A system that can be satisfied by better prose isn’t measuring operational maturity, it’s measuring communications sophistication.
That means the system has to be built with structural pessimism. Not cynicism for its own sake, but a deliberate prior that clean language is not the same as clean operations, and that the absence of red flags is not the same as the presence of green ones. We can’t verify that an organization fixed what it said it would fix. What we can do is watch whether the same failure mode surfaces again and whether the pattern of shallow root cause analyses continues or breaks. The historical record doesn’t tell us what’s true inside these organizations. It tells us what they were willing to say in public, under pressure, over time. Given the alternatives, that may be the most honest signal available.
A certificate authority with genuine operational maturity should want this kind of scrutiny applied to itself. Not because it will always produce a clean result, but because it surfaces the gaps before an external party does. ForgeIQX gives organizations a way to continuously monitor their own compliance posture, so their practices and code keep pace with their commitments. The same is true for auditors who want their findings to mean something beyond a checkbox. The problem with the current regime isn’t that the people in it are careless. It’s that the incentive structures don’t reward rigor, and the tooling to demonstrate it continuously doesn’t exist. That’s what we’re building.
The Web PKI is where we started because the stakes are concrete and the public record is unusually rich. But any regulated industry where compliance is measured annually, where governance documents are written to satisfy auditors rather than inform relying parties, and where incident reports are drafted with one eye on legal exposure, has the same gap between what the paper says and what the organization actually does. We started here. We don’t intend to stop here.








