Category Archives: Security

Incident Response Done Right: A CA’s Guide to Resilience

Imagine yourself as a pilot at 30,000 feet when an engine begins to sputter. You don’t panic—your training activates, you follow your checklist, and take control. For Certificate Authorities (CAs), incidents like misissued certificates or security breaches create similar high-stakes scenarios. They’re unexpected, critical, and unforgiving. Preparation isn’t just advisable—it’s essential. In the Web PKI world, where trust is paramount, improvisation isn’t an option.

These high-stakes scenarios aren’t rare exceptions—browser distrust events occur approximately every 1.23 years. Since 2011, over a dozen CAs have been distrusted, with poor incident response handling featuring prominently among the causes. These aren’t just statistics; they represent existential threats to CAs and the trust system underpinning secure internet communication.

Mozilla’s new CA Incident Response Requirements policy addresses a history of delayed responses, insufficient analyses, and unclear communication that has plagued the ecosystem. By incorporating Site Reliability Engineering (SRE) concepts, CAs can transform incidents into opportunities to strengthen resilience. Let’s examine the new policy, take a quick look SRE concepts and how they enhance it, and analyze real-world examples from Let’s Encrypt and DigiCert to illustrate best practices—and pitfalls to avoid.

Why the Mozilla Policy Matters: Trust at Stake

Incidents are inevitable. Whether a certificate misissuance, system failure, or security exploit, these events represent critical moments for CAs. Losing browser trust, as DigiNotar did in 2011 or Symantec by 2017, is catastrophic. One moment, you’re essential to Web PKI; the next, you’re a cautionary tale.

The evidence is clear: since 2011, CAs have experienced over 10 major incidents—averaging one every 14 months. More than half—over 57%—of these distrusts stem at least in part from delayed or mishandled responses, not just the incidents themselves. Each costs trust, revenue, or both (as DigiNotar’s bankruptcy demonstrated). The pattern reveals that your response defines you more than the incident itself. A prepared CA can recover and even strengthen its reputation. An unprepared one faces severe consequences.

Mozilla’s policy addresses the cycle of late notifications and superficial fixes that have damaged CAs previously. Structured timelines ensure transparency and accountability—essential elements for maintaining trust.

2025 Policy: Your Incident Response Framework

The new Common Incident Reporting Guidelines (effective March 2025) establish a the following framework for incident handling:

72-Hour Initial Disclosure: Three days to publicly acknowledge the issue, outline initial actions, and assess scope of impact.
14-Day Full Report: Two weeks to deliver a standardized, comprehensive Root Cause Analysis (RCA), detailed timeline, and prevention plan.

These aren’t just arbitrary deadlines—they’re designed to break the pattern of delays and ambiguity that has undermined trust in the WebPKI ecosystem. The policy establishes specific templates, report formats, and update requirements that formalize the approaches already taken by the most resilient CAs.

The requirements emphasize “candid, timely, and transparent” reporting—values that separate successful incident responses from catastrophic ones. What’s more, reports must demonstrate “a detailed understanding of root causes” and “clear, measurable explanations” of remediation actions.

The incident lifecycle follows this structure:

SRE: The Enhancement for Resilience

Mozilla provides structure, but Site Reliability Engineering (SRE)—pioneered by Google—offers tools that elevate your response. Two SRE concepts align perfectly with Mozilla’s requirements:

Automation: SRE emphasizes automating repetitive tasks. For the 72-hour disclosure, automated monitoring can identify issues immediately, while scripts—such as certificate revocation tools—activate without delay. Speed becomes your advantage.
Blameless Postmortems: The 14-day RCA isn’t about assigning blame—it’s about learning. SRE’s blameless approach investigates what failed and how to improve, converting every incident into a growth opportunity.

Automation in this case might look like this:

Together, Mozilla’s timelines and SRE’s methodologies establish a framework that’s proactive rather than reactive.

Case Studies: Preparation Demonstrated

Let’s Encrypt: Prepared When It Mattered

In 2020, Let’s Encrypt encountered a bug in their domain validation logic. Their response exemplified best practices:

Early Detection: Proactive monitoring and periodic reviews identified the issue quickly, before external parties did.
Automation in Action: They revoked 1.7 million certificates within hours due to their readiness.
Data-Driven Decisions: They were able to immediately identify which certificates had been replaced versus which were still in active use.
Transparent Communication: Regular updates and a thorough postmortem kept stakeholders informed.
Strategic Delayed Revocation: For certificates that couldn’t be immediately revoked without major disruption, they filed a separate delayed revocation incident with clear timelines.

They met CCADB’s deadlines with margin to spare and emerged stronger. Their preparation proved invaluable—and it was no coincidence. Their systems were designed from day one to handle such scenarios, with automation built into their core operations.

DigiCert: Caught Unprepared

DigiCert’s misissuance incident tells a contrasting story. An external party discovered the problem, and their response faltered:

Manual Processes: Without automation, revocations progressed slowly and required customer intervention.
Insufficient Planning: They struggled, facing subscriber resistance and legal complications, including a Temporary Restraining Order (TRO) from a customer.
Reactive Decision-Making: They initially announced a 24-hour revocation window, then extended it to 5 days as complications mounted.
Customer Impact: They did not know how many of their customers were ready to rotations, and so they had to treat everyone the same, amplifying disruption.
Design Issues: The initial fix appeared to be applied at the user interface level rather than addressing the core validation system—suggesting insufficient engineering practices.

Commercial CAs might argue their enterprise model makes automation harder than Let’s Encrypt’s, but complex customer relationships actually make preparation more critical, not less. The TRO demonstrates how business constraints amplify—rather than excuse—the need for rigorous incident readiness.

The contrast is instructive. Let’s Encrypt’s readiness maintained stability; DigiCert’s lack of preparation created vulnerability and legal complications that set a concerning precedent for the industry.

Implementing the New CCADB Requirements

To meet the new CCADB incident reporting requirements effectively, CAs should implement these eight critical capabilities:

Create Templated Response Plans: Develop standardized report templates aligned with CCADB’s new formats, with designated owners for each section.
Establish Monitoring Triggers: Implement automated monitoring that can identify potential incidents early and trigger response workflows.
Build Certificate Inventory Systems: Maintain comprehensive real-time data about certificate status, usage, and replacement to enable rapid impact assessment.
Create Tiered Revocation Capabilities: Implement automation for certificates with lifecycle management while maintaining processes for manual customers.”
Prepare customers and technology: Implement back-end changes, and work with customers to adopt systems that have been designed to meet these requirements.
Develop Blameless Postmortem Processes: Create structured processes for conducting Root Cause Analysis using methodologies like “5 Whys” and Fishbone Diagrams.
Create Revocation Automation: Implement systems to quickly revoke certificates in bulk with minimal manual intervention.
Align Legal Agreements: Ensure contracts include provisions for certificate revocations and incident response cooperation.
Test Incident Response Regularly: Conduct simulations of different incident types to ensure teams can meet the required reporting deadlines.

These systems shouldn’t be created during a crisis—they must be built, tested, and refined during normal operations to ensure they function when needed most.

Your Next Steps: Building Resilience

Ready to implement these principles? Follow this straightforward approach:

Create a Response Plan: Define roles, responsibilities, and timelines—your organization’s crisis protocol.
Automate Critical Functions: Implement detection and remediation tools—prioritize speed.
Develop Learning Processes: Conduct blameless postmortems to strengthen systems.
Prioritize Clear Communication: Share updates with stakeholders consistently, even during difficulties.

This isn’t complex—it’s disciplined. And for CAs, discipline is fundamental.

Preparation Is Essential

Incidents occur without warning. With a robust plan, automation, a learning orientation, and transparent communication, you can manage them effectively. Mozilla’s policy and Google’s SRE practices provide both structure and methodology to succeed. Let’s Encrypt demonstrated its effectiveness; DigiCert illustrated its necessity.

Don’t wait for an incident to expose weaknesses in your process. Preparation isn’t optional—it’s how you transform a crisis into an opportunity to demonstrate excellence. Plan systematically, automate intelligently, learn continuously, and you’ll build a CA that doesn’t merely survive but thrives.

Want to learn more? Mastering Incident Reporting in the WebPKI class covers mistakes and misconceptions: slides can be found here

How Organizational Inertia Externalizes Risk in the WebPKI

5 Replies

I’ve been involved in the Web PKI since the mid-‘90s, when SSL certificates carried five- or ten-year lifetimes—long-lasting credentials for an internet still a wild west. Issuance was manual, threats were sparse, and long validity fit that quieter era. Thirty years later, we’ve fought our way to a 398-day maximum lifetime—today’s standard as of 2025—thanks in part to Apple’s bold 2020 move to enforce 398-day certificates in Safari, dragging resistant CAs into a shared ballot after years of clinging to the status quo. Yet some certificate authorities, certificate consumers, and industry holdouts still resist shorter lifetimes and tighter data reuse policies, offloading breaches, increased risk, and eroded trust onto users, businesses, and the web’s backbone. This 15-year struggle got us to 398; now it’s time to push past it.

Core Argument

The journey to shorter lifetimes spans decades. The TLS Baseline Requirements set a 60-month cap in 2010, but by 2014, internal debates among browsers and CAs ignited over whether such spans were safe as threats ballooned. Progress stalled—pushback was fierce—until Apple threw a wrench in the works. Announced earlier in 2020, effective September 2020, they declared Safari would reject certificates issued after August 31, 2020, with lifetimes exceeding 398 days, blindsiding CAs who’d dug in their heels. Only after that jolt did the CA/Browser Forum pass Ballot SC-42 in 2021, codifying 398 days as a shared requirement—proof that CAs wouldn’t budge without external force. Earlier, Ballot 185 in 2017 had proposed cutting lifetimes to 27 months, Ballot SC-22 in 2019 explored short-lived certificates, and Ballot SC-081 in 2025 is expected to reaffirm 398 days as the maximum, with a long-term target of 45–47 days by 2029 (SC-081v2). That’s 15 years of incremental progress, built on 30 years of evolution—Last time Apple’s push broke CA inertia enough to land us at 398, and I am confident without that action we would not be where we are yet. Yet risks like “Bygone SSL” linger: valid certificates staying with old domain owners after a sale, opening doors to impersonation or chaos.

Automation made this possible—and Apple’s 2020 edict accelerated it. Let’s Encrypt launched in November 2014, revolutionizing issuance with free, automated certificates; the ACME protocol, drafted then and standardized as RFC 8555 in 2019, turned renewal into a background hum. Today, CAs split into camps: fully automated players like Let’s Encrypt, Google Trust Services, and Amazon, versus mixed providers like DigiCert, Sectigo, and GlobalSign, who blend proprietary and ACME based automation with manual issuance for some. Data from crt.sh suggests over 90% of certificates now use automated protocols like ACME. Apple’s push forced CAs to adapt or lose relevance, yet many clung to old ways, agreeing to 398 only post-ballot. That lag—resisting automation and shorter spans—doesn’t just slow progress; it externalizes risk, burdening the WebPKI with overstretched certificates and outdated practices.

What Problem Are We Solving Anyway?

Well for one certificates are snapshots of a domain’s status at issuance; that 13-month span lets changes—like ownership shifts or domain compromises—linger unreflected, while 45 days would keep them current, shrinking an attacker’s window from over a year to mere weeks. “Bygone SSL” proves the point: when domains change hands, old owners can hang onto valid certificates—sometimes for years—letting them spoof the new owner or, with multi-domain certs, trigger revocations that disrupt others. History teaches us that reusing stale validation data—sometimes months old—leads to misissuance, where certificates get issued on outdated or hijacked grounds. Tighter allowed reuse periods force regular revalidation, but when CAs or companies slack, the ecosystem bears the cost: spoofed domains impersonating legit sites, breaches exposing sensitive data, and a trust system strained by systemic hits.

Browsers show us the way—back in the ‘90s, updates came on floppy disks on magazine covers, a manual slog that left users exposed until the next trip to the store; today, automatic updates roll out silently, patching holes and keeping security tight without a fuss. Certificates should mirror that: automated renewal via ACME or proprietary tools manages 398 days now and could handle 45 effortlessly, shedding the old manual grind—an incremental evolution already underway. Yet some cling to slower cycles, offloading risk—leaving the WebPKI vulnerable to their refusal to fully embrace automation’s promise. The proof’s in the pudding—Kerberos rotates 10-hour tickets daily in enterprise networks without a hitch; ACME brings that scalability to the web. Legacy systems? Centralized solutions like reverse proxies, certificate management platforms, or off-device automation bridge the gap—technical excuses don’t hold.

We’ve hit 398 days, but Zeno’s Dichotomy still grips us: advocates push for shortening, hit “not ready,” and stall at the current max—halving the gap to robust security without ever closing it. Each delay lets inertia shift risk onto the system.

Critics’ Refrain

Critics cling to familiar objections. “Legacy systems can’t handle frequent renewals”? Centralized automation—proxies, management tools, off-device solutions—proves otherwise; their inertia spills risk onto the ecosystem. “Smaller players face a competitive burden,” implying the web should shoulder that risk? Shared tools and phased transitions even the odds, yet their lag, like SHA-1’s slow death, threatens everyone. “Why not focus on revocation, DNSSEC, or key management instead”? Revocation’s a pipe dream—three decades of flops, from CRLs to OCSP, show it crumbling at scale, with privacy holes, performance drags, and spotty enforcement, as DigiNotar’s failure left unpatched clients exposed. DNSSEC and key management complement, not replace—shorter lifetimes cut exposure fast, while those build out. “It’s too rapid”? Two decades of automation—from proprietary solutions to ACME—and 15 years of debate say no; 398 days took effort, 45–47 is next. “We’re not ready”? That’s an impossible hurdle—security leaps like SHA-2 to TLS 1.3 came by diving in, not waiting, just as parents figure out diapers post-birth. Stalling at 398 doesn’t shield risk—it dumps it on the rest.

Pushing Beyond 398 Delivers Concrete Gains When Inertia’s Beaten:

Benefit	Description
Enhanced Trustworthyness	Frequent renewals keep data current, cutting misissuance—laggards can’t dump stale risks on the WebPKI.
Shorter Exploitation Window	45 days caps attacks at weeks, not 398 days—orgs can’t offload longer threats.
Lower Misissuance Risk	Tight reuse forces fresh checks, slashing errors CAs push onto the system.
Rapid Policy Transition	Quick shifts to new standards dodge inertia’s drag, keeping the PKI sharp.
Stronger Baselines	90%+ automated renewals set a secure norm—holdouts can’t undermine it.
Collective Accountability	Deadlines force modernization, ending the pass where a few’s inaction risks all.

Conclusion

Shorter lifetimes and tighter reuse periods—break the cycle: fresh data, capped risk, no more offloading. A phased, deadline-driven approach, like SC-081’s framework (targeting shorter spans by 2029 in SC-081v2), forces the industry to adapt, hones automation where needed, and drives security forward—waiting five more years just fattens the risks we’ve already outgrown.

How does inertia externalize risk in the WebPKI? When CAs lean on stale data, companies settle for 398 days, and stragglers resist progress, they turn trust into a punching bag—ripe for abuse. Thirty years in, with 398 days locked and over 90% automated, the tools sit ready—only will falters.

Zeno’s half-steps got us here, but “not ready” is a fantasy—no one masters security before the plunge, just as parents don’t ace diapers pre-birth; we’ve evolved through every shift this way. Browsers don’t wait for floppy disks anymore—certificates can’t linger on yesterday’s pace either. I’ve watched the WebPKI battle from the Wild West to now—let’s rip inertia’s grip off with deadlines that stick and lock in 45 days to forge a trust that outlasts the past’s failures.

Educating the Champion, the Buyer, and the Market

1 Reply

Security used to be something we tried to bolt on to inherently insecure systems. In the 1990s, many believed that if we simply patched enough holes and set up enough firewalls, we could protect almost anything. Today, hard-won experience has shown that secure-by-design is the only sustainable path forward. Rather than treating security as an afterthought, we need to bake it into a system’s very foundation—from its initial design to its day-to-day operation.

Yet even the best security technology can fail to catch on if no one understands its value. In my time in the field I’ve seen a recurring theme: great solutions often falter because they aren’t communicated effectively to the right audiences. Whether you’re a security entrepreneur, an in-house security architect, or part of a larger development team, you’ll likely need to equip three distinct groups with the right messaging: the Technical Champion, the Economic Buyer, and the Broader Market. If any of them fail to see why—and how—your solution matters, momentum stalls.

From Bolt-On to Secure-by-Design

The security industry has undergone a massive shift, moving away from the idea that you can simply bolt on protection to an already flawed system. Instead, we now realize that security must be designed in from the start. This demands a lifecycle approach—it’s not enough to fix bugs after deployment or put a facade in front of a service. We have to consider how software is built, tested, deployed, and maintained over time.

This evolution requires cultural change: security can’t just live in a silo; it has to be woven into product development, operations, and even business strategy. Perhaps most importantly, we’ve learned that people, processes, and communication strategies are just as important as technology choices.

This shift has raised the bar. It’s no longer sufficient to show that your solution works; you must show how it seamlessly integrates into existing workflows, consider the entire use lifecycle, supports future needs, and gets buy-in across multiple levels of an organization.

The Three Audiences You Need to Win Over

The Technical Champion (80% Tech / 20% Business)

Your security solution will often catch the eye of a deeply technical person first. This might be a security engineer who’s tired of patching the same vulnerabilities or a software architect who sees design flaws that keep repeating. They’re your first and most crucial ally.

Technical champions need more than promises—they need proof. They want detailed demos showing real-world scenarios, sample configurations they can experiment with, and pilot environments where they can test thoroughly. Give them architecture diagrams that satisfy their technical depth, comprehensive documentation that anticipates their questions, and a clear roadmap showing how you’ll address emerging threats and scale for future needs.

Integration concerns keep champions awake at night. They need to understand exactly how your solution will mesh with existing systems, what the deployment strategy looks like, and who owns responsibility for updates and patches. Address their concerns about learning curves head-on with clear documentation and practical migration paths.

While technology drives their interest, champions eventually have to justify their choices to management. Give them a concise one-pager that frames the returns in business terms: reduced incident response time, prevented security gaps, and automated fixes that save precious engineer hours.

Why This Matters:
When you equip your champion with the right resources, they become heroes inside their organizations. They’re the one who discovered that crucial solution before a major breach, who saved the team countless hours of manual work, who saw the strategic threat before anyone else. That kind of impact directly translates to recognition, promotions, and career advancement. The champion who successfully implements a game-changing security solution often becomes the go-to expert, earning both peer respect and management attention. When you help a champion shine like this, they’ll pull your solution along with them as they climb the organizational ladder.

The Economic Buyer (20% Tech / 80% Business)

A passionate champion isn’t always the one holding the purse strings. Often, budget is controlled by directors, VPs, or executives who juggle competing priorities and are measured by overall business outcomes, not technical elegance.

Your buyer needs a concise, compelling story about how this investment reduces risk, saves costs, or positions the company advantageously. Frame everything in terms of bottom-line impact: quantifiable labor hours saved, reduced compliance burdens, and concrete return on investment timelines.

Even without extensive case studies, you can build confidence through hypothetical or pilot data. Paint a clear picture: “Similar environments have seen 30% reduction in incident response time” or “Based on initial testing, we project 40% fewer false positives.” Consider proposing a small pilot or staged rollout—once they see quick wins scaling up becomes an easier sell.

Why This Matters:
When buyers successfully champion a security solution, they transform from budget gatekeepers into strategic leaders in the eyes of executive management. They become known as the one who not only protected the company but showed real business vision. This reputation for combining security insight with business acumen often fast-tracks their career progression. A buyer who can consistently tell compelling business stories—especially about transformative security investments—quickly gets noticed by the C-suite. By helping them achieve these wins, you’re not just securing a deal; you’re empowering their journey to higher organizational levels. And as they advance, they’ll bring your solution with them to every new role and company they touch.

The Broader Market: Present, Teach, and Farm

While winning over individual champions and buyers is crucial, certain security approaches need industry-wide acceptance to truly succeed. Think of encryption standards, identity protocols, and AI based security research tools—these changed the world only after enough people, in multiple communities, embraced them.

Build visibility through consistent conference presentations, industry webinars, and local security meetups. Even with novel technologies, walking people through hypothetical deployments or pilot results builds confidence. Panels and Q&A sessions demonstrate your openness to tough questions and deep understanding of the problems you’re solving.

Make your message easy to spread and digest. While detailed whitepapers have their place, supplement them with short video demonstrations, clear infographics, and focused blog posts that capture your solution’s essence quickly. Sometimes a two-minute video demonstration or one-page technical overview sparks more interest than an extensive document.

Think of education as planting seeds—not every seed sprouts immediately, but consistent knowledge sharing shapes how an entire field thinks about security over time. Engage thoughtfully on social media, address skepticism head-on, and highlight relevant use cases that resonate with industry trends. Consider aligning with open-source projects, industry consortiums, or standards bodies to amplify your reach.

Why This Matters:
By consistently educating and contributing to the community dialogue, you create opportunities for everyone involved to shine. Your champions become recognized thought leaders, speaking at major conferences about their successful implementations. Your buyers get profiled in industry publications for their strategic vision. Your early adopters become the experts everyone else consults. This creates a powerful feedback loop where community advocacy not only drives adoption but establishes reputations and advances careers. The security professionals who help establish new industry norms often find themselves leading the next wave of innovation—and they remember who helped them get there.

Overcoming Common Challenges

The “Not Invented Here” Mindset

Security professionals excel at finding flaws, tearing down systems, and building their own solutions. While this breaker mindset is valuable for discovering vulnerabilities, it can lead to the “Not Invented Here” syndrome: a belief that external solutions can’t possibly be as good as something built in-house.

The key is acknowledging and respecting this culture. Offer ways for teams to test, audit, or customize your solution so it doesn’t feel like an opaque black box. Show them how your dedicated support, updates, and roadmap maintenance can actually free their talent to focus on unique, high-value problems instead of maintaining yet another in-house tool.

Position yourself as a partner rather than a replacement. Your goal isn’t to diminish their expertise—it’s to provide specialized capabilities that complement their strengths. When teams see how your solution lets them focus on strategic priorities instead of routine maintenance, resistance often transforms into enthusiasm.

The Platform vs. Product Dilemma

A common pitfall in security (and tech in general) is trying to build a comprehensive platform before solving a single, specific problem. While platforms can be powerful, they require critical mass and broad ecosystem support to succeed. Many promising solutions have faltered by trying to do too much too soon.

Instead, focus on addressing one pressing need exceptionally well. This approach lets you deliver value quickly and build credibility through concrete wins. Once you’ve proven your worth in a specific area, you can naturally expand into adjacent problems. You might have a grand vision for a security platform, but keep your initial messaging focused on immediate, tangible benefits.

Navigating Cross-Organizational Dependencies

Cross-team dynamics can derail implementations in two common ways: operational questions like “Who will manage the database?” and adoption misalignment where one team (like Compliance) holds the budget while another (like Engineering) must use the solution. Either can stall deals for months.

Design your proof of value (POV) deployments to minimize cross-team dependencies. The faster a champion can demonstrate value without requiring multiple department sign-offs, the better. Start small within a single team’s control, then scale across organizational boundaries as value is proven.

Understand ownership boundaries early: Who handles infrastructure? Deployment? Access control? Incident response? What security and operational checklists must be met for production? Help your champion map these responsibilities to speed implementation and navigate political waters.

The Timing and Budget Challenge

Success often depends on engaging at the right time in the organization’s budgeting cycle. Either align with existing budget line items or engage early enough to help secure new ones through education. Otherwise, your champion may be stuck trying to spend someone else’s budget—a path that rarely succeeds. Remember that budget processes in large organizations can take 6-12 months, so timing your engagement is crucial.

The Production Readiness Gap

A signed deal isn’t the finish line—it’s where the real work begins. Without successful production deployment, you won’t get renewals and often can’t recognize revenue. Know your readiness for the scale requirements of target customers before engaging deeply in sales.

Be honest about your production readiness. Can you handle their volume? Meet their SLAs? Support their compliance requirements? Have you tested at similar scale? If not, you risk burning valuable market trust and champion relationships. Sometimes the best strategy is declining opportunities until you’re truly ready for that tier of customer.

Having a clear path from POV to production is critical. Document your readiness criteria, reference architectures, and scaling capabilities. Help champions understand and navigate the journey from pilot to full deployment. Remember: a successful small customer in production is often more valuable than a large customer stuck in pilot or never deploys into production and does not renew.

Overcoming Entrenched Solutions

One of the toughest challenges isn’t technical—it’s navigating around those whose roles are built on maintaining the status quo. Even when existing solutions have clear gaps (like secrets being unprotected 99% of their lifecycle), the facts often don’t matter because someone’s job security depends on not acknowledging them.

This requires a careful balance. Rather than directly challenging the current approach, focus on complementing and expanding their security coverage. Position your solution as helping them achieve their broader mission of protecting the organization, not replacing their existing responsibilities. Show how they can evolve their role alongside your solution, becoming the champion of a more comprehensive security strategy rather than just maintaining the current tools.

Putting It All Together

After three decades in security, one insight stands out: success depends as much on communication as on code. You might have the most innovative approach, the sleekest dashboard, or a bulletproof protocol—but if nobody can articulate its value to decision-makers and colleagues, it might remain stuck at the proof-of-concept stage or sitting on a shelf.

Your technical champion needs robust materials and sufficient business context to advocate internally. Your economic buyer needs clear, ROI-focused narratives supported by concrete outcomes. And the broader market needs consistent education through various channels to understand and embrace new approaches.

Stay mindful of cultural barriers like “Not Invented Here” and resist the urge to solve everything at once. Focus on practical use cases, maintain consistent messaging across audiences, and show how each stakeholder personally benefits from your solution. This transforms curiosity into momentum, driving not just adoption but industry evolution.

Take a moment to assess your approach: Have you given your champion everything needed to succeed—technical depth, migration guidance, and business context? Does your buyer have a compelling, ROI-focused pitch built on solid data? Are you effectively sharing your story with the broader market through multiple channels?

If you’re missing any of these elements, now is the time to refine your strategy. By engaging these three audiences effectively, addressing cultural barriers directly, and maintaining focus on tangible problems, you’ll help advance security one success story at a time.

The Account Recovery Problem and How Government Standards Might Actually Fix It

Leave a reply

Account recovery is where authentication systems go to die. We build sophisticated authentication using FIDO2, WebAuthn, and passkeys, then use “click this email link to reset” when something goes wrong. Or if we are an enterprise, we spend millions staffing help desks to verify identity through caller ID and security questions that barely worked in 2005.

This contradiction runs deep in digital identity. Organizations that require hardware tokens and biometrics for login will happily reset accounts based on a hope and a prayer. These companies that spend fortunes on authentication will rely on “mother’s maiden name” or a text message of a “magic number” for recovery. Increasingly we’ve got bank-vault front doors with screen-door back entrances.

The Government Solution

But there’s an interesting solution emerging from an unexpected place: government identity standards. Not because governments are suddenly great at technology, but because they’ve been quietly solving something harder than technology – how to agree on how to verify identity across borders and jurisdictions.

The European Union is pushing ahead with cross-border digital identity wallets based on their own standards. At the same time, a growing number of U.S. states—early adopters like California, Arizona, Colorado, and Utah—are piloting and implementing mobile driver’s licenses (mDLs). These mDLs aren’t just apps showing a photo ID; they’re essentially virtual smart cards, containing a “certificate” of sorts that is used to attest to certain information about you, similar to what happens with electronic reading of passports and federal CAC cards. Each of these mDL “certificates” are cryptographically traceable back to the issuing authority’s root of trust, creating verifiable chains of who is attesting to these attributes.

One of the companies helping make this happen is SpruceID, a company I advise. They have been doing the heavy lifting to enable governments and commercial agencies to accomplish these scenarios, paving the way for a more robust and secure digital identity ecosystem.

Modern Threats and Solutions

What makes this particularly relevant in 2024 is how it addresses emerging threats. Traditional remote identity verification relies heavily on liveness detection – systems looking at blink patterns, reflections and asking users to turn their heads, or show some other directed motion. But with generative AI advancing rapidly, these methods are becoming increasingly unreliable. Bad actors can now use AI to generate convincing video responses that fool traditional liveness checks. We’re seeing sophisticated attacks that can mimic these patterns the existing systems look at, even the more nuanced subtle facial expressions that once served as reliable markers of human presence.

mDL verification takes a fundamentally different approach. Instead of just checking if a face moves correctly, it verifies cryptographic proofs that link back to government identity infrastructure. Even if an attacker can generate a perfect deepfake video, they can’t forge the cryptographic attestations that come with a legitimate mDL. It’s the difference between checking if someone looks real and verifying they possess cryptographic proof of their identity.

Applications and Implementation

This matters for authentication because it gives us something we’ve never had: a way to reliably verify legal identity during account authentication or recovery that’s backed by the same processes used for official documents. This means that in the future when someone needs to recover account access, they can prove their identity using government-issued credentials that can be cryptographically verified, even in a world where deepfakes are becoming indistinguishable from reality.

The financial sector is already moving on this. Banks are starting to look at how they can integrate mDL verification into their KYC and AML compliance processes. Instead of manual document checks or easily-spoofed video verification, they will be able to use these to verify customer identity against government infrastructure. The same approaches that let customs agents verify passports electronically will now also be used to enable banks to verify customers.

For high-value transactions, this creates new possibilities. When someone signs a major contract, their mDL can be used to create a derived credential based on the attestations from the mDL about their name, age, and other artifacts. This derived credential could be an X.509 certificate binding their legal identity to the signature. This creates a provable link between the signer’s government-verified identity and the document – something that’s been remarkably hard to achieve digitally.

Technical Framework

The exciting thing isn’t the digital ID – they have been around a while – it’s the support for an online presentment protocol. ISO/IEC TS 18013-7 doesn’t just specify how to make digital IDs; it defines how these credentials can be reliably presented and verified online. This is crucial because remote verification has always been the Achilles’ heel of identity systems. How do you know someone isn’t just showing you a video or a photo of a fake ID? The standard addresses these challenges through a combination of cryptographic proofs and real-time challenge-response protocols that are resistant to replay attacks and deep fakes.

Government benefits show another critical use case. Benefits systems face a dual challenge: preventing fraud while ensuring legitimate access. mDL verification lets agencies validate both identity and residency through cryptographically signed government credentials. The same approach that proves your identity for a passport electronically at the TSA can prove your eligibility for benefits online. But unlike physical ID checks or basic document uploads, these verifications are resistant to the kind of sophisticated fraud we’re seeing with AI-generated documents and deepfake videos.

What’s more, major browsers are beginning to implement these standards as a first-class citizen. This means that verification of these digital equivalents of our physical identities will be natively supported by the web, ensuring that online interactions—from logging in to account recovery—are more easier and more secure than ever before.

Privacy and Future Applications

These mDLs have interesting privacy properties too. The standards support selective disclosure – proving you’re over 21 without showing your birth date, or verifying residency without exposing your address. You can’t do that with a physical ID card. More importantly, these privacy features work remotely – you can prove specific attributes about yourself online without exposing unnecessary personal information or risking your entire identity being captured and replayed by attackers.

We’re going to see this play out in sensitive scenarios like estate access. Imagine a case when someone needs to access a deceased partner’s accounts, they can prove their identity and when combined with other documents like marriage certificates and death certificates, they will be able to prove their entitlement to access that bank account without the overhead and complexity they need today. Some day we can even imagine those supporting documents to be in these wallets also, making it even easier.

The Path Forward

While the path from here to there is long and there are a lot of hurdles to get over, we are clearly on a path where this does happen. We will have standardized, government-backed identity verification that works across borders and jurisdictions. Not by replacing existing authentication systems, but by providing them with a stronger foundation for identity verification and recovery and remote identity verification – one that works even as AI makes traditional verification methods increasingly unreliable.

We’re moving from a world of island of identity systems to one with standardized and federated identity infrastructure, built on the same trust frameworks that back our most important physical credentials. And ironically, at least in the US it started with making driver’s licenses digital.

Key Management: A Meme Retrospective

Leave a reply

We all need a little laugh from time to time, especially when things get unexpectedly crazy. Well, yesterday was one of those days for me, so I decided to do a retrospective on what we call key management. I hope you enjoy!

We fixed secret management! By dumping everything into Vault and pretending it’s not a problem anymore….

Has anyone seen our cryptographic keys? They were right here… like, five years ago.

We need to improve our cryptographic security!
Discovers unprotected private keys lying around
Wait… if we have to discover our cryptographic keys, that means we aren’t actually managing them?

We secure video game DRM keys better than the keys protecting your bank account.

You get a shared secret! You get a shared secret! EVERYONE gets a shared secret! Shared secrets are not secret!

Why spend millions on cryptography if your keys spend 99% of their life unprotected? We need to fix key management first.

We don’t suck at cryptography—we suck at managing it. Everyone’s obsessing over PQC algorithms, but the real problem is deployment, key management, and lifecycle. PQC is just another spice—without proper management, it’s just seasoning on bad security.

The Identity Paradox: If It’s an Identity, Why Is It in a Secret Manager?

Leave a reply

Enterprises love to talk about identity-first security—until it comes to machines. Human users have IAM systems, SSO, MFA, and governance. But workloads? Their so-called identities are often just API keys and certificates stuffed into a secret manager.

And that’s the paradox. If we really believe workloads have identities, why do we manage them like passwords instead of enforcing real authentication, authorization, and lifecycle management?

The Real Problem: Secret Managers Aren’t Enough

Secret managers do what they’re designed for—secure storage, rotation, and access control. But that’s not identity. A vault doesn’t verify anything—it just hands out secrets to whoever asks. That’s like calling a password manager an MFA solution.

And the real problem? Modern workloads are starting to do identity correctly—legacy ones aren’t. Meanwhile, machines, specifically TLS certificates, are getting more and more like workloads every day.

Machines Are Becoming More Like Workloads, But Legacy Workloads Are Still Stuck in Machine-Era Thinking

Attackers usually don’t need to compromise the machine—they don’t even try. Instead, they target the workload, because that’s what’s:

Exposed to the outside world—APIs, services, and user-facing applications.
Running business logic—the real target.
Holding credentials needed for further compromise.

Modern workloads are starting to move past legacy machine identity models.

They use short-lived credentials tied to runtime environments.
They authenticate dynamically, not based on pre-registered certificates.
Their identity is policy-driven and contextual, not static.

Meanwhile, legacy workloads are still trying to manage identity like machines, relying on:

Long-lived secrets.
Pre-assigned credentials.
Vault-based access control instead of dynamic attestation.

And at the same time, machines themselves are evolving to act more like workloads.

Certificate lifetimes used to be measured in years—now they’re weeks, days, or even hours.
Infrastructure itself is ephemeral—cloud VMs come and go like workloads.
The entire model of pre-registering machines is looking more and more outdated.

If this sounds familiar, it should. We’ve seen this mistake before.

Your Machine Identity Model is Just /etc/passwd in the Cloud—Backed by a Database Your Vendor Called a Secret Manager

This is like taking every system’s /etc/passwd file, stuffing it into a database, and distributing copies to every machine.

And that’s exactly what many secret managers are doing today:

That’s not an identity system. That’s a password manager—just with all the same problems.

Storing long-lived credentials that should never exist in the first place.
Managing pre-issued secrets instead of issuing identity dynamically.
Giving access based on who has the key, not what the workload actually is.

Secret managers still have their place. But if your workload identity strategy depends entirely on a vault, you’re just doing machine-era identity for cloud workloads—or a bunch of manual preregistration and processes.

Modern workloads aren’t doing this anymore. They request identity dynamically when they start, and it disappears when they stop. Machines are starting to do the same.

The Four Big Problems with Workload Identity Today

1. No Real Authentication – Possession ≠ Identity

Most workload “identities” boil down to possessing an API key or certificate, which is like saying:

“If you have the password, you must be the right user.”

That’s not authentication. Workload identity should be based on what the workload is, not just what it holds. This is where attestation comes in—like MFA for workloads. Without proof that a workload is valid, a secret is just a reusable token waiting to be stolen.

2. No Dynamic Identification – Workloads Aren’t Pre-Registered

Unlike humans, workloads don’t have pre-verified identities. They don’t exist until they do. That means:

Credentials can’t be issued ahead of time—because the workload isn’t there yet.
Static identifiers (like pre-registered certs) don’t work well for ephemeral, auto-scaling workloads.
The only way to know if a workload should exist is to verify it in real-time.

We’ve moved from static servers to workloads that scale and move dynamically. Machine identity needs to follow.

3. Shorter Credential Lifetimes Aren’t the Problem—They’re Exposing the Real One

Shorter credential lifetimes are making security better, not worse. The more often something happens, the better you get at doing it right. But they’re also highlighting the weaknesses in legacy identity management models:

Workloads that relied on static, pre-provisioned credentials are now failing because they weren’t designed for rotation.
Teams that never had to deal with automated credential issuance are now struggling because they either essentially or literally did it manually.
The more often a system has to handle identity dynamically, the more obvious its weak points become.

Short-lived credentials aren’t breaking security—they’re exposing the fact that we were never doing it right to begin with.

4. Workloads Are Ephemeral, but Secrets Stick Around

A workload can vanish in seconds, but its credentials often outlive it. If a container is compromised, its secret can be exfiltrated and reused indefinitely unless extra steps are taken.

“Three people can keep a secret—if two are dead.”

The same applies here. A workload might be long gone, but if its secrets are still floating around in a vault, they’re just waiting to be misused. And even if the key is stored securely, nothing stops an attacker who compromises an application taking its secret and using it elsewhere in the network or often outside of it.

What This Fixes

By breaking these problems out separately, we make it clear:

Attackers go after workload credentials, not the machine itself—because workloads are exposed, hold secrets, and run business logic.
Machines need authentication, but workloads need dynamic, verifiable identities.
Pre-registration is failing because workloads are dynamic and short-lived.
Short-lived certs aren’t the issue—they’re exposing that static credential models were never scalable.
Secrets should disappear with the workload, not persist beyond its lifecycle.
The divide between machine and workload identity is closing—legacy models just haven’t caught up.

This Shift Is Already Happening

Workload identity is becoming dynamic, attested, and ephemeral. Some teams are solving this with emerging approaches like SPIFFE for workloads and ACME for machines. The key is recognizing that identity isn’t a stored artifact—it’s a real-time state.

Machines used to be static, predictable entities. You’d assign an identity and expect it to stick around for years. But today, cloud infrastructure is ephemeral—VMs come and go, certificates rotate in hours, and pre-registering machines is looking more and more like an outdated relic of on-prem identity thinking.

Modern workloads are starting to do identity correctly—legacy ones aren’t. Machines, specifically TLS certificates, are getting more and more like workloads every day.

Attackers usually care less about your machine’s identity. They care about the API keys and credentials inside your running applications.

If an identity is just a credential in a vault, it’s not identity at all—it’s just a password with a fancier name.

AI Agent Security: A Framework for Accountability and Control

1 Reply

This weekend, I came across a LinkedIn article by Priscilla Russo about OpenAI agents and digital wallets that touched on something I’ve been thinking about – liability and AI agents and how they change system designs. As autonomous AI systems become more prevalent, we face a critical challenge: how do we secure systems that actively optimize for success in ways that can break traditional security models? The article’s discussion of Knight Capital’s $440M trading glitch perfectly illustrates what’s at stake. When automated systems make catastrophic decisions, there’s no undo button – and with AI agents, the potential for unintended consequences scales dramatically with their capability to find novel paths to their objectives.

What we’re seeing isn’t just new—it’s a fundamental shift in how organizations approach security. Traditional software might accidentally misuse resources or escalate privileges, but AI agents actively seek out new ways to achieve their goals, often in ways developers never anticipated. This isn’t just about preventing external attacks; it’s about containing AI itself—ensuring it can’t accumulate unintended capabilities, bypass safeguards, or operate beyond its intended scope. Without containment, AI-driven optimization doesn’t just break security models—it reshapes them in ways that make traditional defenses obsolete.

“First, in 2024, O1 broke out of its container by exploiting a vuln. Then, in 2025, it hacked a chess game to win. Relying on AI alignment for security is like abstinence-only sex ed—you think it’s working, right up until it isn’t,” said the former 19-year-old father.

The Accountability Gap

Most security discussions around AI focus on protecting models from adversarial attacks or preventing prompt injection. These are important challenges, but they don’t get to the core problem of accountability. As Russo suggests, AI developers are inevitably going to be held responsible for the actions of their agents, just as financial firms, car manufacturers, and payment processors have been held accountable for unintended consequences in their respective industries.

The parallel to Knight Capital is particularly telling. When their software malfunction led to catastrophic trades, there was no ambiguity about liability. That same principle will apply to AI-driven decision-making – whether in finance, healthcare, or legal automation. If an AI agent executes an action, who bears responsibility? The user? The AI developer? The organization that allowed the AI to interact with its systems? These aren’t hypothetical questions anymore – regulators, courts, and companies need clear answers sooner rather than later.

Building Secure AI Architecture

Fail to plan, and you plan to fail. When legal liability is assigned, the difference between a company that anticipated risks, built mitigations, implemented controls, and ensured auditability and one that did not will likely be significant. Organizations that ignore these challenges will find themselves scrambling after a crisis, while those that proactively integrate identity controls, permissioning models, and AI-specific security frameworks will be in a far better position to defend their decisions.

While security vulnerabilities are a major concern, they are just one part of a broader set of AI risks. AI systems can introduce alignment challenges, emergent behaviors, and deployment risks that reshape system design. But at the core of these challenges is the need for robust identity models, dynamic security controls, and real-time monitoring to prevent AI from optimizing in ways that bypass traditional safeguards.

Containment and isolation are just as critical as resilience. It’s one thing to make an AI model more robust – it’s another to ensure that if it misbehaves, it doesn’t take down everything around it. A properly designed system should ensure that an AI agent can’t escalate its access, operate outside of predefined scopes, or create secondary effects that developers never intended. AI isn’t just another software component – it’s an active participant in decision-making processes, and that means limiting what it can influence, what it can modify, and how far its reach extends.

I’m seeing organizations take radically different approaches to this challenge. As Russo points out in her analysis, some organizations like Uber and Instacart are partnering directly with AI providers, integrating AI-driven interactions into their platforms. Others are taking a defensive stance, implementing stricter authentication and liveness tests to block AI agents outright. The most forward-thinking organizations are charting a middle path: treating AI agents as distinct entities with their own credentials and explicitly managed access. They recognize that pretending AI agents don’t exist or trying to force them into traditional security models is a recipe for disaster.

Identity and Authentication for AI Agents

One of the most immediate problems I’m grappling with is how AI agents authenticate and operate in online environments. Most AI agents today rely on borrowed user credentials, screen scraping, and brittle authentication models that were never meant to support autonomous systems. Worse, when organizations try to solve this through traditional secret sharing or credential delegation, they end up spraying secrets across their infrastructure – creating exactly the kind of standing permissions and expanded attack surface we need to avoid. This might work in the short term, but it’s completely unsustainable.

The future needs to look more like SPIFFE for AI agents – where each agent has its own verifiable identity, scoped permissions, and limited access that can be revoked or monitored. But identity alone isn’t enough. Having spent years building secure systems, I’ve learned that identity must be coupled with attenuated permissions, just-in-time authorization, and zero-standing privileges. The challenge is enabling delegation without compromising containment – we need AI agents to be able to delegate specific, limited capabilities to other agents without sharing their full credentials or creating long-lived access tokens that could be compromised.

Systems like Biscuits and Macaroons show us how this could work: they allow for fine-grained scoping and automatic expiration of permissions in a way that aligns perfectly with how AI agents operate. Instead of sharing secrets, agents can create capability tokens that are cryptographically bound to specific actions, contexts, and time windows. This would mean an agent can delegate exactly what’s needed for a specific task without expanding the blast radius if something goes wrong.

Agent Interactions and Chain of Responsibility

What keeps me up at night isn’t just individual AI agents – it’s the interaction between them. When a single AI agent calls another to complete a task, and that agent calls yet another, you end up with a chain of decision-making where no one knows who (or what) actually made the call. Without full pipeline auditing and attenuated permissions, this becomes a black-box decision-making system with no clear accountability or verifiablity. That’s a major liability problem – one that organizations will have to solve before AI-driven processes become deeply embedded in financial services, healthcare, and other regulated industries.

This is particularly critical as AI systems begin to interact with each other autonomously. Each step in an AI agent’s decision-making chain must be traced and logged, with clear accountability at each transition point. We’re not just building technical systems—we’re building forensic evidence chains that will need to stand up in court.

Runtime Security and Adaptive Controls

Traditional role-based access control models fundamentally break down with AI systems because they assume permissions can be neatly assigned based on predefined roles. But AI doesn’t work that way. Through reinforcement learning, AI agents optimize for success rather than security, finding novel ways to achieve their goals – sometimes exploiting system flaws in ways developers never anticipated. We have already seen cases where AI models learned to game reward systems in completely unexpected ways.

This requires a fundamental shift in our security architecture. We need adaptive access controls that respond to behavior patterns, runtime security monitoring for unexpected decisions, and real-time intervention capabilities. Most importantly, we need continuous behavioral analysis and anomaly detection that can identify when an AI system is making decisions that fall outside its intended patterns. The monitoring systems themselves must evolve as AI agents find new ways to achieve their objectives.

Compliance by Design

Drawing from my years building CAs, I’ve learned that continual compliance can’t just be a procedural afterthought – it has to be designed into the system itself. The most effective compliance models don’t just meet regulatory requirements at deployment; they generate the artifacts needed to prove compliance as natural byproducts of how they function.

The ephemeral nature of AI agents actually presents an opportunity here. Their transient access patterns align perfectly with modern encryption strategies – access should be temporary, data should always be encrypted, and only authorized agents should be able to decrypt specific information for specific tasks. AI’s ephemeral nature actually lends itself well to modern encryption strategies – access should be transient, data should be encrypted at rest and in motion, and only the AI agent authorized for a specific action should be able to decrypt it.

The Path Forward

If we don’t rethink these systems now, we’ll end up in a situation where AI-driven decision-making operates in a gray area where no one is quite sure who’s responsible for what. And if history tells us anything, regulators, courts, and companies will eventually demand a clear chain of responsibility – likely after a catastrophic incident forces the issue.

The solution isn’t just about securing AI – it’s about building an ecosystem where AI roles are well-defined and constrained, where actions are traceable and attributable, and where liability is clear and manageable. Security controls must be adaptive and dynamic, while compliance remains continuous and verifiable.

Organizations that ignore these challenges will find themselves scrambling after a crisis. Those that proactively integrate identity controls, permissioning models, and AI-specific security frameworks will be far better positioned to defend their decisions and maintain control over their AI systems. The future of AI security lies not in building impenetrable walls, but in creating transparent, accountable systems that can adapt to the unique challenges posed by autonomous agents.

This post lays out the challenges, but securing AI systems requires a structured, scalable approach. In Containing the Optimizer: A Practical Framework for Securing AI Agent Systems I outline a five-pillar framework that integrates containment, identity, adaptive monitoring, and real-time compliance to mitigate these risks.

Why It’s Time to Rethink Machine and Workload Identity: Lessons from User Security

Leave a reply

MFA slashed credential-based attacks. Passwordless authentication made phishing harder than ever. These breakthroughs transformed user security—so why are machines and workloads still stuck with static secrets and long-lived credentials?

While we’ve made remarkable progress in securing user identity, the same cannot always be said for machine and workload identity—servers, workloads, APIs, and applications. Machines often rely on static secrets stored in configuration files, environment variables, or files that are copied across systems. Over time, these secrets become fragmented, overly shared, and difficult to track, creating significant vulnerabilities. The good news? Machines and workloads are arguably easier to secure than humans, and applying the same principles that worked for users—like short-lived credentials, multi-factor verification, and dynamic access—can yield even greater results.

Let’s take the lessons learned from securing users and reimagine how we secure machines and workloads.

From Static Secrets to Dynamic Credentials

Machine and workload identity have long been built on the shaky foundation of static secrets—API keys, passwords, or certificates stored in configuration files, environment variables, or local files. These secrets are often copied across systems, passed between teams, and reused in multiple environments, making them not only overly shared but also hard to track. This lack of visibility means that a single forgotten or mismanaged secret can become a point of entry for attackers.

The lesson from user security is clear: static secrets must be replaced with dynamic, ephemeral credentials that are:

Short-lived: Credentials should expire quickly to minimize exposure.
Context-aware: Access should be tied to specific tasks or environments.
Automatically rotated: Machines and workloads should issue, validate, and retire credentials in real-time without human intervention.

This shift is about evolving from secret management to credential management, emphasizing real-time issuance and validation over static storage. Just as password managers gave way to passwordless authentication, dynamic credentialing represents the next step in securing machines and workloads.

Attestation: The MFA for Machines and Workloads

For users, MFA became critical in verifying identity by requiring multiple factors: something you know, have, or are. Machines and workloads need an equivalent, and attestation fills that role.

Attestation acts as the MFA for machines and workloads by providing:

Proof of identity: Verifying that a machine or workload is legitimate.
Proof of context: Ensuring the workload’s environment and posture align with security policies.
Proof of trustworthiness: Validating the workload operates within secure boundaries, such as hardware-backed enclaves or trusted runtimes.

Just as MFA reduced compromised passwords, attestation prevents compromised machines or workloads from gaining unauthorized access. It’s a dynamic, context-aware layer of security that aligns perfectly with Zero Trust principles.

Zero Trust: Reclaiming the Original Vision

When Zero Trust was introduced, it was a design principle: “Never trust, always verify.” It challenged the idea of implicit trust and called for dynamic, contextual verification for every access request.

But somewhere along the way, marketers reduced Zero Trust to a buzzword, often pushing solutions like VPN replacements or network segmentation tools.

To reclaim Zero Trust, we need to:

Treat all access as privileged access: Every request—whether from a user, machine, or workload—should be verified and granted the least privilege necessary.
Apply dynamic credentialing: Replace static secrets with short-lived credentials tied to real-time context.
Extend MFA principles to machines and workloads: Use attestation to continuously verify identity, context, and trustworthiness.

Preparing for the Future: Agentic AI and the Need for Robust Machine and Workload Identity

As organizations increasingly adopt agentic AI systems—autonomous systems that execute tasks and make decisions on behalf of users—the need for robust machine and workload identity management becomes even more pressing. These systems often require delegated access to resources, APIs, and other identities. Without proper safeguards, they introduce new attack surfaces, including:

Over-permissioned access: Delegated tasks may unintentionally expose sensitive resources.
Static secrets misuse: Secrets stored in configuration files or environment variables can become high-value targets for attackers, especially when copied across systems.
Fragmented visibility: Secrets that are spread across teams or environments are nearly impossible to track, making it hard to detect misuse.

To securely deploy agentic AI, organizations must:

Implement dynamic credentials: Ensure AI systems use short-lived, context-aware credentials that expire after each task, reducing the risk of abuse.
Require attestation: Validate the AI’s environment, behavior, and identity before granting access, just as you would verify a trusted workload.
Continuously monitor and revoke access: Apply zero standing privileges, ensuring access is granted only for specific tasks and revoked immediately afterward.

Building strong foundations in machine and workload identity management today ensures you’re prepared for the growing complexity of AI-driven systems tomorrow.

A Call to Action for Security Practitioners

For years, we’ve made meaningful progress in securing users, from deploying MFA to replacing passwords with strong authenticators. These changes worked because they addressed fundamental flaws in how identity and access were managed.

Now, it’s time to ask: Where else can we apply these lessons?

Look for parallels:

If replacing passwords reduced breaches for users, then replacing static secrets with dynamic credentials for machines and workloads can deliver similar results.
If MFA improved user authentication, then attestation for machines and workloads can add the same level of assurance to machine identity.
E2E encryption for personal communications vs. process-to-process security: End-to-end encryption has drastically improved the privacy of our personal communications, ensuring messages are secure from sender to recipient. Similarly, robust authentication and encryption between processes—ensuring that only trusted workloads communicate—can bring the same level of assurance to machine-to-machine communications, protecting sensitive data and operations.

By identifying these parallels, we can break down silos, extend the impact of past successes, and create a truly secure-by-default environment.

Final Thought

Security practitioners should always ask: Where have we already made meaningful progress, and where can we replicate that success?

If replacing passwords and adding MFA helped reduce user-related breaches, then replacing static secrets and adopting attestation for machines and workloads is a natural next step—one that is arguably quicker and easier to implement, given that machines and workloads don’t resist change.

Zero Trust was never meant to be a buzzword. It’s a call to rethink security from the ground up, applying proven principles to every layer of identity, human or machine. By embracing this approach, we can build systems that are not only resilient but truly secure by design.

What Makes a QR Code Verifiable?

Leave a reply

QR codes are everywhere—tickets, ID cards, product packaging, menus, and even Wi-Fi setups. They’ve become a cornerstone of convenience, and most of us scan them without hesitation. But here’s the thing: most QR codes aren’t cryptographically signed. In practice, this means we’re trusting their contents without any way to confirm they’re authentic or haven’t been tampered with.

One reason QR codes are so useful is their data density. They can store much more information than simpler formats like barcodes, making them ideal for embedding cryptographic metadata, references, or signatures while remaining scannable. However, QR codes have size limits, which means the cryptographic overhead for signing needs to be carefully managed to maintain usability.

While unauthenticated QR codes are fine for low-stakes uses like menus, relying on them for sensitive applications introduces risk. Verifiable QR codes use cryptographic signatures to add trust and security, ensuring authenticity and integrity—even in a post-quantum future.

How Are Verifiable QR Codes Different?

The key difference lies in cryptographic signatures. Verifiable QR codes use them to achieve two things:

Authentication: They prove the QR code was generated by a specific, identifiable source.
Integrity: They ensure the data in the QR code hasn’t been altered after its creation.

This makes verifiable QR codes especially useful in scenarios where trust is critical. For instance, an ID card might contain a QR code with a cryptographic signature over its MRZ (Machine Readable Zone). If someone tampers with the MRZ, the signature becomes invalid, making forgery far more difficult.

Why Think About Post-Quantum Security Now?

Many systems already use signed QR codes for ticketing, identity verification, or supply chain tracking. However, these systems often rely on classical cryptographic algorithms like RSA or ECDSA, which are vulnerable to quantum attacks. Once quantum computers become practical, they could break these signatures, leaving QR codes open to forgery.

That’s where post-quantum cryptography (PQC) comes in. PQC algorithms are designed to resist quantum attacks, ensuring the systems we rely on today remain secure in the future. For QR codes, where size constraints matter, algorithms like UOV and SQISign are especially promising. While most standardized PQC algorithms (like CRYSTALS-Dilithium or Falcon) produce relatively large signatures, UOV and SQISign aim to reduce signature sizes significantly. This makes them better suited for QR codes, which have limited space to accommodate cryptographic overhead.

By adopting post-quantum signatures, verifiable QR codes can address today’s security needs while ensuring long-term resilience in a post-quantum world.

What’s Practical in Implementation?

For verifiable QR codes to work at scale, standard formats and easy-to-use verifiers are essential. Ideally, your smartphone’s default camera should handle verification without requiring extra apps, potentially deep-linking into installed applications. This kind of seamless integration is crucial for widespread adoption.

Verifiable QR codes don’t need to include all the data they validate. Instead, they can store a reference, an identifier, and a cryptographic signature. This approach stays within QR code size limits, accommodating cryptographic overhead while keeping the codes lightweight and usable.

Think of verifiable QR codes as digital certificates. They tie the QR code’s contents back to an issuer within a specific ecosystem, whether it’s a ticketing platform, a supply chain, or an identity system. To build transparency and trust, these signatures could even be logged in a transparency log (tlog), much like Certificate Transparency for web certificates. This would make the issuance of QR codes auditable, ensuring not only the validity of the signature but also when and by whom it was issued.

What About Purely Digital Use Cases?

Even without a physical object like a driver’s license, verifiable QR codes offer significant value. For instance, an online ticket or access pass can prove its issuer and verify its contents with contactless reading. Key benefits include:

Confirming the QR code came from a legitimate issuer (e.g., a trusted ticketing platform).
Ensuring the content hasn’t been altered, reducing phishing or tampering risks.

This assurance is especially critical in digital-only contexts where physical cross-checking isn’t an option, or additional information is needed to verify the object.

Where Verifiable QR Codes Shine

URL-Based QR Codes: Phishing is a growing problem, and QR codes are often used as bait. A verifiable QR code could cryptographically confirm a URL matches its intended domain, letting users know it’s safe before they click—a game-changer for consumers and enterprises.
Identity and Credentials: Driver’s licenses or passports could include QR codes cryptographically tied to their data. Any tampering, digital or physical, would break the signature, making counterfeits easier to detect.
Event Tickets: Ticket fraud costs billions annually. Verifiable QR codes could tie tickets to their issuing authority, allowing limited offline validation while confirming authenticity.
Supply Chain Security: Counterfeiting plagues industries like pharmaceuticals and luxury goods. Signed QR codes on packaging could instantly verify product authenticity without needing centralized databases.
Digital Proof of Vaccination: During the COVID-19 pandemic, QR codes became a common way to share vaccination records. A verifiable QR code would tie the data to an official source, simplifying verification while reducing counterfeit risks at borders, workplaces, or events.

Enhancing Trust in Everyday Interactions

Verifiable QR codes bridge the gap between convenience and trust. By incorporating cryptographic signatures—especially post-quantum ones—they add a necessary layer of security in an increasingly digital world.

While they won’t solve every problem, verifiable QR codes offer a practical way to improve the reliability of systems we already depend on. From verifying tickets and vaccination records to securing supply chains, they provide a scalable and effective solution for building trust into everyday interactions. As verification tools integrate further into devices and platforms, verifiable QR codes could become a cornerstone of authenticity in both physical and digital spaces.

The Laws of Stupidity and the Gaps in Your Security Posture

Leave a reply

Carlo M. Cipolla, in his essay The Basic Laws of Human Stupidity, laid out a set of principles that are both hilarious and uncomfortably accurate when applied to everyday life. If you’ve ever watched a perfectly preventable security breach unfold and thought, “How did no one see this coming?” Cipolla has an explanation: stupidity—the kind that causes harm without benefiting anyone.

In security, stupidity isn’t just a human problem. It’s systemic. Your security posture is the sum of every decision you make—large or small, deliberate or “temporary.” Vulnerabilities don’t just happen; they’re created at the intersections of components and processes where decisions are made in isolation. And as Cipolla’s laws remind us, these decisions often externalize harm without yielding any real benefit to the decision-makers.

Cipolla’s Third Law states: “A stupid person is one who causes losses to another person or group of persons while deriving no gain and even possibly incurring losses themselves.” Unfortunately, this describes many decisions in security architecture. Consider a product team that ships a feature with hard-coded credentials because “it’s just for testing,” or an infrastructure team that approves open SSH access from anywhere because “we’ll lock it down later.” These decisions aren’t malicious, but they create cascading vulnerabilities that attackers are happy to exploit.

As Cipolla reminds us, the most dangerous kind of stupidity comes from ignoring the bigger picture. A classic example is teams measuring “success” by the number of CVEs closed or bugs fixed while ignoring metrics that actually reflect resilience, like lateral movement resistance or detection speed. It’s like polishing the hood of your car while leaving the gas tank open.

For a fun analogy, let’s turn to Star Wars. When the droids took over a ship’s trash system to gain access to more critical systems, they exploited what seemed like an insignificant component. As Adam Shostack highlights in his book Threats: What Every Engineer Should Learn from Star Wars, the trash system is a classic example of how attackers exploit overlooked parts of a system to achieve much bigger objectives. Security isn’t about protecting what seems important—it’s about understanding that any overlooked vulnerability can become critical. Whether it’s an unpatched library in your supply chain or a misconfigured process, attackers are happy to exploit your blind spots. If your trash system can sink your flagship, you’ve got bigger problems.

How do you avoid these mistakes? It starts by measuring the right things. Vanity metrics like “bugs closed” or “CVE counts” are security theater. They make you feel good but don’t tell you whether your system is truly secure. Engineers love optimizing for metrics—it’s in their blood. But optimizing for the wrong ones creates a false sense of security.

Instead, focus on metrics that reflect real resilience:

Lateral movement resistance: How hard is it for an attacker to move from one compromised system to another?
Detection speed: How quickly can you identify a breach? (And no, “when the customer calls” doesn’t count.)
Response effectiveness: Once detected, how quickly can you contain and neutralize the threat?
Minimized attack surfaces: How lean are your deployment images? Are you running unnecessary packages or services?
Key management hygiene: Are credentials rotated frequently? Are static secrets eliminated in favor of short-lived credentials?

These metrics focus on outcomes, not activity. While no single metric is sufficient, together they provide a clearer picture of how well security is embedded into the fabric of your organization.

Microsoft’s recent push to create division-wide Chief Security Officers is a good step toward addressing security silos. By embedding security leadership at the division level, they’re recognizing that vulnerabilities often arise between components, not just within them. But this alone isn’t enough. Security needs to be designed into the architecture itself, not just layered on as a management structure. It’s about ensuring every decision—from how APIs handle garbage inputs to how your CI/CD pipelines handle third-party code—is made with security in mind.

This is where proactive humility comes in: acknowledging that mistakes will happen, blind spots will exist, and systems must be designed to fail gracefully. Defense in depth isn’t just a buzzword—it’s an acknowledgment that your trash system will be attacked, and you’d better be ready for it.

Cipolla’s framework highlights a critical distinction:

Intelligent decisions benefit everyone—users, developers, and security teams—without externalizing harm. Think of secure defaults, automated safeguards, and least-privilege architectures.
Stupid decisions, on the other hand, create risk for everyone while providing no real gain. Hard-coded credentials, unnecessary privileges, or ignoring supply chain risks fall squarely into this category.

The challenge is to make intelligent decisions easier than stupid ones. This requires strong governance, effective tooling, and metrics that reward resilience over vanity. It’s not about avoiding mistakes altogether—that’s impossible—it’s about making it harder to make the big ones.

Cipolla’s laws might seem like a humorous take on human behavior, but they offer a sobering reminder of the gaps in security posture. Whether it’s overlooking the trash system in Star Wars or counting CVEs while ignoring systemic risks, stupidity in security is often the result of narrow thinking and poor measurement. The solution? Embed security into the fabric of your organization, focus on meaningful metrics, and foster a culture of proactive humility. By designing systems that make intelligent decisions easier than stupid ones, you can stop polishing the hood and start closing the gas tank.

Why the Mozilla Policy Matters: Trust at Stake

2025 Policy: Your Incident Response Framework

SRE: The Enhancement for Resilience

Case Studies: Preparation Demonstrated

Let’s Encrypt: Prepared When It Mattered

DigiCert: Caught Unprepared

Implementing the New CCADB Requirements

Your Next Steps: Building Resilience

Preparation Is Essential

Core Argument

What Problem Are We Solving Anyway?

Critics’ Refrain

Conclusion

From Bolt-On to Secure-by-Design

The Three Audiences You Need to Win Over

The Technical Champion (80% Tech / 20% Business)

The Economic Buyer (20% Tech / 80% Business)

The Broader Market: Present, Teach, and Farm

Overcoming Common Challenges

The “Not Invented Here” Mindset

The Platform vs. Product Dilemma

Navigating Cross-Organizational Dependencies

The Timing and Budget Challenge

The Production Readiness Gap

Overcoming Entrenched Solutions

Putting It All Together

The Government Solution

Modern Threats and Solutions

Applications and Implementation

Technical Framework

Privacy and Future Applications

The Path Forward

We fixed secret management! By dumping everything into Vault and pretending it’s not a problem anymore….

Has anyone seen our cryptographic keys? They were right here… like, five years ago.

We need to improve our cryptographic security!Discovers unprotected private keys lying aroundWait… if we have to discover our cryptographic keys, that means we aren’t actually managing them?

We secure video game DRM keys better than the keys protecting your bank account.

You get a shared secret! You get a shared secret! EVERYONE gets a shared secret! Shared secrets are not secret!

Why spend millions on cryptography if your keys spend 99% of their life unprotected? We need to fix key management first.

We don’t suck at cryptography—we suck at managing it. Everyone’s obsessing over PQC algorithms, but the real problem is deployment, key management, and lifecycle. PQC is just another spice—without proper management, it’s just seasoning on bad security.

The Real Problem: Secret Managers Aren’t Enough

Machines Are Becoming More Like Workloads, But Legacy Workloads Are Still Stuck in Machine-Era Thinking

Your Machine Identity Model is Just /etc/passwd in the Cloud—Backed by a Database Your Vendor Called a Secret Manager

The Four Big Problems with Workload Identity Today

1. No Real Authentication – Possession ≠ Identity

2. No Dynamic Identification – Workloads Aren’t Pre-Registered

3. Shorter Credential Lifetimes Aren’t the Problem—They’re Exposing the Real One

4. Workloads Are Ephemeral, but Secrets Stick Around

What This Fixes

This Shift Is Already Happening

The Accountability Gap

Building Secure AI Architecture

Identity and Authentication for AI Agents

Agent Interactions and Chain of Responsibility

Runtime Security and Adaptive Controls

Compliance by Design

The Path Forward

From Static Secrets to Dynamic Credentials

Attestation: The MFA for Machines and Workloads

Zero Trust: Reclaiming the Original Vision

Preparing for the Future: Agentic AI and the Need for Robust Machine and Workload Identity

A Call to Action for Security Practitioners

Final Thought

How Are Verifiable QR Codes Different?

Why Think About Post-Quantum Security Now?

What’s Practical in Implementation?

What About Purely Digital Use Cases?

Where Verifiable QR Codes Shine

Enhancing Trust in Everyday Interactions

We need to improve our cryptographic security!
Discovers unprotected private keys lying around
Wait… if we have to discover our cryptographic keys, that means we aren’t actually managing them?