Technology bias—we all have it—but it often gets in the way.

An old saying goes, “When you’re a plumber, you fix everything with a wrench.” It highlights a truth: we naturally gravitate toward the tools, people, and methods we know and trust most. This tendency stems from cognitive biases like anchoring—our reliance on initial information—and confirmation bias, which pushes us to favor ideas that align with our existing beliefs. While these biases help us make quick decisions, they can also blind us to better alternatives.

Another saying, “To know thyself is to be true,” resonates here. Even with my deep experience in PKI, I consciously revisit first principles whenever I consider applying it to a new problem. Is this really the best solution? PKI, like many technologies, carries hidden baggage that isn’t always visible, and over-reliance on familiarity can obscure better approaches.

The danger of sticking to the familiar becomes evident in the adoption of Infrastructure as Code (IaC). When tools like Terraform and CloudFormation emerged, many teams resisted, clinging to manual infrastructure management because it felt familiar and unnecessary. Yet manual approaches introduced inconsistency, inefficiency, and even security risks. Teams that embraced IaC unlocked scalable, repeatable workflows that transformed operations. IaC not only streamlined processes but also embedded elements of compliance and best practices directly into code. What outdated practices might we be holding onto today that prevent us from unlocking similar benefits?

I recently encountered a similar situation during a meeting with the leader of a large IT organization. They were eager to adopt a technology developed by someone they trusted personally. However, when I asked fundamental questions like, “How much time do you have to deliver this project?” and “What other systems need to interoperate for this to be considered a success?” it became clear that the technology wasn’t the right fit—at least not yet. By breaking the problem down to its fundamentals, we uncovered insights that their initial bias had obscured.

Practicing first-principles thinking can help sidestep these pitfalls. Start by identifying the core problem: what is the actual goal? What constraints are truly fixed, and which are merely assumptions? From there, challenge each assumption. Is there an alternative approach that better addresses the need? This process not only reduces the influence of bias but also fosters creativity and more effective solutions.

Biases aren’t inherently bad—they help us move quickly—but as the example of IaC demonstrates, unchecked bias can limit us. By anchoring decisions in first principles, we can do more than solve problems; we open the door to better solutions. Asking, “Is this truly the best approach?” ensures we don’t just repeat old patterns but discover new opportunities to improve and thrive.

Government CAs and the WebPKI: Trust is Often the Opposite of Security

Following my recent post about another CA failing the “Turing test” with a likely MITM certificate issuance, let’s examine a troubling pattern: the role of government-run and government-affiliated CAs in the WebPKI ecosystem. This incident brings attention to Microsoft’s root program, what is clear is a fundamental contradiction persists: we’re trusting entities whose institutional incentives often directly conflict with the security goals of the WebPKI.

The Value Proposition

Let me be clear—CAs and root programs serve critical functions in the WebPKI. As I discussed in my article about Trust On First Use, attempting to build trust without them leads to even worse security outcomes. The issue isn’t whether we need CAs—we absolutely do. The question is whether our current trust model, which treats all CAs as equally trustworthy regardless of their incentives and constraints, actually serves our security goals.

The Core Contradiction

History has repeatedly shown that the temptation to abuse these capabilities is simply too great. Whether it’s decision-makers acting in their perceived national interest or CAs that fail to understand—or choose to ignore—the consequences of their actions, we keep seeing the same patterns play out.


Consider that a CA under government oversight faces fundamentally different pressures than one operating purely as a business. While both might fail, the failure modes and their implications for users differ dramatically. Yet our root programs largely pretend these differences don’t exist.

The DarkMatter Paradox

The removal of DarkMatter as a CA due to its affiliation with the UAE government, despite its clean record in this context, starkly contrasts with the continued trust granted to other government-affiliated CAs with documented failures. This inconsistency highlights a deeper flaw in root programs: Rules are often applied reactively, addressing incidents after they occur, rather than through proactive, continuous, and consistent enforcement.

A History of Predictable Failures

If you read yesterday’s post, you may recall my 2011 post on the number of government-run or affiliated CAs. The intervening years have given us a clear pattern of failures. Whether through compromise, willful action, or “accidents” (take that as you will), here are just the incidents I can recall off the top of my head—I’m sure there are more:

The Economics of (In)Security

The fundamental problem isn’t just technical—it’s economic. While some root programs genuinely prioritize security, inconsistencies across the ecosystem remain a critical challenge. The broader issue is not simply about convenience but about conflicting incentives—balancing compatibility, regulatory pressures, and market demands often at the expense of doing what is best for end users.


CAs face strong incentives to maintain their trusted status but relatively weak incentives to uphold the rigorous security practices users expect. The cost of their security failure is largely borne by users, while the benefits of looser practices accrue directly to the CA. Audits, much like those in financial scandals such as Wirecard or Enron, often serve as window dressing. With CAs selecting and paying their auditors, incentives rarely align with rigorous enforcement.


The long tail of rarely-discussed CAs is particularly concerning. Many root certificates in browser trust stores belong to CAs that issue only dozens to hundreds of certificates annually, not the thousands or millions that major CAs produce. Some haven’t issued a certificate in ages but retain the capability to do so—and with it, the ability to compromise security for months or longer. It wouldn’t be unreasonable to say these low-volume CAs pose risks far outweighing their utility.

Certificate Transparency: Necessary but Not Sufficient

While Certificate Transparency has been invaluable in helping identify incidents (including the latest ICP-Brasil case), it’s not a complete solution. Its limitations include:

  • Reactive nature: Violations are identified only after they occur.
  • Monitoring challenges: Effective oversight is resource-intensive and depends on a small community of volunteers.
  • Incomplete coverage: Not all certificates are logged, leaving gaps in visibility.
  • Poorly funded: We have too few logs and monitors to have confidence about the long-term survivability of the ecosystem.

The Limits of Technical Controls

Some browsers have implemented technical guardrails for some CA mistakes in their validation logic, such as basic certificate linting and rules, to reject certificates that don’t pass basic checks but nothing more granular. There have been discussions about imposing additional restrictions on CAs based on their relationship to government oversight or regulatory jurisdictions. However, these proposals face significant pushback, partly due to the political consequences for browser vendors and partly due to concerns about basing trust decisions on “future crime” scenarios. As a result, the WebPKI remains stuck with a one-size-fits-all approach to CA trust.

The Monitoring Gap

The challenges extend beyond malicious behavior to include operational oversight. For instance, in August 2024, ICP-Brasil formally announced they would cease issuing publicly trusted SSL/TLS certificates. Yet by November, they issued a rogue certificate for google.com. This outcome was predictable—public CT logs in 2020 revealed their consistent inability to handle basic operational and issuance requirements, including issuing certificates with invalid DNS names and malformed URLs. Despite these red flags, they remained trusted.


How many other CAs operate outside their stated parameters without detection? Patterns of technical incompetence frequently precede security incidents, but warnings are often ignored.

Required Reforms

To address these systemic issues, root programs must adopt the following measures:

  1. Consistent Standards: Apply appropriate scrutiny to CAs based on their operational and institutional context.
  2. Swift Response Times: Minimize delays between discovery and action.
  3. Proactive Enforcement: Treat red flags as early warnings, not just post-incident justifications.
  4. Technical Controls: Implement meaningful restrictions to limit the scope of certificate issuance.
  5. Automated Compliance: Require CAs to report security incidents, and operational, and ongoing compliance while continuingly to monitor them via automated checks for compliance.
  6. Value Assessment: Regularly evaluate whether each CA’s utility justifies its risks and remove those that do not.

Protecting Yourself

Until the ecosystem adopts consistent and enforceable security measures:

  • Windows users should monitor Microsoft’s root program decisions.
  • Enterprises should use the Microsoft distrust store and group policies.
  • Everyone should stay informed about CA incidents and their handling.

When Will We Learn?

The “Turing Test” reference in my previous post was somewhat tongue-in-cheek, but it points to serious questions: How many more failures will it take before we fundamentally reform the WebPKI? Even if we know what’s needed, can we realistically create a system that treats government-affiliated CAs differently – or even reliably identify such affiliations – given the complex web of international relations, corporate structures and potential diplomatic fallout?

With regulatory frameworks like eIDAS 2.0 potentially constraining security measures browsers can take, vigilance from the security community is more critical than ever. Stay vigilant, and keep watching those CT logs. Someone has to.

Another CA Fails the Turing Test?

In a concerning development, yet another Certificate Authority (CA) has issued what is likely a man-in-the-middle (MITM) certificate—something strictly prohibited by all root programs. This particular case is unique because the CA is trusted only by Microsoft, making the situation both frustratingly familiar and uniquely problematic. Details are emerging in this Bugzilla thread.

A Familiar Pattern

Back in 2011, I wrote about Microsoft’s trust in government-run CAs and the inherent risks (read here). More than a decade later, it’s clear little has changed. Browser distrust events happen with disappointing regularity—roughly every 1.25 years, according to my analysis (source). While MITM certificate issuance is far rarer, it’s far more serious, and a disturbing trend is evident: Many of the CAs responsible are government-run or affiliated.

Why This Matters to You

For Windows users, this is particularly relevant. Windows browsers like Edge (and others) rely on the Microsoft Root Program, which has unfortunately historically been overly permissive and slow to respond to incidents. You can learn more about the program and its requirements here. In the recent past, I can’t recall a CA responsible for willfully issuing an MITM certificate surviving, but the the timeline for Microsoft’s response is unclear. That said, when Microsoft does act, their AutoRoot Update feature—which I was the product manager for in the early 2000s—allows them to respond swiftly.

In the meantime, you can protect yourself by identifying and distrusting the offending certificate. Enterprises, in particular, can take a proactive stance by using the Microsoft distrust store. Through group policy, IT administrators can preemptively distrust the problematic CA across their organization, mitigating the risk before Microsoft formally acts.

The Lack of Technical Controls

It’s worth noting there are no technical controls that inherently prevent CAs from issuing MiTM certificates (though some browsers do have technical controls for some classes of misissuance). Instead, the WebPKI ecosystem relies on Certificate Transparency (CT) logs and a dedicated community of people closely monitoring CA issuance for violations of requirements. In a way, this incident serves as a smoke test for the system—but when it comes to these MITM certificates, it’s an awfully expensive test, with significant risks both for users of the web and reputational risks for the root programs, as well as questions about the trustworthiness of the WebPKI in general.

Predictable Chaos

If you’re following this story, keep an eye on the Bugzilla thread. Based on past experience, I’d wager the CA in question will bungle its incident response. MITM certificate issuance often reflects systemic issues, and such organizations typically don’t have the maturity to handle these crises well.

If this topic interests you, here’s some further reading:

For a deeper dive, here’s a class I run on the topic of WebPKI incident response and how they’re (mis)handled.

Lessons Unlearned

While it’s comforting to know mechanisms like Certificate Transparency exist to catch these incidents, the recurring nature of these failures raises the question: Are we doing enough to hold CAs accountable?

Trust in the web depends on the reliability of its foundational systems. It’s time we demand higher standards from the organizations entrusted with securing our online world. Until then, stay informed, protect yourself, and let’s hope the next CA at least manages to pass the “Turing Test.”

Proactive Security: Engineering Resilience from the Ground Up

Picture discovering your house has been robbed. Like many homeowners in this situation, your first instinct might be to invest in the latest security system with cameras and motion sensors. But what if the thief simply walked through an unlocked door, exploiting the most basic failure of security? No amount of surveillance would have prevented such a fundamental oversight.

This scenario mirrors how many organizations approach security today. Companies invest heavily in sophisticated detection and response tools and a patchwork of workarounds to basic design flaws while neglecting basic security practices, creating a false sense of security all built on a shaky foundation. According to Gartner, global cybersecurity spending reached $188.3 billion in 2023, yet breaches continue to rise because we’re treating symptoms while ignoring their root causes.

The Real Cost of Reactive Security

Detection and monitoring tools can provide valuable insights but cannot compensate for fundamental security weaknesses. Many organizations invest heavily in sophisticated detection capabilities while leaving basic architectural vulnerabilities unaddressed—much like a house with state-of-the-art cameras but unlocked doors.

The U.S. Government Accountability Office recently highlighted this problem in stark terms: ten critical federal IT systems, ranging from 8 to 51 years old, cost taxpayers $337 million annually to maintain. Many of them rely on obsolete technologies like COBOL, where maintenance costs continue to rise due to scarce expertise. The thing is that we’ve learned a lot about building secure systems in the last 51 years — as a result, these systems have no chance when faced with a moderately skilled attacker. While government systems make headlines, similar issues affect private enterprises, where legacy systems persist due to the perceived cost and risk of modernization.

The persistence of basic security flaws isn’t just a technical failure; it often represents a systemic underinvestment in foundational security architecture. Consider weaknesses such as:

  • Outdated Architectures
    Decades-old systems that cannot meet modern security demands.
  • Minimal Security Hygiene
    Poor patching practices, weak service-to-service authentication, and a lack of hardened or unikernel images.
  • Weak Design Principles
    Core concepts like zero trust and least privilege can not be bolted on later leaving systems exposed.
  • Lack of Lifecycle Planning
    Without clear modernization plans, organizations face costly and disruptive migrations when problems inevitably arise.

These issues are not just hypothetical. For example, the Salt Typhoon espionage campaign exploited foundational weaknesses to compromise major U.S. telecom firms, including Verizon, AT&T, and T-Mobile. Such systemic flaws make even the most advanced detection systems insufficient.

Building Security from the Ground Up

For years, the cybersecurity industry has embraced the mantra, “security is everyone’s problem.” While this has broadened awareness, it often leads to unintended consequences. When responsibility is shared by everyone, it can end up being truly owned by no one. This diffusion of accountability results in underinvestment in specialized security expertise, leaving critical vulnerabilities unaddressed. The Microsoft Storm-0558 incident serves as a prime example of the risks posed by this approach.

True security requires a fundamental shift from reactive to proactive approaches. Organizations must design systems assuming they will eventually be compromised. This means embedding zero trust principles, implementing proper system segmentation, and treating least privilege as foundational.

In practice, proactive measures include short-lived credentials, mutual TLS authentication, and granular access controls from the outset. For example, while a reactive approach might detect suspicious service-to-service communication, a proactive approach prevents such movement entirely through robust authentication.

Security in the Development Process

The development process itself should prioritize security through specific, measurable practices. Best-in-class organizations typically implement:

  • Infrastructure as code with built-in security policies.
  • Hardened containers or unikernel images to reduce attack surfaces.
  • Automated patch management integrated into deployment pipelines.
  • Continuous compliance monitoring and reporting for real-time security assurance.

These aren’t just best practices—they’re competitive advantages. Organizations that adopt them often see reduced incident costs and faster recovery times, transforming security from a cost center into an enabler of resilience and growth.

Regulatory Progress and Its Limitations

The U.S. Cybersecurity and Infrastructure Security Agency (CISA) introduced its Secure by Design pledge to encourage security-first practices. While this initiative represents progress, it lacks critical components:

  • No Accountability
    There are no enforcement mechanisms to ensure organizations uphold their commitments.
  • No Tracking
    Without benchmarks or reporting requirements, evaluating progress is impossible.
  • No Timeline
    The absence of deadlines allows organizations to deprioritize these efforts indefinitely.

Without these elements, the pledge risks becoming aspirational rather than transformative. As seen with other voluntary efforts, real change often depends on market pressure. For instance, if cloud providers demanded stronger security controls from vendors, or if enterprises baked security requirements into procurement, the market would likely respond more effectively than through regulation alone.

A Balanced Security Strategy

Organizations must balance strong foundations with effective monitoring through clear, measurable steps:

  1. Thoroughly Evaluate Legacy Systems
    Identify critical systems, document dependencies, and create modernization plans with timelines.
  2. Embed Security Into Development
    Use security champions programs, establish clear ownership for each system, and incentivize proactive measures.
  3. Leverage Proactive Security Measures
    Implement short-lived credentials, granular privileges, and zero trust principles during design and operation.
  4. Strategically Deploy Reactive Tools
    Use detection and response systems to validate security assumptions and provide early warning of issues, not to compensate for poor design.

Proactive and reactive measures are complementary, not competing priorities. Installing advanced monitoring on a fundamentally weak system offers organizations only a false sense of security. By contrast, strong proactive foundations reduce the need for reactive interventions, cutting costs and lowering risks.

Conclusion: The Cost of Inaction

The choice between proactive and reactive security isn’t theoretical—it’s an urgent and practical decision. Systems designed with security in mind experience fewer breaches and cost less to maintain. The CISA Secure by Design pledge is a step in the right direction, but without accountability and market-driven enforcement, its impact will remain limited.

Organizations face a clear path forward: invest in proactive security measures to reduce systemic risks while leveraging reactive tools as a safety net. As cyber threats continue evolving, the question is not whether proactive security is necessary, but how soon organizations will act to implement it. Don’t wait until it’s too late—fix the house before adding stronger deadbolts.

From Years to Seconds: Rethinking Public Key Infrastructure

Public Key Infrastructure was designed for a world where identities persisted for years—employees joining a company, servers running in data centers, devices connecting to networks. In this world, the deliberate pace of certificate issuance and revocation aligned perfectly with the natural lifecycle of these long-lived identities. But today’s cloud-native workloads—containers, serverless functions, and microservices—live and die in seconds, challenging these fundamental assumptions.

Though these ephemeral workloads still rely on public key cryptography for authentication, their deployment and management patterns break the traditional model. A container that exists for mere seconds to process a single request can’t wait minutes for certificate issuance. A serverless function that scales from zero to thousands of instances in moments can’t depend on manual certificate management. The fundamental mismatch isn’t about the cryptography—it’s about the infrastructure and processes built around it.

This isn’t a problem of public key infrastructure being inadequate but rather of applying it in a way that doesn’t align with modern workload realities. These new patterns challenge us to rethink how authentication and identity management systems should work—not just to ensure security, but to support the flexibility, performance, and speed that cloud-native infrastructure demands.

Why Workloads Are Different

Unlike human or machine identities, workloads are ephemeral by design. While a human identity might persist for years with occasional role changes, and a machine identity might remain tied to a server or device for months, workloads are created and destroyed on-demand. In many cases, they live just long enough to process a task before disappearing.

Unlike human and machine identities where identifiers are pre-assigned, workload identifiers must be dynamically assigned at runtime based on what is running and where. This transient nature makes revocation—a cornerstone of traditional PKI—irrelevant. There’s no need to revoke a workload’s credentials because they’ve already expired. In fact, much like Kerberos tickets, workload credentials are short-lived by design, issued for just long enough to meet deployment SLAs.

The Identity Lifecycle Dynamics graphic below illustrates these differences clearly:

  • Human identities are persistent, often spanning years, with sequential changes governed by compliance and auditing processes.
  • Machine identities are semi-persistent, lasting weeks or months, with planned updates and automated renewals sometimes tied to devices or hardware lifetimes.
  • Workload identities, by contrast, are ephemeral. They join and leave almost instantly, with lifespans measured in minutes and operations occurring at massive scale.

Compounding this difference is the scale and speed at which workloads operate. It’s not unusual for thousands of workloads to be created simultaneously, each requiring immediate authentication. Traditional PKI processes, designed for slower-moving environments, simply can’t keep up. And workloads don’t just operate in isolation—they’re often distributed across multiple regions to minimize latency and avoid unnecessary points of failure. This means the supporting credentialing infrastructure must also be distributed, capable of issuing and verifying credentials locally without introducing bottlenecks or dependency risks.

Governance adds another layer of complexity. While human and machine identities are often subject to compliance-driven processes focused on auditability and security, workloads are governed by operational priorities:

  • Zero downtime: Workloads must scale rapidly and without disruption.
  • Regional performance: Authentication systems must match the workloads’ regional deployments to avoid latency.
  • Developer flexibility: Identity systems must integrate with whatever technology stacks developers are already using.

The lifecycle of a workload identity reflects the immediacy of software deployment cycles, rather than the structured schedules of hardware or personnel management.

Rethinking Identity Infrastructure for Workloads

The traditional PKI model isn’t going away—it remains essential for the stable, predictable environments it was designed to support. But workloads require a shift in approach. They demand systems capable of:

  • Dynamic credential issuance: Credentials must be created on-demand to support rapid scaling, with automated identifier assignment based on runtime context and workload characteristics.
  • Ephemeral lifecycles: Workload credentials should expire automatically, eliminating the need for revocation, with lifecycle durations matched to actual workload runtime requirements.
  • Multi-factor workload authentication Something the workload has (hardware roots of trust, cryptographic keys), something the workload knows (runtime configuration), something the workload is (attestation data, container hashes, process metadata).
  • Distributed infrastructure: Regional authentication systems ensure low latency and high availability, with local credential issuance capabilities.
  • Massive scalability: Systems must support thousands of identity events per minute, operating across clouds or hybrid environments, with automated identifier management at scale.
  • Runtime identifier assignment based on: What is running (container hashes, process information), Where it’s running (environment context, runtime attestation), how it’s running (execution environment verification).

As highlighted in the lifecycle comparison, workload identities aren’t simply a smaller, faster version of machine identities. Their governance models reflect their role in delivering business-critical objectives like seamless scaling and developer empowerment.

Perhaps the most significant difference is the role of developers. Workload identity systems can’t impose rigid, one-size-fits-all requirements. Developers need the freedom to:

  • Work with existing technology stacks.
  • Seamlessly integrate identity management into their workflows.
  • Build and deploy at the speed demanded by modern infrastructure.

In this way, workload identity management becomes not just a security task but a foundational enabler of innovation and efficiency.

Taking the First Step with SPIFFE

SPIFFE (Secure Production Identity Framework For Everyone) is an open standard designed to enable workloads to automatically acquire identities, certificates, and OIDC tokens for secure zero-trust communication between services. Rather than retrofitting existing systems, look for upcoming greenfield deployments where you can engage early in the design phase. This allows you to build SPIFFE’s workload identity patterns in from the start—solving different problems than traditional PKI, not competing with it. Use this greenfield project to demonstrate how PKI as a technology via SPIFFE can help solve additional problems for production environments.

Final Thoughts

Workloads have redefined how we think about identity. They operate at a speed and scale that traditional PKI never anticipated, governed by different priorities and lifecycles that reflect the realities of modern software. While PKI will continue to serve as a critical component of authentication, it must evolve to meet the unique demands of ephemeral, distributed workloads.

This isn’t about abandoning the tools we rely on but about adapting them for a world where zero downtime, developer flexibility, and seamless scalability are non-negotiable. The future of identity isn’t static or centralized—it’s dynamic, distributed, and built to support workloads that define modern infrastructure.

For organizations looking to implement these patterns quickly and efficiently, SPIRL (a company I advise) provides tools to make workload identity management straightforward and accessible.

   

Bundling and Unbundling in the NHI Market: Opportunities in Identity, Governance, and Cryptography

Jim Barksdale famously said “All money is made through bundling and unbundling,” and this dynamic is evident in the Non-Human Identity (NHI) market. Cryptography management, privileged access management, and certificate lifecycle solutions are being redefined under a higher-level taxonomy. These functions, once viewed as isolated, are increasingly integrated into broader frameworks addressing identity, governance, and security holistically, reflecting the market’s shift toward unified and specialized solutions.

Cloud providers dominate in offering integrated solutions across categories, but these are often limited and focus on cost-recovery pricing to encourage adoption of their real money-makers like compute, storage, network, databases, and these days AI. They frequently provide just enough to facilitate a single project’s adoption, leaving opportunities for other vendors. For instance, Microsoft’s push to migrate enterprises from on-premises Active Directory to its cloud offering presents an opportunity to unbundle within the NHI IAM space. By focusing narrowly on migrating existing infrastructures rather than reimagining solutions from first principles to meet modern usage patterns, Microsoft has created gaps that smaller, more agile providers can exploit. Similarly, regulatory pressures and the rise of AI-driven, agentic workloads are driving demand for advanced workload authentication, creating further opportunities for specialized providers to deliver tailored solutions. Meanwhile, established players like CyberArk and Keyfactor have pursued acquisitions, such as Keyfactor’s merger with PrimeKey, to bundle new capabilities and remain competitive. However, the integration complexity of these acquisitions often leaves room for focused providers to address modern, cloud-native demands more effectively.

At the same time, traditional cryptography management companies have been so focused on their existing Key Management System (KMS) and Hardware Security Module (HSM) offerings that they have largely ignored broader unmet needs in the market, prioritizing feature expansion and acquisitions aimed at chasing smaller competitors. This narrow focus has left significant gaps in visibility, particularly around cryptographic assets and risks, creating fertile ground for new solutions focused on cryptography discovery, automated inventory management, and preparation for post-quantum cryptography.

Capital allocation, on the other hand, highlights category focus and growth potential. Seed and Series A investments underscore the dynamic opportunities created by unbundling, as well as the constraints faced by larger vendors burdened with legacy products that make it harder to truly innovate due to existing commercial obligations in the same space. In contrast, private equity activity targets larger bundling opportunities, enabling less agile and more mature market leaders to remain relevant by scaling established solutions or consolidating fragmented players. These stages illustrate the market’s balance between early-stage innovation and late-stage consolidation, driven by the growing demand for unified, cloud-native identity and governance solutions.

These patterns of bundling and unbundling are organic and continual, offering just one lens on the evolving dynamics of this market. While the NHI market appears new, it is, in fact, a natural evolution of existing identity governance patterns, driven by the growing demand for unified, cloud-native identity and governance solutions. This evolution underscores the balance between early-stage innovation and late-stage consolidation, as new entrants and established players alike navigate the opportunities created by shifting market dynamics.

Rethinking Authentication: “Something You Have,” “Something You Know,” and “Something You Are” for Workloads and Machines

Passwords have existed for millennia, and their weaknesses have persisted just as long. Their simplicity led to widespread adoption, but as their use expanded, so did the frequency of their abuse. To address this, we implemented stricter password policies—longer lengths, special characters, regular changes—much like hiding vulnerable software behind firewalls. When these efforts fell short, we evolved to multi-factor authentication (MFA), introducing the principles of “Something You Have,” “Something You Know,” and “Something You Are.”

MFA brought its own challenges—deployment complexity and user friction. Password managers helped bridge some gaps by generating and storing random passwords for us. While each of these steps enhanced security, none addressed the core problem: passwords and shared secrets are fundamentally flawed authenticators. This realization spurred efforts like WebAuthn, FIDO, and Passkeys, which replaced passwords with cryptographic keys and secure protocols, eliminating shared secrets entirely.

However, while user authentication evolved, workload and machine authentication lagged behind. Instead of passwords, workloads relied on API keys—essentially shared passwords—managed through “password managers” rebranded as secret vaults. These shared secrets are just as fragile and inadequate for today’s complex, scaled environments as passwords were for users.

The path forward is clear: workloads and machines need their own authentication revolution. We must replace shared secrets with cryptographic keys and implement MFA for workloads. But what does machine-focused MFA look like? Let’s explore how the three fundamental authentication factors apply to workloads and machines.


Applying Authentication Factors to Workloads and Machines

1. Something the Workload Has

This encompasses physical or cryptographic elements unique to the workload:

  • Hardware Roots of Trust: Security processors like TPM, Microsoft Pluton, Google Titan, and Apple’s Secure Enclave provide tamper-resistant foundations for device identity and posture assessment.
  • Cryptographic Keys: Private keys secured within hardware security processors serve as a robust “something you have,” enabling strong authentication.
  • Credentials: Like OIDC tokens and X.509 certificates, uniquely identify machines and workloads within trusted environments.

These mechanisms form the backbone of secure workload authentication, much like physical security tokens do for human users.


2. Something the Workload Knows

This parallels knowledge-based authentication but focuses on workload-specific secrets:

  • Shared Secrets: Shared API keys, symmetric keys, and asymmetric credentials that are used for authentication.
  • Configuration Data: Runtime-specific information like environment configuration.

Although often necessary for a service’s functionality, these weak attributes are highly susceptible to exposure, reuse, and theft. Implementing credentialing systems like SPIFFE can significantly mitigate these risks by replacing shared secrets with cryptographically secure, short-lived credentials uniquely tailored to each workload.


3. Something the Workload Is

This represents inherent characteristics of the workload, similar to human biometrics:

  • Trusted Execution Environments (TEEs): Secure enclaves like Intel SGX or AWS Nitro verify and attest to the integrity of the workload’s execution environment.
  • Immutable Code or Container Hashes: Binary or container image hashes verify workload integrity.
  • Runtime Attestation: Environmental and configuration validation ensures compliance with security policy.
  • POSIX Process Names and Metadata: Process information and runtime metadata provide operational context.

By combining these attributes, workloads can demonstrate their role and environment to enable more contextual identification and authorization.


The Future of Workload Identity

Authentication factors vary in strength—cryptographic keys and runtime attestations can provide strong confidence in what you are talking to while process names and secrets offer weaker assurance. Combining these elements creates a more comprehensive picture of workload authenticity. Standards like SPIFFE leverage this combination, creating strong workload identities by incorporating hardware roots of trust, runtime attestations, and other security properties. Over time, these attestations can be enriched with supply chain provenance, vulnerability assessments, and other compliance data.

As we look further ahead to agentic workloads and AI systems, we will need to develop protocols and policies that enable us to consider both workload identity and the entities they represent. For example, an AI model handling financial transactions needs both verified workload identity and specific policies for each user it serves so that the agent does not become a way to escalate privileges and access data that would otherwise have been unreachable.

This evolving, layered approach ensures that workload authentication scales with increasingly complex machine ecosystems while maintaining security and accountability. By unifying identity, actions, and policies, we prepare for a future of autonomous yet accountable workloads.

The Myth of Non-Technical Product Management

A common theme in conversations about product managers is the notion that they don’t need to be technical; they just need to bridge the gap between technical and non-technical teams. In my experience, particularly with enterprise and security products, this is a complete fallacy. Part of why this argument persists is the misconception that all product management is the same.

If you’re working on a 10-year-old product based on 20-year-old deployment patterns—and this might be hard to hear—chances are you’re not innovating. Instead, you’re managing customer requests and operating within the constraints of the bureaucracy you’re part of. Your roadmap likely consists of a mix of customer demands and features cloned from smaller competitors.

Another reason this perspective persists is that many organizations divide product managers into two categories: inbound and outbound. Outbound product managers are this decade’s version of product MBAs. They often have a limited understanding of their customers and their needs, instead focusing on systematizing a go-to-market strategy based on abstractions.

In the problem domain of enterprise and security—especially in small to medium-sized companies, where innovation tends to happen—there is no substitute for being an expert in what you’re building and selling. One of the most important things to understand is your customer: their pains, their constraints, and the schedules they operate within. The thing is, your customer isn’t just one person in an enterprise sale. As I’ve written before, at a minimum, you’re dealing with an economic buyer and a champion in any sale. If you’re lucky, you have many champions. And if you think strategically, you can even identify your champions’ champions within the sale.

This requires you to understand everyone’s job and perspective. If you don’t understand the technology or problem domain natively, you will always struggle—and likely fail—especially in smaller, early-stage companies.

Don’t get me wrong: once a company finds product-market fit and has a reproducible recipe for selling into organizations—or as the market evolves and expectations for a product in a given segment become standardized—it becomes less necessary. But even then, bringing that expertise to the table remains a powerful force multiplier that enables organizations lucky enough to have these resources to vastly outperform much larger and better-funded competitors.

Since I spend most of my time these days with smaller companies or very large companies looking to become more competitive again, all I can say is this: without the right product leaders, the best you can hope for is growing at the pace of your overall market and maintaining the status quo.

Winning Over the Organization: The Subtle Art of Getting Your Product Deployed

As someone who has spent over 30 years building security and infrastructure products both in large companies and small, I’ve seen one pattern repeat itself far too often: a great product gets sold to an enterprise, only to end up sitting on a shelf, untouched and unloved. For every company that successfully deploys their product and becomes a cornerstone of their customer’s operations, there are countless others that fall victim to this fate.

The difference? It’s rarely about the technology itself. Instead, it’s about understanding how to navigate the human dynamics of enterprise sales and deployment—helping your champions and economic buyers not just buy your product, but deploy it, show value, and win over the organization. The startups that succeed here often share a surprising trait: they get people promoted.

Here’s how I think about this challenge and the advice I give to the companies I advise.

Know Your Allies: The Champion and the Economic Buyer

In any enterprise sale, there are two critical players: the champion and the economic buyer. Your champion is the person who feels the pain your product solves most acutely. They’re your advocate inside the organization, the one who wants you to succeed because it solves their problem.

The economic buyer, on the other hand, is the one with the budget and the organizational perspective. They’re not as close to the day-to-day pain, but they’re thinking about alignment with company priorities, ROI, and risk.

If you want your product to avoid becoming shelfware, you need to understand what it takes for both of these people to succeed—not just in deploying your product, but in navigating the bureaucracy of their organization.

Empowering Your Champion: The Keys to Advocacy

Your champion is on your side, but they’re likely not equipped to sell your product internally without help. They need:

  • Clear, tangible wins: How can they deploy your product incrementally, showing immediate value without asking the organization to take on too much risk or disruption upfront?
  • Compelling talking points: You know your product better than anyone. Equip your champion with narratives that resonate with their stakeholders. For example:
    • “This solution aligns with our zero-trust initiative and reduces risks highlighted in the Verizon DBIR report.”
  • Materials for buy-in: Provide them with decks, case studies, and ROI calculators tailored to their audience—whether it’s IT, security, or the C-suite.

Startups that succeed make it easy for champions to tell a compelling story, removing the burden of figuring it out themselves.

Winning Over the Economic Buyer: Speak Their Language

The economic buyer is focused on the bigger picture: strategic alignment, ROI, and risk management. They’ll ask questions like:

  • How does this product support our organizational goals?
  • What’s the ROI? How does this reduce costs or avoid risks?
  • Will this disrupt our existing systems or processes?

To win them over:

  • Frame the product as part of their strategy: Don’t sell a feature—sell a solution to their larger problems.
  • Provide financial justification: Show them how your product saves money, reduces risks, or increases efficiency.
  • Mitigate risk: Give them confidence that deploying your product won’t be a gamble.

This isn’t just about convincing them to buy—it’s about giving them the confidence to champion your product as it rolls out.

Navigating Bureaucracy: Guiding the Path Forward

Here’s the uncomfortable truth: in large organizations, success often depends more on navigating bureaucracy than on the quality of the technology. Startups that win deployment understand this and partner with their buyers to:

  • Break down deployments into milestones: Start small, deliver quick wins, and build momentum over time.
  • Anticipate bottlenecks: Security reviews, procurement delays, and committee approvals are inevitable. Help your buyer prepare for and address these hurdles.
  • Guide advocacy efforts: Provide step-by-step playbooks to help champions and economic buyers build internal support and overcome resistance.

Think of it as being not just a vendor, but a partner in internal change management.

Selling More Than Software: The Roadmap as a Vision

One of the most overlooked strategies in enterprise sales is this: sell your roadmap.

A roadmap isn’t just a future wish list; it’s a way to help champions and buyers plot their own narratives for how your product will grow with their organization. By aligning your roadmap with their goals, you’re not just selling what your product does today—you’re selling the promise of what it can enable tomorrow.

Successful startups make buyers feel like they’re investing in something bigger than a single tool. They’re investing in a vision.

Helping Customers Win—and Get Promoted

Here’s the heart of it: successful startups help their customers succeed personally.

  • For champions, this might mean solving a thorny problem and becoming the hero of their team.
  • For economic buyers, it might mean delivering measurable results that align with company priorities, demonstrating strategic leadership.

Startups that win understand that their product is a means to an end. The real goal is to make the people buying and deploying your product look good—and in some cases, get promoted. This is a mindset shift, but it’s critical. If your customers succeed, your product succeeds.

Building Partnerships, Not Just Products

The startups I see succeed don’t try to bulldoze their way into organizations. They’re humble, practical, and focused on helping their customers navigate the messy, human reality of enterprise deployment.

They make it easy for champions to win arguments. They help economic buyers frame deployments as strategic wins. And they sell not just their product, but a roadmap that makes their customers look like visionaries.

In the end, that’s the secret: make your customer’s success the core of your strategy. If you do that, you’re not just selling a product—you’re building a partnership that drives real results. And that’s how you avoid the shelfware graveyard.

From Fairways to the Cloud: Estimating Golf Balls in Flight to Tackling Cloud Workload Scale

Early in my career, I worked in quality assurance at Microsoft, analytical skills were a core trait we tried to hire for, at the time  “brain teasers” were often used in interviews to assess these skills. One memorable interview question went like this:

“How would you figure out how many golf balls are in flight at any given moment?”

This question wasn’t about pinpointing the exact number; it was a window into the candidate’s analytical thinking, problem-solving approach, and ability to break down complex problems into manageable parts. It was always interesting to see how different minds approached this seemingly simple yet deceptively complex problem. If a candidate wasn’t sure how to begin, we would encourage them to ask questions or to simply document their assumptions, stressing that it was the deconstruction of the problem—not the answer—that we were looking for.

In engineering, we often need to take big, abstract problems and break them down. For those who aren’t engineers, this golf ball question makes that process a bit more approachable. Let me walk you through how we might tackle the golf ball question:

  1. Number of Golf Courses Worldwide
    • There are approximately 38,000 golf courses globally.
  2. Players and Tee Times
    • On average, each course hosts about 50 groups per day.
    • With an average group size of 4 players, that’s 200 players per course daily.
  3. Shots Per Player
    • An average golfer takes around 90 shots in a full round.
  4. Total Golf Balls in Play
    • 200 players × 90 shots = 18,000 shots per course per day.
  5. Time a Golf Ball Is in the Air
    • Let’s estimate each shot keeps the ball airborne for about 5 seconds.
  6. Calculating Balls in Flight
    • Over 12 hours of playtime, there are 43,200 seconds in a golfing day.
    • Total airborne time per course: 18,000 shots × 5 seconds = 90,000 seconds.
    • Average balls in flight per course: 90,000 seconds ÷ 43,200 seconds2 golf balls.
  7. Global Estimate
    • 2 balls per course × 38,000 courses = 76,000 golf balls in flight at any given moment worldwide.

This exercise isn’t about precision; it’s about methodically breaking down a complex question into digestible parts to arrive at a reasonable estimate. As the saying goes, all models are wrong, but some are useful. Our goal here is to find a stable place to stand as we tackle the problem, and this question does a decent job at doing that, if nothing else, letting us see how a candidate might approach unknown topics.

Transitioning from the Green to the Cloud

Today, the biggest challenges in cloud workload identity management remind me of these kinds of problems—except far more complex. Unlike in a round of golf, most workloads aren’t individually authenticated today; instead, they rely on shared credentials, essentially passwords, stored and distributed by secret managers, and anything needing access to a resource must have access to that secret. 

But with the push for zero trust, rising cloud adoption, infrastructure as code, and the reality that credential breaches represent one of the largest attack vectors, it’s clear we need a shift. The future should focus on a model where every workload is independently authenticated and authorized.

So, let’s put the “golf balls soaring through the air” approach to work here, using the same framework to break down the cloud workload scale:

  1. Global Cloud Infrastructure
    • Major cloud providers operate data centers with an estimated 10 million servers worldwide.
  2. Workloads Per Server
    • Each server might run an average of 100 workloads (virtual machines or containers).
    • 10 million servers × 100 workloads = 1,000 million  (1 billion) workloads running at any given time.
  3. Ephemeral Nature of Workloads
    • Let’s assume 50% of these are ephemeral, spinning up and down as needed.
    • 1 billion workloads × 50% = 500 million ephemeral workloads.
  4. Workload Lifespan and Credential Lifecycle
    • If each ephemeral workload runs for about 1 hour there are 24 cycles in a 24-hour day.
    • 500 million workloads × 24 cycles = 12 billion ephemeral workloads initiated daily.
  5. Credentials Issued
    • Each new workload requires secure credentials or identities to access resources.
    • This results in 12 billion credentials needing issuance and management every day.
  6. Credentials Issued Per Second
    • With 86,400 seconds in a day:
    • 12 billion credentials ÷ 86,400 seconds138,889 credentials issued per second globally.

In this updated example, just as with the golf balls in flight question, we deconstruct a complex system to better understand its core challenges:

  • Scale: The number of workloads and credentials needed to achieve this zero-trust ideal is much higher than we would need to simply pass around shared secrets.
  • Dynamics: These credentialing systems must have much higher availability than static systems to support the dynamism involved.
  • Complexity: Managing identities and credentials at this scale is a monumental task, emphasizing the need for scalable and automated solutions.

Note: These calculations are estimates meant to illustrate the concept. In real-world cloud environments, actual numbers can vary widely depending on factors like workload type distribution, number of replicas, ephemerality of workloads, and, of course, individual workload needs.

Conclusion

This exercise demonstrates a fundamental point: analytical thinking and problem-solving are timeless skills, applicable across domains.

You don’t need to be an expert in any given system to get started; you simply need to understand how to break down a problem and apply some basic algebra.

It also serves as a way to understand the scope and scale of authenticating every workload to enable zero trust end-to-end. Simply put, this is a vastly different problem than user and machine authentication, presenting a unique challenge in managing identities at scale.