Category Archives: Certificates

Understanding Patterns in WebPKI CA Issues

There’s a saying, “where there’s smoke, there’s fire.” This adage holds especially true in the context of WebPKI Certificate Authorities (CAs). Patterns of issues are one of the key tools that root programs use to understand what’s happening inside organizations. While audits are essential, they are often insufficient. Historical cases like Wirecard and Enron illustrate how audits can provide a partial and sometimes incorrect picture. Just as in most interactions in life, understanding who you are dealing with is crucial for successful navigation, especially when a power dynamic differential exists.

The Limitations of Audits

Currently, there are 86 organizations in the Microsoft root program. Most root programs have at most two people involved in monitoring and policing these 86 CAs. Technologies like Certificate Transparency make this possible, and open-source tools like Zlint and others use this data to find technically observable issues. However, these tools, combined with audits, only provide a small slice of the picture. Audits are backward-looking, not forward-looking. To understand where an organization is going, you need to understand how they operate and how focused they are on meeting their obligations.

This is where the nuanced side of root program management, the standards, and norms of the ecosystem, come into play. If we look at signals in isolation, they often appear trivial. However, when we examine them over a long enough period in the context of their neighboring signals, a more complete picture becomes apparent.

For example, consider a CA with minor compliance issues that seem trivial in isolation. A single misissued certificate might not seem alarming. But when you see a pattern of such incidents over time, combined with other issues like poor incident response or associations with controversial entities, the picture becomes clearer. These patterns reveal deeper issues within the organization, indicating potential systemic problems.

Root Program Challenges

Root programs face significant challenges in managing and monitoring numerous CAs. With limited personnel and resources, they rely heavily on technology and community vigilance. Certificate Transparency logs and tools like Zlint help identify and flag issues, but they are only part of the solution. Understanding the intentions and operational integrity of CAs requires a deeper dive into their practices and behaviors.

In the WebPKI ecosystem, context is everything. Root programs must consider the broader picture, evaluating CAs not just on isolated incidents but on their overall track record. This involves looking at how CAs handle their responsibilities, their commitment to security standards, and their transparency with the community. A CA that consistently falls short in these areas, even in seemingly minor ways, can pose a significant risk to the ecosystem.

Conclusion

Understanding the nuances of CA operations and focusing on their adherence to obligations is critical. By examining patterns over time and considering the broader context, root programs can better identify and address potential risks. The combination of audits, technological tools, and a keen understanding of organizational behavior forms a more comprehensive approach to maintaining trust in the WebPKI system.

It’s always important to remember that CAs need to be careful to keep this in mind. After all, it’s not just what you do, but what you think you do. Having your house in order is essential. By learning from past mistakes and focusing on continuous improvement, organizations can navigate public reporting obligations more effectively, ensuring they emerge stronger and more resilient.

Exploring Browser Distrust

Browser distrust events of WebPKI Certificate Authorities occur on average approximately every 1.23 years. These events highlight the critical role the WebPKI plays in maintaining secure communications on the internet and how failures within this system can have far-reaching implications. By examining these incidents, we can identify common patterns and underlying causes that lead to distrust, so as implementors and operators, we don’t end up repeating the same mistakes.

Identifying Common Patterns

As they say, those who don’t know history are destined to repeat it, so it is worthwhile to take a look at the history of CA distrust events to understand what, if any common patterns exist:

  • Security Breaches: Involves unauthorized access to the CA’s infrastructure, leading to potential misuse of certificate issuance capabilities.
  • Compromise of CA Infrastructure: Refers to breaches where the core infrastructure of the CA is compromised, resulting in unauthorized certificate issuance.
  • Fraudulent Certificates: Occurs when certificates are issued without proper authorization, often leading to the impersonation of legitimate websites.
  • Poor Incident Response Handling: Indicates that the CA failed to adequately respond to security incidents, exacerbating the impact of the initial problem.
  • Misissuance of Certificates: Happens when CAs issue certificates incorrectly, either to the wrong entities or without proper validation, undermining trust.
  • Facilitating Man-In-The-Middle Attacks: Refers to situations where misissued or improperly handled certificates enable attackers to intercept and alter communications.
  • Improper Practices: Includes actions by CAs that deviate from accepted standards and best practices, leading to security risks.
  • Deceptive Actions: Involves deliberate misleading actions by CAs, such as backdating certificates or other forms of dishonesty.
  • Insecure Practices: Encompasses practices by CAs that fail to maintain adequate security controls, leading to vulnerabilities.
  • Non-Compliance with Industry Standards: Indicates that the CA has repeatedly failed to adhere to industry standards and guidelines, leading to a loss of trust.
  • Ties to Controversial Entities: Involves associations with entities that raise ethical or security concerns, leading to distrust.
  • Limited Value to Ecosystem: Indicates that the CA does not provide significant value to the security ecosystem, often due to questionable practices or minimal compliance.
  • Operational Vulnerabilities: Refers to weaknesses in the CA’s operational security, such as using default passwords or having exposed administrative tools, making them susceptible to attacks.

Browser Distrust Events

  1. DigiNotar (2011):
    • Event: DigiNotar was hacked, leading to the issuance of fraudulent certificates. This prompted Mozilla, Google, and Microsoft to revoke trust in DigiNotar certificates.
    • Labels: Security Breaches, Compromise of CA Infrastructure, Fraudulent Certificates, Poor Incident Response Handling
    • Details: Mozilla Security Blog, Threatpost
  2. TurkTrust (2013):
    • Event: It was discovered that TurkTrust mistakenly issued two intermediate CA certificates, one of which was used to issue a fraudulent certificate for *.google.com. This led to the distrust of the TurkTrust CA by major browsers.
    • Labels: Misissuance of Certificates, Facilitating Man-In-The-Middle Attacks
    • Details: Krebs on Security
  3. ANSSI (French CA) (2013):
    • Event: It was discovered that ANSSI had issued a certificate to a network appliance company, which used it to inspect encrypted traffic. This led Google to revoke trust in the intermediate certificate.
    • Labels: Misissuance of Certificates, Facilitating Man-In-The-Middle Attacks
    • Details: Google Security Blog, Mozilla Blog
  4. CNNIC (China Internet Network Information Center) (2015):
    • Event: CNNIC was distrusted after it issued an intermediate certificate to MCS Holdings, which misused it to issue unauthorized certificates.
    • Labels: Misissuance of Certificates, Facilitating Man-In-The-Middle Attacks, Improper Practices
    • Details: Tom’s Hardware, Mozilla Security Blog, Mozilla Security Blog
  5. WoSign and StartCom (2016):
    • Event: WoSign (and StartCom) were distrusted after discovering multiple security issues, including backdating certificates, lying, and improper issuance.
    • Labels: Misissuance of Certificates, Deceptive Actions, Insecure Practices
    • Details: Tom’s Hardware, Google Security Blog
  6. Symantec (2017):
    • Event: Google announced a gradual distrust of Symantec certificates due to numerous instances of certificate misissuance, impacting millions of websites.
    • Labels: Misissuance of Certificates, Non-Compliance with Industry Standards, Poor Incident Response Handling
    • Details: Bleeping Computer, Google Security Blog
  7. Certinomis (2019):
    • Event: Mozilla distrusted Certinomis due to numerous incidents of misissuance and poor handling of security concerns.
    • Labels: Misissuance of Certificates, Facilitating Man-In-The-Middle Attacks
    • Details: Venafi Blog
  8. PROCERT (2020):
    • Event: Mozilla distrusted Procert due to numerous incidents of misissuance and poor handling of security concerns.
    • Labels: Non-Compliance with Industry Standards, Poor Incident Response Handling
    • Details: Venafi Blog
  9. TrustCor (2022):
    • Event: TrustCor was distrusted due to concerns about its ties to companies linked to the US intelligence community and its failure to provide satisfactory responses to these concerns.
    • Labels: Ties to Controversial Entities, Limited Value to Ecosystem
    • Details: gHacks, SSLs.com, SSL Shopper
  10. Camerfirma (2021):
    • Event: Mozilla and Google removed trust in Camerfirma due to a series of compliance issues and failure to maintain industry standards.
    • Labels: Non-Compliance with Industry Standards, Poor Incident Response Handling
    • Details: The Register
  11. Visa (2022):
    • Event: Issues include incomplete security audits that are required according to the
    • Labels: Non-Compliance with Industry Standards, Poor Incident Response Handling
    • Details: Feisty Duck
  12. e-Tugra (2023):
    • Event: e-Tugra was distrusted due to security concerns. A researcher found numerous vulnerabilities in e-Tugra’s systems, including default passwords and accessible administrative tools, leading to a loss of trust.
    • Labels: Operational Vulnerabilities, Insecure Practices, Poor Incident Response Handling
    • Details: Sectigo, GitHub, Ian Carroll’s Blog
  13. Ecommerce CA (EU) (2024):
    • Event: The Ecommerce CA in the EU faced browser distrust due to various security and compliance issues, leading to its removal from trusted lists.
    • Labels: Non-Compliance with Industry Standards, Operational Vulnerabilities, Poor Incident Response Handling
    • Details: EuroCommerce, Ecommerce Europe
  14. Entrust (2024):
    • Event: The Chrome Security Team announced the distrust of several Entrust roots due to a pattern of compliance failures and unmet improvement commitments.
    • Labels: Non-Compliance with Industry Standards, Poor Incident Response Handling
    • Details: Google Security Blog

Conclusion

The frequency and patterns of browser distrust events underscore the critical importance of preventive measures, transparency, and effective incident response.

Implementing robust security practices, conducting regular audits, and maintaining compliance with industry standards can significantly reduce the risk of such incidents. Transparency in operations and public disclosure of security issues foster an environment of accountability and trust.

An ounce of prevention is indeed worth more than a pound of cure. By focusing on proactive measures and cultivating a culture of continuous improvement, Certificate Authorities can better navigate the complexities of WebPKI. Effective crisis communication and incident response plans are essential for managing the fallout from security breaches and maintaining the trust of users and the broader web community.

By learning from past incidents and addressing their root causes, we can work towards a more secure and resilient internet, where trust in the WebPKI system is consistently upheld. The collective effort of CAs, browser vendors, and security researchers will play a pivotal role in achieving this goal, ensuring the integrity and reliability of our online ecosystem.

Why We Trust WebPKI Root Certificate Authorities

I’ve always likened the WebPKI governance system to our legal system, where congress sets the laws and the judiciary ensures compliance. Justice Breyer’s recent explanation on “rules” and “standards” in law, as discussed on the Advisory Opinions podcast, resonates well with how WebPKI operates in practice. In WebPKI, “rules” are explicitly defined through audits derived from CA/Browser Forum standards, incorporated into programs like WebTrust for CAs, and enforced through contractual obligations. These rules ensure aspire to consistent security and reliability across the web.

In contrast, “standards” in WebPKI encompass community norms, best practices, and recommendations specific to each root program. These standards are adaptable, evolving with technological advancements, security threats, and collective learning among CAs. They provide a framework that upholds the integrity of the Internet, ensuring that CAs remain transparent and live up to their promises while adhering to ecosystem norms, requirements, and best practices.

Similar to the Supreme Court, the WebPKI governance system consists of multiple ‘justices,’ with each root program acting akin to a Supreme Court justice. Their decisions on adherence or abstention from standards shape the outcomes that dictate the security and reliability of the Internet. Thus, the trust users place in WebPKI and its stewards is earned through a consistent, transparent, and accountable framework that ensures integrity across devices and browsers.

The Dual Role of Root Programs

1. As Trusted Stewards

While there’s no explicit voting process for root program management, users effectively select them through their choice of browsers or operating systems that incorporate these programs. This implicit trust in their ability to deliver on their security promises to users grants root programs the authority to establish and enforce rigorous standards for Root CAs. These standards determine inclusion in trust stores based on compliance assessments and judgments on value a CA would bring the web’s users, ensuring Root CAs uphold a consistent and transparent standard of integrity that users and web operators can rely on.

2. As Judicial Authorities

Root programs also serve a critical judicial function within the WebPKI landscape, akin to a Supreme Court. They interpret rules and standards, resolve ambiguities, settle community disputes, and establish precedents that guide CA operational practices. This role ensures equitable and consistent application of standards across all Root CAs.

Enforcing Compliance and Transparency

1. Maintaining Checks and Balances

Root programs enforce checks and balances through rigorous audits and monitoring, similar to judicial reviews. These processes assess Root CAs’ adherence to these
“rules” and “standards” and ensure accountability,with the goal of  preventing misuse of their authority on the web.

2. Promoting Transparency and Accountability

Root programs need to operate with a high degree of transparency, akin to open judicial proceedings. Decisions on trust or distrust of Root CAs need to be communicated clearly, accompanied by reasons for these decisions. This transparency ensures that all stakeholders, from end-users to website operators, understand and trust the framework protecting their privacy.

Case Study: The Ongoing Entrust Incident

A current discussion in the mozilla.dev.security.policy forum provides a compelling example of how the WebPKI governance framework operates in practice. This incident underscores the nuanced interaction between rules and standards, as well as the critical importance of transparency and accountability in maintaining trust.

The issue at hand involves Entrust’s performance concerns related to certificate misissuance. Such incidents are pivotal in demonstrating how root programs must navigate complex challenges and uphold rigorous standards of integrity within the web. The ongoing dialogue in the forum highlights the deliberative process undertaken by root programs to address such issues, ensuring that decisions are transparently communicated to stakeholders.

Cases like this illustrate the intricate balance that root programs must maintain between enforcing strict rules for security and adapting standards to accommodate technological advancements. The resolution of these incidents ultimately defines why users can trust root CAs, as it showcases the procedural approach, the transparency the process was designed for, and the goal of achieving accountability through this governance framework.

Why This All Matters

Understanding the dual role of root programs as regulatory bodies and judicial authorities underscores their essential role in maintaining trust. It emphasizes the significance of their decisions in shaping how privacy is delivered on the web, focusing on delivering a robust, evolving transparent, and accountable governance framework to guide these decisions.

The trust placed in WebPKI and its stewards are earned through a system that respects user choice and adheres to principles of fairness, ensuring that end-users can trust they are communicating with the correct website.

From Static to Dynamic: Adapting PKI for Cloud-Native Architectures

When it comes to workload and service credential management, a common misconception is that you can simply reuse your existing Certificate Authority (CA) and Certificate Lifecycle Management (CLM) infrastructure to achieve your desired end-state more quickly. Given that nearly every organization has client and server TLS certificates for devices and web servers and many have made significant investments in managing and issuing these certificates, this notion seems logical. However, the design constraints of these use cases differ significantly. The transition from static to dynamic environments, along with the associated credential governance problems, makes it clear that a purpose-built approach is necessary for managing workload and service credentials.

The Static Nature of Traditional CA and CLM Infrastructure

Traditional CA and CLM infrastructure primarily deals with TLS server certificates, where the identities involved are usually domain names or IP addresses. These identifiers are largely static, pre-defined, and managed out-of-band. As a result, the lifecycle of these certificates, which includes issuance, renewal, and revocation, follows a relatively predictable pattern. These certificates are usually issued for validity periods ranging from 90 days to a year, and the processes and infrastructure are designed to handle these constraints. However, use cases surrounding workload and service credentials have a totally different set of constraints, given the highly dynamic environment and various regulatory frameworks they must adhere to.

The Dynamic Nature of Workloads and Services

Workloads and services operate in a significantly different environment. These identities often come and go as services scale up or down or undergo updates. The sheer scale at which workload and service credentials are issued and managed can be hundreds of times greater than that of traditional client and server TLS certificate use cases.

Unlike the static domain names or IP addresses used in these traditional TLS certificates, the identifiers for workloads and services are dynamically assigned as workloads spin up and down or are updated. Workload and service credentials often have a much shorter lifespan compared to server certificates. They might need to be reissued every few minutes or hours, depending on the nature of the workload and the policies of the environment. This changes the expectations of the availability and scalability of the issuing infrastructure, leading to a need for credentials to be issued as close to the workload as possible to ensure issuance doesn’t become a source of downtime.

These workloads and services are also often deployed in clusters across various geographic locations to ensure fault tolerance and scalability. Unlike server certificates that can rely on a centralized CA, workload and service credentials often need a more distributed approach to minimize latency and meet performance requirements.

Another critical difference lies in the scope of trust. For TLS client and server certificates, the security domain is essentially defined by a root certificate or the corresponding root store. However, in the context of workloads and services, the goal is to achieve least privilege. This often necessitates a much more complicated PKI to deliver on this least-privileged design.

Additionally, federation is often needed in these use cases, and it is usually not possible to standardize on just X.509. For example, you might need to interoperate with a service that works based on OAuth. Even in such cases, you want to manage all of this within one common framework to have unified control and visibility regardless of the technology used for the deployment.

Purpose-Built Solutions for Workload and Service Credential Management

Given the unique challenges of managing workload and service credentials in dynamic, cloud-native environments, existing CA and CLM investments often fall short. Purpose-built solutions, such as those based on SPIFFE (Secure Production Identity Framework for Everyone), tend to offer a more effective approach by providing dynamic, attested identities tailored for these environments.

Dynamic Identifier Assignment

Unlike static identifiers managed by traditional CLM solutions, SPIFFE dynamically assigns identifiers as workloads spin up and down, aligning with the nature of modern cloud environments.

Decentralized Issuance

By issuing credentials as close to the workload as possible, SPIFFE-based solutions reduce latency and align issuance with the availability goals of these deployments, ensuring that credentials are issued and managed efficiently without becoming a source of downtime.

Granular Policy Enforcement

SPIFFE-based solutions enable enforcing fine-grained policy through its namespacing and security domain concepts. This enables organizations to define and enforce policies at a more granular level, ensuring that workloads only access the necessary resources.

Identity Governance and Federation

SPIFFE-based solutions tend to support extending existing identity governance programs to workload and service credentials while also facilitating seamless and secure access to external services across different trust domains.

Multifactor Authentication

These SPIFFE-based solutions also provide support for attestation, which can be thought of as multi-factor authentication for workloads. This attestation verifies the workload and services’ running state and environment, tying the credentials to those environments and helping minimize the risks of credential theft.

Integration with Other Systems and Protocols

These environments can seldom rely exclusively on just X.509, which is why SPIFFE supports both X.509 and JWT formats of credentials. This flexibility allows seamless integration with various systems and protocols within cloud deployments.

Conclusion

While it might be tempting to reuse existing CA and CLM infrastructure for workload and service credential management, this approach fails to address the unique challenges posed by dynamic, cloud-native environments. The ephemeral nature of workload identities, the high frequency of credential issuance, and the need for granular trust domains necessitate purpose-built solutions like those based on SPIFFE.

These purpose-built systems are designed to handle the dynamic nature of modern environments, providing strong, attested identities and ensuring that credential management aligns with the evolving needs of cloud-native workloads. Understanding these distinctions is crucial for developing an effective identity management strategy for modern workloads and services without having to go through a reset event down the line.

To learn more about effective machine identity management, check out SPIRL and see how they are leading the way.

Credential Management vs. Secret Management: Choosing the Right Approach

If we examine the contents of most secret management solutions, like HashiCorp Vault, we will find that we primarily store the logical equivalent of user IDs and passwords for services, workloads, and machines. Much like the old-school practice of writing passwords on post-it notes and sharing them, these secrets are then distributed to whatever entity needs access to the protected resource. As a result, it is no surprise that we encounter the same problems as password management: weak secrets, shared secrets, and stolen secrets. This is why we see machine credentials and keys being stolen in events like STORM 0558.

Secret Management: A Machine-Scale Password Management

Secret management can be seen as password management at a machine scale. These systems centralize secrets  (e.g., API keys, tokens, and passwords) to manage their sprawl and then make it easy to pass these shared secrets around various services and applications. However, just as simple passwords alone are seen as insufficient for user authentication—necessitating the adoption of multi-factor authentication (MFA) and migration to non-password authenticators like Passkeys—the same shortcomings apply to this legacy approach to managing these credentials.

The reality is that these secret managers were designed to address the symptoms rather than the root cause of secret sprawl. By storing and distributing these static secrets without treating them as credentials to be managed, you centralize the chaos but do not solve the fundamental problem: the need for robust, scalable machine identity management. Machines, services, and workloads, much like users credentials, require purpose-built approaches. Just as we have applied multi-factor authentication, identity governance and administration (IGA) and solutions like OKTA to user management, we need analogous systems for machines.

Credential Management for Machines

At machine and workload scale, credential management solutions need to be architected differently to handle the dynamic nature of modern cloud environments. This is where purpose-built solutions come into play. These solutions must address the unique challenges of credential management in dynamic, cloud-native environments. Unlike secret management, which often relies on static secrets, credential management provides dynamic, attested identities for workloads. For example, SPIFFE (Secure Production Identity Framework for Everyone) offers a strategy for managing dynamic identities, providing the foundation for standards-based, scalable, and robust workload identity management.

Higher-Level Concepts in Workload Credential Management:

  1. Provisioning and Deprovisioning:
    • Efficiently creating, updating, and removing machine identities as services start, stop, and scale.
  2. Role-Based Access Control (RBAC):
    • Assigning access rights based on predefined roles, ensuring that machines and services have appropriate permissions.
  3. Periodic Reviews:
    • Conducting regular reviews of machine access rights to ensure compliance and appropriateness.
  4. Policy Enforcement:
    • Defining and enforcing access control policies, ensuring that machines and services adhere to security guidelines.
  5. Audit and Reporting:
    • Generating comprehensive reports to demonstrate compliance with regulatory requirements and internal policies.
  6. Risk Analysis:
    • Identifying and mitigating risks associated with machine identities and their entitlements.
  7. Behavioral Analysis:
    • Monitoring machine behavior to detect anomalies and potential security threats.

Just as legacy approaches to secret management fail for workloads, machines, and services, these legacy IGA approaches will not work either. We need built-for-purpose solutions to adapt and not repeat the mistakes of the past. Those solutions will look like credential management systems that enable these functions as a byproduct of how they work, rather than bolting them onto an existing secret management or IGA process. 

This is why I have been an advisor to SPIRL since its founding. The founders, Evan and Eli, are why I knew I had to be involved. As the authors of the standard and having had real production-at-scale experience with SPIFFE, I knew they had what it took, which is why I have been working with them in a more formal capacity as of this last month. They are building the infrastructure that makes, as I like to say, the right thing the easy thing, ensuring their customers don’t repeat the mistakes of the past when it comes to workload and service identity.

Conclusion

Secret management tools were a necessary step in addressing the sprawl of machine secrets, and they will continue to be needed. However, as the complexity and scale of cloud environments grow, so does the need for more sophisticated solutions for workload and service credentials. Credential management systems provide the dynamic, robust framework required for modern machine identity management.

By adopting credential management as the primary approach and using secret management as the exception, organizations can achieve greater security, scalability, and operational efficiency in their cloud-native architectures. To learn more about effective machine identity management, check out SPIRL and see how they are leading the way.

ACME vs. SPIFFE: Choosing the Right One

In the world of certificate lifecycle management for workloads, two approaches often come into focus: ACME (Automated Certificate Management Environment) and SPIFFE (Secure Production Identity Framework for Everyone). While both can deliver certificates to a device or workload, they cater to different use cases. Let’s explore their core differences and why these matter, especially for dynamic workload deployments.

ACME: Proving Control Over Identifiers

ACME is primarily designed to prove control over an identifier, typically a domain name, and to deliver a certificate that demonstrates successful control verification.

  • Control Verification: ACME verifies control through various methods such as HTTP-01, DNS-01, TLS-ALPN, and External Account Binding.
  • Attestation (Optional): Attestation in ACME is secondary and optional, primarily enabling verifying if the key is well protected.
  • Pre-assigned Identifiers: ACME assumes that the identifier (like a domain name) is pre-assigned and focuses on validating control over this identifier.

This approach is particularly useful for scenarios where identifiers are static and pre-assigned, making it ideal for server authenticated TLS and applications that rely on domain names.

SPIFFE: Dynamic Assignment of Identifier Based on Attestation

SPIFFE, conversely, is designed for dynamic workloads, which frequently change as services scale or update. SPIFFE assigns identifiers to workloads dynamically, based on attestation.

  • Identifier Assignment: SPIFFE assigns an identifier to the subject (such as a workload) using attestation about the subject to construct this identifier.
  • Attestation: Unlike ACME, attestation is a core component in SPIFFE, enabling robust multi-factor authentication (MFA) for workloads based on what is running and where it is running.
  • Namespacing and Least Privilege: SPIFFE facilitates the namespacing of identifiers, building a foundation that enables authorization frameworks that promote least privilege, ensuring workloads only access necessary resources.
  • Minimal Security Domains: At the core of SPIFFE is the concept of security domains, which serve as trust boundaries between services, helping to minimize the attack surface.
  • JWT and X.509: SPIFFE supports SVIDs in both X.509 and JWT formats, enabling seamless integration with various systems and protocols within cloud deployments.

Importance in Dynamic Workload Deployments

The differences between ACME and SPIFFE are particularly significant in dynamic workload deployments:

  • Flexibility and Scalability: SPIFFE’s dynamic identifier assignment is highly suitable for environments where workloads are frequently spun up and down, such as in microservices architectures and Kubernetes clusters.
  • Role-Based Authentication: By making attestation a core component and promoting least privilege, SPIFFE ensures that each workload is authenticated and authorized precisely for its intended role.
  • Granular Policy Enforcement: SPIFFE’s namespacing and minimal security domain features enable fine-grained policy enforcement, enabling organizations to define and enforce policies at a more granular level, such as per-workload or per-service.
  • Identity Federation: SPIFFE supports identity federation enabling different trust domains to interoperate securely. This is particularly beneficial in multi-cloud and hybrid environments where different parts of an organization or third-parties.

Conclusion

While both ACME and SPIFFE are used for certificate management, they cater to different scenarios and can complement each other effectively. ACME is ideal for static environments where identifiers are pre-allocated, focusing on certificate lifecycle management for issuing, renewing, and revoking certificates. It excels in managing the lifecycle of certificates for web servers and other relatively static resources. 

On the other hand, SPIFFE excels in dynamic, high-scale environments, emphasizing credential lifecycle management with flexible and robust authentication and authorization through dynamic identifiers and attestation. SPIFFE is designed for modern, cloud-native architectures where workloads are ephemeral, and identifiers need to be dynamically issued and managed. 

By understanding these differences, you can leverage both ACME and SPIFFE to enhance your identity management strategy. Use ACME for managing certificates in static environments to ensure efficient lifecycle management. Deploy SPIFFE for dynamic environments to provide strong, attested identities for your workloads.

Effortless Certificate Lifecycle Management for S/MIME

In September 2023, the SMIME Baseline Requirements (BRs) officially became a requirement for Certificate Authorities (CAs) issuing S/MIME certificates (for more details, visit CA/Browser Forum S/MIME BRs).

The definition of these BRs served two main purposes. Firstly, they established a standard profile for CAs to follow when issuing S/MIME certificates. Secondly, they detailed the necessary steps for validating each certificate, ensuring a consistent level of validation was performed by each CA.

One of the new validation methods introduced permits mail server operators to verify a user’s control over their mailbox. Considering that these services have ownership and control over the email addresses, it seems only logical for them to be able to do domain control verification on behalf of their users since they could bypass any individual domain control challenge anyway. This approach resembles the HTTP-01 validation used in ACME (RFC 8555), where the server effectively ‘stands in’ for the user, just as a website does for its domain.

Another new validation method involves delegating the verification of email addresses through domain control, using any approved TLS domain control methods. Though all domain control methods are allowed for in TLS certificates as supported its easiest to think of the DNS-01 method in ACME here. Again the idea here is straightforward: if someone can modify a domain’s TXT record, they can also change MX records or other DNS settings. So, giving them this authority suggests they should reasonably be able to handle certificate issuance.

Note: If you have concerns about these two realities, it’s a good idea to take a few steps. First, ensure that you trust everyone who administers your DNS and make sure it is securely locked down. 

To control the issuance of S/MIME certificates and prevent unauthorized issuance, the Certification Authority Authorization (CAA) record can be used. Originally developed for TLS, its recently been enhanced to include S/MIME (Read more about CAA and S/MIME).

Here’s how you can create a CAA record for S/MIME: Suppose an organization, let’s call it ‘ExampleCo’, decides to permit only a specific CA, ‘ExampleCA’, to issue S/MIME certificates for its domain ‘example.com’. The CAA record in their DNS would look like this:

example.com. IN CAA 0 smimeemail "ExampleCA.com"

This configuration ensures that only ‘ExampleCA.com’ can issue S/MIME certificates for ‘example.com’, significantly bolstering the organization’s digital security.

If you wanted to stop any CA from issuing a S/MIME certificate you would create a record that looks like this: 

example.com. IN CAA 0 issuemail ";"

Another new concept introduced in this round of changes is a new concept called an account identifier in the latest CAA specification. This feature allows a CA to link the authorization to issue certificates to a specific account within their system. For instance:

example.com. IN CAA 0 issue "ca-example.com; account=12345"

This means that ‘ca-example.com’ can issue certificates for ‘example.com’, but only under the account number 12345.

This opens up interesting possibilities, such as delegating certificate management for S/MIME or CDNs to third parties. Imagine a scenario where a browser plugin, is produced and managed by a SaaS on behalf of the organization deploying S/MIME. This plug-in takes care of the initial enrollment, certificate lifecycle management, and S/MIME implementation acting as a sort of S/MIME CDN.

This new capability, merging third-party delegation with specific account control, was not feasible until now. It represents a new way for organizations to outsource the acquisition and management of S/MIME certificates, simplifying processes for both end-users and the organizations themselves.

To the best of my knowledge, no one is using this approach yet, and although there is no requirement yet to enforce CAA for SMIME it is in the works. Regardless the RFC has been standardized for a few months now but despite that, I bet that CAs that were issuing S/MIME certificates before this new CAA RFC was released are not respecting the CAA record yet even though they should be. If you are a security researcher and have spare time that’s probably a worthwhile area to poke around 😉

The Rise of Key Transparency and Its Potential Future in Email Security

Key Transparency has slowly become a crucial part of providing truly secure end-to-end encrypted messaging. Don’t believe me? The two largest providers of messaging services, Apple and Facebook (along with their WhatsApp service), have openly adopted it, and I am hopeful that Google, one of its early advocates, will follow suit.

At the same time, we are on the precipice of interoperable group messaging as Messaging Layer Security (MLS) was recently standardized. Its contributors included representatives from employees of the mentioned services and more, which suggests they may adopt it eventually. What does this have to do with Key Transparency? It acknowledges the need for secure, privacy-preserving key discovery through its inclusion of Key Transparency in its architecture.

It’s also noteworthy to see that Apple has agreed to support RCS, Android’s messaging protocol. While there is no public hint of this yet, it’s possible that since they have positioned themselves as privacy champions over the last decade frequently toting their end-to-end encryption, we may see them push for MLS to be adopted within RCS, which could net the world its first interoperable cross-network messaging with end-to-end encryption, and that would need a key discovery mechanism.

In that spirit, recently the Internet Engineering Task Force (IETF) has established a Working Group on Key Transparency, and based on the participation in that group, it seems likely we will see some standardization around how to do Key Transparency in the future.

What’s Next for Key Transparency Adoption Then?

I suspect the focus now shifts to S/MIME, a standard for public key encryption and signing of emails. Why? Well, over the last several years, the CA/Browser Forum adopted Baseline Requirements (BRs) for S/MIME to help facilitate uniform and interoperable S/MIME, and those became effective on September 1, 2023 – this means CAs that issue these certificates will need to conform to those new standards.

At the same time, both Google and Microsoft have made strides in their implementations of S/MIME for their webmail offerings.

In short, despite my reservations about S/MIME due to its inability to address certain security challenges (such as metadata confidentiality, etc), it looks like it’s witnessing a resurgence, particularly fueled by government contracts. But does it deliver on the promise today? In some narrow use cases like mail signing or closed ecosystem deployments of encrypted mail where all participants are part of the same deployment, it is probably fair to say yes.

With that said, mail is largely about interoperable communications, and for that to work with encrypted S/MIME, we will need to establish a standard way for organizations and end-users to discover the right keys to use with a recipient outside of their organization. This is where Key Transparency would fit in.

Key Transparency and S/MIME

Today, it is technically possible for two users to exchange certificates via S/MIME, enabling them to communicate through encrypted emails. However, the process is quite awkward and non-intuitive. How does it work? You either provide the certificate out of band to those in the mail exchange, and they add it to their contact, or some user agents automatically use the keys associated with S/MIME signatures from your contacts to populate the recipient’s keys.

This approach is not ideal for several reasons beyond usability. For instance, I regularly read emails across three devices, and the private keys used for signing may not be the same on each device. Since consistent signing across devices isn’t required, if I send you an email from my phone and then you send me an encrypted message that I try to open on my desktop, it won’t open.

Similarly, if I roll over my key to a new one because it was compromised or lost, we would need to go through this certificate distribution workflow again. While Key Transparency doesn’t solve all the S/MIME-related problems, it does provide a way to discover keys without the cumbersome process, and at the same time, it allows recipients to know all of my active and published certificates, not just the last one they saw.

One of the common naive solutions to this problem is to have a public directory of keys like what was used for PGP. However, such an approach often becomes a directory for spammers. Beyond that, you have the problem of discovering which directory to use with which certificate. The above Key Transparency implementations are all inspired by the CONIKS work, which has an answer to this through the use of a Verifiable Random Function (VRF). The use of the VRF in CONIKS keeps users’ email addresses (or other identifiers) private. When a user queries the directory for a key, the VRF is used to generate a unique, deterministic output for each input (i.e., the user’s email). This output is known only to the directory and the user, preserving privacy.

The generic identifier-based approach in Key Transparency means it can neatly address the issue of S/MIME certificate discovery. The question then becomes, how does the sender discover the Key Transparency server?

Key Transparency Service Discovery

The answer to that question probably involves DNS resource records (RRs). We use DNS every day to connect domain names with IP addresses. It also helps us find services linked to a domain. For instance, this is how your email server is located. DNS has a general tool, known as an SRV record, which is designed to find other services. This tool would work well for discovering the services we’re discussing.

_sm._keytransparency._https.example.com. 3600 IN SRV 10 5 443 sm-kt.example.com.

In this example, _sm the identifier is placed before _keytransparency. and _https shows that this SRV record is specifically for a Key Transparency service for Secure Messaging. This allows someone to ask DNS for a S/MIME-specific Key Transparency service. It also means we can add other types of identifiers later, allowing for the discovery of various KT services for the same domain.

Conclusion

While S/MIME faces many challenges, such as key roaming, message re-encrypting on key rollover, and cipher suite discoverability, before it becomes easy to use and widely adopted—not to mention whether major mail services will invest enough in this technology to make it work—there’s potential for a directory based on Key Transparency if they do.

Hopefully, the adoption of Key Transparency will happen if this investment in S/MIME continues, as it’s the only large-scale discovery service for user keys we’ve seen in practice. Unlike other alternatives, it’s both privacy-respecting and transparently verifiable, which are important qualities in today’s world. Only time will tell, though.

To err is human, to forgive is divine

Every service should strive to provide perfect availability, the reality is that it’s not possible to be perfect. Mistakes happen, it’s how you deal with them that is important.

Successfully dealing with availability issues requires planning, and when dealing with a client-server solution, it requires both parties to make improvements.

In the context of ACME today the large majority of clients have no failover or logic. This means if the enrollment fails due to connectivity issues to the specified CA, the CA has an outage, or the CA suspends operations for one reason or another the enrollment will fail.

This is a problem for any service protected with TLS, which is basically every service, that wants to have a highly available service. One of the ways services deal with this is to proactively acquire two certificates for every endpoint, this is viable, but I would argue this is not the right solution.

The right solution is to not rely on a single CA and instead failover between many CAs that are capable of servicing your needs. This way if any single CA fails you are fine, you can just keep chugging along.

That is not sufficient to address this risk though. Not all certificates have the same level of device trust. Sometimes clients make bad assumptions and trust specific CA hierarchies assuming these configurations are static, even when they are not.

To help mitigate this behavior clients should implement a round-robin or random CA selection logic so that subsequent renewals will hit different CAs. This will force clients to make sure they work with any of your chosen CAs. This way you won’t find yourself breaking apps that make those bad assumptions when your CA fails.

Caddy Server already implements both of these strategies as I understand it but every ACME client should be doing the same thing.

A Boy Scout is always prepared

My father always says it’s not a problem to make a mistake, what is important is how you deal with it. 

The same thing is true when it comes to WebPKI CAs, broadly the incident response process used in this ecosystem could be categorized as Blameless Post Mortem. The focus is on what happened, what contributed to it, and what was done to address the issue and not on fault.

A few years ago a number of large CAs had to do millions of revocations, in all of the cases I am thinking the required deadline for those revocations was 5 days. Revoking a large number of certificates that are not directly obvious but if you’re a CA who has done any moderate level of planning it’s something you should be up for.

The thing is that doing so can cause harm, for example, the issue that necessitates the revocation might be incredibly subtle and not security-impacting. Nonetheless, the requirement is what the requirement is — the certificate needs to be revoked.

The question then becomes how can you meet that timeline objective without creating an unnecessary outage for customers? If you defy the rules you risk being distrusted, if you act blindly you could take down your customer’s services.

The question then becomes how do you contact millions of customers and give them enough time to replace their certificates without an outage with these constraints?

Like most scale problems the answer is automation, in the context of certificate lifecycle management that means an extension to the ACME protocol. To that end, there is now a draft for something called “ACME Renewal Information” which when implemented by CAs and ACME clients will enable a CA to signal that there may be a need to replace their certificate earlier than expected.

The basic idea with this proposal is that the CA will make available hints on when it would like certificates to be updated and the client will periodically check this information and use it to guide its renewal behavior.

To be clear, this is just a hint, a CA might be providing this hint just to smooth out the load, but there is no mandate to rely on the hint. With that said I do hope that all major ACME clients implement this standard and respect it by default because it will make the WebPKI a lot less fragile.