Monthly Archives: March 2021

How is ACME different than XCEP/WSTEP anyway?

If you read my blog there is a reasonable chance that you are familiar with RFC 8555, the standard for Automatic Certificate Management Environment (ACME). Even though ACME is a relatively young protocol it is already used by the majority of websites on the internet for certificate lifecycle management.

While I won’t go into a lot of detail for this post to make sense you have to understand a couple of things about the ACME protocol. 

The first is that it works on the concept of dynamic “account” registration. By that I mean requestors can in real-time, request that an “account” be created for them. This account is represented by a public key pair that the ACME service will use to persist meta-data about the requestor. The ACME service can deny this request for any reason it likes but commonly in Web PKI as long as the request is wellformed it is accepted.

The next thing you need to understand is that it has the concept of “challenges” that can be used to communicate conditions that must be met before a certificate is issued. For example, an ACME service may request that the account holder demonstrate that they are authorized to get a certificate for a given domain name by placing a specific value in DNS at a well-known location. Since only a DNS administrator could perform that action the ACME service can have confidence the requestor controls the name it has requested a certificate for.

And finally, there is nothing in the ACME protocol that limits its use to just web server certificates. It is a general framework that can be used to acquire and manage certificates of any type. For example, CISCO is using ACME in their WebEx offering to facilitate the acquisition of what are essentially email certificates via OIDC authentication as a way to authenticate chat members.

But there is another very popular protocol, a well set of protocols that fewer people know about, that is XCEP and WSTEP. These protocols are used by Windows machines to both determine what kind of certificates a machine or user should enroll for as well as enabling the enrollment for those certificates. 

Similar to ACME these protocols also support, although in a more ridged rigid way, the ability for the issuer to challenge the client for additional information necessary to get a certificate of a particular type. For example, you can configure a certificate type (known as a template) to require that the requestor provide a cryptographic attestation backed by a TPM to be used to prove the machine belongs to the organization operating the certificate authority.

There are differences though, the first of which is the concept of a template, this enables XCEP/WSTEP to have one URL endpoint issue many types of certificates which is very important within an enterprise which is where certificates are used for many different scenarios.

Another difference is that XCEP/WSTEP presume the authorization of the client happened out of the band before the client requested the certificate. The dynamic approach to challenges that was adopted by ACME allowed it to tackle this problem in-band or rely on the out-of-band authorization. It supports this out-of-band concept through the concept of External Account Binding which allows the requestor to use an API key gathered out of band to prove on account creation the account key is associated with some pre-enrolled user.

And finally, ACME has a clear model for extensibility built into it. What this means is that one can easily extend it with additional capabilities. The most fundamental part of this is the Directory resource which lists all of the APIs supported by this ACME instance. One could use this, for example, to add a “Templates” API that would allow an ACME client to request specific types of certificates from the ACME endpoint.

Similarly, the concept of the challenge allows the server to demand the client do any number of things before the certificate is to be issued so the idea of adding a TPM challenge, for example, is trivial within this framework.

In short, ACME, contrary to popular belief, is not a protocol for getting and managing website certificates it is a framework for getting and managing any certificate. More importantly, it is extensible in such a way that with just a few minor additions it would be a proper superset of all the capabilities within the Windows enrollment protocol suite.

Why is this important? That’s easy! When I talk to anyone who is using certificates at any reasonable scale their concerns almost instantly come to the complexity of managing the certificate lifecycle management of those certificates across the various products and services that use them.

When we look at this complexity most of it arises from the use of a mish-mash of solutions for lifecycle management that when viewed in isolation seemed sufficient but when looked at holistically were actually woefully insufficient. 

If as an industry we move these legacy systems to a single protocol so that certificates regardless of them being for public or private PKI or representing users, machines or workloads use one protocol we will have a reliable substrate that we can use to authenticate and authorize with agility.

The next decade of Public Key Infrastructure…

Background

Before we talk about the future we need to make sure we have a decent understanding of the past. X.509 based Public Key Infrastructure originally was created in the late 80s with a focus on enterprise and government use cases.  These use cases were largely for private systems, it was not until a decade later this technology was applied to the internet at large.

Since the standards for enrollment and lifecycle management at the time were building blocks rather than solutions and were designed for government and enterprise use cases rather than the internet, the Web PKI, as it became known, relied largely on manual certificate lifecycle management and a mix of proprietary automation solutions.

While the use of PKI in the enterprise continued, primarily thanks to Microsoft AD/CS and its automatic certificate lifecycle management (I worked on this project), the Web PKI grew in a far more visible way. This was primarily a result of the fact that these certificates had to be acquired manually which led to the creation of an industry focused on sales and marketing of individual certificates.

The actors in this system had no incentive to push automation as it would accelerate the commoditization of their products. The reality was that these organizations had also lost much of their technical chops as they became sales and marketing organizations and could no longer deliver the technology needed to bring this automation anyways.

This changed in 2016 when the Internet Security Research Group, an organization I am involved in, launched Let’s Encrypt. This was an organization of technologists looking to accelerate the adoption of TLS on the web and as such started with a focus on automation as it was clear that without automation growth of HTTPS adoption would continue to be anemic. What many don’t know is in when Let’s Encrypt launched HTTPS adoption was at about 40% and year over year growth was hovering around 2-3%, about the rate of growth of the internet and — it was not accelerating. 

Beyond that TLS related outages were becoming more frequent in the press, even for large organizations. Post mortems would continuously identify the same root causes, a manual process did not get executed or was executed incorrectly.

The launch of Let’s Encrypt gave the Internet the first CA with a standards-based certificate enrollment protocol (ACME), this combined with the short-lived nature of the certificates they issued meant those that adopted it would have to use automation for their services to reliably offer TLS. This enabled products to make TLS work reliably and by default, a great example of this is the Caddy web server. This quickly took the TLS adoption rate to around 10% year over year and now we are hovering around 90%+ HTTPS on the internet.

While this was going on the concept of microservices merged with containers which led to container-orchestration, which later adopted the concept of mesh networking. This mesh networking was often based on mutual-TLS (mTLS). The most visible manifestation of that being SPIFFE, the solution used by Kubernetes.

At the same time, we saw networks becoming more composable, pushing authentication and authorization decisions out to the edge of the network. While this pattern has had several names over the years we now call it Zero Trust and a visible example of that today is Beyond Corp from Google. These solutions again are commonly implemented ontop of mutual TLS (mTLS).

We now also see the concept of Secure Access Service Edge (SASE) or Zero-Trust Edge gaining speed which extends this same pattern to lower-level network definition. Again commonly implemented ontop of mTLS.

The reality is that the Web PKI CAs were so focused on sales and marketing they missed almost all of these trends. You can see them now paying lip service to this by talking about DevOps in their sales and marketing but the reality is that the solutions they offer in this area are both too late and too little. This is why cloud technology providers like Hashicorp and cloud providers like Amazon and Google (I am involved in this also) had to step in and provide their offerings.

We now see that Web PKI CAs are starting to more seriously embrace automation for the public PKI use cases, for example, most of the major CAs now offer ACME support to some degree and generally have begun to more seriously invest in the certificate lifecycle management for other use cases.

That being said many of these CAs are making the same mistakes they have made in the past. Instead of working together and ensuring standards and software exist to make lifecycle management work seamlessly across vendors, most are investing in proprietary solutions that only solve portions of the problems at hand.

What’s next?

The usage of certificates and TLS has expanded massively in the last decade and there is no clear alternative to replace its use so I do not expect the adoption of TLS to wain anytime soon.

What I do think is going to happen is a unification of certificate lifecycle management for private PKI use cases and public PKI use cases. Mesh networking, Zero-Trust, and Zero-Trust edge is going to drive this unification.

This will manifest into the use of ACME for these private PKI use cases, in-fact this has already started, just take a look at Cert Manager and Small Step Certificates as small examples of this trend. 

This combined with the ease of deploying and managing private CAs via the new generation of Cloud CA offerings will result in more private PKIs being deployed and the availability problems from issues like certificate expiration and scalability will no longer be an issue.

We will also see extensions to the ACME protocol that make it easier to leverage existing trust relationships which will simplify the issuance process for private use cases as well as ways to leverage hardware-backed device identity and key protection to make the use of these certificate-based credentials even more secure.

As is always the case the unification of common protocols will enable interoperability across solutions, improve reliability and as a result accelerate the adoption of these patterns across many products and problems.

It will also mean that over time the legacy certificate enrollment protocols such as SCEP, WSTEP/XCEP, CMC, EST, and others will become less common.

Once this transition happens this will lead us to a world where we can apply policy based on subjects, resources, claims, and context across L3 to L7 which will transform the way we think about access control and security segmentation. It will give both more control and visibility into who has access to what.

What does this mean for the Web PKI?

First I should say that Web PKI is not going anywhere – with that said it is evolving.

Beyond the increase in automation and shorter certificate validities over the next decade we will see several changes, one of the more visible will be the move to using dedicated PKI hierarchies for different use cases. For example, we will ultimately see server authentication, client authentication, and document signing move to their own hierarchies. This move will better reflect the intent of the Web PKI and prevent these use cases from holding the Web PKI’s evolution back.

This change will also minimize the browser influence on those other scenarios. It will do this at the expense of greater ecosystem complexity around root distribution but the net positive will be felt regardless. I do think this shift will give the European CAs an advantage in that they can rely on the EUTL for distribution and many non-web user agents simply do not want to manage a root program of their own so the EUTL has the potential to be adopted more. I will add that is my hope these user agents instead adopt solution-specific root programs vs relying on a generic one not built for purpose.

The Web PKI CAs that have not re-built their engineering chops are going to fall further behind the innovation curve. Their shift from engineering companies to sales and marketing companies resulted in them missing the move to the cloud and those companies that are going through digital transformation via the adoption of SaaS, PaaS, and modern cloud infrastructures are unlikely to start that journey by engaging with a traditional Web PKI CA.

To address this reality the Web PKI CAs will need to re-invent themselves into product companies focusing on solving business problems rather than selling certificates that can be used to solve business problems. This will mean, for example, directly offering identity verification services (not selling certificates that contain assertions of identity), providing complete solutions for document signing rather than certificates one can use to sign a document or turnkey solutions for certificate and key lifecycle management for enterprise wireless and other related use cases.

This will all lead to workloads that were once on the Web PKI by happenstance being moved to dedicated workload/ecosystem-specific private PKIs. The upside of this is that the certificates used by these infrastructures will have the opportunity to aggressively profile X.509 vs being forced to carry the two decades of cruft surrounding it like they are today.

The Web PKI CAs will have an opportunity to outsource the root certificate and key management for these use cases and possibly subcontract out CA management for the issuing CAs but many of these “issuing CA” use cases are likely to go to the cloud providers since that is where the workloads will be anyway.

Due to the ongoing balkanization of the internet that is happening through increased regional regulation, we will see smaller CAs get acquired, mainly for their market presence to let the larger providers play more effectively in those markets.

At the same time, new PKI ecosystems like those used for STIR/SHAKEN and various PKIs to support IoT deployments will pop up and as the patterns used by them are found to be inexpensive, effective, and easily deployable they will become more common.

We will also see that the lifecycle management for both public and private PKI will unify ontop of the ACME enrollment protocol and that through that a new generation of device management platforms will be built around a certificate-based device identity anchored in keys bound to hardware where the corresponding certificates contain metadata about the device it is bound to.

This will lay the groundwork for improved network authentication within the enterprise using protocols like EAP-TTLS and EAP-TLS, enable Zero-Trust and Zero-Trust Edge deployments to be more easily deployed which will, in turn, blur the lines further between what is on-premise and what is in the cloud.

This normalization of the device identity concepts we use across solutions and the use of common protocols for credential lifecycle will result in better key hygiene for all use cases, and simplify deployment for those use cases.

Accountability and Transparency in Modern Systems

Over the last several decades we have seen the rate of technological innovation greatly accelerate. A key enabler of that acceleration has been the move to cloud computing which has made it possible for hardware, software, and services to be shared. This significantly reduced both the capital and time necessary to adopt and operate the infrastructure and services built on these platforms.

This migration started by enabling existing software to run using dedicated computers and networks owned and operated by someone else. As these computers got faster and the tools to share the physical hardware and networks were built, the cost of technological innovation reduced significantly. This is what democratized modern startup entrepreneurship as it made it cost-effective for individuals and small businesses to gain access to the resources once only available to the largest companies.

This flipped technological innovation on its head. It used to be that government and big businesses were the exclusive sources of technological innovation because they were the ones who could afford to buy technology. The lowering of the cost of innovation is what gave us the consumer startups we have today. This drew the attention of large companies to this emerging market and led to the creation of the modern smartphone which was fundamental to creating the market opportunity we see in consumer startups. This was a scale opportunity that was fundamentally different than the prior government and enterprise models of innovation.

As enterprises saw the rate of innovation and agility this new model provided, it became clear that they too needed to embrace this model in their businesses. It is this reality that led to the creation of Salesforce, the first Software As a Service, and AWS, the first to market with a modern Cloud Service Provider. It was these offerings that gave us Software As a Service (SaaS), Platform As a Service (PaaS), and what we think of as modern cloud infrastructure.

At first, these enterprises only moved greenfield or very isolated projects to the cloud but as the benefits of the new model became irrefutable and the capabilities of these offerings were enriched in ways that were impractical to replicate in their environments, they started moving more business-critical offerings. We can see this trend continues today, a recent survey found that 55% of IT organizations are now looking at ways to reduce their on-premise spending. This will lead to many legacy systems being replaced with more modern, scalable, agile, and secure solutions.

That same survey found that digital transformation and security are the two biggest reasons for this shift. This is no surprise when we look at how capital efficient modern businesses are relative to those based on legacy IT and manual processes, or how vulnerable legacy IT systems are to modern attacks. 

This does beg the question, what is next?. I believe that two trends are emerging. The first being the democratization of compliance for modern systems and the second being the shift in expectations of what does it mean to be “secure”.

If we look at the first trend, the democratization of compliance, we see the internet becoming balkanized through regulation and governments seeking to get more control over what people do on the internet. Increased regulation makes it significantly harder for new entrants to compete, which in turn helps entrench the incumbents who can often eat the engineering and compliance costs associated with the regulations. When you think about this in the context of the global economy in which the internet exists, an economy made up of 195 independent sovereign countries, the compliance burden becomes untenable.

Modern Cloud Service Providers can make a significant dent in this by making it possible for those who build on them to meet many of these compliance obligations as a byproduct of adopting their platforms.

In the near term, this will likely be focused on the production of the artifacts and audit reports that are needed to meet an organization’s current compliance requirements but if we project out, it will surely evolve to include services for legal identity verification, content moderation, and other areas of regulatory oversight. A decade from now I believe we will see systems being built on these platforms in such a way that they will be continually compliant producing the artifacts necessary to pass audit as a natural byproduct of the way they work. 

This will in turn make it easier to demonstrate compliance and create new opportunities such as auditors continually monitoring an organization for its compliance with guidelines rather than just doing annual point-in-time assessments as is done today.

This has also led to companies like Coalition building offerings that let customers augment existing systems with the artifacts to demonstrate conformance with security best practices are being met so that insurance companies can offer more affordable risk-based insurance policies.

As we look at the second trend, the redefinition of what it means to be secure, we can see consumers becoming more aware of security risks and as a result, their expectations around the sovereignty of their data and the confidentiality of their information evolving. 

One response to this realization is the idea of decentralization. The thesis here is arguably is that there can be no sovereignty as long as there is centralization. In practice, most of these decentralized systems are in-fact quite centralized. While there are many examples of this, one of the more visible has been the DAO hard fork which was done to recover stolen funds or the simple fact that 65% of Bitcoin mining happens in China. Additionally, for the most part, the properties that enable sovereignty typically come from the use of verifiable data-structures and cryptography and not decentralization. That is not to say these systems do not have a place, I would argue that their success and durability so far at least suggests there is “a there, there” but I would also say that, at least currently, they do not yet live up to their full promise. 

Another response to this is the consumer adoption of End-To-End encryption in messaging applications (even iMessage is end-to-end encrypted!) and by extension to that problem the verifiability of the systems that implement these schemes. 

The best example here is probably Signal, they spent time designing security and privacy into their messaging protocol and implementing systems from the beginning, modeling its design on modern threats and decades of learning about what does, and does not work. This approach led to the protocol that they defined being adopted by many of their competitors, including WhatsApp, Facebook Messenger, Skype, and Google Allo.

Signal is also a great example of the verifiability property, in particular, the work they have done with Contact Discovery is exciting. What they have done with this feature is first to minimize what information they need to deliver the capability in the hope to limit future abuse. Secondly, they leveraged technologies like SGX, which is an example of a Confidential Compute, that enables them to demonstrate what they are doing with the information they do collect. This introduces transparency and accountability which both are important ingredients to earning trust.

The use of hardware security as a key component of the security boundary has already found its way from consumer phones, laptops, and tablets to the cloud. For example, Google Cloud‘s Shielded VMs and Azure Trusted Launch use hardware to provide verifiable integrity to VM instances to make it possible to detect VMs compromised by boot- or kernel-level malware or rootkits similar to how Apple does with the iPhone. We also now see AMD Sev and SGX seeing broader deployment in the larger Cloud Service Providers (I will be the first to admit these technologies have room to grow if they are to live up to their promises but they are promising none the less).

With this foundation, the industry is starting to look at how they can bring similar levels of transparency and accountability into applications and ecosystems too. One of the projects that have demonstrated that doing this can have a big impact is Certificate Transparency. As a result of the investments in deploying Certificate Transparency, the internet is now materially more secure than it was before and this is a direct result of introducing accountability into an opaque ecosystem based on blind trust.

Another example in this space is the Golang Checksum Database where verifiable data-structures like Merkle Trees are being used to introduce accountability into the software supply chain as a means to mitigate risks for those who rely on the Golang ecosystem. 

For many problems in the security space, you can solve from one of two philosophical bases. You can either create privileged systems only visible to a few that you hope aren’t corruptible or you can build democratizing transparency into the system as a check on corruption.

Dino A. Dai Zovi

While the earlier examples are using combinations of hardware, cryptography, and verifiable data-structures to deliver on these properties, other examples take a more humble approach. For example, Google Cloud’s Access Transparency uses privilege separation, audit logs, and workflows to provide the fundamental ability to track business justifications for access to systems and data. The existence of these systems is further validation that the trend of verifiability is emerging.

So what should you take away from this post? I suppose there are four key messages:

  1. The definition of security in modern Cloud services is continuing to be influenced by the consumer space which is leading to the concepts of verifiability, accountability, data sovereignty, and confidentiality becoming table stakes.
  2. Globalization and regulation are going to accelerate the adoption of these technologies and patterns as they will ultimately become necessary to meet regulatory expectations.
  3. Increasingly verifiable data structures, cryptography, and hardware security capabilities are being used to make all of this possible.
  4. These trends will lead to the democratization of compliance to the many regulatory schemes that exist in the world.

I believe when we look back, these trends will have significantly changed the way we build systems and a new generation of businesses will emerge enabling these shifts to take place.