Tag Archives: WebPKI

WebPKI, TLS, cross-signs, device compatibility, and TLS record size.

Both the Chrome and Mozilla root program have signaled the intent to substantially shorten the time we rely on roots in the WebPKI. I believe this to be a good objective but I am struggling to get my head around the viability and implications of the change.

In this post, I wanted to capture my inner dialog as I try to do just that 🙂

I figure a good place to start when exploring this direction is why this change is a good idea. The simplest answer to this question is probably that root certificates are really just “key bags” — by that I mean they are just a convenient way to distribute asymmetric keys to clients. 

Asymmetric keys have an effective lifetime, this means the security properties they offer start to decay from the moment the key is created. This is because the longer the key exists the more time an attacker has had to guess the corresponding private key from its public key, or worse compromise the private key in some way. For this reason, there are numerous sources of guidance on usage periods for asymmetric keys, for example, NIST guidance from 2020 says that a 2048 key is usable between 2019 and 2030.

There is also the question of ecosystem agility to consider. Alexander Pope once said To Err is Human, to forgive divine. This may actually be the most important quote when looking at computer security. After all, security practitioners know that all systems will have security breaches, this is why mature organizations focus so much on detection and response. I believe the same thing is needed for ecosystem management, the corollary, in this case, is how quickly you can respond to unexpected changes in the ecosystem.

Basically having short-lived root keys is good for security and helps ensure the ecosystem does not become calcified and overly dependent on very old keys continuing to exist by forcing them to change regularly.

With that, all said, if it were easy we would already be doing it! So what are the challenges that make it hard to get to this ideal? One of the hairiest is the topic of the Internet of Things (IOT). Though I would be the first to admit there is no one size fits all answer here, broadly these devices should not use the WebPKI.

In 2022, the market for the Internet of Things is expected to grow 18% to 14.4 billion active connections. It is expected that by 2025, as supply constraints ease and growth further accelerates, there will be approximately 27 billion connected IoT devices. 

IOT-Analytics.com

A big reason for this is that these devices last a very long time compared to their mobile phone and desktop counterparts. They also, unfortunately, tend not to have managed root stores and if they do, they do it as part of firmware updates and have limited attach rates of these updates. To make things worse these devices take crude simplifying assumptions around what cryptographic algorithms are supported and what root CAs will be used by servers in the future.

But for most IoT applications, like those enabling the smart city, the device life cycle is 10 to 20 years or more. For instance, it doesn’t make sense to replace the wireless module in smart streetlights every few years.

Ingenu

A good topical example of the consequences here might be the payment industry making a decision in the 90s to adopt a single WebPKI CA  (a VeriSign-owned and operated root certificate) in payment terminals without having a strategy for updating the device’s root stores.  These terminals communicated to web servers that were processing payments. In short, the servers had to use certificates from this one WebPKI Root CA or no payment terminals could reach them.

This design decision was made during a time when SHA-1 was considered “secure”. The thing is cryptographic attacks are always evolving, and since 2005 SHA-1 has not been considered secure. As a result, browsers moved to prevent the use of this algorithm. 

This meant that one of two things would happen, either browsers would break payment terminals by preventing WebPKI CAs from using this algorithm, or payment terminals’ lack of an update strategy would put the internet at risk by slowing the migration from SHA-1. Unfortunately, the answer was the latter.

In short, when you have no update story for roots and you rely on the WebPKI you are essentially putting both your product and the internet at risk. In this case, while a root and firmware update story should have existed regardless, the payment industry should have had a dedicated private PKI for these use cases.

OK, so what does this have to do with moving to a shorter root validity period in Browser root programs? Well, when a TLS server chooses what certificate to present to a client it doesn’t know much more than the IP address of the client. This means it can’t selectively choose a TLS certificate based on it being an IOT device, a phone running an old version of a browser, or a desktop browser.

This is where the concept of “root ubiquity” and what those in the industry sometimes called “root baking”. Some root programs process requests for new root inclusion very quickly, maybe within a year your root can become a member of their program but that isn’t enough.  You need the very large majority of the devices that rely on that root program to pick up the new root before the devices will trust it.

In the case of Microsoft Windows, the distribution of a root certificate happens via a feature called AutoRoot Update. The uptake for consumer devices of a new root is very high as this is seldom turned off and the URL that is used to serve this update is seldom blocked. Enterprises and data centers are another matter altogether though. While I do not have any data on what the split is, anecdotally I can say it’s still hit or miss if a root has been distributed in this system as a result of this.

Chrome on the other hand uses a configuration management system for all Chrome that also happens to control what roots are trusted. Again while I have no data I can point out anecdotally it appears this works very well and since chrome updates all browsers automatically they tend to have the latest binaries and settings.

Safari on the other hand manages the roots in the firmware/operating system update. This means if a user does not update their devices’ software the roots will never be present even if you are a member of the root program.

These are just a few examples of some of the root programs CAs have to worry about, and the problems they have to consider.  

What this means is the slowest root program in the WebPKI to accept and distribute roots holds back the adoption of new CAs.  

As an aside, It is worth noting that around 59% of all browser traffic (not including IoT devices) is mobile. This includes a lot of developing country traffic where phones are kept longer and replacement models are often still old models with older software and root stores that are not updated. This is before you get to TVs, printers, medical devices, etc that seldom do forced firmware updates and never manage roots independently from firmware.

As a result of the above, these days I generally tell people that it takes 5-7 years post inclusion to get sufficient ubiquity in a root to be able to rely on it for a popular service as a result.

A CA will typically try to address this problem by acquiring what is known as a cross-sign (see Unobtainium), there are two recent examples of this I can point to. One is the cross-sign for Let’s Encrypt and the other is the cross-sign for Google Trust Services. The problem is most CAs don’t want to help other CAs with cross-signs for obvious reasons. Beyond that increasingly there are fewer and fewer of the “old keys” still in use on the web that is baked into the oldest devices meaning there are not a lot of choices for those cross signs even if you can get one.

Let’s assume for the purpose of this (now long) post that you are able to get a cross sign either from a competitor or because you happen to have custody of older key material that was grandfathered into the root program that has the needed ubiquity.

In the simplest case, you get to a situation where you are asking those that use your certificates to include the two CA certificates in their TLS bundle.  At a minimum, we start to pay a performance penalty for doing this.

As an example consider that each certificate is about 1.5KB in size, by adding this new certificate to the bundle every fresh TLS negotiation will carry this tax.  If we assume a typical chain is normally 3 in length that makes our new total 4.5kb in overhead. It’s not a large figure but if you consider a high-traffic site like Amazon it adds up quickly.

How many purchases are made on Amazon daily? On a daily basis, Amazon ships more than 66,000 orders per hour in the US. About 1.6 million packages are shipped on a daily basis.

capitalcounselor.com

If we assume each of Amazon’s orders equates to a single fresh TLS negotiation (it should be many more than that since many servers power the site) that is a daily increase of 3.6GB of traffic. Now Amazon can deal with that with no problem but it would slow down the experience for users, and potentially cost those users without unlimited internet plans some of their allotted bandwidth. There is also the potential of fragmentation as the TLS packets get larger which increases the latency for the user.

Don’t get me wrong, these are not deal breakers but they are taxes that this cross sign approach represents. 

My fear, and to be clear I’ve not completed the thought process yet, the above story creates a situation where CAs need to provide multiple cross signs to support all the various device combinations that are out there in this new world.

If so this scares me for a few reasons, off the top of my head these include:

  1. It is already hard enough to get a single cross sign doing so multiple times will be that much harder,
  2. It will put new entrants at a disadvantage because they will not have legacy key material to rely on nor the high capital requirements to secure cross signs if commercial terms can even be reached allowing them to get one,
  3. We have spent the last decade making TLS the default and we are close to being able to declare victory on this journey if TLS becomes less reliable we may lose ground on this journey,
  4. It is a sort of regressive taxation on people with older and slower devices that are likely already slow and paying for data on a usage basis.
  5. The additional data required will have a negative impact on Time to First Paint (TTFP) in that this extra data has to be exchanged regardless.

Mom always said don’t complain if you don’t have a proposal to make things better, so I figure I should try to propose some alternatives. Before I do though I want to reiterate that this post isn’t a complaint, instead it’s just a representation of my inner dialog on this topic. 

OK, so what might be a better path for us getting to this world of shorter-lived roots? I guess I see a few problems that lead to the above constraints

  1. Even in the browser ecosystem, we need to see root lists updated dynamically,
  2. Inclusion into a root store should take months for your initial inclusion and later updates should happen in weeks,
  3. Browsers should publish data to help CAs understand how ubiquitous their root distributions are so they can assess if cross signs are even necessary,
  4. There needs to be some kind of public guidance to IoT devices on what they should be doing for the usage of certificates in their devices and this needs to include root management strategies.
  5. IOT devices need to stop using the WebPKI, it is increasingly the Desktop and Browser PKI and anything else will get squashed like a bug in the future if they are not careful 🙂
  6. There should be a Capability and Maturity Model for root programs and associated update mechanisms that can be used to drive change in the existing programs 
  7. CAs should stop using one issuing hierarchy for all cases and divide up the hierarchies into as narrow of slices as possible to reduce the need for one root to be trusted in all scenarios.
  8. Root programs should allow and go so far as to encourage CAs to have more than the 3 (typical) roots they allow now to support CAs in doing that segmentation 
  9. WebPKI root programs should operate two root programs, the legacy one with all of its challenges and the new one focused on agility, automation, and very narrow use cases. Then they should use the existence of that program to drive others to the adoption of those narrower use cases.
  10. And I guess finally we have a chance with the Post Quantium TLS discussions to look at creating an entirely new WebPKI (possibly without X.509!) that is more agile and narrow from the get-go.

Why crawling is not an adequate measurement methodology for the WebPKI

The answer is simple — It’s an incomplete view of the use of the WebPKI.

There are a number of different methodologies a web crawler-based approach might take in measuring the size of the WebPKI. The most naive approach would be to simply scan all IPv4 address space and log all of the certificates you see during this scan.

The problem is that this only shows a small fraction of the certificates that are out there. When you connect to an IP address and the associated web server doesn’t know what host you are trying to connect to it will return its “default” website and use the associated certificate.

That same IP address may literally be responsible for serving millions of sites based on the client’s indicated hostname. With this IP-based enumeration approach at best you would get one certificate from that host, at worse you wouldn’t even get that because some servers are not configured with a default site. This is just one problem with this approach there are many more.

Though most WebPKI market share reports do not document their methodology anecdotally it appears most work on this crawler approach and at least historically some have taken periodic drops from CAs to make their view “more complete”.

Today though the only way to measure CA market share that should be used is by relying on the pre-certificate counts in Certificate Transparency logs.

How to measure the WebPKI ecosystem

The web is dependent on there being a robust, secure, and scalable set of CAs being able to provide TLS certificates. It is unhealthy for there to be a single provider because if for any reason they have an operational or security issue they could become unavailable leaving the web in a world of hurt.

Beyond that in the name of TLS reliability TLS certificate consumers should be relying on multiple CAs for their certificates. For example, to reduce exposure to outages your certificate lifecycle management solution should support failover from one CA to the next. 

Another example of why you should use multiple CAs is to help ensure relying on party agility to changes in CAs, for example, if a CA changes which root key material they use you may lose (or gain) device compatibility, or if an issuing CA changes and someone is pinning you might break them. By to ensure device compatibility long term one should use multiple CAs to help ensure the relying party ecosystem you support is agile to these changes.

For this to work though you need to have an ecosystem of CAs you can use interchangeably, ACME (RFC 8555) helps here substantially because it provides a normalized way to interact with CAs to get these certificates. That is only helpful if there are multiple CAs that implement the protocol and if those CAs are able to scale to meet the needs of those who rely on them.

This is particularly important when you look at SaaS-like offerings the larger ones will often demand millions of certificates that need to be able to be revoked and re-issued in less than 24 hours in some cases so the scalability of the CA becomes particularly important.

Assessing the scalability of a CA is hard but one of the closest proxies you have is their overall market share.

In the US, according to the Google Transparency Report, 97% of all web traffic is protected with TLS. To put that in context there were 366.8 million registered domain names as of 2022.

Certificates can represent more than one domain name so depending on what you are measuring certificate count may not be the best metric to asses CA market share. With that said in the context of scalability, it’s probably a good metric.

What are some ways to evaluate the CA impact and market share?

  • How many certificates are issued by the CA and are unexpired.
  • How many domains are contained within the unexpired certificates issued by a CA.
  • What percentage of web traffic would be covered by the certificates issued by a CA.
  • What percentage of certificates issued by the CA are unexpired and actively in use.

Each of these answers different questions, and they progressively get harder to measure as you go down the list. The easiest by far is how many certificates are issued and still unexpired. This is because all CAs log what is called a pre-certificate to the Certificate Transparency ecosystem before issuance.

NOTE: Publication of a pre-certificate is not required by the rules of the ecosystem however not doing so would mean that users relying on that certificate would get an error.

While the existence of a pre-certificate doesn’t promise the certificate is in use it does signal that someone who controlled that domain wanted to use a certificate for that domain. They wouldn’t have bothered going to the trouble of doing that if there was not an intent to use the certificate in some way.

The easiest way to look at this data is to use the excellent https://crt.sh/cert-populations report. While it does go down from time to time it also provides very fresh views into the un-expired pre-certificate count.

NOTE: Since not all CAs publish what is referred to as the “final certificate” you can safely ignore the Certificate count data on this report.

So what does this data look like (As of July 29th, 2022)?

CertificatesPrecertificates
ALLUnexpiredALLUnexpired% of Unexpired Population
Internet Security Research Group2,834,892,521264,685,3352,553,476,280228,023,48050.18%
Sectigo109,399,9847,245,014373,669,758106,119,71323.35%
DigiCert560,740,35744,640,273497,448,38945,475,97210.01%
GoDaddy6,371,9601,874,81252,669,26031,293,3046.89%
Google Trust Services LLC17,28417828,112,66215,443,3063.40%
Amazon13,540,55698,980104,887,85914,757,6503.25%
GlobalSign nv-sa16,729,66393723,636,7786,893,7281.52%
Actalis55233,236,4931,691,7440.37%
Asseco Data Systems S.A. (previously Unizeto Certum)6,298,4726209,375,7421,571,8520.35%
Start Commercial (StartCom) Ltd.1,495,580982,866,004883,0220.19%
?1,241,4632143,924,285567,4800.12%
Entrust739,9015242,304,521554,4310.12%
SECOM Trust Systems CO., LTD.156,234-112,217,668242,8150.05%
WoSign CA Limited88,6607250,823110,1010.02%
Certainly LLC31,361205240,103101,5330.02%
Buypass186,2002702,127,22898,8350.02%
QuoVadis53,636432236,06397,4540.02%
SecureTrust311,226227301,19778,4890.02%
Microsoft Corporation Core Services Engineering & Operations ( “Microsoft CSEO”)216,44873,560212,90574,6970.02%
Deutsche Telekom Security GmbH57,57032147,94949,5560.01%
JPRS15,7383482,64236,5120.01%
SwissSign AG237,88668,27283,50426,9750.01%
Government of Spain, Fábrica Nacional de Moneda y Timbre (FNMT)86,87223,86657,20623,7720.01%

What you will see is the top 5 CAs out of 233 issue 98.59% of all TLS certificates. While I would like to see this distribution be more normalized to ensure that the ecosystem is not overly dependent on any one entity as far as health goes it does show there are several large providers out there that support the Web who have demonstrated they can scale to meet large certificate consumption needs.

One thing you will notice in this data is that the variability in the pre-certificate “ALL” and “Unexpired” count is quite large in some cases. This is because some CAs like Let’s Encrypt and Google Trust Services either predominantly, or exclusively issue shorter-lived certificates. This results in the certificate count in “All” being much higher than the “Unexpired” case.

So what can we take away from this data? I think there are four key takeaways:

  1. Support of certificate issuance via ACME has made shorter-lived certificates viable and they now represent the large majority of certificates on the web.
  2. Support of ACME has helped grow the percentage of the web that is encrypted from about half of the web to nearly 100% of the web.
  3. 2.15% of CAs issue 98.59% of all TLS certificates on the web.