Tag Archives: OCSP

Revocation reasons don’t make sense for the WebPKI of today

As the old saying goes, to err is human but to forgive is divine, in WebPKI you deal with errors through the concept of revocation.

In the event a CA determines it made a mistake in the issuance of a certificate or has been notified by the subscriber they would like to see a certificate marked invalid, they revoke the corresponding certificate.

The two mechanisms they have available to them to do this are called Certificate Revocation Lists and OCSP responses. The first is a like a phonebook of all known “revoked” certificates while the last is more like a lookup that it enables User Agents to ask the status of a particular set of certificates.

In both cases they have the option to indicate why the certificate was revoked, the available options include:

Reason Description
Key Compromise We believe the key to have been stolen or can no longer be sure it has not been.
CA Compromise The issuing CA itself has been compromised (think DigiNotar).
Affiliation Changed The certificate contained affiliation information, for example, it may have been an EV certificate and the associated business is no longer owned by the same entity.
Privilege Withdrawn The right to represent the given host was revoked for some reason.

There are a few other reasons but they are not materially relevant to WebPKI use cases so I left them out.

CA Compromise is clearly a special and exceptional case, so much so that browsers have developed their own mechanism for dealing with it (see OneCRL and CRLSets) so it’s not particularly relevant to the options available to CAs anymore, at least in the WebPKI.

Another thing to consider when looking at revocation reasons is that about 90% of all certificates in used in the WebPKI (for TLS) are considered to be Domain Validated (DV) certificates, this limits the applicability of the remaining reasons substantially.

DV OV EV
Key Compromise Yes Yes Yes
Affiliation Changed Maybe Yes Yes
Privilege Withdrawn Maybe Yes Yes


As you can see, the available “reasons” for DV certificates are largely reduced to just “Key Compromise”. There is an argument that both “Privilege Withdrawn” and “Affiliation Changed” could be applicable to DV but in my opinion, they are a bit of a stretch. If nothing else the difference between these two reasons in the case of DV is so subtle you will almost never see these two usages be used on a DV certificate.

So what about Key Compromise?  How useful is it? Clearly, it is an important use case, your trust in an SSL certificate is frankly only as good as your confidence in how well the key is protected!

The reality is that most key compromise cases can not be detected and worse yet is that it’s far too easy to compromise an SSL key. Take for example the Heartbleed (http://heartbleed.com/) bug from last year, SSL keys are often kept in the process and relatively minor coding mistakes, as a result, can potentially expose that key and users won’t necessarily know about it.

Some platforms go out of their way to prevent such attacks at the expense of performance like we did at Microsoft with SCHANNEL where we keept the key out of process but this is the exception and not the rule.

Also popping a web server due to some third-party code vulnerability and getting to the shell in the context of the server is unfortunately far too easy. As a result, when an attacker does get a shell on a web server one of the first things they should be doing is taking that key so they can use it later, but again, how is this detected?

That is not to say “Key Compromise” as a revocation reason is not important when there is a large-scale vulnerability like Hearbleed but this is not exactly a common case.

The other cases for the “Key Compromise” revocation reason are notably dependent on the subscriber:

  • Knowing the compromise happened,
  • Being mindful enough to notify the CA,
  • Being willing to let the world know there was a compromise.

This last point of public signaling of “compromise” combined with the fact user agents treat all revocations the same (if they check at all) means that we seldom see subscribers even specify a reason when they ask for a certificate to be revoked.

As I look at why a certificate gets revoked today I think there are a few cases, these include:

  • CA has decided the certificate should be revoked,
  • The subscriber has decided the certificate should be revoked,
  • A root program has decided a certificate should be revoked.

These are far more meaningful signals to relying parties that the ones listed above and are not so specific that these is any negative signaling in their use.

Today it is not possible to specify these things but maybe we should take a second look at revocation reasons and if we continue to use them make them more relevant to the way the WebPKI works today.

Ryan

CAs and SSL and Phishing Oh My!

 

 

 

 

 

 

NOTE: This post reflects my personal beliefs and is not necessarily those of my employer Google, or Let’s Encrypt where I am a member of their Technical Advisory Board.

Introduction

Recently Vincent from The SSL Store published a blog post calling out Let’s Encrypt for issuing certificates to domains that contain the world PayPal.

The TL;DR for his post is he believes that Let’s Encrypt is enabling phishers by issuing them SSL certificates that contain the word “PayPal” and then refusing to revoke them when arbitrary third-parties ask them to.

As a result of his post, several news sources have decided to write articles about how “Let’s Encrypt” is acting as an enabler of these Phishers [1] [2].

Unfortunately, Vincent’s post and the associated articles don’t cover this in the most complete and balanced way so over my morning coffee today I decided to write this post to discuss the other side of the argument.

If this is a topic that interests you please also check out the Let’s Encrypt blog post where they talk about why they have taken this position.

Exploration

Let’s explore the opportunities CAs have to check for phishing, the tools they have available to them, the effectiveness of those tools, the consequences of this approach, how complete a solution based on the tools available to them would be and what the resulting experience would be for users.

Opportunities

The WebPKI’s CAs role, historically, has been that of a Passport office, you present proof you control a domain, and possibly that you are an authorized member of an organization and you get a digital certificate that attests to that.

This certificate could be valid for up to 1095 days. Once the certificate is issued the CA, largely speaking, has no natural opportunity to verify this information again. It is worth noting that this month the CABForum voted to shorten this period to 825 days.

Tools

In the event a CA determines it made a mistake in the issuance of a certificate or has been notified by the subscriber they would like to see a certificate marked invalid, the tool they have available to them is called “revocation”.

The two types of revocation that are under the control of a Certificate Authority are called Certificate Revocation Lists and OCSP responses. The first is a like a phonebook of all known “revoked” certificates while the last is more like a lookup that it enables User Agents to ask the status of a particular set of certificates.

Effectiveness

Earlier we discussed the lifetime of certificates, this is important to understand because the large majority of phishing sites do not start out as Phishing sites, as such issuance time checks seldom net positive results.

After issuance, this leaves you with periodic checks of the site,  third-party reports of phishing and relying on revocation checking as an enforcement mechanism. This is a recipie for failure, there are a few reasons for this, but one of the more significant is the general ineffectiveness of revocation checking.

Revocation checking is the most taxing thing a CA does. This is because the revocation mechanisms available to them will result in every relying party contacting them to download a OCSP response or CRL covering that certificate.

As a result, OCSP has a tendency to be both slow and unreliable. This forced browsers to implement this check as a “soft fail”, in other words, if the connection times out or fails for some reason they assume the certificate as good.

To give that some context about 8% of all revocation checks done by Firefox fail and the median response time is over 200ms.

As a result of this in 2012 Chrome, which is used by about 50% of all users, more-or-less disabled revocation checking except for exceptional circumstances.

What this means is that revocation checking, even for its intended purpose, is far from an effective tool. Expanding its use to include protecting users from phishers would not improve its effectiveness and arguably it would (due to the infrastructure implications) make it even less reliable.

It is also important to note that every wildcard certificate can be used for a hostname containing “PayPal” without the CA ever being made aware, a good example is https://paypal.github.io/ which is protected by a wildcard certificate issued to Github.

Consequences

To understand the consequences of expanding the CAs role include protecting us from phishing we first need to understand what a certificate represents, or more importantly what it does not represent. It does not represent the content, it represents the host that is serving content and it is the content that “phishes”.

Today, in the age of cloud services, there is a good chance the host that is serving the content is a service operated by WordPress, or maybe Amazon’s S3. These services allow users to sign up and post arbitrary content for free or very little money.

If we decide that revocation checking is the right tool to get phishing content off the web we would be saying a CA should revoke WordPress’s certificate if one of it’s users posted something someone reported as phishing content. That would, for the situations where revocation checking takes place and happens to work, take WordPress off the Internet. Is that what we want to happen?

If so, who is it we are asking to perform this task? There are well over 400 CAs in the Microsoft Root Program do we believe these are the right organizations to be policing the internet for the appropriateness of content?

If so what criteria should they use to do so and what do we do if they abuse this censorship role?

Completeness

It is easy to say that a CA should not issue a certificate if it contains the word “PayPal”. I could even see an argument that those that would be hurt by such a rule, for example, http://www.PayPalSucks.com and (a theoretical) PayPalantir.com are an acceptable loss.

This would, however not catch homoglyphs like when a Cyrillic “a” is used instead of the latin “a” which would very likley require a manual review of the name and content to determine the intent of the domain owner which is near impossible to do with any level of accuracy or fairness.

Even with that, what about ING, as one of the world’s largest banks, they too are commonly phished, should a CA be able to issue a certificate to https://www.fishing.com. And if they do and the issuing CA receives a complaint that it is Phishing ING what should they do?

And what about global markets and languages? In Romania there is a company called Amazon that is a cleaning company, should anyone be able to request their website be revoked because it contains the word Amazon?

If we promote the CA to content police, how do we do so in a complete way?

User Experience

With CAs acting as the content police, what would a user see when they encounter a revoked site? While it varies browser to browser the experience is almost always a blocking “interstitial”, for example:

 chrome revoked firefox revoked


If you look closely you will see these are not screens that you can bypass, revoked sites are effectively removed from the internet.

This is in contrast to Safe Browsing and Smartscreen which were designed for this particular problem set and therefore provide the user a chance to visit the site after a contextually relevant warning:

 SafeBrowsing smartscreen

Conclusion

I hope you see from the above that relying on Certificate Authorities as content police as a means to protect users from phishers a bad idea, at a minimum, it would be:

  • Ineffective,
  • Incomplete,
  • Unmanageable,
  • and Duplicative.

But more importantly it would be establishing a large loosely managed group as the de-facto content censors on the internet and as Steven Spielberg said, there is a fine line between censorship, good taste, and moral responsibility.

So what should CAs do about phishing then? It is my position they should check the Google Safe Browsing API prior to issuance (which by the way, Let’s Encrypt does), and they should report Phishers to the Safe Browsing service if they encounter any.

It is also important to answer the question about what users should do to protect themselves from phishing. I understand the desire to say there is only one indicator they need to be worried about, it’s just not realistic.

When I talk to regular users I tell them to do three things, the first of which is to use an up-to-date and modern browser that uses Smart Screen or Safe Browsing. Second, you should only provide data to sites you know and only over SSL. And finally, try to only provide sites information when it was you initiated the exchange of information.

P.S.

Thanks To Vincent Lynch and the others who were kind enough to proof this post before publishing.

How did I get involved in PKI?

In the mid 90s I was a security consultant, I principally worked on authentication systems (Smart cards, One Time Passwords, Kerberos, PKI, etc.).

Back then the only people who cared about these things were organizations concerned with protecting lives or money. This meant most of our contracts were with governments, banks, and fortune 50s. This was an amazing experience that I would not trade for the world — it gave me the chance to work with some amazing people in some of the most paranoid and security conscious environments in the world.

While not my first exposure to PKI the first time “it was all I did” was when I worked for a company called ValiCert. The founders saw a problem:

Who was watching the certificate authorities and who would make sure that the revocation infrastructure would scale to meaningfully work in the event miss-issuances or key compromises happened?

We had developed technologies that were intended to address these problems. This technology looked very similar to Certificate Transparency, OCSP stapling and certificate pinning which are again all-the-rage these days.

Unfortunately the Certificate Authorities did not like the the idea of being “watched” by a third-party; the largest CA went so far to threaten with lawsuits and modified their Relying Party Agreements to state that third parties could not re-distribute any information about what certificates they had revoked or issued.

Another entity had patents they claimed covered some of our optimizations and given the browsers were minimally investing in this area we did not get adequate traction so we pivoted into other areas.

For personal reasons I ultimately ended up at Microsoft where I was responsible for a number of security technologies and one of the “little things” I ran was the Microsoft Root Program.

When this was assigned to me I was told it was the least important thing on my plate and that I could measure my success through the number of escalations we got relating to it — basically I was told to invest as little as possible to keep things quiet. The root program was a necessity but shipping software was what we were all about.

The first thing I did for the root program was review its requirements and try to understand who were its participants and what agreements we had with them. I was surprised to see there were in-essence no requirements, no authoritative list of contacts at each of the organizations and no contracts with any of its members. I felt marginally better when I found that Netscape had only one requirement and that was your check for $250,000 USD cleared, the upside of which also meant they probably had contracts with each CA but there were no technical or audit requirements in their program either.

To remedy I began to work with my AWSOME paralegal and lawyer on defining the first “root program” with both technical and audit requirements. We did not want to approach this as a profit center like Netscape but instead establish a set of requirements that were technically sound that encouraged CAs to spend on improving their infrastructure and having it reviewed by others

To this end I picked up a project that had been begun by my predecessor to work with the American Institute of Public Accountants (AICPA) to help define and adopt what is WebTrust for CAs today.

We were the first root program to adopt this new audit. I remember being interviewed by the AICPA for a video on their website on how excellent it was to work with them – they must have taken 50 cuts during that session because of my bumbling.

With these new requirements in hand we set out to get contractual agreements with each of the CAs where they would commit to meet these new requirements and make clear conditions on which we could kick them out for not complying. Given this required them to make operational changes to their practices as well as budget and manage a third-party audit it took a complete product release cycle to get all of this in place.

At the end of the operating system release we had an audited set of CAs and contractual agreements with each one of them. Now our goal was to get these CAs into one room so we could encourage them to adopt common issuance practices.

This was important for a number of reasons, one of the most obvious was that each one of the CAs used a different taxonomy to describe what they did. The simplest example of this was that one CAs in-person verified certificate would be called a Class 1 and another’s was a Class 3.

To top things almost all of the CAs wanted to see the browser “chrome” differentiate between their weakly authenticated certificates and those that were strongly authenticated. This of course was not possible without a common practices  and means of marking certificates to make it clear what practices were used in the vetting of the subscriber.

The internal consensus was that there would be value to users to be able to tell the difference so we decided to try to make this happen. To do that we arranged to get these CAs in one room so we could talk about standardizing practices and certificate formats.  To make this happen I reached out to my contact at the AICPA and asked him to work with me to arrange what was the very first gathering of publicly trusted CAs and trust store providers. We met in Washington DC because I felt we could leverage the work done by the US Government to accelerate the standardization of these things.

Unfortunately one of the newest CAs who only issued low assurance certificates saw adopting common standards for vetting and labeling a risk to their business and as a result they through a wrench in the my plan. They filed a claim with the FTC that what the event an attempt to create anti-competitive marketplace and as a result I was deposed by the DOJ. Ultimately the issue was closed and I understand the disposition was that the claim was baseless.

At this point I was instructed by management and our legal council to stop pushing for this standardization as it represented too much legal risk for the company.

As an aside a few months later the largest CA acquired the troublemaker.

About a year and a half later the CAs self-organized and attempted to agree on a smaller set of standardization, the definition of what is called Extended Validation today. This was effectively a new label for what most CAs were offering in their “high assurance” certificates. The CABFORUM was now born.

At this point I had moved onto another team at Microsoft. During my time at Microsoft I worked on a number of very cool projects with some great people. Several of the projects I worked on used PKI but my involvement was much more on the peripheral to the industry at that point.

Years later I decided to leave Microsoft — the Diginotar incident was a big contributor to this decision. I felt that the industry was a mess, they were under investing in their infrastructure, not supporting the open source community they were dependent on and not actively working to improve adoption of SSL. I wanted to change this, I had decided I would start my own Certificate Authority and set an example for the industry on how a CA should approach these things.

This is when GlobalSign approached me and asked me to join as their CTO, I really liked the team, they were principled, hard working and looking to change the way things were done. I spent nearly three years in this role and we accomplished a great deal, I also still work with them on technical research / direction  but I have since moved onto a startup doing work on Bitcoin related technologies.

I did not accomplish all of the things I wanted to but I still have hopes that these systemic issues will be resolved as I do believe trusted-third parties are needed on the internet.

Anyway this is how I got into PKI.

Ryan

MUST STAPLE and PKI.js

The other day I did a post on how to create a self-signed certificate using PKI.js in that sample we included a Basic Constraints extension but we could have also just as easily defined a custom or new certificate extension. For example thanks to #heartbleed folks are talking about MUST STAPLE again, this is an extension that was proposed several years ago that when present would indicate that clients should hard-fail instead of soft-fail with OCSP.

This proposal is based on a generic concept of expressing a security policy within the certificate. While the OIDs for this extension and the associated policy have not been defined yet one can easily construct a certificate using this extension with PKI.js:

cert_simpl.extensions.push(new org.pkijs.simpl.EXTENSION({
    extnID: "1.2.3", // No OIDs assigned yet
    critical: false,
    extnValue: (new org.pkijs.asn1.SEQUENCE({
        value: [
                   new org.pkijs.asn1.INTEGER({ value: 4 }),
                   new org.pkijs.asn1.INTEGER({ value: 5 }),
                   new org.pkijs.asn1.INTEGER({ value: 6 })
               ]
               })).toBER(false)
}));

NOTE: In the above snip-it we just made up two OID values, hopefully IANA will assign OIDs soon so it is possible for browsers and CAs to implement this extension formally.

A look at revocation repository uptime

It is no secret that in the last two months GlobalSign was affected by outages at relating to our use of CloudFlare. I won’t go into the specifics behind those outages because the CloudFlare team does a great job of documenting their outages as well as working to make sure the mistakes of the past do not reoccur. With that said we have been working closely with CloudFlare to ensure that our services are better isolated from their other customers and to optimize their network for the traffic our services generate.

I should add that I have a ton of faith in the CloudFlare team, these guys are knowledgeable, incredibly hard working and very self critical — I consider them great partners.

When looking at these events it is important to look at them holistically; for example one of the outages was a result of mitigating what has been called the largest publically announced DDOS in the history of the Internet.

While no downtime is acceptable and I am embarrassed we have had any downtime it’s also important to look at the positives that come from these events, for one we have had an opportunity to test our mitigations for such events and improve them so that in the future we can withstand even larger such attacks.

Additionally it’s also useful to look the actual uptime these services have had and to give those numbers some context look at them next to one of our peers. Thankfully I have this data as a result of the revocation report which tracks performance and uptime from 21 different network worldwide perspectives every minute.

For 05/2012-12/2012 we see:

Service Uptime(%) Avg(ms)
GlobalSign/AlphaSSL OCSP 100.00 101.29
VeriSign/Symantec/Thawte/GeoTrust/Trustcenter OCSP 99.92 319.40
GlobalSign/AlphaSSL CRL 100 96.86
VeriSign/Symantec/Thawte/GeoTrust/Trustcenter CRL 99.97 311.42

 

For 01/2013 to 04/2013 we see:

Service Uptime(%) Avg(ms)
GlobalSign/AlphaSSL OCSP 99.98 76.44
VeriSign/Symantec/Thawte/GeoTrust/Trustcenter OCSP 99.85 302.88
GlobalSign/AlphaSSL CRL 99.98 76.44
VeriSign/Symantec/Thawte/GeoTrust/Trustcenter CRL 99.22 296.97

NOTE:  Symantec operates several different infrastructures – which one you hit is dependent on which brand you buy from and some cases which product you buy. We operate only two brands which share the same infrastructure. I averaged the results for each of their brands together to create these two tables. If you want to see the independent numbers see the Excel document linked to this post.

 

As you can see no one is perfect; I don’t share this to say our downtime is acceptable because it is not, but instead I want to make it clear this is data we track and use to improve our services and to make it clear what the impact really was.

By the way if you want to see the data I used in the above computation you can download these spreadsheets.

Abstract: Revocation reality and the path to becoming effective

Just submitted my first abstract for the NIST workshop on “Workshop on Improving Trust in the Online Marketplace” in April, the title of the talk is “Revocation reality and the path to becoming effective”, the abstract of which is:

 

The concept of certificate revocation is core to the X.509 trust model however 18 years after its introduction the reality is as implemented and deployed it falls short of its promise to enable an issuer certificate issuers to protect relying parties from malicious actors and miss-issuance

This talk will discuss the findings of a project where I have observed the behavior (https://revocation-report.x509labs.com), up time and performance of revocation repositories for a number of commercial Certificate Authorities for a period of over six months.

Additionally I will overview the revocation behavior of the most common browsers, identifying the gaps as they exist in those implementations.

And finally I will provide a set of recommendations that I believe if followed can address the current gaps which would move us to a world where revocation checking is an effective means of protecting relying parties from known bad actors and miss-issuance.

How Facebook can avoid losing $100M in revenue when they switch to always-on SSL

Recently Facebook announced that they will be moving to Always-On-SSL, I for one am thrilled to see this happen – especially given how much personal data can be gleamed from observing a Facebook session.

When they announced this change they mentioned that users may experience a small performance tax due to the addition of SSL. This is unfortunately true, but when a server is well configured that tax should be minimal.

This performance tax is particularly interesting when you put it in the context of revenue, especially when you consider that Amazon found that every 100ms of latency cost them 1% of sales. What if the same holds true for Facebook? Their last quarter revenue was 1.23 billion, I wanted to take a few minutes and look at their SSL configuration to see what this tax might cost them.

First I started with WebPageTest; this is a great resource for the server administrator to see where time is spent when viewing a web page.

The way this site works is it downloads the content twice, using real instances of browsers, the first time should always be slower than the second since you get to take advantage of caching and session re-use.

The Details tab on this site gives us a break down of where the time is spent (for the first use experience), there’s lots of good information here but for this exercise we are interested in only the “SSL Negotiation” time.

Since Facebook requires authentication to see the “full experience” I just tested the log-on page, it should accurately reflect the SSL performance “tax” for the whole site.

I ran the test four times, each time summing the total number of milliseconds spent in “SSL Negotiation”, the average of these three runs was 4.111 seconds (4111 milliseconds).

That’s quite a bit but can we reduce it? To find out we need to look at their SSL configuration; when we do we see a few things they could do to improve things, these include:

Let’s explore this last point more, the status check the browser does is called an OCSP request. For the last 24 hours their current CA had an average world-wide OCSP response time of 287 ms, if they used OCSP Stapling the browser would need to do only one OCSP request, even with that optimization that request could be up to 7% of the SSL performance tax.

Globalsign’s average world-wide OCSP response time for the same period was 68 milliseconds, which in this case could have saved 219 ms. To put that in context Facebook gets 1.6 billion visits each week. If you do the math (219 * 1.6 billion / 1000 / 60 / 24), that’s 12.7 million days’ worth of time saved every year. Or put another way, it’s a lifetime worth of time people would have otherwise spent waiting for Facebook pages to load saved every two and a half hours!

If we consider that in the context of the Amazon figure simply changing their CA could be worth nearly one hundred million a year.

Before you start to pick apart these numbers let me say this is intended to be illustrative of how performance can effect revenue and not be a scientific exercise, so to save you the trouble some issues with these assumptions include:

  • Facebook’s business is different than Amazons and the impact on their business will be different.
  • I only did four samples of the SSL negotiation and a scientific measurement would need more.
  • The performance measurement I used for OCSP was an average and not what was actually experienced in the sessions I tested – It would be awesome if WebPageTest could include a more granular breakdown of the SSL negotiation.

With that said clearly even without switching there are a few things Facebook still can do to improve how they are deploying SSL.

Regardless I am still thrilled Facebook has decided to go down this route, the change to deploy Always-On-SSL will go a long way to help the visitors to their sites.

Ryan

Priming the OCSP cache in Nginx

So recently GlobalSign, DigiCert, and Comodo worked together with Nginx to get OCSP stapling supoported in Nginx 1.3.7, unfortunately architectural restrictions made it impractical to make it so that pre-fetching the OCSP response on server start-up so instead the first connection to the server primes the cache that is used for later connections.

This is a fine compromise but what if you really want the first connection to have the benefit too? Well there are two approaches you can take:

  1. Right after you start the server you do a SSL request to prime the cache.
  2. You manually get the ocsp response and plumb it where Nginx is looking for it.

The first model is easy, right after you start your server use the OpenSSL s_client to connect to the server with OCSP stapling enabled  just like I documented in this post, the first request will trigger the retrieval of the OCSP response by Nginx.

The second model can be done before you start the server, you need to find the URI for the OCSP responder, do a OCSP request and populate the Nginx cache manually, this would look something like:

#!/bin/sh
ISSUER_CER=$1
SERVER_CER=$2

URL=$(openssl x509 -in $SERVER_CER -text | grep “OCSP – URI:” | cut -d: -f2,3)

openssl ocsp -noverify -no_nonce -respout ocsp.resp -issuer \
$ISSUER_CER -cert $SERVER_CER -url $URL

Where “ocsp.resp” is whatever file you have configured in Nginx for the “ssl_stapling_file“.

Each approach has its pros and cons, for example with the first approach your execution of the s_client call may not be the first request the server sees, with the second approach if you are using a certificate that doesn’t contain a OCSP pointer and have manually told Nginx where to fetch certificate status from then it won’t work.

It is worth noting you can run this same script in a cron script to ensure your server never needs to hit the wire (and potentially block when doing so) when it tries to keep its OCSP cache up to date.

 

 

What is the status of revocation checking in browsers?

Today we did an announcement of some work we have been doing with CloudFlare to speed up SSL for all of our customers through some improvements to our revocation infrastructure.

One of the things that come up when talking about this is how each of the browsers handles revocation checking, I thought it might be useful to put together a quick post that talks about this to clear up some confusion.

The first thing that’s important to understand is that all major browsers do some form of revocation checking, that includes Opera, Safari, Chrome, Firefox and Internet Explorer.

Let’s talk about what that means, the IETF standards for X.509 certificates define three ways for revocation checking to be done, the first is Certificate Revocation Lists (CRLs), next there is the Online Certificate Status Protocol (OCSP) and finally there is something called Simple Certificate Validation Protocol (SCVP).

In the context of browsers we can ignore SCVP as no browser implements them; this leaves us with CRLs and OCSP as the standards compliant ways of doing revocation checking.

All of the above browsers support these mechanisms, in addition to these standard mechanisms Google has defined a proprietary certificate revocation checking scheme called CRLsets.

If we look at StatCounter for browser market share that means today at least 64.84% (its likely more) of the browsers out there are doing revocation checking based on either OCSP or CRLs by default.

This means that when a user visits a website protected with SSL it has to do at least one DNS look-up, one TCP socket and one HTTP transaction to validate the certificate the web server presents and more likely several of these.

This is important because of the way revocation checking needs to be done, you need to know if the server you are talking to really is who they say they are before you start to trust them – that’s why when browsers do OCSP and CRLs they do this validation before they download the content from the web page.

This means that your content won’t be displayed to the user until this check happens and this can take quite a while.

For example in the case of IE and Chrome (when it does standards based revocation checking on Windows) it uses CryptoAPI which will time-out after 15 seconds of attempting to check the status of a certificate.

The scary part is that calls to this API do actually time out and when they do this delay is experienced by the users of your website!

So what can you do about it? It’s simple really you have to be mindful of the operational capacity and performance of the certificate authority you get your certificate from.

Check out this monitoring portal I maintain for OCSP and this one I maintain for CRLs, you will see GlobalSign consistently outperforms every other CA for the performance of their revocation infrastructure in most cases it’s nearly 6x as fast and in others is much more than that.

The other thing to understand is that today the default behavior of these browsers when checking the status of a certificate via OCSP or CRLs is to do what is often referred to as a “soft-revocation failure”.

This basically means that if they fail for any reason to check the status of a certificate (usually due to performance or reliability issues) they will treat the certificate as good anyways. This is an artifact of CAs not operating sufficiently performant and reliable infrastructure to allow the browsers to treat network related failures critically.

Each of these browsers all have options you can use to enable “hard” or “strict” revocation checking but until the top CAs operate infrastructure that meets the performance and reliability requirements of the modern web no browser will make these the default.

Finally its also important to understand that even with this “soft-failure” your website experiences the performance cost of doing these checks.

It’s my belief that the changes we have put into place in our own infrastructure meet that bar and I hope the other CAs follow in our lead as it is in the best interest of the Internet.

Ryan

Revocation checking, Chrome and CRLsets

One of the things I often hear is that Chrome no longer does revocation checking, this isn’t actually true.

All major browsers do some form of revocation checking, that includes Opera, Safari, Chrome, Firefox and Internet Explorer.

Google still does revocation checking it just does so through a proprietary mechanism called CRLsets.

As its name implies CRLsets are basically a combination of CRLs, Google crawls the web gathers CRLs and merges them together into a “mega-crl”. This mega-crl is formatted differently than other CRLs but it’s essentially the same thing but there are some important differences, the most important being that due to size concerns Google selectively chooses which CAs it includes in the CRL set and within those CRLs which revoked certificates to include.

With this understanding you have to wonder why would Google introduce this new mechanism if it not as comprehensive as the standard based ways to deal with revocation checking? The answer is simple performance and reliability.

With CRLsets Google is distributing the revocation list, and as such they can make sure that its delivered quickly they do this in-part by taking a bet that they can intelligently pick which revoked certificates are important (IMHO they cannot – revoked = revoked) and by being the one that distributes the list.

This has implications for users, Chrome trusts certificate authorities for which it has no revocation information for it also intentionally treats some revoked certificates as good which exposes you to some risk.

This is especially problematic for enterprises that use Chrome and leverage PKI, there is essentially no chance Google will decide to include your CRL. This is also problematic for those who encounter certificates from those CAs.

That’s not to say CRLsets do not have value they do, but those values have been discussed elsewhere in detail.

But what do you do if you want a more holistic solution to revocation checking? Its simple you can turn on the standards based revocation checking mechanisms and Chrome will use them in addition to the CRLset, to do that you go to Settings and expand choose Advanced Settings where you will see:

 

 

 

Here you can re-enable the standards based revocation checking mechanisms so chrome can do a more holistic job protecting you from the known bad actors on the internet.

Ryan