Priming the OCSP cache in Nginx

So recently GlobalSign, DigiCert, and Comodo worked together with Nginx to get OCSP stapling supoported in Nginx 1.3.7, unfortunately architectural restrictions made it impractical to make it so that pre-fetching the OCSP response on server start-up so instead the first connection to the server primes the cache that is used for later connections.

This is a fine compromise but what if you really want the first connection to have the benefit too? Well there are two approaches you can take:

  1. Right after you start the server you do a SSL request to prime the cache.
  2. You manually get the ocsp response and plumb it where Nginx is looking for it.

The first model is easy, right after you start your server use the OpenSSL s_client to connect to the server with OCSP stapling enabled  just like I documented in this post, the first request will trigger the retrieval of the OCSP response by Nginx.

The second model can be done before you start the server, you need to find the URI for the OCSP responder, do a OCSP request and populate the Nginx cache manually, this would look something like:

#!/bin/sh
ISSUER_CER=$1
SERVER_CER=$2

URL=$(openssl x509 -in $SERVER_CER -text | grep “OCSP – URI:” | cut -d: -f2,3)

openssl ocsp -noverify -no_nonce -respout ocsp.resp -issuer \
$ISSUER_CER -cert $SERVER_CER -url $URL

Where “ocsp.resp” is whatever file you have configured in Nginx for the “ssl_stapling_file“.

Each approach has its pros and cons, for example with the first approach your execution of the s_client call may not be the first request the server sees, with the second approach if you are using a certificate that doesn’t contain a OCSP pointer and have manually told Nginx where to fetch certificate status from then it won’t work.

It is worth noting you can run this same script in a cron script to ensure your server never needs to hit the wire (and potentially block when doing so) when it tries to keep its OCSP cache up to date.

 

 

What is the status of revocation checking in browsers?

Today we did an announcement of some work we have been doing with CloudFlare to speed up SSL for all of our customers through some improvements to our revocation infrastructure.

One of the things that come up when talking about this is how each of the browsers handles revocation checking, I thought it might be useful to put together a quick post that talks about this to clear up some confusion.

The first thing that’s important to understand is that all major browsers do some form of revocation checking, that includes Opera, Safari, Chrome, Firefox and Internet Explorer.

Let’s talk about what that means, the IETF standards for X.509 certificates define three ways for revocation checking to be done, the first is Certificate Revocation Lists (CRLs), next there is the Online Certificate Status Protocol (OCSP) and finally there is something called Simple Certificate Validation Protocol (SCVP).

In the context of browsers we can ignore SCVP as no browser implements them; this leaves us with CRLs and OCSP as the standards compliant ways of doing revocation checking.

All of the above browsers support these mechanisms, in addition to these standard mechanisms Google has defined a proprietary certificate revocation checking scheme called CRLsets.

If we look at StatCounter for browser market share that means today at least 64.84% (its likely more) of the browsers out there are doing revocation checking based on either OCSP or CRLs by default.

This means that when a user visits a website protected with SSL it has to do at least one DNS look-up, one TCP socket and one HTTP transaction to validate the certificate the web server presents and more likely several of these.

This is important because of the way revocation checking needs to be done, you need to know if the server you are talking to really is who they say they are before you start to trust them – that’s why when browsers do OCSP and CRLs they do this validation before they download the content from the web page.

This means that your content won’t be displayed to the user until this check happens and this can take quite a while.

For example in the case of IE and Chrome (when it does standards based revocation checking on Windows) it uses CryptoAPI which will time-out after 15 seconds of attempting to check the status of a certificate.

The scary part is that calls to this API do actually time out and when they do this delay is experienced by the users of your website!

So what can you do about it? It’s simple really you have to be mindful of the operational capacity and performance of the certificate authority you get your certificate from.

Check out this monitoring portal I maintain for OCSP and this one I maintain for CRLs, you will see GlobalSign consistently outperforms every other CA for the performance of their revocation infrastructure in most cases it’s nearly 6x as fast and in others is much more than that.

The other thing to understand is that today the default behavior of these browsers when checking the status of a certificate via OCSP or CRLs is to do what is often referred to as a “soft-revocation failure”.

This basically means that if they fail for any reason to check the status of a certificate (usually due to performance or reliability issues) they will treat the certificate as good anyways. This is an artifact of CAs not operating sufficiently performant and reliable infrastructure to allow the browsers to treat network related failures critically.

Each of these browsers all have options you can use to enable “hard” or “strict” revocation checking but until the top CAs operate infrastructure that meets the performance and reliability requirements of the modern web no browser will make these the default.

Finally its also important to understand that even with this “soft-failure” your website experiences the performance cost of doing these checks.

It’s my belief that the changes we have put into place in our own infrastructure meet that bar and I hope the other CAs follow in our lead as it is in the best interest of the Internet.

Ryan

Revocation checking, Chrome and CRLsets

One of the things I often hear is that Chrome no longer does revocation checking, this isn’t actually true.

All major browsers do some form of revocation checking, that includes Opera, Safari, Chrome, Firefox and Internet Explorer.

Google still does revocation checking it just does so through a proprietary mechanism called CRLsets.

As its name implies CRLsets are basically a combination of CRLs, Google crawls the web gathers CRLs and merges them together into a “mega-crl”. This mega-crl is formatted differently than other CRLs but it’s essentially the same thing but there are some important differences, the most important being that due to size concerns Google selectively chooses which CAs it includes in the CRL set and within those CRLs which revoked certificates to include.

With this understanding you have to wonder why would Google introduce this new mechanism if it not as comprehensive as the standard based ways to deal with revocation checking? The answer is simple performance and reliability.

With CRLsets Google is distributing the revocation list, and as such they can make sure that its delivered quickly they do this in-part by taking a bet that they can intelligently pick which revoked certificates are important (IMHO they cannot – revoked = revoked) and by being the one that distributes the list.

This has implications for users, Chrome trusts certificate authorities for which it has no revocation information for it also intentionally treats some revoked certificates as good which exposes you to some risk.

This is especially problematic for enterprises that use Chrome and leverage PKI, there is essentially no chance Google will decide to include your CRL. This is also problematic for those who encounter certificates from those CAs.

That’s not to say CRLsets do not have value they do, but those values have been discussed elsewhere in detail.

But what do you do if you want a more holistic solution to revocation checking? Its simple you can turn on the standards based revocation checking mechanisms and Chrome will use them in addition to the CRLset, to do that you go to Settings and expand choose Advanced Settings where you will see:

 

 

 

Here you can re-enable the standards based revocation checking mechanisms so chrome can do a more holistic job protecting you from the known bad actors on the internet.

Ryan

 

A quick look at SSL performance

When people think about SSL performance they are normally concerned with the performance impact on the server, specifically they talk about the computational and memory costs of negotiating the SSL session and maintaining the encrypted link.  Today though it’s rare for a web server to be CPU or memory bound so this really shouldn’t be a concern, with that said you should still be concerned with SSL performance.

Did you know that at Google SSL accounts for less than 1% of the CPU load, less than 10KB of memory per connection and less than 2% of network overhead?

Why? Because studies have shown that the slower your site is the less people want to use it. I know it’s a little strange that they needed to do studies to figure that out but the upside is we now have some hard figures we can use to put this problem in perspective. One such study was done by Amazon in 2008, in this study they found that every 100ms of latency cost them 1% in sales.

That should be enough to get anyone to pay attention so let’s see what we can do to better understand what can slow SSL down.

Before we go much further on this topic we have to start with what happens when a user visits a page, the process looks something like this:

  1. Lookup the web servers IP address with DNS
  2. Create a TCP socket to the web server
  3. Initiate the SSL session
  4. Validate the certificates provided by the server
  5. Establish the SSL session
  6. Send the request content

What’s important to understand is that to a great extent the steps described above tasks happen serially, one right after another – so if they are not optimized they result in a delay to first render.

To make things worse this set of tasks can happen literally dozens if not a hundred times for a given web page, just imagine that processes being repeated for every resource (images, JavaScript, etc.) listed in the initial document.

Web developers have made an art out of optimizing content so that it can be served quickly but often forget about impact of the above, there are lots of things that can be done to reduce the time users wait to get to your content and I want to spend a few minutes discussing them here.

First (and often forgotten) is that you are dependent on the infrastructure of your CA partner, as such you can make your DNS as fast as possible but your still dependent on theirs, you can minify your web content but the browser still needs to validate the certificate you use with the CA you get your certificate from.

These taxes can be quite significant and add up to 1000ms or more.

Second a mis(or lazily)-configured web server is going to result in a slower user experience, there are lots of options that can be configured in TLS that will have a material impact on TLS performance. These can range from the simple certificate related to more advanced SSL options and configuration tweaks.

Finally simple networking concepts and configuration can have a big impact on your SSL performance, from the basic like using a CDN to get the SSL session to terminate as close as possible to the user of your site to the more advanced like tuning TLS record sizes to be more optimum.

Over the next week or so I will be writing posts on each of these topics but in the meantime here are some good resources available to you to learn about some of these problem areas:

Reading ocspreport and crlreport at x509labs.com

As you may know I have been hosting some performance and up-time monitors at: http://ocspreport.x509labs.com and http://crlreport.x509labs.com.

I started this project about six months ago when I walked the CAB Forum membership list, visited the sites of the larger CAs on that list, looked at their certificates and extracted both OCSP and CRL urls and added them into custom monitor running on AWS nodes.

Later I tried Pingdom and finally settled on using Monitis because Pingdom doesn’t let you control which monitoring points are used and doesn’t give you the ability to do comparison views. That said as a product I liked Pingdom much better.

As for how I configured Monitis, I did not do much — I set the Service Level Agreement (SLA) for uptime to 10 seconds which is the time required to be met by the CABFORUM for revocation responses. I also selected all of the monitoring locations (30 of them) and set it loose.

I put this up for my own purposes, so I could work on improving our own service but I have also shared it publicly and know several of the other CAs that are being monitored are also using it which I am happy to see.

OK, so today I found myself explaining a few things about these reports to someone so I thought it would be worthwhile to summarize those points for others, they are:

  1. Why is it so slow to render? – Unfortunately despite numerous requests to Monitis there is nothing I can do about this – Monitis is just slow.
  2. Why does it show downtime so often? – I do not believe the downtime figures, most of the time the failures show up on all of the urls. The times I have looked into theses it turned out the failures were at Monitis or due to regional network congestion / failures. Unfortunately this means we cannot rely on these figures for up-time assessment, at best they are indicators when looked at over long periods of time.
  3. Why do some tests show at 0-1 ms? – This is likely because the Monitis testing servers are located in the same data center as the OCSP servers in question. This skews the performance numbers a little bit but the inclusion of many perspectives should off-set this.

At this point I suspect you’re wondering, with these shortcomings what is this thing good for anyways? That’s a good question; OCSP (and CRLs) are a hidden tax that you and your users pay when they visit your site.

This is important because studies have found a direct correlation between latency and user abandonment and seriously who doesn’t just want their site to be fast as possible.

My hope is these resources help you understand what that tax is; if you’re a CA operator it can also help you tweak your performance as well as get an idea of what the global user experience is for the relying parties of your certificates.

On a related note I do think someone could make a pretty penny if they made an easy to use, yet powerful monitoring site 🙂

RESTful X509, CRL and OCSP to JSON web-service

So the other day I got a bee in my bonnet and decided I wanted a simple web service I could pass common day X509 objects to and get a JSON representation of that same object. We had recently done a project in Go at work and we found it quick, robust and easy to build, additionally it looks it’s certificate support decent enough so I thought it was the way to go.

In comes Freelancer, I threw my rough (and that’s kind) goals in a paragraph or two and a few days later I had a bid proposal from an engineer in Chicago — Eli Frey.

Based on a quick review of the Go documentation for cryptography it looked like this was going to be pretty straight forward, and for the most part it was – we did find that there were a few cases that just were not possible without more work than we wanted to put in, I will summarize those a little later.

As things progressed we also decided to add the ability to get an X509 certificate from the interface. Normally one would do this by generating a PKCS #10 request (CSR) and sending it to a CA for processing, unfortunately one of those cases that required more work than we wanted to put in was parsing PKCS #10s since go does not as of yet support it. With that said a CSR is really just a self-signed certificate we just did the same thing with a self-signed X509 certificate request.

So how do these interfaces work? Here are a few examples of how you would call them:

 

Decode a PEM encoded X509 certificate
curl  -F “[email protected]” “api.x509labs.com/v1/x509/certificate?action=decode&inputEncoding=PEM”
 
Decode a DER encoded X509 certificate
curl –fail -F “[email protected]” “api.x509labs.com/v1/x509/certificate?action=decode&inputEncoding=DER”
 
Request and issue an X509 certificate based on a DER encoded self-signed certificate with one hostname
openssl genrsa -out request.key 2048
openssl req -config openssl.cfg -subj “/CN=www.example.com” -new -x509 -set_serial 01 -days 1 -key request.key -out request.cer
curl –fail -F “[email protected]” “api.x509labs.com/v1/x509/certificate?action=issue&hostnames=bob.com&inputEncoding=DER”
 
Request and issue an X509 certificate based on a PEM encoded self-signed certificate with one hostname
openssl genrsa -out request.key 2048
openssl req -config openssl.cfg -subj “/CN=www.example.com” -new -x509 -set_serial 01 -days 1 -key request.key -out request.cer
curl –fail -F “[email protected]” “api.x509labs.com/v1/x509/certificate?action=issue&hostnames=bob.com&inputEncoding=PEM”
 
Request and issue an X509 certificate based on a PEM encoded self-signed certificate with several hostnames
openssl genrsa -out request.key 2048
openssl req -config openssl.cfg -subj “/CN=www.example.com” -new -x509 -set_serial 01 -days 1 -key request.key -out request.cer
curl –fail -F “[email protected]” “api.x509labs.com/v1/x509/certificate?action=issue&hostnames=bob.com,fred.com&inputEncoding=PEM”
 
Decode a set of PEM encoded X509 certificates
curl –fail -F “[email protected]” “api.x509labs.com/v1/x509/certificates?action=decode&inputEncoding=PEM”
 
Decode a PEM encoded X509 crl
curl –fail -F “[email protected]” “api.x509labs.com/v1/x509/crl?action=decode&inputEncoding=PEM”
 
Decode a DER encoded X509 crl
curl –fail -F “[email protected]” “api.x509labs.com/v1/x509/crl?action=decode&inputEncoding=DER”
 
Decode an OCSP response
openssl ocsp -noverify -no_nonce -respout ocsp.resp -reqout ocsp.req -issuer ca.cer -cert www.cer -url “http://ocsp2.globalsign.com/gsextendvalg2” -header “HOST” “ocsp2.globalsign.com” -text
curl –fail -F “[email protected]” “api.x509labs.com/v1/x509/ocsp?action=decode&type=response”

 

 

So even though this started out as a pet project I actually think these interfaces are pretty useful, the largest limitations of these interfaces are:

X509Certificate

  1. Not every element of the structures is included in the JSON serialization, for example AIA, CDP, Name Constraints and Certificate Policy are not present (most extensions actually); this is because there is not a decoder for them in GO.
  2. ECC based certificates are not supported, this is because at this time the released version of GO doesn’t include support for these.
  3. Only issuing certificates based on self-signed X509 certificates are supported, this is as I mentioned a result of the lack of support for the PKCS #10 object in GO.
  4. No OID is specified for the Signature algorithm, this is because it’s not exposed in GO.
  5. Only one certificate profile is supported when using the issue action, this is mostly due to limitations in go (time was also a factor) for example the lack of AIA and OCSP support mean these regardless of CA key material these certs are just good for playing around.
  6. No user supplied information is included in the generated certificate, this was really just a function of time and building a proper workflow that would not be valuable without addressing other go limitations.
  7. Requested certificates that contain RSA keys must have a bit length of at least 2048 bits in length, just a best practice.
  8. Requested certificates will only be issued if the submitted certificate contains a self-signed certificate with a valid signature, this is to ensure the requestor actually holds the private key.
  9. Not all SAN types are supported, only DNSnames really again a limitation of GO.
  10. Certificates with name constraints are not supported, again a limitation of GO.
  11. Not possible to put EKU in certificates, again a limitation of GO.

 

X509OCSP

  1. ResponderID is not specified, this is because it’s not exposed in GO.
  2. Only responses with a single response are supported, this is because more that response is not exposed in GO.
  3. No OCSP extensions are supported, this is because this is not exposed in GO.
  4. Only responses are supported, this is because the request is not supported in GO.

 

Here are some things you might want to know about these interfaces:

  1. Both X509crl and X509ocsp default to DER but you can specify PEM in the encode query string parameter.
  2. X509Certificate defaults to the PEM encoding but DER is supported via the encode query string parameter.
  3. X509Certificates defaults to PEM encoding but DER is not supported.
  4. X509Certificates takes the file you might use in Apache or Nginx to configure which certificates to send — a concatenation of PEM encoded certificates.
  5. All interfaces use HTTP error codes to report issues.
  6. I can’t propose they will always be up and available, be reliable, performant or accurate 🙂

All in-all I think this was a fun project and I really enjoyed working with Eli and Freelancer (though its mail client is awful and the site needs some UI work).

Ryan

A look at revoked certificates

So today I have done posts on the browser user experience for expired and untrusted certificates but we wouldn’t have proper coverage on the topic of bad certificate user experience if we did not cover revoked certificates.

VeriSign is kind enough to host a test site that uses a revoked certificate (I know we do too I just can’t find it right now) so we will use that (https://test-sspev.verisign.com:2443/test-SSPEV-revoked-verisign.html)

Again what we want to see here is:

  1. Users are warned or prohibited from going to the site in question.
  2. The warning language used is easy to understand and explains the risks.
  3. The warning language used is related to the fact that the certificate is expired.
  4. The trust indicator does not show or is marked to indicate that there is a problem.

In this case I think again Internet Explorer and Chrome do the best; The worse experience is in Opera as it leads the user to believe there is a connectivity problem unless they expand the error message.

Chrome

Internet Explorer

Mozilla

Opera

Safari

A look at untrusted certificates

Today I did a blog post on how browsers show expired certificates. I figured I would take the opportunity to capture a few of the other failure cases for certificates.

 

The most severe example is that of an untrusted root certificate, for this scenario I figured the use of https://cacert.org was the most direct example.

 

There are a few cases where this error condition will come up, for example another one is if a server doesn’t include all of the intermediate certificates the clients cannot determine which Certificate Authority issued the certificate.

According to the current SSL Pulse data about 7.4% of the servers in the Alexa top one million may fall into this case.

 

Chrome

Internet Explorer

Mozilla

Opera

 

Safari

A look at expired certificates

Today I was on a mail thread where the topic of how browsers handle expired certificates; this is particularly relevant for a few reasons.

The first of which is that there is a large number of sites operating with expired certificates out on the Internet today, the other is that the adoption of short lived certificates (which I am a fan of) is at least in part dependent on how browsers deal with certificates that are expired.

In any event I was not sure how the most recent versions of browsers were handling these cases so I dug up an example site where an expired certificate was in use (https://www.appliancetherapy.com – it uses a certificate that expired a few weeks ago and has not as of yet been replaced).

So what did I want to find? In a perfect world I believe that the following should be true:

  1. Users are warned or prohibited from going to the site in question.
  2. The warning language used is easy to understand and explains the risks.
  3. The warning language used is related to the fact that the certificate is expired.
  4. The trust indicator does not show or is marked to indicate that there is a problem.

The good news is that for the most part browsers behaved fairly close to this, they all could have improved language but I believe Internet Explorers was the best.

The worst behaving client was Mozilla, as it doesn’t report the certificate as expired but instead indicates that it tried to make an OCSP request but got a response it was not expecting. This has two problems – the first of which being it should not have made an OCSP request for the status of an expired request.

RFC 5280 Section 5 states that:

 

   A complete CRL lists all unexpired certificates, within its scope,

   that have been revoked for one of the revocation reasons covered by

   the CRL scope.  A full and complete CRL lists all unexpired

   certificates issued by a CA that have been revoked for any reason.

 

And RFC 2560 is written largely based on OCSP responses being fed from CRLs. What this means is that it is not appropriate to ask the revocation status of a certificate that is expired.

The next problem is that Mozilla also doesn’t handle the unauthorized response in a usable way. RFC 5019 Section 2.2.3 states:

 

   The response “unauthorized” is returned in cases where the client

   is not authorized to make this query to this server or the server

   is not capable of responding authoritatively.

 

A user who receives this message would believe the issue is related to their permissions but based on the true reason for the error the failure as really that the responder in question doesn’t have the information that’s needed.

This lack of information on the server is likely due to the fact that it isn’t required to maintain information for expired certificates and the message Mozilla delivered should have been about the certificate being expired.

In any event the browsers behaved much better than I expected, IE and Chrome did the best (I really like Chromes red / over the https as a visual queue there is a problem).

 

 

Chrome

Internet Explorer

Mozilla

Opera

Safari