Category Archives: Thoughts

Is SSL Broken?

How Facebook can avoid losing $100M in revenue when they switch to always-on SSL

Recently Facebook announced that they will be moving to Always-On-SSL, I for one am thrilled to see this happen – especially given how much personal data can be gleamed from observing a Facebook session.

When they announced this change they mentioned that users may experience a small performance tax due to the addition of SSL. This is unfortunately true, but when a server is well configured that tax should be minimal.

This performance tax is particularly interesting when you put it in the context of revenue, especially when you consider that Amazon found that every 100ms of latency cost them 1% of sales. What if the same holds true for Facebook? Their last quarter revenue was 1.23 billion, I wanted to take a few minutes and look at their SSL configuration to see what this tax might cost them.

First I started with WebPageTest; this is a great resource for the server administrator to see where time is spent when viewing a web page.

The way this site works is it downloads the content twice, using real instances of browsers, the first time should always be slower than the second since you get to take advantage of caching and session re-use.

The Details tab on this site gives us a break down of where the time is spent (for the first use experience), there’s lots of good information here but for this exercise we are interested in only the “SSL Negotiation” time.

Since Facebook requires authentication to see the “full experience” I just tested the log-on page, it should accurately reflect the SSL performance “tax” for the whole site.

I ran the test four times, each time summing the total number of milliseconds spent in “SSL Negotiation”, the average of these three runs was 4.111 seconds (4111 milliseconds).

That’s quite a bit but can we reduce it? To find out we need to look at their SSL configuration; when we do we see a few things they could do to improve things, these include:

Enabling SPDY – SPDY could help with performance on mobile where latency is a real problem.
Enabling OCSP Stapling – Enabling OCSP stapling would remove one of the certificate status checks clients need to do before downloading the content.
Switching to a faster CA – For a browser to validate Facebook’s certificate it has to contact the CA who issued it to check if it’s still good this can introduce delays in the user getting to the site.

Let’s explore this last point more, the status check the browser does is called an OCSP request. For the last 24 hours their current CA had an average world-wide OCSP response time of 287 ms, if they used OCSP Stapling the browser would need to do only one OCSP request, even with that optimization that request could be up to 7% of the SSL performance tax.

Globalsign’s average world-wide OCSP response time for the same period was 68 milliseconds, which in this case could have saved 219 ms. To put that in context Facebook gets 1.6 billion visits each week. If you do the math (219 * 1.6 billion / 1000 / 60 / 24), that’s 12.7 million days’ worth of time saved every year. Or put another way, it’s a lifetime worth of time people would have otherwise spent waiting for Facebook pages to load saved every two and a half hours!

If we consider that in the context of the Amazon figure simply changing their CA could be worth nearly one hundred million a year.

Before you start to pick apart these numbers let me say this is intended to be illustrative of how performance can effect revenue and not be a scientific exercise, so to save you the trouble some issues with these assumptions include:

Facebook’s business is different than Amazons and the impact on their business will be different.
I only did four samples of the SSL negotiation and a scientific measurement would need more.
The performance measurement I used for OCSP was an average and not what was actually experienced in the sessions I tested – It would be awesome if WebPageTest could include a more granular breakdown of the SSL negotiation.

With that said clearly even without switching there are a few things Facebook still can do to improve how they are deploying SSL.

Regardless I am still thrilled Facebook has decided to go down this route, the change to deploy Always-On-SSL will go a long way to help the visitors to their sites.

Ryan

A quick look at SSL performance

2 Replies

When people think about SSL performance they are normally concerned with the performance impact on the server, specifically they talk about the computational and memory costs of negotiating the SSL session and maintaining the encrypted link. Today though it’s rare for a web server to be CPU or memory bound so this really shouldn’t be a concern, with that said you should still be concerned with SSL performance.

Did you know that at Google SSL accounts for less than 1% of the CPU load, less than 10KB of memory per connection and less than 2% of network overhead?

Why? Because studies have shown that the slower your site is the less people want to use it. I know it’s a little strange that they needed to do studies to figure that out but the upside is we now have some hard figures we can use to put this problem in perspective. One such study was done by Amazon in 2008, in this study they found that every 100ms of latency cost them 1% in sales.

That should be enough to get anyone to pay attention so let’s see what we can do to better understand what can slow SSL down.

Before we go much further on this topic we have to start with what happens when a user visits a page, the process looks something like this:

Lookup the web servers IP address with DNS
Create a TCP socket to the web server
Initiate the SSL session
Validate the certificates provided by the server
Establish the SSL session
Send the request content

What’s important to understand is that to a great extent the steps described above tasks happen serially, one right after another – so if they are not optimized they result in a delay to first render.

To make things worse this set of tasks can happen literally dozens if not a hundred times for a given web page, just imagine that processes being repeated for every resource (images, JavaScript, etc.) listed in the initial document.

Web developers have made an art out of optimizing content so that it can be served quickly but often forget about impact of the above, there are lots of things that can be done to reduce the time users wait to get to your content and I want to spend a few minutes discussing them here.

First (and often forgotten) is that you are dependent on the infrastructure of your CA partner, as such you can make your DNS as fast as possible but your still dependent on theirs, you can minify your web content but the browser still needs to validate the certificate you use with the CA you get your certificate from.

These taxes can be quite significant and add up to 1000ms or more.

Second a mis(or lazily)-configured web server is going to result in a slower user experience, there are lots of options that can be configured in TLS that will have a material impact on TLS performance. These can range from the simple certificate related to more advanced SSL options and configuration tweaks.

Finally simple networking concepts and configuration can have a big impact on your SSL performance, from the basic like using a CDN to get the SSL session to terminate as close as possible to the user of your site to the more advanced like tuning TLS record sizes to be more optimum.

Over the next week or so I will be writing posts on each of these topics but in the meantime here are some good resources available to you to learn about some of these problem areas:

Reading ocspreport and crlreport at x509labs.com

2 Replies

As you may know I have been hosting some performance and up-time monitors at: http://ocspreport.x509labs.com and http://crlreport.x509labs.com.

I started this project about six months ago when I walked the CAB Forum membership list, visited the sites of the larger CAs on that list, looked at their certificates and extracted both OCSP and CRL urls and added them into custom monitor running on AWS nodes.

Later I tried Pingdom and finally settled on using Monitis because Pingdom doesn’t let you control which monitoring points are used and doesn’t give you the ability to do comparison views. That said as a product I liked Pingdom much better.

As for how I configured Monitis, I did not do much — I set the Service Level Agreement (SLA) for uptime to 10 seconds which is the time required to be met by the CABFORUM for revocation responses. I also selected all of the monitoring locations (30 of them) and set it loose.

I put this up for my own purposes, so I could work on improving our own service but I have also shared it publicly and know several of the other CAs that are being monitored are also using it which I am happy to see.

OK, so today I found myself explaining a few things about these reports to someone so I thought it would be worthwhile to summarize those points for others, they are:

Why is it so slow to render? – Unfortunately despite numerous requests to Monitis there is nothing I can do about this – Monitis is just slow.
Why does it show downtime so often? – I do not believe the downtime figures, most of the time the failures show up on all of the urls. The times I have looked into theses it turned out the failures were at Monitis or due to regional network congestion / failures. Unfortunately this means we cannot rely on these figures for up-time assessment, at best they are indicators when looked at over long periods of time.
Why do some tests show at 0-1 ms? – This is likely because the Monitis testing servers are located in the same data center as the OCSP servers in question. This skews the performance numbers a little bit but the inclusion of many perspectives should off-set this.

At this point I suspect you’re wondering, with these shortcomings what is this thing good for anyways? That’s a good question; OCSP (and CRLs) are a hidden tax that you and your users pay when they visit your site.

This is important because studies have found a direct correlation between latency and user abandonment and seriously who doesn’t just want their site to be fast as possible.

My hope is these resources help you understand what that tax is; if you’re a CA operator it can also help you tweak your performance as well as get an idea of what the global user experience is for the relying parties of your certificates.

On a related note I do think someone could make a pretty penny if they made an easy to use, yet powerful monitoring site 🙂

A look at revoked certificates

Chrome

Internet Explorer

Mozilla

Opera

Safari

A look at untrusted certificates

1 Reply

Today I did a blog post on how browsers show expired certificates. I figured I would take the opportunity to capture a few of the other failure cases for certificates.

The most severe example is that of an untrusted root certificate, for this scenario I figured the use of https://cacert.org was the most direct example.

There are a few cases where this error condition will come up, for example another one is if a server doesn’t include all of the intermediate certificates the clients cannot determine which Certificate Authority issued the certificate.

According to the current SSL Pulse data about 7.4% of the servers in the Alexa top one million may fall into this case.

Chrome

Internet Explorer

Mozilla

Opera

Safari

A look at expired certificates

3 Replies

Today I was on a mail thread where the topic of how browsers handle expired certificates; this is particularly relevant for a few reasons.

The first of which is that there is a large number of sites operating with expired certificates out on the Internet today, the other is that the adoption of short lived certificates (which I am a fan of) is at least in part dependent on how browsers deal with certificates that are expired.

In any event I was not sure how the most recent versions of browsers were handling these cases so I dug up an example site where an expired certificate was in use (https://www.appliancetherapy.com – it uses a certificate that expired a few weeks ago and has not as of yet been replaced).

So what did I want to find? In a perfect world I believe that the following should be true:

Users are warned or prohibited from going to the site in question.
The warning language used is easy to understand and explains the risks.
The warning language used is related to the fact that the certificate is expired.
The trust indicator does not show or is marked to indicate that there is a problem.

The good news is that for the most part browsers behaved fairly close to this, they all could have improved language but I believe Internet Explorers was the best.

The worst behaving client was Mozilla, as it doesn’t report the certificate as expired but instead indicates that it tried to make an OCSP request but got a response it was not expecting. This has two problems – the first of which being it should not have made an OCSP request for the status of an expired request.

RFC 5280 Section 5 states that:

A complete CRL lists all unexpired certificates, within its scope,

that have been revoked for one of the revocation reasons covered by

the CRL scope. A full and complete CRL lists all unexpired

certificates issued by a CA that have been revoked for any reason.

And RFC 2560 is written largely based on OCSP responses being fed from CRLs. What this means is that it is not appropriate to ask the revocation status of a certificate that is expired.

The next problem is that Mozilla also doesn’t handle the unauthorized response in a usable way. RFC 5019 Section 2.2.3 states:

The response “unauthorized” is returned in cases where the client

is not authorized to make this query to this server or the server

is not capable of responding authoritatively.

A user who receives this message would believe the issue is related to their permissions but based on the true reason for the error the failure as really that the responder in question doesn’t have the information that’s needed.

This lack of information on the server is likely due to the fact that it isn’t required to maintain information for expired certificates and the message Mozilla delivered should have been about the certificate being expired.

In any event the browsers behaved much better than I expected, IE and Chrome did the best (I really like Chromes red / over the https as a visual queue there is a problem).

Chrome

Internet Explorer

Mozilla

Opera

Safari

How to tell DV and OV certificates apart

Introduction

There are in essence three kinds of SSL certificates: Domain Validated, Organization Validated and Extended Validated. I am not going to write about the differences here it seems that there are hundreds of articles on this topic on the Internet.

What I think has not been given sufficient coverage is how one is able to look at a certificate and determine what type it is.

One would think that this would be easy; In theory if nothing was explicitly stated it would be a Domain Validated certificate (since it is the weakest validation), otherwise someone would put something in the certificate making it clear that the certificate was either Organization Validated or Extended Validated.

Unfortunately it’s not this simple, the main issue being the historic lack of coordination within the CA industry.

Each Certificate Authority (CA) has its own unique practices relating to how they mark their certificates so with the existing deployed certificates there is no singular rule or approach can be used to definitively know what level of validation was done for a given certificate.

Thankfully it looks like that this problem is betting better thanks to the adoption of the Baseline Requirements but in the meantime we have to make do with heuristics.

Deterministic Approach

Today the only way to know with confidence that a certificate is of a specific type is to know the practices of each CA.

In X.509 the way an issuer is supposed to express something like this is via the Certificate Policies extension which is defined in RFC 5280.

This allows a CA to express a unique identifier (an OID) in their certificates that maps to a document that describes its practices associated with this certificate. This identifier can be used programmatically to do make trust decisions about a certificate or to differentiate the user interface in an application based on what type of certificate is being used.

This is exactly how browsers today can tell if a certificate is an Extended Validation (EV) certificate. In essence they have some configuration that says “I trust GlobalSign to issue EV certificates, when a certificate is presented to me from them that has this policy OID show the EV user experience”.

The Baseline Requirements use the same approach defining identifiers for Domain Validated and Organization Validated certificates, these are:

Type	Policy Identifier
Domain Validated	2.23.140.1.2.1
Organization Validated	2.23.140.1.2.2

Having these identifiers takes us a long way towards our goal of deterministic evaluation of certificate issuance policy — that said not all CAs have adopted them which is technically alright since the Baseline Requirements do allow them to use their own Policy Identifiers.

Heuristic Approach

Since the Baseline Requirements were only established this year it will take some time for the existing install base of certificates to be re-issued to use these Policy Identifiers called about above. This doesn’t mean you can’t tell the certificates apart today, it does mean it is quite a bit messier though.

Here is some pseudo-code provided to me as an example from a friend that they used in one of their projects:

type = null;

if (cert is self-signed) then

type = SS; /* SS = Self-signed */

else if (cert was issued by a known “CA”) then

type = DV; /* DV = Domain Validation */ else if (cert contains a known EV Policy OID) then

type = EV; /* EV = Extended Validation */ else if (cert “Subject O” and “Subject CN” are the same or “Subject OU” contains “Domain Control Validated”) then {

if (cert contains no Subject L, St or PostalCode) then

type = DV;

}

else if (cert “Subject O” is “Persona Not Validated” and the cert’s issuer was StartCom

type = DV;

if (type is null)

type = OV;

This logic is not comprehensive but should work well enough for most uses.

Summary

Unfortunately today there is not a deterministic way to tell if a certificate was Domain or Organization Validated, that said things are changing and within a few years hopefully it will be possible.

In the mean-time there are heuristics you can use that help tell these types of certificates apart.

Updated my script for Qualified Subordination testing

1 Reply

I did some testing with ECC and SHA2 today and as such decided to update my script for testing Qualified Subordination to make it easy to get certificates that use these algorithms.

There are now several configurable variables in makepki.bat:

key = possible values include RSA or ECC
rsasize = possible values include 1024,2048,4096
eccsize= possible values include secp256r1,secp384r1,secp521r1
hash = possible values include sha1,sha256,sha384,sha512

With these its very easy to get chains to do testing with that include these algorithms, have fun.

Ryan

Using SHA2 based signatures in X509 certificates

6 Replies

It’s been an exciting decade for cryptography; as a result we see smaller key sizes and weaker algorithms getting deprecated.

One driver of such things is the U.S. Federal Government, specifically NIST.

One example of this would be NIST Special Publication 800-131A which disallows the use of SHA1 after December 2013. What this means is if you are in the U.S. Federal Government or you work with them you may have to revise your technology strategy to use SHA2 in its place.

But what if you don’t have any policy mandate forcing you to do this switch? Well it’s a good idea but it has consequences too, namely compatibility.

You see SHA2 was published in 2001 so anything produced before then will not support it. The most notable example is Windows XP which as of July 2012 has about 29% presence on the Internet.

This is important for more than just Internet Explorer users since even Chrome and Safari use CryptoAPI for certificate validation when on Windows.

The good news is that XP SP3 which was released in 2008 added support for this new suite of hash algorithms, that begs the question how many of those XP machines have XP SP3?

Unfortunately I don’t have any public references that can answer this question but let’s that 85% of all XP machines on the Internet have gotten this update (I have good confidence in this number) that means that 15% of those 29% would not be able to connect to your server over SSL if you used SHA2.

This would mean these users would see something like this:

That is pretty scary, so how long until we can use this more broadly? It’s hard to say there is a good article titled “The developers guide to browser adoption rates” that sheds some light, that and the historic gs.statcounter.com results. Based on these unless there is a sudden change (which is possible these machines are getting pretty old) I would assume that we have around 4-5 years of XP out there yet.

Hope this helps,

Ryan

UNMITIGATED RISK

un.mit.i.gat.ed: Adj. Not diminished or moderated in intensity or severity; unrelieved. risk: N. The possibiity of suffering harm or loss; danger.

Category Archives: Thoughts

Is SSL Broken?

How Facebook can avoid losing $100M in revenue when they switch to always-on SSL

A quick look at SSL performance

Reading ocspreport and crlreport at x509labs.com

A look at revoked certificates

Chrome

Internet Explorer

Mozilla

Opera

Safari

A look at untrusted certificates

Chrome

Internet Explorer

Mozilla

Opera

Safari

A look at expired certificates

Chrome

Internet Explorer

Mozilla

Opera

Safari

How to tell DV and OV certificates apart

Introduction

Deterministic Approach

Heuristic Approach

Summary

Updated my script for Qualified Subordination testing

Using SHA2 based signatures in X509 certificates