Is SSL Broken?

[ This is a re-post of a article I wrote for the GlobalSign corporate blog, you can find it here]

It seems every month a new flaw is identified in SSL, and while that’s a slight exaggeration, after a while one starts to ask the question – is SSL broken? My answer would to that question would be no, but the protocol is nearly twenty years old and even though it now carries a new name (TLS) it also carries much of the baggage of the past in its design.

Despite this fact, my faith in TLS is stronger today than it ever was. My reasoning is simple – today we understand the strengths and weaknesses of this protocol better than we ever have. It is continuously reviewed by the world’s best engineers and cryptographers, trying to find the bad assumptions their predecessors made, strengthening it in response to identified weaknesses, and modernizing it to use the strongest forms of cryptography available.

This continuous investment in this foundational technology gives me faith.

Today another attack on TLS was made public.  “Lucky Thirteen” is a derivative of the work of French cryptographer Serge Vaudenay (Padding Oracles against CBC based ciphers – 2010), though unlike Vaudenay’s attack, Lucky Thirteen uses a known Timing Attack previously believed to be impractical. A successful application of this attack enables an attacker to decrypt your SSL communications.

Unlike other recent attacks, such as BEAST,  Lucky Thirteen requires a server-side fix. This means that complete and effective protection against this attack will require all webservers to be updated or patched.

That said, it is possible to mitigate the attack by removing CBC cipher suites, since the attack is against SSL/TLS’s use of CBC. But what to use in its place? The consensus of security researchers is to adopt suites based on AES-GCM, and while I agree, this has one problem – the large population of clients that do not yet support it.

This recommendation is complicated slightly by the BEAST attack from last year, the resolution of which required a client side fix which has, in all likelihood, not yet been deployed ubiquitously. As such, I still recommend prioritizing the older and less secure RC4 based suites above AES-GCM since it addresses both issues.

But should you be worried? It depends. If you are using TLS (and not its little brother DTLS) I would say your best bet is to walk calmly to the nearest exit, and use this as an excuse to ensure you are following industry Best Practices when deploying SSL – if  you’re not, this attack is the least of your worries.  Specifically I would recommend visiting the SSL Configuration Checker and make the critical (red) and important (yellow) configuration changes it suggests.

I would also encourage you to deploy HTTP Strict Transport Security  on your site since the attack this mitigates (SSL stripping) is much easier for an attacker to execute.

The good news is that if you were already following the advice of the SSL Configuration Checker you were prioritizing RC4 over other ciphers and most sessions to your server were resistant to this attack. This doesn’t mean you should not be deploying the patch to this issue, you just don’t need to do so in a crazed rush.

So are there any lessons we can take away from this? Of course there are. As a server operator, I would say this finding underscores the importance of regularly reviewing your server configuration to ensure that it follows industry best, and that you are always operating the most recent and stable release of your web server.

If you want a more technical walk through of this attack, I highly recommend this post by Mathew Green on TLS Timing Oracles or this one by Adam Langly.

SSL 3.0 Usage in the Wild

Recently I had an opportunity to look at some logs that showed the cipher suites and protocol versions being negotiated for a large cross-section of websites.

I have always wanted to look at data like this and as such have instrumented my own sites to look at it but let’s face it some uber geek blog or security product company website just isn’t going to have representative traffic for the internet at large.

One of the easiest and most useful things to gleam from this data is that the impact of disabling SSL 3.0 is actually quite small.

So of the sampling 2.48% of all SSL/TLS sessions were done with SSL 3.0, if we look at (and believe) the User Agents that negotiated these sessions we see 74.98% of these were Windows clients, the next biggest chunk was Gecko at 16.39%.

Browser %
Internet Explorer

74.98%

Gecko

16.39%

Apple

4.12%

Playstation

2.85%

Chrome

1.36%

Other

0.30%

100.00%

 

Of these Windows clients 45.45% of them were Windows 2000 or XP but only 6.67% of them were running versions of Internet Explorer that did not support TLS 1.0; this basically boils down the the IE versions before version 7 as this was the first to enable TLS by default. So why did we see the remaining 68.31% of the 2.48% negotiating SSL when they support TLS?

There are a few possible explanations:

  1. Some TLS implementations will fall back to SSL in the event of a failure, one common example of a failure would be an intermittent TCP connection problem. Basically if this is the case the client had a problem reaching the server and thought it might be related to TLS and so it tried again. In this case its likely that if it had tried with TLS it probably would have succeeded also.  It also seems that its likley in this case the user did not get a working experience — the assumption here is that the TCP problems they are experiencing were not a one time thing.
  2. Some old TLS implementations had problems with TLS extensions as such some TLS implementations added logic to fall back to SSL when they encountered a this extension intolerance, again falling back to TLS (without extensions) would have likely also worked.
  3. Some enterprises may have used group policy to disable the use of TLS due to the TLS extension intolerance problems (see #2).
  4. Some clients are lying; they may be crawlers, bots and other such automated agents looking to profile these websites.

So what can we do with this data?

Well for one we can understand what interoperability implications we may encounter by disabling SSL 3.0 on our servers – on the surface the answer is up to 2.48% of clients will not be able to get to our servers.

The real answer is that it’s likely that figure is much smaller, probably half that if not even less than.

OK, so we understand the interoperability impact but why should I care? Well there are a few reasons:

  1. NIST 140-2 compliance requires disabling SSL 3 ciphers and by disabling SSL 3 you do just that.
  2. The browsers that only support this decade old protocol are nearly as old and a have a litany of issues of their own.
  3. TLS has a number of security, performance and deploy-ability enhancing  features such such as stronger cipher suites, Session Tickets and SNI that you will benefit from.

Another thing you should ask yourself is did you design your site for these old browsers? If not by leaving SSL 3 enabled you really are not getting much if any benefit since those users who require it would likely not be able to use your site effectively anyways.

When we consider this data I believe the natural conclusion is that disabling SSL 3.0 it is the right thing to do.

Ryan

Understanding Windows Automatic Root Update

Windows has a feature called Automatic Root Update, when CryptoAPI does a chain build, exhausts the locally installed root certificates it downloads (if it has not already done so) a list of certificates it should trust.

This list contains attributes about those certificates (hashes of their subject name and keys, what Microsoft believes it should be trusted for, etc.).

If it finds the certificate it needs in that list it downloads it and installs it.

So how does this technically work?

There is a pre-configured location where Windows looks for this Certificate Trust List (CTL), it is http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab.

NOTE: Technically they do use different language codes in the URLs based on the client local but to the best of my knowledge they all map to the same files today.

This list is downloaded and de-compressed; you can do the same thing on Windows (with curl) like this:

mkdir authroot

cd authroot

curl http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab>authrootstl.cab

expand authrootstl.cab .\authroot.stl

You can open authroot.stl with explorer and look at some of its contents but for the purpose of our exercise that won’t be necessary.

This file is a signed PKCS #7 file, Windows expects this to be signed with key material they own (and protect dearly).

It contains a Microsoft specific ContentInfo structure for CTLs and a few special attributes. WinCrypt.h has the OIDs for these special attributes defined in it but for brevity sake here are the ones you will need for this file:

1.3.6.1.4.1.311.10.1 OID_CTL

1.3.6.1.4.1.311.10.3.9 OID_ROOT_LIST_SIGNER

1.3.6.1.4.1.311.10.11.9 OID_CERT_PROP_ID_METAEKUS

1.3.6.1.4.1.311.10.11.11 CERT_FRIENDLY_NAME_PROP_ID

1.3.6.1.4.1.311.10.11.20 OID_CERT_KEY_IDENTIFIER_PROP_ID

1.3.6.1.4.1.311.10.11.29 OID_CERT_SUBJECT_NAME_MD5_HASH_PROP_ID

1.3.6.1.4.1.311.10.11.83 CERT_ROOT_PROGRAM_CERT_POLICIES_PROP_ID

1.3.6.1.4.1.311.10.11.98 OID_CERT_PROP_ID_PREFIX_98

1.3.6.1.4.1.311.10.11.105 OID_CERT_PROP_ID_PREFIX_105

1.3.6.1.4.1.311.20.2 szOID_ENROLL_CERTTYPE_EXTENSION

1.3.6.1.4.1.311.21.1 szOID_CERTSRV_CA_VERSION

1.3.6.1.4.1.311.60.1.1 OID_ROOT_PROGRAM_FLAGS_BITSTRING

Copy this list of OIDs and constants into a file called authroot.oids and put it where we extracted the STL file.

We can now use the following OpenSSL command to inspect the contents in more detail:

openssl asn1parse -oid authrootstl.oids -in authroot.stl -inform DER

There is lots of stuff in here, but for this exercise we will just download the certificates referenced in this list.

To do that we need to understand which of the attributes are used to construct the URL we will use to download the actual certificate.

This list values we want are just after the OID_ROOT_LIST_SIGNER, the first one being “CDD4EEAE6000AC7F40C3802C171E30148030C072”, its entry will look like this:

SEQUENCE

  128:d=8  hl=2 l=  20 prim:         OCTET STRING      [HEX DUMP]:CDD4EEAE6000AC7F40C3802C171E30148030C072

  150:d=8  hl=3 l= 246 cons:         SET

  153:d=9  hl=2 l=  30 cons:          SEQUENCE

  155:d=10 hl=2 l=  10 prim:           OBJECT            :OID_CERT_PROP_ID_PREFIX_105

  167:d=10 hl=2 l=  16 cons:           SET

  169:d=11 hl=2 l=  14 prim:            OCTET STRING      [HEX DUMP]:300C060A2B0601040182373C0302

  185:d=9  hl=2 l=  32 cons:          SEQUENCE

  187:d=10 hl=2 l=  10 prim:           OBJECT            :OID_CERT_SUBJECT_NAME_MD5_HASH_PROP_ID

  199:d=10 hl=2 l=  18 cons:           SET

  201:d=11 hl=2 l=  16 prim:            OCTET STRING      [HEX DUMP]:F0C402F0404EA9ADBF25A03DDF2CA6FA

  219:d=9  hl=2 l=  36 cons:          SEQUENCE

  221:d=10 hl=2 l=  10 prim:           OBJECT            :OID_CERT_KEY_IDENTIFIER_PROP_ID

  233:d=10 hl=2 l=  22 cons:           SET

  235:d=11 hl=2 l=  20 prim:            OCTET STRING      [HEX DUMP]:0EAC826040562797E52513FC2AE10A539559E4A4

  257:d=9  hl=2 l=  48 cons:          SEQUENCE

  259:d=10 hl=2 l=  10 prim:           OBJECT            :OID_CERT_PROP_ID_PREFIX_98

  271:d=10 hl=2 l=  34 cons:           SET

  273:d=11 hl=2 l=  32 prim:            OCTET STRING      [HEX DUMP]:885DE64C340E3EA70658F01E1145F957FCDA27AABEEA1AB9FAA9FDB0102D4077

  307:d=9  hl=2 l=  90 cons:          SEQUENCE

  309:d=10 hl=2 l=  10 prim:           OBJECT            :CERT_FRIENDLY_NAME_PROP_ID

  321:d=10 hl=2 l=  76 cons:           SET

  323:d=11 hl=2 l=  74 prim:            OCTET STRING      [HEX DUMP]:4D006900630072006F0073006F0066007400200052006F006F007400200043006500720074006900660069006300610074006500200041007500740068006F0072006900740079000000

We can download this certificate with the following command:

curl http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/CDD4EEAE6000AC7F40C3802C171E30148030C072.crt> CDD4EEAE6000AC7F40C3802C171E30148030C072.crt

If we look at the contents of the certificate we can see this is the 4096 bit “Microsoft Root Certificate Authority”.

The next in the list is 245C97DF7514E7CF2DF8BE72AE957B9E04741E85, and then the next is 18F7C1FCC3090203FD5BAA2F861A754976C8DD25 and so forth.

We could go through and parse the ASN.1 to get these values, iterate on them and download them all if we wanted. Of course if we wanted to get all of the root certificates we might as well just download the most recent root update here: http://www.microsoft.com/en-us/download/details.aspx?id=29434

It contains the same information just in a self-contained format (this is whats used by down-level clients that do not have Automatic Root Update).

Anyhow hopefully this will be useful for someone,

Ryan

How Facebook can avoid losing $100M in revenue when they switch to always-on SSL

Recently Facebook announced that they will be moving to Always-On-SSL, I for one am thrilled to see this happen – especially given how much personal data can be gleamed from observing a Facebook session.

When they announced this change they mentioned that users may experience a small performance tax due to the addition of SSL. This is unfortunately true, but when a server is well configured that tax should be minimal.

This performance tax is particularly interesting when you put it in the context of revenue, especially when you consider that Amazon found that every 100ms of latency cost them 1% of sales. What if the same holds true for Facebook? Their last quarter revenue was 1.23 billion, I wanted to take a few minutes and look at their SSL configuration to see what this tax might cost them.

First I started with WebPageTest; this is a great resource for the server administrator to see where time is spent when viewing a web page.

The way this site works is it downloads the content twice, using real instances of browsers, the first time should always be slower than the second since you get to take advantage of caching and session re-use.

The Details tab on this site gives us a break down of where the time is spent (for the first use experience), there’s lots of good information here but for this exercise we are interested in only the “SSL Negotiation” time.

Since Facebook requires authentication to see the “full experience” I just tested the log-on page, it should accurately reflect the SSL performance “tax” for the whole site.

I ran the test four times, each time summing the total number of milliseconds spent in “SSL Negotiation”, the average of these three runs was 4.111 seconds (4111 milliseconds).

That’s quite a bit but can we reduce it? To find out we need to look at their SSL configuration; when we do we see a few things they could do to improve things, these include:

Let’s explore this last point more, the status check the browser does is called an OCSP request. For the last 24 hours their current CA had an average world-wide OCSP response time of 287 ms, if they used OCSP Stapling the browser would need to do only one OCSP request, even with that optimization that request could be up to 7% of the SSL performance tax.

Globalsign’s average world-wide OCSP response time for the same period was 68 milliseconds, which in this case could have saved 219 ms. To put that in context Facebook gets 1.6 billion visits each week. If you do the math (219 * 1.6 billion / 1000 / 60 / 24), that’s 12.7 million days’ worth of time saved every year. Or put another way, it’s a lifetime worth of time people would have otherwise spent waiting for Facebook pages to load saved every two and a half hours!

If we consider that in the context of the Amazon figure simply changing their CA could be worth nearly one hundred million a year.

Before you start to pick apart these numbers let me say this is intended to be illustrative of how performance can effect revenue and not be a scientific exercise, so to save you the trouble some issues with these assumptions include:

  • Facebook’s business is different than Amazons and the impact on their business will be different.
  • I only did four samples of the SSL negotiation and a scientific measurement would need more.
  • The performance measurement I used for OCSP was an average and not what was actually experienced in the sessions I tested – It would be awesome if WebPageTest could include a more granular breakdown of the SSL negotiation.

With that said clearly even without switching there are a few things Facebook still can do to improve how they are deploying SSL.

Regardless I am still thrilled Facebook has decided to go down this route, the change to deploy Always-On-SSL will go a long way to help the visitors to their sites.

Ryan

Making a Windows smartcard login certificate with OpenSSL.

I use OpenSSL for testing certificate related stuff all the time, while using its test clients as a administrative tool can require contortions sometimes it’s very useful thing to have in my toolbox.

Today I needed to throw together a certificate for Windows smartcard login, a valid Windows Smart Card Login certificate has the following attributes:

  1. Is issued by an CA that is trusted as an Enterprise CA
  2. Is issued by a CA that has the “Smartcard Logon” EKU (1.3.6.1.4.1.311.20.2.2)
  3. Has the “Smartcard Logon” EKU
  4. Has the “Digital Signature” “Key Usage”
  5. Has the principal name of the subscriber in the SubjectAltName extension as a UPN (1.3.6.1.4.1.311.20.2.3)

With that background how does one do this in OpenSSL? Well lets focus on the last 3 (3,4,5) as they are about the subscriber certificate.

To create this certificate you would create an OpenSSL section that looks something like this:

[ v3_logon_cert ]

# Typical end-user certificate profile

 

keyUsage = critical, nonRepudiation, digitalSignature, keyEncipherment

extendedKeyUsage = critical, clientAuth, emailProtection, msSmartcardLogin

basicConstraints = critical, CA:FALSE

 

subjectKeyIdentifier = hash

authorityKeyIdentifier = keyid,issuer

 

authorityInfoAccess = @customerca_aia

 

subjectAltName = otherName:msUPN;UTF8:[email protected], email:[email protected]

 

certificatePolicies=ia5org,@rootca_polsect

There are a few other “reference” sections you can find the INF file I used these additions with in my script for testing Qualified Subordination.

Hope this helps you too,

Ryan

Using CAPICOM on Windows x64

So CAPICOM was one of the project I was responsible for while at Microsoft, its been discontinued but I always find it useful – it is kind of a Swiss Army knife for CryptoAPI certificate stores when paired with its VBS samples.

One of it’s problems is we never shipped with x64 bit version, you can do similar things with PowerShell and the .NET classes (this is why it was discontinued) but I still find this the quickest way to do stuff sometimes so I keep it in my toolbelt.

Here is what you need to know to make it work:

  1. Windows can run 32bit things in 64bit environments.
  2. You cannot have a 64bit thing call a 32bit thing.
  3. Windows ships a 32bit cmd prompt.
  4. Windows ships a 32bit regsrv32.

To use CAPICOM you need to:

  1. Download CAPICOM – http://www.microsoft.com/en-us/download/details.aspx?id=25281
  2. Install CAPICOM
  3. Register CAPICOM
  • Open an administrative command prompt
  • cd to “C:\Program Files (x86)\Microsoft CAPICOM 2.1.0.2 SDK\Lib\X86”
  • copy CAPICOM.DLL %windir%\syswow64
  • %windir%\syswow64\regsvr32.exe %windir%\syswow64\capicom.dll
  • “exit” the command prompt

So what can you do? There are lots of things, tonight I used it to enumerate the extensions included in a PFX file, you can do this with OpenSSL too by looking at the ASN.1 but this way you get some of the Microsoft specific stuff expanded out to human readable things.

I should note that its old, its unsupported and it may have vulnerabilities in it — as such I unregister it when its not in use and I recomend you do the same.

Hope this helps someone,

Ryan

Priming the OCSP cache in Nginx

So recently GlobalSign, DigiCert, and Comodo worked together with Nginx to get OCSP stapling supoported in Nginx 1.3.7, unfortunately architectural restrictions made it impractical to make it so that pre-fetching the OCSP response on server start-up so instead the first connection to the server primes the cache that is used for later connections.

This is a fine compromise but what if you really want the first connection to have the benefit too? Well there are two approaches you can take:

  1. Right after you start the server you do a SSL request to prime the cache.
  2. You manually get the ocsp response and plumb it where Nginx is looking for it.

The first model is easy, right after you start your server use the OpenSSL s_client to connect to the server with OCSP stapling enabled  just like I documented in this post, the first request will trigger the retrieval of the OCSP response by Nginx.

The second model can be done before you start the server, you need to find the URI for the OCSP responder, do a OCSP request and populate the Nginx cache manually, this would look something like:

#!/bin/sh
ISSUER_CER=$1
SERVER_CER=$2

URL=$(openssl x509 -in $SERVER_CER -text | grep “OCSP – URI:” | cut -d: -f2,3)

openssl ocsp -noverify -no_nonce -respout ocsp.resp -issuer \
$ISSUER_CER -cert $SERVER_CER -url $URL

Where “ocsp.resp” is whatever file you have configured in Nginx for the “ssl_stapling_file“.

Each approach has its pros and cons, for example with the first approach your execution of the s_client call may not be the first request the server sees, with the second approach if you are using a certificate that doesn’t contain a OCSP pointer and have manually told Nginx where to fetch certificate status from then it won’t work.

It is worth noting you can run this same script in a cron script to ensure your server never needs to hit the wire (and potentially block when doing so) when it tries to keep its OCSP cache up to date.

 

 

What is the status of revocation checking in browsers?

Today we did an announcement of some work we have been doing with CloudFlare to speed up SSL for all of our customers through some improvements to our revocation infrastructure.

One of the things that come up when talking about this is how each of the browsers handles revocation checking, I thought it might be useful to put together a quick post that talks about this to clear up some confusion.

The first thing that’s important to understand is that all major browsers do some form of revocation checking, that includes Opera, Safari, Chrome, Firefox and Internet Explorer.

Let’s talk about what that means, the IETF standards for X.509 certificates define three ways for revocation checking to be done, the first is Certificate Revocation Lists (CRLs), next there is the Online Certificate Status Protocol (OCSP) and finally there is something called Simple Certificate Validation Protocol (SCVP).

In the context of browsers we can ignore SCVP as no browser implements them; this leaves us with CRLs and OCSP as the standards compliant ways of doing revocation checking.

All of the above browsers support these mechanisms, in addition to these standard mechanisms Google has defined a proprietary certificate revocation checking scheme called CRLsets.

If we look at StatCounter for browser market share that means today at least 64.84% (its likely more) of the browsers out there are doing revocation checking based on either OCSP or CRLs by default.

This means that when a user visits a website protected with SSL it has to do at least one DNS look-up, one TCP socket and one HTTP transaction to validate the certificate the web server presents and more likely several of these.

This is important because of the way revocation checking needs to be done, you need to know if the server you are talking to really is who they say they are before you start to trust them – that’s why when browsers do OCSP and CRLs they do this validation before they download the content from the web page.

This means that your content won’t be displayed to the user until this check happens and this can take quite a while.

For example in the case of IE and Chrome (when it does standards based revocation checking on Windows) it uses CryptoAPI which will time-out after 15 seconds of attempting to check the status of a certificate.

The scary part is that calls to this API do actually time out and when they do this delay is experienced by the users of your website!

So what can you do about it? It’s simple really you have to be mindful of the operational capacity and performance of the certificate authority you get your certificate from.

Check out this monitoring portal I maintain for OCSP and this one I maintain for CRLs, you will see GlobalSign consistently outperforms every other CA for the performance of their revocation infrastructure in most cases it’s nearly 6x as fast and in others is much more than that.

The other thing to understand is that today the default behavior of these browsers when checking the status of a certificate via OCSP or CRLs is to do what is often referred to as a “soft-revocation failure”.

This basically means that if they fail for any reason to check the status of a certificate (usually due to performance or reliability issues) they will treat the certificate as good anyways. This is an artifact of CAs not operating sufficiently performant and reliable infrastructure to allow the browsers to treat network related failures critically.

Each of these browsers all have options you can use to enable “hard” or “strict” revocation checking but until the top CAs operate infrastructure that meets the performance and reliability requirements of the modern web no browser will make these the default.

Finally its also important to understand that even with this “soft-failure” your website experiences the performance cost of doing these checks.

It’s my belief that the changes we have put into place in our own infrastructure meet that bar and I hope the other CAs follow in our lead as it is in the best interest of the Internet.

Ryan

Revocation checking, Chrome and CRLsets

One of the things I often hear is that Chrome no longer does revocation checking, this isn’t actually true.

All major browsers do some form of revocation checking, that includes Opera, Safari, Chrome, Firefox and Internet Explorer.

Google still does revocation checking it just does so through a proprietary mechanism called CRLsets.

As its name implies CRLsets are basically a combination of CRLs, Google crawls the web gathers CRLs and merges them together into a “mega-crl”. This mega-crl is formatted differently than other CRLs but it’s essentially the same thing but there are some important differences, the most important being that due to size concerns Google selectively chooses which CAs it includes in the CRL set and within those CRLs which revoked certificates to include.

With this understanding you have to wonder why would Google introduce this new mechanism if it not as comprehensive as the standard based ways to deal with revocation checking? The answer is simple performance and reliability.

With CRLsets Google is distributing the revocation list, and as such they can make sure that its delivered quickly they do this in-part by taking a bet that they can intelligently pick which revoked certificates are important (IMHO they cannot – revoked = revoked) and by being the one that distributes the list.

This has implications for users, Chrome trusts certificate authorities for which it has no revocation information for it also intentionally treats some revoked certificates as good which exposes you to some risk.

This is especially problematic for enterprises that use Chrome and leverage PKI, there is essentially no chance Google will decide to include your CRL. This is also problematic for those who encounter certificates from those CAs.

That’s not to say CRLsets do not have value they do, but those values have been discussed elsewhere in detail.

But what do you do if you want a more holistic solution to revocation checking? Its simple you can turn on the standards based revocation checking mechanisms and Chrome will use them in addition to the CRLset, to do that you go to Settings and expand choose Advanced Settings where you will see:

 

 

 

Here you can re-enable the standards based revocation checking mechanisms so chrome can do a more holistic job protecting you from the known bad actors on the internet.

Ryan

 

A quick look at SSL performance

When people think about SSL performance they are normally concerned with the performance impact on the server, specifically they talk about the computational and memory costs of negotiating the SSL session and maintaining the encrypted link.  Today though it’s rare for a web server to be CPU or memory bound so this really shouldn’t be a concern, with that said you should still be concerned with SSL performance.

Did you know that at Google SSL accounts for less than 1% of the CPU load, less than 10KB of memory per connection and less than 2% of network overhead?

Why? Because studies have shown that the slower your site is the less people want to use it. I know it’s a little strange that they needed to do studies to figure that out but the upside is we now have some hard figures we can use to put this problem in perspective. One such study was done by Amazon in 2008, in this study they found that every 100ms of latency cost them 1% in sales.

That should be enough to get anyone to pay attention so let’s see what we can do to better understand what can slow SSL down.

Before we go much further on this topic we have to start with what happens when a user visits a page, the process looks something like this:

  1. Lookup the web servers IP address with DNS
  2. Create a TCP socket to the web server
  3. Initiate the SSL session
  4. Validate the certificates provided by the server
  5. Establish the SSL session
  6. Send the request content

What’s important to understand is that to a great extent the steps described above tasks happen serially, one right after another – so if they are not optimized they result in a delay to first render.

To make things worse this set of tasks can happen literally dozens if not a hundred times for a given web page, just imagine that processes being repeated for every resource (images, JavaScript, etc.) listed in the initial document.

Web developers have made an art out of optimizing content so that it can be served quickly but often forget about impact of the above, there are lots of things that can be done to reduce the time users wait to get to your content and I want to spend a few minutes discussing them here.

First (and often forgotten) is that you are dependent on the infrastructure of your CA partner, as such you can make your DNS as fast as possible but your still dependent on theirs, you can minify your web content but the browser still needs to validate the certificate you use with the CA you get your certificate from.

These taxes can be quite significant and add up to 1000ms or more.

Second a mis(or lazily)-configured web server is going to result in a slower user experience, there are lots of options that can be configured in TLS that will have a material impact on TLS performance. These can range from the simple certificate related to more advanced SSL options and configuration tweaks.

Finally simple networking concepts and configuration can have a big impact on your SSL performance, from the basic like using a CDN to get the SSL session to terminate as close as possible to the user of your site to the more advanced like tuning TLS record sizes to be more optimum.

Over the next week or so I will be writing posts on each of these topics but in the meantime here are some good resources available to you to learn about some of these problem areas: