Tag Archives: X509

Why bother with short-lived certificates and keys in TLS?

There seems to be a lot of confusion and misinformation about the idea of short-lived certificates and keys so I thought I would pen some thoughts about the topic in the hope of providing some clarification.

I have seen some argue the rationale behind short-lived certificates is to address the shortcoming in the CA and browser revocation infrastructure, I would argue this is not the case at all.

In my mind, the main reason for them is to address the issue of key compromise. Long-lived keys have a long period in which they are exposed to theft (see Heartbleed) and therefor are a higher value to an attacker since a stolen key enables the attacker to impersonate the associated website for that period.

The most important thing to keep in mind is that the nature of key compromise is such that you almost never know it has happened until it is too late (consider Diginotar as an extreme example).

The importance of protecting the SSL private key is why in the 90s when SSL deployment started in earnest, large companies tried to deploy SSL in their environment using SSL “accelerators” and “security modules” that protected the keys. This was the “right technical” thing to do not not the right “practical thing” because it significantly reduced the scalability of SSL protected services and at the same time massively increased the cost to deploy SSL.

By the mid 00’s we recognized this was not workable and the use of accelerators basically stopped outside a few edge cases and software keys were used instead. Some implementations, like the Microsoft SCHANNEL implementation, tried to protect the keys by moving them to a separate process mitigating the risk of theft to some degree. Others simply loaded the keys into the web servers process and as a result exposed them to Heartbleed like attacks.

What’s important here is that when this shift happened no one pushed meaningful changes to the maximum validity period of certificates and by association their private keys. This meant that we had software keys exposed to theft for five years (max validity period at the time) with no reliable way to detect they were stolen and in the unlikely event we did find out of the theft we would rely on the unreliable CA revocation infrastructure to communicate that issue to User Agents — clearly this was not an ideal plan.

In short, by significantly reducing the validity of the certificate and keys, we also significantly reduce the value to the attacker.

Another issue that short-lived certificates help mitigate is the evolution of the WebPKI, with a long-lived certificate you get virtually no security benefits of policy and technology improvements until the old certificates and keys are expunged from the ecosystem. Today this is rollout of new policy largely accomplished via “natural expiration” which means you have to wait until the last certificate that was issued under the old policies expires before absolute enforcement is possible.

So what is the ideal certificate validity period then? I don’t think there is a one-size-fits-all answer to that question. The best I can offer is:

As short as possible but no shorter.

With some systems it is not possible to deploy automation and until it is one needs to pick a validity period that is short enough to mitigate the key compromise and policy risks I discussed while long enough to make management practical.

It is probably easier to answer the question what is the shortest validity period we can reliably use. The answer to that question is buried in the clock skew of the relying parties. Chrome recently released a new clock synchronization feature that significantly reduces errors related certificate validity periods. But until that is fully deployed and other UAs adopt similar solutions you are probably best to keep certificate validity periods at 30 days to accommodate skew and potential renewal failures.

In-short, long-lived keys for SSL exist because we never re-visited the threat model of key compromise when we stopped using hardware-protected keys in our SSL deployments and short-lived keys help deal with the modern reality of SSL deployments.

Ryan

– I have added a few pictures for fun and made some minor text changes for clarity.

CAs and SSL and Phishing Oh My!

 

 

 

 

 

 

NOTE: This post reflects my personal beliefs and is not necessarily those of my employer Google, or Let’s Encrypt where I am a member of their Technical Advisory Board.

Introduction

Recently Vincent from The SSL Store published a blog post calling out Let’s Encrypt for issuing certificates to domains that contain the world PayPal.

The TL;DR for his post is he believes that Let’s Encrypt is enabling phishers by issuing them SSL certificates that contain the word “PayPal” and then refusing to revoke them when arbitrary third-parties ask them to.

As a result of his post, several news sources have decided to write articles about how “Let’s Encrypt” is acting as an enabler of these Phishers [1] [2].

Unfortunately, Vincent’s post and the associated articles don’t cover this in the most complete and balanced way so over my morning coffee today I decided to write this post to discuss the other side of the argument.

If this is a topic that interests you please also check out the Let’s Encrypt blog post where they talk about why they have taken this position.

Exploration

Let’s explore the opportunities CAs have to check for phishing, the tools they have available to them, the effectiveness of those tools, the consequences of this approach, how complete a solution based on the tools available to them would be and what the resulting experience would be for users.

Opportunities

The WebPKI’s CAs role, historically, has been that of a Passport office, you present proof you control a domain, and possibly that you are an authorized member of an organization and you get a digital certificate that attests to that.

This certificate could be valid for up to 1095 days. Once the certificate is issued the CA, largely speaking, has no natural opportunity to verify this information again. It is worth noting that this month the CABForum voted to shorten this period to 825 days.

Tools

In the event a CA determines it made a mistake in the issuance of a certificate or has been notified by the subscriber they would like to see a certificate marked invalid, the tool they have available to them is called “revocation”.

The two types of revocation that are under the control of a Certificate Authority are called Certificate Revocation Lists and OCSP responses. The first is a like a phonebook of all known “revoked” certificates while the last is more like a lookup that it enables User Agents to ask the status of a particular set of certificates.

Effectiveness

Earlier we discussed the lifetime of certificates, this is important to understand because the large majority of phishing sites do not start out as Phishing sites, as such issuance time checks seldom net positive results.

After issuance, this leaves you with periodic checks of the site,  third-party reports of phishing and relying on revocation checking as an enforcement mechanism. This is a recipie for failure, there are a few reasons for this, but one of the more significant is the general ineffectiveness of revocation checking.

Revocation checking is the most taxing thing a CA does. This is because the revocation mechanisms available to them will result in every relying party contacting them to download a OCSP response or CRL covering that certificate.

As a result, OCSP has a tendency to be both slow and unreliable. This forced browsers to implement this check as a “soft fail”, in other words, if the connection times out or fails for some reason they assume the certificate as good.

To give that some context about 8% of all revocation checks done by Firefox fail and the median response time is over 200ms.

As a result of this in 2012 Chrome, which is used by about 50% of all users, more-or-less disabled revocation checking except for exceptional circumstances.

What this means is that revocation checking, even for its intended purpose, is far from an effective tool. Expanding its use to include protecting users from phishers would not improve its effectiveness and arguably it would (due to the infrastructure implications) make it even less reliable.

It is also important to note that every wildcard certificate can be used for a hostname containing “PayPal” without the CA ever being made aware, a good example is https://paypal.github.io/ which is protected by a wildcard certificate issued to Github.

Consequences

To understand the consequences of expanding the CAs role include protecting us from phishing we first need to understand what a certificate represents, or more importantly what it does not represent. It does not represent the content, it represents the host that is serving content and it is the content that “phishes”.

Today, in the age of cloud services, there is a good chance the host that is serving the content is a service operated by WordPress, or maybe Amazon’s S3. These services allow users to sign up and post arbitrary content for free or very little money.

If we decide that revocation checking is the right tool to get phishing content off the web we would be saying a CA should revoke WordPress’s certificate if one of it’s users posted something someone reported as phishing content. That would, for the situations where revocation checking takes place and happens to work, take WordPress off the Internet. Is that what we want to happen?

If so, who is it we are asking to perform this task? There are well over 400 CAs in the Microsoft Root Program do we believe these are the right organizations to be policing the internet for the appropriateness of content?

If so what criteria should they use to do so and what do we do if they abuse this censorship role?

Completeness

It is easy to say that a CA should not issue a certificate if it contains the word “PayPal”. I could even see an argument that those that would be hurt by such a rule, for example, http://www.PayPalSucks.com and (a theoretical) PayPalantir.com are an acceptable loss.

This would, however not catch homoglyphs like when a Cyrillic “a” is used instead of the latin “a” which would very likley require a manual review of the name and content to determine the intent of the domain owner which is near impossible to do with any level of accuracy or fairness.

Even with that, what about ING, as one of the world’s largest banks, they too are commonly phished, should a CA be able to issue a certificate to https://www.fishing.com. And if they do and the issuing CA receives a complaint that it is Phishing ING what should they do?

And what about global markets and languages? In Romania there is a company called Amazon that is a cleaning company, should anyone be able to request their website be revoked because it contains the word Amazon?

If we promote the CA to content police, how do we do so in a complete way?

User Experience

With CAs acting as the content police, what would a user see when they encounter a revoked site? While it varies browser to browser the experience is almost always a blocking “interstitial”, for example:

 chrome revoked firefox revoked


If you look closely you will see these are not screens that you can bypass, revoked sites are effectively removed from the internet.

This is in contrast to Safe Browsing and Smartscreen which were designed for this particular problem set and therefore provide the user a chance to visit the site after a contextually relevant warning:

 SafeBrowsing smartscreen

Conclusion

I hope you see from the above that relying on Certificate Authorities as content police as a means to protect users from phishers a bad idea, at a minimum, it would be:

  • Ineffective,
  • Incomplete,
  • Unmanageable,
  • and Duplicative.

But more importantly it would be establishing a large loosely managed group as the de-facto content censors on the internet and as Steven Spielberg said, there is a fine line between censorship, good taste, and moral responsibility.

So what should CAs do about phishing then? It is my position they should check the Google Safe Browsing API prior to issuance (which by the way, Let’s Encrypt does), and they should report Phishers to the Safe Browsing service if they encounter any.

It is also important to answer the question about what users should do to protect themselves from phishing. I understand the desire to say there is only one indicator they need to be worried about, it’s just not realistic.

When I talk to regular users I tell them to do three things, the first of which is to use an up-to-date and modern browser that uses Smart Screen or Safe Browsing. Second, you should only provide data to sites you know and only over SSL. And finally, try to only provide sites information when it was you initiated the exchange of information.

P.S.

Thanks To Vincent Lynch and the others who were kind enough to proof this post before publishing.

WebCrypto and PKI

Like it or not here it comes — within the next few months WebCrypto will be supported in various degrees across all mainstream browsers. There are plenty of posts out there talking about the security concerns of performing cryptography in the browser so I wont go into those here.

What I wanted to talk about was now that it’s here (mostly) what can we do with it? There will be those who say you do harm by making any claims about the assurances a web application makes relating to identity and confidentiality when WebCrypto is involved. The reality though is its happening and we should think about how we enable applications to use it responsibly.

This is why I started work on PKI.JS and ASN1.JS with Yuri Strozhevsky. Now that browsers have these basic crypto primitives available to them it is possible to build web applications that are interoperable with the security services used off the web, it is also possible to build new services on the web that simply were not possible before.

Now there have been libraries that that offered ASN.1 and PKI related capabilities (for example jsRSAsign, Forge and Lapo-asn1js) but none of these were complete and none built around WebCrypto as the source of crypto.

What Yuri and I set out to do is create a set of libraries that addressed these gaps and used public test suites (when available) to ensure conformance with the associated standards, including:

  1. X.509 and CRLs– RFC 5280
  2. CMS / PKCS 7 (Signed & EnvelopedData) – RFC 5652
  3. PKCS10 – RFC 2986
  4. PKCS8 – RFC 5208
  5. OCSP – RFC 6960
  6. Time-stamping – RFC 3161

For example for ASN.1 Yuri used his freely available test suite and for path building he tested against the PKITS test suite.

This of course does not mean the libraries are 100% compliant or defect free, in-fact I can promise you they are not but where test suites were clearly available we tried to utilize them so we would end up with a highly stable and standards compliant library.

At this point the libraries work in all modern browsers but only support signing, verifying, encrypting and decrypting in the Chrome dev-channel but in theory should work on Firefox nightlies as well. Unfortunately the profile and version of WebCrypto supported by Internet Explorer is outdated enough at this point these features do not work there at all yet.

These libraries have not yet been published to their public repositories but I expect them to be within the week under an BSD style license, to give some perspective on the size of this project I expect it to be just under 20,000 lines of code when released. It’s my hope that other people take this and build upon them so that the Internet has a browser friendly way to interact with these technologies.

NOTE: While I hate disclaimers like this but these libraries have not undergone any significant review please do not consider them production ready more work is needed before that’s the case.

NOTE: It’s also worth noting that until at least two browsers release their WebCrypto implementations as final products that these libraries may stop working or not work uniformly across browsers, for example at this time the nightly Chrome builds do nor support key exports which prevents implementation of the key storage structures.

P.S I actually miss spoke earlier, we did not end up include PKCS #12 in this version but most of the base structures are supported.

What’s in a certificate chain and why?

Have you ever wondered why your web server certificate has a “chain” of other certificates associated with it?

The main reason is so that browsers can tell if your certificate was issued by a CA that has been verified to meet the security, policy and operational practices that all CAs are mandated to meet. That certificate at the top of the chain is commonly called the “root”. Its signature on a certificate below it indicates that the CA operating the root believes that practices of the CA below it meets that same high bar.

But why not issue directly off of the “root” certificates? There are a few reasons; the main one is to prevent key compromise. To get a better understanding, it’s useful to know that the private keys associated with the “root” are kept in an offline cryptographic appliance located in a safe, which is located in a vault in a physically secured facility.
These keys are only periodically brought out to ensure the associated cryptographic appliance is still functioning, to issue any associated operational certificates (for example an OCSP responder certificate) that may be needed, and to sign fresh Certificate Revocation Lists (CRLs). This means that for an attacker to gain access to these keys, they would need to gain physical access to this cryptographic appliance as well as the cryptographic tokens and corresponding secrets that are used to authenticate the device.

CAs do this because keeping keys offline is a great way to reduce the risk of a compromised key, but it’s a poor way to offer a highly available and performant service, so the concept of an Issuing CA (ICA) was introduced. This concept also enabled the “root” to respond to CA key compromise events by revoking a CA certificate that should no longer be trusted. This also enables delegation of control, limiting those who can influence a given ICA to sign something.

Another way CAs solve the “online CA” problem is to use what is commonly referred to as a Policy Certificate Authority (PCA). This model allows a CA to segment operational practices more granularly. For example, maybe the CA is audited to be in compliance with a specific set of government standards so the ICAs associated with those practices would be signed by the corresponding PCA. This not only allows segmentation of policy and procedures, but it also enables separation of usage scenarios. For example, one PCA may only allow issuance of certificates for secure mail while the other PCA may allow issuance of SSL certificates. These PCAs are also very commonly operated as offline entities and have ICAs right underneath them.

While the above two models represent the most common ways a PKI might be segmented, they are not the only two. For example, the operational practices required to be a publicly trusted CA are far stricter than what a typical data center might employ. For this reason, it’s very common for CAs to manage PKIs for other organizations within their facilities.

CAs may also “roll” ICAs as a means to manage CRL size. For example, if a given CA has had to revoke many certificates during its lifespan, it may decide to manage the size of CRLs – it would be appropriate to create a new ICA and take the previous one out of service so that future CRLs can still be downloaded quickly by clients. When this happens both CA certificates may be valid for an overlapping time, but only the more recent one is actively in use.

Long story short, some counts on the number of Certificate Authorities that exist on the internet can be deceiving. One of the easiest ways to see this is to look at a CA called DFN-Verein. They are an educational PKI that manages all of the CAs in their PKI in the same facilities, using the same practices, but for security reasons they create separate ICAs for each organization in their network.

Simply put, the count of CAs in a PKI is not a good way to assess the number of entities issuing certificates in the PKI ecosystem. What you really want to count is how many facilities manage publicly trusted certificates. The problem is that it is too difficult to count – what you can do, however, is count the number of organizations associated with ownership of each “root”. Thankfully Microsoft makes this fairly easy. In March, I did a post on my blog showing a breakdown of the ownership. Unfortunately, this approach does not give you a count of operational facilities that are used for the subordinate CAs, but it’s quite likely that given the operational requirements and costs associated with maintaining them that these two numbers are relatively close.

So what would I like for you to take away from this post? I suppose there are two key points:

  • A public CA using several Certificate Authorities under their direct control is actually a good thing as it indicates they are managing the risk of operating their services and planning for migrations to new algorithms and keys as appropriate.
  • Counting the number of “roots” and “subordinate CAs” found by crawling the web does not actually represent the number of organizations that can act as publicly trusted certificate authorities.

That is not to say the efforts to crawl the web to understand how PKI is deployed and used is not valuable, it is – quite valuable. These projects are an important way to keep an eye on the practices that are actually used in the management of Public PKI.

Additionally, efforts to support Least Privilege designs in PKI and adopt means to actively monitor certificate issuance, such as Certificate Transparency, all represent positive moves to help us better understand what is actually out there.

The (soon to be) not-so Common Name

If you are reading this post you are probably familiar with the use of digital certificates in SSL even if you are you may not be familiar with their history. Before we go there though we should start with what, at its core a digital certificate actually is.

Fundamentally a digital certificate is a binding of entitlements and constraints to a key, in other words they say things like “The holder of the private key associated with this certificate can rightfully use the name Ryan Hurst when signing emails”.

When originally conceived they were to be used to help bind subjects (people and resources) to their representations in directories. This is why the Subject Name in a certificate is structured as a Distinguished Name (DN) as this is how a directory uniquely identify a subject.

This makes sense when looking up an encryption key for a user in an enterprise directory but not so well on the Internet where there is no global directory of users.

This brings us to SSL, it was introduced in the mid 1990s and at the time nearly every large enterprise was already deploying Directories and Certificate Authorities as part of their identity management frameworks. The technology of X.509 was tested, broadly accepted and fit the bill for the problem the designers of the protocol had so they included it as is.

At the time there was only one way to represent concept of a subject of a certificate and that was the Common Name (CN) so they chose to put the DNS name of the SSL server there. This was technically acceptable but was a re-purposing of a field that was really intended for a users actual name.

After SSL was finalized the IETF released their profile of X.509 for use on the Internet this standard introduced the concept of a Subject Alternative Name (SAN) where you can put names that are not associated with a directory. The problem is that ship had sailed, by the late 90s when this was standardized everyone had already settled on using the Common Name.

This led us down a bad path, first of all many servers (especially today) have multiple DNS names and application that supported only the Common Name field couldn’t work with a single certificate with more than one DNS name in it. This was addressed in the short term by using a single certificate for each DNS name but this came at a high cost, we also needed to use a single IP address for each domain name.

Another problem with this approach is applications never really knew what to expect in the Common Name field. Is the value in that field a person’s name or is it a DNS name? This is a problem because often times there are rules you need to use to validate a piece of data before using it and this is especially true for DNS names.

For these reasons (and more) since at least 1999 (when RFC 2459 was standardized) we have been on a slow path to moving away from the use of Common Names for domain names to using Subject Alternative Names.

Fast forward to 2012 some Stanford researchers publish a paper titled “The most dangerous code in the world: validating SSL certificates in non-browser software” which identifies a bunch of applications who fail to do the most basic certificate validation tasks correctly and as a result are the source of a bunch of security vulnerabilities.

These applications gave their users a false sense of security not out of malice but as a result of a lack of understanding of the technology they used to deliver on that promise. A big part of that is the complexity 18 years of technological evolution carries with it.

To address this a number of things need to change but one of the most immediate changes is what that the definition of what constitute a “valid” SSL certificate is changing to make the rule-set a little simpler for the application developer and to rule out options that are no longer considered good practice.

We see this happening in a few ways. First the CA/Browser Forum has worked with Browsers to define a set of Baseline Practices that all Certificates must meet, we are also seeing Browsers doing sanity checks to ensure these practices are in-fact followed.

These baseline requirements mandate that certificate authorities always include at least one Subject Alternative Name in the SSL certificates they issue, this means that today an application doesn’t need to look in both the Common Name and the Subject Alternative Name they only need to check the latter.

Currently most Certificate Authorities will include the first DNS Name from the Subject Alternative Name in the Common Name field also but this is done primarily for legacy reasons and at some point in the not so distant future will stop.

When it does certificates will be a little smaller and developers lives will be a little easier.

Ryan

Resources

· Baseline Requirements

· Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile

· Microsoft Security Advisory: Update for minimum certificate key length

Average CRL size and download time

The other day I had a great conversation with Robert Duncan over at Netcraft, he showed me some reports they have made public about CRL and OCSP performance and uptime.

One thing that I have been meaning to do is to look at average CRL size across the various CAs in a more formal way I just never got around to doing it; conveniently one of the Netcraft reports though included a column for CRL size. So while I was waiting for a meeting to start I decided to figure out what the average sizes were; I focused my efforts on the same CAs I include in the revocation report, this is what I came up with:

 

CA Average CRL Size(K) CRL Download Time @ 56k (s)
Entrust 512.33 74.95
Verisign 200.04 29.26
GoDaddy 173.79 25.42
Comodo 120.75 17.66
Cybertrust/Verizon 75.00 10.97
DigiCert 21.66 3.17
GlobalSign 21.25 3.11
Certum 20.00 2.93
StartSSL 9.40 1.38
TrendMicro 1.00 0.15

 

From this we can derive two charts one for size and another for download time at 56k (about 6% of internet users as of 2010):

clip_image002 clip_image004

 

I overlaid the red line at 10s because that is the timeout that most clients use to indicate when they will give up trying to download, some clients will continue trying in the background so that the next request would have the CRL already cached for the next call.

This threshold is very generous, after all what user is going to hang around for 10 seconds while a CRL is downloaded? This gets worse though the average chain is greater than 3 certificates per chain, two that need to have their status checked :/.

This is one of the reasons we have soft-fail revocation checking, until the Baseline Requirements were published inclusion of OCSP references was not mandatory and not every CA was managing their CRLs to be downloadable within that 10 second threshold.

There are a few ways CAs can manage their CRL sizes, one of the most common is simply roll new intermediate CAs when the CRL size gets unmanageable.

There is something you should understand about the data in the above charts; just because a CRL is published doesn’t mean it represents active certificates – this is one of the reasons I had put of doing this exercise because I wanted to exclude that case by cross-referencing the signing CA with crawler data to see if active certificates were associated with each CRL.

This would exclude the cases where a CA was taken out of operation and all of the associated certificates were revoked as a precautionary exercise – this can happen.

So why did I bother posting this then? It’s just a nice illustration as to why we cannot generally rely on CRLs as a form of revocation checking. In-fact this is very likely why some browsers do not bother trying to download CRLs.

All posts like this should end with a call to action (I need to do better about doing that), in this case I would say it is for CAs to review their revocation practices and how they make certificate status available to ensure it’s available in a fast and reliable manner.

A look at revocation repository uptime

It is no secret that in the last two months GlobalSign was affected by outages at relating to our use of CloudFlare. I won’t go into the specifics behind those outages because the CloudFlare team does a great job of documenting their outages as well as working to make sure the mistakes of the past do not reoccur. With that said we have been working closely with CloudFlare to ensure that our services are better isolated from their other customers and to optimize their network for the traffic our services generate.

I should add that I have a ton of faith in the CloudFlare team, these guys are knowledgeable, incredibly hard working and very self critical — I consider them great partners.

When looking at these events it is important to look at them holistically; for example one of the outages was a result of mitigating what has been called the largest publically announced DDOS in the history of the Internet.

While no downtime is acceptable and I am embarrassed we have had any downtime it’s also important to look at the positives that come from these events, for one we have had an opportunity to test our mitigations for such events and improve them so that in the future we can withstand even larger such attacks.

Additionally it’s also useful to look the actual uptime these services have had and to give those numbers some context look at them next to one of our peers. Thankfully I have this data as a result of the revocation report which tracks performance and uptime from 21 different network worldwide perspectives every minute.

For 05/2012-12/2012 we see:

Service Uptime(%) Avg(ms)
GlobalSign/AlphaSSL OCSP 100.00 101.29
VeriSign/Symantec/Thawte/GeoTrust/Trustcenter OCSP 99.92 319.40
GlobalSign/AlphaSSL CRL 100 96.86
VeriSign/Symantec/Thawte/GeoTrust/Trustcenter CRL 99.97 311.42

 

For 01/2013 to 04/2013 we see:

Service Uptime(%) Avg(ms)
GlobalSign/AlphaSSL OCSP 99.98 76.44
VeriSign/Symantec/Thawte/GeoTrust/Trustcenter OCSP 99.85 302.88
GlobalSign/AlphaSSL CRL 99.98 76.44
VeriSign/Symantec/Thawte/GeoTrust/Trustcenter CRL 99.22 296.97

NOTE:  Symantec operates several different infrastructures – which one you hit is dependent on which brand you buy from and some cases which product you buy. We operate only two brands which share the same infrastructure. I averaged the results for each of their brands together to create these two tables. If you want to see the independent numbers see the Excel document linked to this post.

 

As you can see no one is perfect; I don’t share this to say our downtime is acceptable because it is not, but instead I want to make it clear this is data we track and use to improve our services and to make it clear what the impact really was.

By the way if you want to see the data I used in the above computation you can download these spreadsheets.

Why we built the Revocation Report

For over a year I have been monitoring the industry’s largest OCSP and CRL repositories for performance and uptime. I started this project for a few reasons but to understand them I think it’s appropriate to start with why I joined GlobalSign.

If you’re reading this post you are likely aware of the last few years of attacks against public Certificate Authorities (CA). Though I am no stranger to this space, like you I was watching it all unfold from the outside as I was working at Microsoft in the Advertising division where I was responsible for Security Engineering for their platform.

I recall looking at the industry and feeling frustrated about how little has changed in the last decade, feeling like the Internet was evolving around the CA ecosystem – at least technologically. The focus seemed almost exclusively on policies, procedures and auditing which are of course extremely important when you’re in this business but by themselves they are not a solution.

When I looked at the CA ecosystem there were a few players who I thought understood this; the one I felt got it the most was GlobalSign. Instead of joining the race to the bottom they were developing solutions to help with key management, certificate lifecycle management, and publishing guides to help customers deploy certificates cost effectively.

As a result when they approached me with the opportunity to join them as their CTO and set the technology direction for the company I was intrigued. Those of you who know me know I love data, I believe above all things successful businesses (if they recognize it or not) leverage the Define, Measure, Analyze, Improve and Control cycle to ensure they are solving the right problems and doing so effectively.

To that end when I joined GlobalSign as their CTO and I wanted market intelligence on what the status quo was for technology, operating practices and standards compliance so that I could use to adjust my own priorities as I planned where GlobalSign was going to focus its investments.

It was more than that though, as many of you know I am not new to PKI and especially not to revocation technologies having developed several products / features in this area as well as contributing to the associated standards over the years. I was always frustrated by many public certificate authorities’ inability or unwillingness to acknowledge the inadequacy of their revocation infrastructure and its contribution to slow TLS adoption and bad user agent behavior when it comes to revocation checking.

More directly the reliability and performance of major CA operational infrastructure was why browsers had to implement what is now called “soft-fail” revocation behaviors; the treating of failures to check the status of a certificate the same as a successful check. Yet it is these same people who point fingers at the browsers when the security implications of this behavior are discussed.

I wanted to see that change.

This is why from the very beginning of this project I shared all the data I had with other CAs, my hope was they would use it to improve their infrastructure but unfortunately short of one or two smaller players no one seemed concerned – I was shouting at the wind.

With the limited feedback I had received for the data I had been collecting I decided to put together what is now the revocation report. As part of this project I switched to a different monitoring provider (Monitis) because it gave me more control of what was being monitored and had a more complete API I could use to get at the data.

In parallel I began to work with CloudFlare to address what I felt was one barrier to optimally using a CDN to distribute OCSP responses (inability to cache POSTs). The whole time chronicling my experiences, thoughts and decisions on my blog so that others could learn from my experience and the industry as a whole could benefit.

When I set up the Monitis account I licensed the ability to monitor the top responders from 21 locations worldwide every minute. At first I just published the graphical reports that Monitis had but they had a few problems:

  1. They did not perform very well (at the time).
  2. It was not laid out in such a way you could see all the data at once (also now fixed).
  3. It did not exclude issues associated with their monitoring sensors.
  4. It gave no context to the data that was being presented.

This is what took me to working with Eli to build the revocation report we have today, the internet now has a public view into approximately eleven months (and growing) of performance data for revocation repositories. Eli and I are also working on mining and quantizing the data so we can do something similar for responder uptime but this has taken longer than expected due to other priorities — we will finish it though.

So the question at this point is “was the effort worth it?” — I think so, both of us put a lot of time into this project but I believe it’s been a success for a few reasons:

  1. It allowed me to figure out how to improve our own revocation infrastructure; we now perform at about the same speed as gstatic.google.com for a similarly sized object which is what the bar should be.
  2. Both StartSSL and Entrust have now followed suit and made similar changes to their infrastructure improving their performance by about 3x (besting our performance by a few ms!).
  3. Symantec has improved their primary revocation repository performance by almost 40% and I understand more improvements are on the way.
  4. We are a closer to having data based argument we can present to browsers about why they can and should re-enable hardfail revocation checking by default.
  5. It gives customers visibility into the invisible performance hit associated with the decision of who you choose as your certificate provider.

What do you think? Do you find this valuable? Are there any other elements you think we should be tracking?

Certificate-based Mozilla Persona IdP

My name is David Margrave, I am a guest author on unmitigatedrisk.com.  I have worked in the security sphere for 20 years at various U.S. federal agencies, financial institutions, and retailers.  My interests include improving the state of client authentication on the Internet, which is an area that saw robust developments in the 1990s, then languished for a number of years as the Internet at large seemed content with reusable passwords and form-based authentication over SSL/TLS, but has received renewed scrutiny because of recent large scale data breaches and the veiled promise from the Federal government to ‘fix this mess or we will fix it for you’.

 

The Mozilla Persona project is a recent initiative to improve and standardize browser-based authentication.  For a long time (over 10 years) the most widely-used form of browser-based authentication has been based on HTML forms.  At its most basic level, a user will enter an identifier and reusable password into an HTML form, and submit the form in an HTTPS request to access a protected resource.  The server will receive these values, validate them, and typically return state information in an encrypted and encoded HTTP cookie.  Subsequent visits to the protected resource will send the cookie in the HTTP request, and the server will decrypt and validate the cookie before returning the protected content.   This entire exchange usually takes place over HTTPS, but in many instances the authentication cookie is used over an HTTP connection after initial authentication has completed successfully.  There are other forms of HTTP authentication and other previous attempts at standardization, but a quick survey of the largest retailers and financial institutions will show that HTML form-based authentication is still the most common by far.

 

Assuming that the implementers of these cookie schemes are competent amateur cryptographers and avoided the most glaring mistakes (see this paper by MIT researchers), all of these authentication schemes which rely on HTTP cookies suffer from the same critical flaw:  An attacker who obtains the cookie value can impersonate the user.  The crucial problem is that HTML form-based authentication schemes have not been capable of managing cryptographic keying material on the client side.  More secure schemes such as Kerberos V5 use a ticket in conjunction with an accompanying session key, both of which are stored in a credentials cache.  In contrast to flawed cookie-based schemes, in the Kerberos V5 protocol, a service ticket is useless to an adversary without the accompanying service ticket session key.  An authentication exchange in Kerberos V5 includes the service ticket, and a value encrypted with the service ticket session key, to prove possession.There are some proprietary or enterprise-level solutions to this situation.  For example, Microsoft Internet Explorer and IIS have long had (for over 10 years) the capability to use HTTP Negotiate authentication and to use GSS-API with Kerberos V5 as the underlying mechanism.  The Apache web server has had the capability to accept HTTP Negotiate authentication for several years as well, but the adoption of these solutions on the Internet at large has not been widespread.  At a high level, the Mozilla Persona project improves this situation by bringing the credentials cache and cryptographic capabilities into the browser, and doing so in a standardized manner.  Although the underlying cryptographic algorithms may differ from the Kerberos V5 example, the importance of this project can’t be understated.

 

Persona introduces the concept of the Identity Data Provider (IdP).  The basic idea is that a domain owner is responsible for vouching for the identity of email addresses in that domain.  This could involve whatever scheme the domain owner wishes to implement.  If a domain does not implement an IdP, the Persona system will use its own default IdP which uses the email verification scheme that all Internet users are familiar with:  you prove your ability to receive email at a particular address.  When signing-in to a website which uses Persona authentication, the user will be presented with a dialog window asking for the email address to use.

Screenshot from 2013-04-10 13:14:26

Behind the scenes, the Persona system determines which IdP to use to verify the address.  A domain implementing an IdP must publish some metadata (the public key, and provisioning and verification URLs), in JSON format, at the URL https://domain/.well-known/browserid.  The server at the URL must have a certificate from a trusted certificate authority, and the returned value must be properly-formatted JSON with certain required metadata information (described here).

 

The author implemented an IdP at the domain margrave.com as a research exercise, borrowing from the NodeJS browserid-certifier project.  This particular IdP was written to accept X.509 client certificates issued by a commercial certificate authority, to extract the email address from the X.509 certificate, and issue a persona certificate with that email address. The .well-known/browserid file for node.margrave.com is shown here:

{
    "public-key": {"algorithm":"DS",
        "y":"aab45377fa024964a6b3339d107b91887adf85b96649b5b447a7ac7390866c92d88ed101f6525e717c0d703d5fd8727e0d1d8adb60bb80c7123730616c197326f1eed326fdfc136d7594ffce39a05005a433add8d3344813ea89f6e426d8f5b0bc0d3fdb59c8ec7c19583ba7f14d3636713b84c1ebe62a6866e9c2091def5c25aba967670eabc4591ee3f536006ce5c550265d4b2264c5a989abf908763b41014f35eb2949a0b027a1a1054203a3e13eeb1f16ffb171d6942405546a8407c3fb7e73227e432d150834054edc379de8f8988a8e3b102b70fe5b1164a28a4a453310313e00de1aa177f5ac2b73ef31670e16914607ba4196c06e57f7e5209bc7e4",
        "p":"d6c4e5045697756c7a312d02c2289c25d40f9954261f7b5876214b6df109c738b76226b199bb7e33f8fc7ac1dcc316e1e7c78973951bfc6ff2e00cc987cd76fcfb0b8c0096b0b460fffac960ca4136c28f4bfb580de47cf7e7934c3985e3b3d943b77f06ef2af3ac3494fc3c6fc49810a63853862a02bb1c824a01b7fc688e4028527a58ad58c9d512922660db5d505bc263af293bc93bcd6d885a157579d7f52952236dd9d06a4fc3bc2247d21f1a70f5848eb0176513537c983f5a36737f01f82b44546e8e7f0fabc457e3de1d9c5dba96965b10a2a0580b0ad0f88179e10066107fb74314a07e6745863bc797b7002ebec0b000a98eb697414709ac17b401",
        "q":"b1e370f6472c8754ccd75e99666ec8ef1fd748b748bbbc08503d82ce8055ab3b",
        "g":"9a8269ab2e3b733a5242179d8f8ddb17ff93297d9eab00376db211a22b19c854dfa80166df2132cbc51fb224b0904abb22da2c7b7850f782124cb575b116f41ea7c4fc75b1d77525204cd7c23a15999004c23cdeb72359ee74e886a1dde7855ae05fe847447d0a68059002c3819a75dc7dcbb30e39efac36e07e2c404b7ca98b263b25fa314ba93c0625718bd489cea6d04ba4b0b7f156eeb4c56c44b50e4fb5bce9d7ae0d55b379225feb0214a04bed72f33e0664d290e7c840df3e2abb5e48189fa4e90646f1867db289c6560476799f7be8420a6dc01d078de437f280fff2d7ddf1248d56e1a54b933a41629d6c252983c58795105802d30d7bcd819cf6ef"
    },
    "authentication": "/persona/sign_in.html",
    "provisioning": "/persona/provision.html"
}

 

The public key from the browserid file is the public portion of the key pair used by the IdP to certify users in the domain.  The fact that it must be served over a URL protected with a certificate issued from a trusted CA, is how the Persona system builds on the existing trust infrastructure of the Internet, instead of attempting to re-implement their own from scratch, or requiring operators of websites relying on Persona authentication to establish shared secrets out-of-band.  The authentication and provisioning URLs are how browsers interact with the IdP.

 

In the Certificate-based IdP implemented at margrave.com, the page located at /persona/provision.html includes some javascript which does the following things:  calls an AJAX API to get the email address from the certificate, receives the email address that the user entered in the Persona login dialog via a javascript callback, validates that they match, and calls another AJAX API to issue the certificate.  Note that the email address comparison performed in client-side javascript is purely for UI and troubleshooting purposes, the actual issuance of the Persona certificate uses the email address from the X.509 certificate (if the provisioning process progresses to that point), irrespective of what username was entered in the Persona login dialog.  The client-side validation of the email address is to provide the ability to troubleshoot scenarios where a user may choose the wrong certificate from the browser certificate dialog box, etc.  The client-side provisioning source code is shown below (ancillary AJAX error handling code is omitted).

 

function provision() {

  // Get the email from the cert by visiting a URL that requires client cert auth and returns our cert's email in a json response.
  // This is not strictly necessary, since the server will only issue persona certificates for the email address from the X.509 certificate,
  // but it is useful for troubleshooting, helping the user avoid choosing the wrong certificate from the browser dialog, etc.
  getEmailFromCert(function(emailFromCert) {
      if (emailFromCert) {
          navigator.id.beginProvisioning(function(emailFromPersona, certDuration) {
              if (emailFromPersona===emailFromCert) {
                  navigator.id.genKeyPair(function(publicKey) {
                      // generateServerSide makes an AJAX call to a URL that also requires client cert auth
                      generateServerSide(publicKey, certDuration, function (certificate) {
                          if (navigator.id && navigator.id.registerCertificate) {
                              //alert('registering certificate: ' + certificate);
                              navigator.id.registerCertificate(certificate);
                          }
                      });
                  });
              } else {
                  navigator.id.raiseProvisioningFailure('user is not authenticated as target user');
              }
          });
      } else {
          navigator.id.raiseProvisioningFailure('user does not have a valid X.509 certificate');
      }
  });
}

function generateServerSide(pubkey, duration, cb) {
    $.ajax({
        // Note that this URL requires SSL client certificate authentication,
        // and performs its own authorization on the email address from the certificate,
        // (for example, based on issuing CA or email address domain),
        // and so does not need the email address as an explicit input parameter
        url: "https://node.margrave.com/cert_key",
        type: "POST",
        global: false,
        data: {pubkey: pubkey,
               duration: duration},
               dataType: "json",
        success: function(response) {
                if (response.success && response.certificate) {
                    cb(response.certificate);
                }
            }
    });
    return false;
}

function getEmailFromCert(cb) {
        $.ajax({
            // Note that this URL requires SSL client certificate authentication,
            // and performs its own authorization on the email address from the certificate.
            url: "https://node.margrave.com/email",
            type: "POST",
            global: false,
            dataType: "json",
            success: function(response) {
                cb(response.email);
            }
        });}

 

The other portion of a Persona IdP, the authentication URL, turned out not to be necessary in this case, because the authentication is implicit in the use of X.509 client certificate-authenticated AJAX calls.  Once the Persona certificate has been provisioned, the user is able to access the protected resource.  If things don’t work as expected, the error messages do not seem to bubble up to the UI dialog, and I had to resort to tracing XHR calls with Firebug to determine what went wrong.  In one case, it was a clock skew error that was corrected by installing ntpd on my IdP server.   In another case, one of my IdP AJAX calls may return an error but this error gets masked by a vague UI message.  It may be helpful to standardize HTTP return code and JSON field names to return descriptive error text to the Persona UI.

 

Screenshot from 2013-04-10 13:15:32

 

 

In its current form, this concept could be useful for enterprises, but not really for the Internet at large, since it requires a) that you have a client cert and b) that the IDP for your email domain is certificate-aware.  However, If the persona-default IDP were certificate-aware, or CAs were persona-aware, then there are some interesting possibilities.

  1. The persona default IDP could skip the verification email if a trusted X.509 client certificate is provided.   Possession of a certificate from a trusted CA implies the email address has already been verified, at a minimum.  The Persona system already accepts CA’s trust when retrieving .well-known/browserid, this idea extends CA trust to clients.
  2. Going the other direction: If a CA were to accept a persona certificate from either a domain’s IDP or from the persona-default IDP, and using that to issue X.509 client certificates, or as one part of the client certificate enrollment process (higher assurance certificates may verify more information than email, such as state-issued identification).  This idea seems promising because the email verification scheme is the wheel that everyone on the Internet has reimplemented, in many cases with security flaws.

 

Abstract: Using least privileged design principals to improve trust in the online marketplace

Weekends, they are overrated 🙂

Tomorrow is the cut-off for submissions for the NIST workshop on “Workshop on Improving Trust in the Online Marketplace” to be held April 2013 and I have spent part of the day thinking about what talks might be interesting. I have already submitted one on “Revocation reality and the path to becoming effective” but I also wrote this one up and I might submit it also, posting here so I can get a little feedback before the submission deadline.

In 2010 security researchers with the EFF collected the certificates of all of the publicly-visible SSL certificates on the IPv4 internet and published their analysis and data-sets from their research. This work made it clear to the world how extensively PKI is used to facilitate commerce on the web but it also raised he concern that there were as many as 650 organizations capable of issuing publicly trusted certificates on the internet.

While this conclusion is exaggerated as many of those certificates and keys are in-fact operated by the same organizations that their certificates are ultimately issued by, the conclusion that there has been an un-needed expansion of the number of keys that are technically trusted to issue certificates for SSL for the entire Internet is sound.

To address this problem one of the steps that is needed is the application of least privilege principals to how one designs and manages publicly trusted keys and certificates. Thankfully in the late 90’s the foundation for addressing much of this problem was developed as a means to enable the Federated PKI in use by the U.S. Federal Government.

For the last year we have been working to broadly deploy X.509 Name Constraints’ along with other least privilege design principals to our customers PKIs both internally managed by our own staff as well as those on premise. This talk will explore these concepts, the client support for them, the challenges we have experienced in their deployment and identify the remaining issues that must be addressed to obtain the full benefits of this approach.