MSRC 2718704 and Nested EKU enforcement

There are a number of technical constraints a Certificate Authority can put into place on a subordinate Certificate Authority; the general concept is referred to as Qualified Subordination.

One of the most important ways to constrain a certificate is through by restricting what it can be good for.

The foundation for such a constraint is provided by PKIX in the Extended Key Usage extension (RFC 5280), this extension can be put into a certificate to restrict what it is trusted for – for example a certificate might be OK for SSL Server Authentication but not for S/MIME.

The problem is the RFC provides no practical guidance on how to act when this certificate is encountered in a CA certificate, all it says is:

In general, this extension will appear only in end entity certificates.

One can interpret this to mean that its semantics are the same in issuer or subscriber certificates, this makes sense but isn’t very useful as a CA is not very likely to ever perform “application tasks” like S/MIME or SSL Server authentication with its signing key, so why would you put it in a CA certificate?

Also if you look back at the history this extension was really one of the first that was introduced, it came into existence in a time where PKIs were only one level deep – the absence of guidance on how to handle this could easily be seen as an omission.

Microsoft saw it this way and decided to have their implementation treat this extension as a constraint, in other words if no EKU is present in the chain then the chain is considered good for all usages. But once a single EKU is added into the path nothing bellow it can be considered good for a non-listed EKU.

In Windows applications validate certificates using the CertGetCertificateChain API takes a number of control parameters via the PCERT_CHAIN_PARA structure, one can specify what EKUs they want to make sure a certificate is good for via the RequestedUsage parameter.

This logic (frankly almost all of the certificate validation) is all wrapped into this one call.

So what does this have to do with MSRC 2718704? Well it has reduced the risk of this mess up in a meaningful way I thought I would explain but before I do let me explain that I am not trying to downplay the significance of this issue I am just trying to clarify where the risks are.

As we know now the “licensing solution” deployed for terminal services has put a signing CA that is trusted for Code Signing in ever enterprise that uses the product.   But how is it restricted to just Code Signing, that’s really what this post is about.

Let’s look at the EKUs included in the offending “MS” certificate, in that chain we see:

  • Microsoft Root Authority
    • No EKUs
  • Microsoft Enforced Licensing Intermediate PCA
    • EKUs = Code Signing, Key Pack Licenses, License Server Verification
    • Effective EKUs = Code Signing, Key Pack Licenses, License Server Verification
  • Microsoft Enforced Licensing Certificate Authority CA
    • EKUs = Code Signing, License Server Verification
    • Effective EKUs = Code Signing, License Server Verification
  • Microsoft LSRA PA
    • EKUs = None
    • Effective EKUs = Code Signing, License Server Verification
  • MS
    • EKUs = None
    • Effective EKUs = Code Signing, License Server Verification

You will notice that the “Microsoft LSRA PA” certificate lists no EKUs but the Effective EKUs are listed as “Code Signing” and “License Server Verification”, this is because of the Nested EKU behavior I describe above.

The same thing happens in the end “MS” certificate; even though it has no EKUs listed I can only be used to validate licenses and sign-code because that’s all it’s issuers are entitled to bestow onto its subordinates.

OK so what does all of this mean to you and me? It basically means as long as the application is written using CryptoAPI in the intended way (and all do that I am aware of in this context) those CAs out there cannot be used to issue SSL certificates (or any other usages not listed) that would be “valid”, they can of course sign code as Microsoft which is a larger issue in my book.

Anyway over the years I have proposed in IETF that this same behavior be adopted, it was always rejected as an evil Microsoft conspiracy (I was at Microsoft at the time) it of course was nothing of the sort but in the end I gave up. Recently I have started trying to convince the browsers directly to implement this same behavior as I feel it is beneficial, for example here is a NSS bug tracking the same request, if implemented that would take care of Chrome and Firefox, that still leaves Safari and Opera but it’s a step in the right direction.

 

Additional Resources

http://lists.randombit.net/pipermail/cryptography/2012-June/002966.html

MSRC 2718704 and the Terminal Services Licensing Protocol

There has been a ton of chatter on the internet the last few days about this MSRC incident; it is an example of so many things gone wrong it’s just not funny anymore.

In this post I wanted to document the aspects of this issue as it relates to the Terminal Services Licensing protocol.

Let’s start with the background story; Microsoft introduced a licensing mechanism for Terminal Services that they used to enforce their Client Access Licenses (CAL) for the product. The protocol used in this mechanism is defined in [MS-RDPELE].

A quick review of this document shows the model goes something like this:

  1. Microsoft operates a X.509 Certificate Authority (CA).
  2. Enterprise customers who use Terminal Services also operate a CA; the terminal services team however calls it a “License server”.
  3. Every Enterprise “License server” is made a subordinate of the Microsoft Issuing CA.
  4. The license server issues “Client Access Licenses (CALs)” or aka X.509 certificates.

By itself this isn’t actually a bad design, short(ish) lived certificates are used so the enterprise has to stay current on its licenses, it uses a bunch of existing technology so that it can be quickly developed and its performance characteristics are known.

The devil however is in the details; let’s list a few of those details:

  1. The Issuing CA is made a subordinate of one of the “Microsoft Root CAs”.
  2. The Policy  CA  (“Microsoft Enforced Licensing Intermediate PCA”) on the Microsoft side was issued “Dec 10 01:55:35 2009 GMT”
  3. All of the certificates in the chain were signed using RSAwithMD5.
  4. The CAs were likely a standard Microsoft Certificate Authority (The certificate was issued with a Template Name of “SubCA”)
  5. The CAs were not using random serial numbers (its serial number of one was 7038)
  6. The certificates the license server issued were good for :
    • License Server Verification (1.3.6.1.4.1.311.10.6.2)
    • Key Pack Licenses (1.3.6.1.4.1.311.10.6.1)
    • Code Signing (1.3.6.1.5.5.7.3.3)
  7. There were no name constraints in the “License server” certificate restricting the names that it could make assertions about.
  8. The URLs referred to in the certificates were invalid (URLs to CRLs and issuer certificates) and may have been for a long time (looks like they may not have been published since 2009).

OK, let’s explore #1 for a moment; they needed a Root CA they controlled to ensure they could prevent people from minting their own licenses. They probably decided to re-use the product Root Certificate Authority so that they could re-use the CertVerifiyCertificateChainPolicy and its CERT_CHAIN_POLICY_MICROSOFT_ROOT policy. No one likes maintaining an array of hashes that you have to keep up to date on all the machines doing the verification. This decision also meant that they could have those who operated the existing CAs manage this new one for them.

This is was also the largest mistake they made, the decision to operationally have this CA chained under the production product roots made this system an attack vector for the bad guys. It made it more valuable than any third-party CA.

They made this problem worse by having every license server be a subordinate of this CA; these license servers were storing their keys in software. The only thing protecting them is DPAPI and that’s not much protection.

They made the problem even worse by making each of those CAs capable of issuing code signing certificates.

That’s right, every single enterprise user of Microsoft Terminal Services on the planet had a CA key and certificate that could issue as many code signing certificates they wanted and for any name they wanted – yes even the Microsoft name!

It doesn’t end there though, the certificates were signed using RSAwithMD5 the problem is that MD5 has been known to be susceptible to collision attacks for a long time and they were found to be practical by December 2008 – before the CA was made.

Also these certificates were likely issued using the Microsoft Server 2003 Certificate Authority (remember this was 2009) as I believe that the Server 2008 CA used random serial numbers by default (I am not positive though – it’s been a while).

It also seems based on the file names in the certificates AIAs that the CRLs had not been updated since the CA went live and maybe were never even published, this signals that this was not being managed at all since then.

Then there is the fact that even though Name Constraints was both supported by the Windows platform at the time this solution was created they chose not to include them in these certificates – doing so would have further restricted the value to the attacker.

And finally there is the fact that MD5 has been prohibited from use in Root Programs for some time, the Microsoft Root Program required third-party CAs to not to have valid certificates that used MD5 after January 15, 2009. The CA/Browser Forum has been a bit more lenient in the Baseline Requirements by allowing their use until December 2010. Now it’s fair to say the Microsoft Roots are not “members” of the root program but one would think they would have met the same requirements.

So how did all of this happen, I don’t know honestly.

The Security Development Lifecycle (SDL) used at Microsoft was in full force in 2008 when this appears to have been put together; it should have been caught via that process, it should have been caught by a number of other checks and balances also.

It wasn’t though and here we are.

P.S. Thanks to Brad Hill, Marsh Ray and Nico Williams for chatting about this today 🙂

Additional Resources

Microsoft Update and The Nightmare Scenario

Microsoft Certificate Was Used to Sign “Flame” Malware

How old is Flame

Security Advisory 2718704: Update to Phased Mitigation Strategy

Getting beyond 1% — How do we increase the use of SSL?

Today about 1% of the traffic on the Internet is protected with SSL (according to Sandyvine), there are a few key issues keeping this number so small and I thought I would put together a quick post on what I think those issues are.

 

Interoperability

For over a decade we have been working towards migrating to IPv6, despite that we have made little progress, in-fact they say that at the end of 2012 we will run out of IPv4 addresses.

As far as I know not one of the top 10 CAs support IPv6 yet (yes, not even GlobalSign though were working on it). This means it is impossible to host a pure IPv6 SSL solution today (because of the need for revocation data).

This is also interesting because today many sites are hosted on virutal hosting solutions that share the same IP address — this is primarily because IP addresses are a scarce resource, it has the side effect of making it hard (sometimes impossible) to deploy SSL on these hosts.

In 2003 an extension to TLS was proposed to address this problem it’s called Server Name Indication (SNI – now defined RFC 6066).

Today the server support for this extension is quite good but the same can not be said for client support (due to the lingering XP population and influx of mobile devices).

In my opinion this is the #1 issuing holding back the adoption of SSL everywhere.

 

Complexity

It is amazing to me but very little has changed in the CA industry since it’s birth in the mid 90s, certificates are still requested and managed in essentially the same way – it’s a shame, it’s wasteful.

One of the reasons I joined GlobalSign is they have been trying to address this issue by investing in both clients and APIs (check out OneClick SSL and CloudSSL) — with that said that there is still a lot more that can be done in this area.

Then there is the problem of managing and deploying SSL, the SSL Pulse data shows us it’s hard to get SSL configured right; we are getting better tools for this but again there is still a ton of room for improvement.

 

Performance

There has been a bunch of work done in this area over the years; the “solutions” relating to  performance of SSL seem to be broken up into:

  1. Protocol improvements (SPDYFalseStartOCSP Stapling, etc.)
  2. Using different cryptography to make it faster (Smaller keys, DSA, ECDSA, etc.)
  3. Using accelerator products (F5 BigIP, NetScaller, SSL Accelerators, etc.)

I won’t spend much time on protocol improvements as I think it gets a ton of coverage from the likes of Google who have made several proposals in this area over the last few years. I do have concerns with these protocol changes introducing interoperability issues, but I can’t argue with the performance benefits they offer.

You will notice I also included OCSP Stapling in this group, I think this is a great way to improve revocation checking but it’s not about security, it’s about performance and reliability – you should just use this today, it’s safe and very likely supported by your servers already.

The use of different cryptography is an interesting one, however again the issue of compatibility rears its ugly head. Though every implementation of an algorithm will perform differently the Crypto++ benchmarks are a nice way to get high level understanding of an algorithms performance characteristics.

There is a lot of data in there, not all of it related to SSL but one thing definitely is the performance characteristics of RSA vs DSA:

Operation Milliseconds/Operation Megacycles/Operation
RSA 1024 Signature 1.48 2.71
RSA 1024 Verification 0.07 0.13
DSA 1024 Signature 0.45 0.83
DSA 1024 Verification 0.52 0.94

 

You will notice that with RSA it is more expensive to sign than it is to verify, you will also notice that with DSA the opposite is true (it is also faster in this sampling).

Since in the case of SSL it is the server doing the signature and the client doing the verify this is an important fact, it means a server using a DSA certificate will spend less time doing crypto and more time doing other stuff like serving content.

On the surface this sounds great, there are of course problems with this though – for one because of the work researchers have done to “break” RSA over the last few years the browsers are moving CAs to not issue 1024bit RSA keys (by 2013) an effort which CAs have also applied to DSA.

Another not-so trivial factor is that Microsoft only supports DSA keys up to 1024 bits in length which means the larger DSA keys are not viable on these platforms.

So what of the new ciphers like AES and ECDH-ECDSA? This will represent a very large performance boon for web server operators but they too like SNI are not supported by legacy browsers.

What this means for you is for the next few years we have to make do with the “legacy cipher suites” as a means to facilitate TLS sessions.

Miscellaneous

Not everything fits neatly into the above taxonomy, here are a few common topics that don’t:

  1. Increased cost of operation
  2. Inability to do “legitimate” packet inspection

Increased cost of operation can be summerized needing more servers for the same load due to the increased SSL computational costs.

Inability to do “legitimate” packet inspection can be summarized as limiting he practical value of existing security investments of technologies like Intrusion Detection and Network Optimization since once the traffic is encrypted they become totally innefective. To work around this issue networks need to be designed with encryption and these technologies in mind.

 

Summary

I personally think the biggest barriers is ineroperability, the biggest part of this being the lingering XPs installations; the silver lining being the last few years XP has lost market share at about 10% per year, at the current rate we are about three years from these issues being “resolved”.

In the mean time there is a lot the industry can do on the topic of complexity, I will write more on this topic another time.

Always On SSL (AOSSL) Whitepaper Published

Did you know about 1% of the traffic on the internet is protected with SSL (see the Sandyvine Report)?

Or than many of the sites responsible for this traffic do not require SSL, they instead just make it available as an option?

This unfortunately this means users are exposed to risks that are otherwise would not be present.

There is a trend to move to protecting all site content with SSL, this effort has been dubbed Always On SSL; the OTA has just recently published a whitepaper on this topic that I had a chance to contribute to.

My top PKI/TLS related issues in Firefox

I have been asked a few times recently what my largest issues are with Firefox and it’s PKI/TLS implementations, here is the short-list:

725351 – Support enforcing nested EKU constraints, do so by default.

579606 – Multiple OCSP requests should be performed in parallel

565047 – Implement TLS 1.1 (RFC 4346)

436414 – OCSP client should be able to use HTTP GET as well as POST

360420 – Implement OCSP Stapling in libSSL

399324 – Fetch missing intermediate certs (use AIA extension for incomplete cert chains)

378098 – Do not expire OCSP responses that say “revoked”

48597 – OCSP needs offline cache (persistent on-disk)

 

Kathleen at Mozilla has recently set up a page to track revocation related issues here.

Attending workshop on “Improving the Availability of Revocation Information”

At the most recent CA/Browser Forum folks from DigiCert and I both made presentations on what’s needed to improve the current state of revocation in X.509.

There were really two different themes in these presentations:

  1. We can better use the technologies we have today.
  2. We can make “tweaks” to the technologies we have today to improve the situation.

It was not really possible to go into any details about these proposals given the time slots allocated were more presentation oriented and since the DigiCert guys had already engaged with Maximiliano Palla of NYU Polytechnic University (the founder of the OpenCA project) they agreed to work with him to arrange this workshop.

The session is April 16th (I leave tomorrow) and I am looking forward to the chance to talk about this topic, my goals for the session are we get to agreements on:

  1. Authoring a whitepaper on OCSP responder best practices.
  2. Authoring a whitepaper on revocation client best practices.
  3. Agreeing on an approach to “opt-in” hard revocation checking.
  4. Agreeing on a path forward to resolve the many outstanding Firefox revocation issues.
  5. Funding Nginx to add support for OCSP stapling this year.

There are lots of other potentially interesting topics I am sure will come up:

  1. Getting Apache’s OCSP stapling enabled by default.
  2. Short-lived certificates, their potential and challenges.
  3. Defining a new transport for OCSP via DNS.
  4. Defining a new query-less OCSP like protocol.
  5. CRLsets and their place in the browser ecosystem.

Should be an interesting day for sure.

Browser Revocation Behavior Needs Improvement

Today the best behaving client for revocation behavior is that of Windows, in the case of browsers that means IE and Chrome.

With that said it has a very fundamental problem, if it reaches a CA’s OCSP responder and it provides an authoritative “that’s not mine” (aka Unknown) clients built on this platform treat the certificate as good.

You got that right; it treats a certificate that is clearly invalid as good! This unfortunately is a common behavior that all the browsers implement today.

The other browsers are even worse, Firefox for example:

  1. Do not  maintain a cache across sessions – This is akin of your browser downloading the same image every time you opened a new browser session instead of relying on a cached copy.
  2. Does OCSP requests over POST vs. GET – This prevents OCSP responders from practically utilizing CDN technology or cost-effectively doing geographic distribution of responders
  3. Do not support OCSP stapling – IE has supported this since 2008, Firefox even paid OpenSSL to add support around the same time but they have yet to get support in themselves.

These each seem like fairly small items but when you look at all these issues as a whole they significantly contribute the reality we face today – Revocation Checking isn’t working.

There are other problems as well, for example:

In some cases browsers do support GET as a means to do a OCSP request but if they receive a “stale” or “expired” response from an intermediary cache (such as a corporate proxy server) they do not retry the request bypassing the proxy.

All browsers today do synchronous revocation checking, imagine if your browser only downloaded one image at a time in series; that’s in-essence what the browsers are doing today.

These and other client behaviors contribute to reliability and performance problems that are preventing Hard Revocation Checking from being deployed. These issues need to be addressed and the browser vendors need to start publishing metrics on what the failure rates are as well as under what conditions they fail so that any remaining issues on the responder side can be resolved.

 

OCSP Responder Performance Needs Improvement

Recently I set up a PingDom monitor to track the overall performance of the various OCSP responders out there, PingDom is limited to doing GETs and cannot parse the responses from the responders but it’s a fair mechanism to look at response time.

These tests run from a number of different global locations and are averaged together, the locations change but the same locations are used for each set of tests so again this seems fair.

I decided to use the Google logo as my control test, as it is about the same size as a larger OCSP response, after about a month of monitoring this is what I saw:

Test Avg. Response time
Google Logo (3972 bytes)

44 ms

GoDaddy OCSP

186 ms

GlobalSign OCSP

228 ms

Digicert OCSP

266 ms

Comodo OCSP

268 ms

TrustCenter OCSP

273 ms

TrustWave OCSP

315 ms

Startcom OCSP

364 ms

Entrust OCSP

371 ms

Geotrust OCSP

432 ms

VeriSign OCSP

510 ms

CyberTrust OCSP

604 ms

Certum OCSP

776 ms

As you can see the fastest responder is over four times slower than the Google logo, far from acceptable.

When looking at the individual responses and their responses this is what I saw:

  • Very few responders are using CDNs, AnyCast or other techniques to globally distribute responses.
  • Only a handful of responders have multiple DNS entries for failover scenarios.
  • Quite a few responders are not following the HTTP caching header requirements in RFC 5019.
  • Most responders are not sending CA signed responses which reduce the response size significantly (down to 471 bytes), in my opinion a OCSP responder should do this for all pre-produced responses.
  • Some responders are returning Unknown for out of scope responses, this really isn’t safe for unauthenticated requests as it exposes the responder to resource consumption denial of service for against the signing keys.
  • Response freshness ranges from 6 hours to 14 days, I am quite sure the six hour responses are failing for a very large % of the internet community due to time skew; 4 days appear to be optimum.

These are all fairly easy things to address and I believe it’s reasonable for responders to get down to response times that are consistent with the control test above.