It is my turn, or short-lived certificates part 2

My response to a recent post suggesting short-lived certificates were intended to remove the need to do revocation seems to have spurred a response from its author.

The thesis of the latest response is that:

Revocation can easily handle “key compromise” situation and do so by offering more security than short-lived certs.

The largest problem with this thesis is that it is based on an incorrect understanding. That being the author believes short-lived certificates do not get checked for revocation, or they believe it is a choice between short-lived certificates and revocation.

Both statements are not true. User Agents do not say “oh, this certification is short-lived so we won’t do revocation checking”. If User Agents do revocation checking of end-entity certificates, and not all do, the short-lived certificate will get checked just like the long-lived certificate.

Now there is a reasonable argument to be made that if CAs are allowed to produce revocation lists and OCSP responses every 7 days, and User agents will trust them for that time, that it wouldn’t be an unreasonable performance optimization. At least if the behavior was limited to certificates with validity periods shorter than 7 days. With that said, User Agents don’t do this today and I didn’t suggest they should in my post.

[Update 10:43PM May 4th 2017 It seems a changed happened while I was asleep at the wheel. Firefox has implemented the below optimization as of Firefox 41. A corrected statement follows in the next  two paragraphs]

Both statements are not true, well except for one performance optimization implemented by Firefox I will explain shortly. Basically, if User Agents do revocation checking of end-entity certificates, and not all do, the short-lived certificate will get checked just like the long-lived certificate as long as it is younger than the corresponding CRL or OCSP response would be (Firefox only).

This performance optimization Firefox implemented is based on the fact that  CAs are allowed to rely on 7-day old OCSP responses and CRLS. As a result, 7 days becomes the precision of revocation checking, It is not clear what value Firefox chose but it is a subset of that figure, not 90 days. But either way, no major CA that I am aware of issues such short-lived certs today due to time skew issues.

The second problem with this thesis is it presumes the user who does know of a compromise wants to announce it to the world and can. As I mentioned in an earlier post on revocation reasons subscribers are not keen to announce to the world that there was “a compromise”. I mention “can” because in some cases the one who knows about the comromise do not even have sufficient permissions to the associated CA consoles to request the revocation.

And finally, the last issue in this thesis is that it presumes an effectiveness of revocation checking. Today over 9% of OCSP responses fail due to issues with the CA’s revocation infrastructure (the connections time out).

That’s right 9 out of 100 revocation checks fail because CAs fail to operate capable enough infrastructure to meet the needs of the clients that rely on them. It is actually worse than that though, the largest websites use a technique called Domain Sharding to make their sites load faster, this means that the failure rate you would experience as a user if hard-fail was implemented could be 2-4x higher than that.

This is before we consider the fact that due to the poor performance and failures in CA revocation infrastructure revocation checking has been largely turned off in Chrome and other browsers.

I say this because for “revocation checking” to work for a key compromise case you need two things:

  • The CA to know the compromise occurred,
  • Revocation checking to actually work.

Now I want to be clear, I think revocation checking is a good thing and I would like to see the situation improved, as a proof of that statement here is an example of some of my work in this area:

  • While at ValiCert (the lead creators of OCSP) I worked on the standardization of RFC 2560 (OCSP) including running the interoperability testing that led to its ultimate standardization,
  • I also am the author of RFC 5019 which is the profile of OCSP in use by CAs and clients,
  • I have led the development of numerous PKI SDKs and servers which have implemented these standards (including leading the team that added support for OCSP to Windows),
  • I led the implementation of the first, and most reliable implementation of OCSP stapling (in SCHANNEL),
  • I led the CASC project to get OCSP stapling added Nginx,
  • I have helped create, and/or recreate numerous CAs and along with their revocation infrastructure,
  • And I led the first wide scale measurement and efforts of the CA revocation infrastructure via my X509LABS Revocation Report project which led to all CAs adopting the same design philosophies to begin to address abhorrent response times and uptime.

Basically, I see the value in revocation checking and think the investments need to be made to make it work and be relevant for the WebPKI of today. That said, this topic has zero to do with short-lived certificates.

Ryan

5 thoughts on “It is my turn, or short-lived certificates part 2

  1. Alice Wonder

    This is why it is important for web servers to use OCSP stapling. OCSP stapling allows the client to know the certificate has not been revoked without needing to query the OCSP server itself, reducing the load the OCSP server has to handle.

    What I do not know is how well (if at all) OCSP stapling works with certificates used for S/MIME. Obviously, for self-signed S/MIME certs there is no OCSP available, but when it is available, do clients staple when signing an e-mail and do clients properly support the stapling when receiving?

    With respect to short-lived certs, different topic than OCSP, I personally do not like them. I like to change my private key once a year and use DANE to give the client confidence it is a valid certificate.

    Unfortunately, most web browsers do not validate DNSSEC or DANE. It’s really too bad, it is an excellent mechanism for client validation of the server’s public key.

    Reply
    1. rmhrisk Post author

      I agree. OCSP Stapling (and MUST STAPLE) will be an important tool in making revocation checking work. Unfortunately, both the Apache and Nginx implementation leave a lot to be desired, some issues include – https://gist.github.com/sleevi/5efe9ef98961ecfb4da8

      As for OCSP Stapling and S/MIME, I do not think any clients support it, it is technically possible but S/MIME implementations are generally pretty week.

      I think there are cases for longer lived certificates and in my personal view, a yearly switch is fine in those cases.

      That said I do think that the greater good is served through encrypting the web and that automation is necessary to achieve that goal. I also feel the secondary effects of that change that I can see are acceptable.

      As for DNSSEC, I was the responsible for enterprise networking servers at MSFT for a while, that included their DNS server. When we added support for DNSSEC to Windows Server I got deep into it and came to the conclusion it is no better and worse in a number of ways to our current state. While I don’t agree with everything Thomas says on this topic I do agree with most, as an implementor I also think AGLs post is spot on.

      Reply
  2. Pingback: Short lived certificates cannot solve the “un-aware victim” problem – 2

  3. Pingback: SSL Review: May 2017 - Entrust, Inc.

  4. Pingback: Short Lived Certificates Cannot Solve Un Aware Victim Problem 2

Leave a Reply

Your email address will not be published. Required fields are marked *