Browser Revocation Behavior Needs Improvement

Today the best behaving client for revocation behavior is that of Windows, in the case of browsers that means IE and Chrome.

With that said it has a very fundamental problem, if it reaches a CA’s OCSP responder and it provides an authoritative “that’s not mine” (aka Unknown) clients built on this platform treat the certificate as good.

You got that right; it treats a certificate that is clearly invalid as good! This unfortunately is a common behavior that all the browsers implement today.

The other browsers are even worse, Firefox for example:

  1. Do not  maintain a cache across sessions – This is akin of your browser downloading the same image every time you opened a new browser session instead of relying on a cached copy.
  2. Does OCSP requests over POST vs. GET – This prevents OCSP responders from practically utilizing CDN technology or cost-effectively doing geographic distribution of responders
  3. Do not support OCSP stapling – IE has supported this since 2008, Firefox even paid OpenSSL to add support around the same time but they have yet to get support in themselves.

These each seem like fairly small items but when you look at all these issues as a whole they significantly contribute the reality we face today – Revocation Checking isn’t working.

There are other problems as well, for example:

In some cases browsers do support GET as a means to do a OCSP request but if they receive a “stale” or “expired” response from an intermediary cache (such as a corporate proxy server) they do not retry the request bypassing the proxy.

All browsers today do synchronous revocation checking, imagine if your browser only downloaded one image at a time in series; that’s in-essence what the browsers are doing today.

These and other client behaviors contribute to reliability and performance problems that are preventing Hard Revocation Checking from being deployed. These issues need to be addressed and the browser vendors need to start publishing metrics on what the failure rates are as well as under what conditions they fail so that any remaining issues on the responder side can be resolved.

 

6 thoughts on “Browser Revocation Behavior Needs Improvement

  1. Dave

    Since GETs are non-idempotent and therefore break cacheing, the correct implementation is to use a POST and not a GET. In other words Firefox’s behaviour is the right thing to do, not the wrong thing.

    Reply
    1. rmhrisk Post author

      Nope, POST has no guarantee to produce the same result on a subsequent call, it doesn’t even have the semantics to communicate that it will unlike GET which does. For various reasons, including both performance and resilience to DDOS OCSP responders serve cached responses, when they do so they indicate how long they can be cached, this is not honored by intermediary proxies since they presume POST is dynamic and GET is (mostly) static. Firefox’s behavior is wrong.

      Reply
  2. Dave

    That’s only valid in the case of broken (replay-attack-enabled) OCSP, a.k.a. “high-performance” OCSP. If you’re running proper OCSP with nonces then you can’t have the data cached because there’ll be a different nonce every time. Since the client is getting a fresh response every time, the correct way to handle it is POST, not GET.

    Reply
    1. rmhrisk Post author

      Dave, while it is true that caching is not possible when nonces are used I know for a fact my responders receive nearly zero requests with nonces, from talking to other CAs they see the same thing I have also verified in the Firefox code they do not send them by default.

      If a client were to send nonces I (and the RFCs) would agree that POST is the right method to use but the fact is they do not.

      Also its worth noting that Verisign/Symantec issues 3 billion OCSP responses a day (pre-produced, nonceless…) dynamically signing those responses would mean they (and other CAs like myself) would be exposed to the risk of a trivial resource exhaustion DOS.

      The decision to rely on time as the means to protect against replay is a practical one for many reasons.

      While it is natural (human nature) to focus on small items one at a time when evaluating a design given the way things are inter-related one needs to step back and look at all the facts before comming to a conclusion.

      If not we end up with a system that is ‘correct’ but doesn’t work.

      Don’t get me wrong we have a ton of issues that need to be fixed with revocation, but getting improved replay protection is the least of our problems.

      Today given the mandatory 7 day maximum validity period for OCSP responses and the approximate 7 day time skew average we have a 14 day revocation Window, we need to bring that in for sure but we lave larger problems preventing revocation from working. One of the largest being we have browsers getting “this is a bad certificate don’t trust it messages” and ignoring them, this is clearly a much more urgent problem no?

      Reply
  3. Dave

    I agree with your comments, but you’re taking a somewhat browser-PKI-specific view of things. OCSP was originally designed for use with things like the Indentrus financial PKI (you can see various traces of this still in the OCSP spec, in particular the delegation mechanism and ability to query the status of multiple certs, which was to be done through access concentrators), for that you did actually need realtime responses because you were dealing with financial transactions (this is why the original OCSP design includes no provisions for scalability, it was created to facilitate the business models of the authors’ employers, which is why it has a bunch of schizophrenic requirements in it, one for each RFC author, rather than a single clear way of doing things). I agree that this thing was never designed to scale in any way, and the current behaviour is a compromise approach.

    Reply
    1. rmhrisk Post author

      It was intended to be specific to browsers (the title of the post is: “Browser Revocation Behavior Needs Improvement”) :p

      I am very familiar with Identrus as I was with ValiCert during this time, our responder represented most of the responders in Indentrus.

      As for the RFC having no provisions for scalability, I don’t agree with that. RFC 2560 had support for nonceless responses, time based replay protection, caching and other scalabiity concepts.

      With that said I would be the first to admit that it was, shall we say light on the details that’s why I did RFC 5019 to clarify those implementation details.

      Hindsight being 20/20 I think everyone back then would have done OCSP quite differently and we would have a better protocol to work with but IMHO its sufficient for its purpose today we just need to use it appropriately.

      Reply

Leave a Reply to Dave Cancel reply

Your email address will not be published. Required fields are marked *