As you may know I have been hosting some performance and up-time monitors at: http://ocspreport.x509labs.com and http://crlreport.x509labs.com.
I started this project about six months ago when I walked the CAB Forum membership list, visited the sites of the larger CAs on that list, looked at their certificates and extracted both OCSP and CRL urls and added them into custom monitor running on AWS nodes.
Later I tried Pingdom and finally settled on using Monitis because Pingdom doesn’t let you control which monitoring points are used and doesn’t give you the ability to do comparison views. That said as a product I liked Pingdom much better.
As for how I configured Monitis, I did not do much — I set the Service Level Agreement (SLA) for uptime to 10 seconds which is the time required to be met by the CABFORUM for revocation responses. I also selected all of the monitoring locations (30 of them) and set it loose.
I put this up for my own purposes, so I could work on improving our own service but I have also shared it publicly and know several of the other CAs that are being monitored are also using it which I am happy to see.
OK, so today I found myself explaining a few things about these reports to someone so I thought it would be worthwhile to summarize those points for others, they are:
- Why is it so slow to render? – Unfortunately despite numerous requests to Monitis there is nothing I can do about this – Monitis is just slow.
- Why does it show downtime so often? – I do not believe the downtime figures, most of the time the failures show up on all of the urls. The times I have looked into theses it turned out the failures were at Monitis or due to regional network congestion / failures. Unfortunately this means we cannot rely on these figures for up-time assessment, at best they are indicators when looked at over long periods of time.
- Why do some tests show at 0-1 ms? – This is likely because the Monitis testing servers are located in the same data center as the OCSP servers in question. This skews the performance numbers a little bit but the inclusion of many perspectives should off-set this.
At this point I suspect you’re wondering, with these shortcomings what is this thing good for anyways? That’s a good question; OCSP (and CRLs) are a hidden tax that you and your users pay when they visit your site.
This is important because studies have found a direct correlation between latency and user abandonment and seriously who doesn’t just want their site to be fast as possible.
My hope is these resources help you understand what that tax is; if you’re a CA operator it can also help you tweak your performance as well as get an idea of what the global user experience is for the relying parties of your certificates.
On a related note I do think someone could make a pretty penny if they made an easy to use, yet powerful monitoring site 🙂