Category Archives: Thoughts

Secure, Privacy Preserving Key Discovery for End-To-End Encryption

A lot of products today claim to offer End-To-End Encryption but not all of these products offer the same level of protection. Some of the differences between these solutions are rooted in the protocols and cryptography that they use, in some, it is in the way they are implemented and in others it is the way they handle the discovery of the cryptographic keys of the peers involved in the session.

The topic of key discovery itself is a complicated one, on its surface, for a messaging application all you need to do is go to a directory to request the public key pairs associated with the user or their devices you will communicate with. Where things get tricky is how, as a relying party, you can tell if the key discovery mechanism is lying to you or not.

This is important because if the key discovery server is lying to you it can facilitate an impersonation of that user, add a hidden third-party to the encrypted session without your knowledge, or potentially trigger a re-encryption to a device not under your control without your knowledge.

To understand the implications here you just need to look at iMessage. Although many do not know this iMessage is actually End-to-End Encrypted! Matthew Green has done several great write-ups on its protocol [1] [2] and how the lack of verifiability in the key discovery mechanism utilized weakens the overall solution.

The most used End-to-End Encrypted messaging application is probably Facebook’s What’s App. Several years ago a security researcher [3] reached out to The Guardian to discuss what they described as a “backdoor” in What’s App, this “backdoor” was related to how it handled key discovery in device recovery use cases.

As a product person, you often need to make trade-offs to achieve your goals and that was what happened in this case. This “backdoor” was a design decision that was made to ensure billions of users could get some of the End-to-End encryption protections without compromising usability.

A number of security researchers, including myself, spoke up [4] which resulted in the article being updated to correctly reflect this reality [5] flawed reporting about WhatsApp.

Later WhatsApp and how Key Discovery happens came up in the news again, this time in an article from Wired [6. Alex Stamos, the former Chief Security Officer of Facebook, responded to this article [7] affirming some of the article’s points and talking about how a conscious decision was made to enable the associated use case:

“Read the Wired article today about WhatsApp – scary headline! But there is no secret way into WhatsApp groups chats. The article makes a few key points.”

While is response may be true, it is nor verifiably true as it relies on the behavior of the client and not cryptographic verifiability.

This is where systems like CONiKS [8], Keybase [9] and Google’s Key Transparency [10] come into play.

These solutions aim to enable automated trust establishment with untrusted communication through the use of an auditable directory of all of its users’ keys both past and present.

The fact that these solutions provide the auditable history of keys means that both the relying party and subscriber involved in the communication can reliably be made aware of when new keys have been associated with a users account, and importantly what entity added the key to the account.

With this information, they applications the users are using can both prevent messagings (via policy) being sent or notify the user when keys have changed unexpectedly.

This allows messaging clients to verify the identity of users automatically and prevents malicious/compromised servers from hijacking secure communications without getting caught.

On the surface, this sounds much easier than it is to acomplish at least at scale. WhatsApp serves over a billion users, any solution needs to be able to deal with key updates and reads at rates necessary to support such a large user base.

It needs to do this without leaking metadata associated with who the users are communicating with.

And do this without significantly increasing the amount of data a user must download or the time it takes to change keys.

While these are all tractable problems, they are not problems that are solved today in this context.

For this reason, applications that implement End-To-End Encryption typically either provide a mechanism that users who care about these risks can use to out of band verify cryptographic keys in person [11] or simply implicitly trust the key discovery service as an honest actor.

At Google, I have the pleasure of working on Google’s answer to this problem [12]. It is our hope that when complete that applications that need to securely discover keys in a verifiable way can simply download our solution and focus on their application and not need to spend years of energy to solve this problem for their applications.

I firmly believe the best way to ensure the right thing happens is to make sure that the right way is the easy way and fundamentally that is the goal for the Google Key Transparency effort.


  • [1] Attack of the Week: Apple iMessage
  • [2] Let’s talk about iMessage (again)
  • [3] The Guardian is backtracking on a controversial story about WhatsApp
  • [4] Security researchers call for Guardian to retract false WhatsApp “backdoor” story
  • [5] Flawed reporting about WhatsApp
  • [7] Read the Wired article today about WhatsApp – scary headline!
  • [8] CONIKS Project
  • [9] OKCUPID’S FOUNDERS WANT TO BRING ENCRYPTED EMAIL TO THE MASSES
  • [10] Google’s Key Transparency project aims to ease a tough task in cryptography)
  • [11] Safety number updates

Why I chose UniFi vs AmpliFi HD


I just had a brief exchange with a friend on Twitter who suggested that AmpliFi HD, not UniFi was the product Ubiquiti was building for users like me.

I thought folks might be interested in why I didn’t go that route so here is another post 🙂

I did look at AmpliFi HD, in fact, my eldest son tried to convince me to ditch the Google WiFi for AmpliFi HD shortly after it came out.

When I looked into the AmpliFi HD, my conclusion was that it was a less well featured (e.g. it didn’t seem to have the home automation, parental controls, etc features) Google WiFi with better radios and I was largely satisfied with the radio coverage I had with Google WiFi so I was not compelled to make the change.

One of the pain points I did have with my Google WiFi solution was that I had to find places to stash four Google WiFi access points to get sufficient coverage for all the devices in my network. The devices themselves look OK but we really do try to hide all the tech in the house and live by the motto “less is more” so this is a pain we did feel.

The AmpliFi didn’t really have a solution to this problem either, in fact I now probably needed more, smaller units for proper coverage. The upside of which is that those smaller units would have been less visible which would have been nice. On the other hand, I find the kids are often unplugging things in the house to free up outlets or to simply mess with me and the design of the AmpliFi mesh units are such I feared that would happen a lot.

When I looked at the UI on the AmpliFi products my conclusion was it was a stripped down UniFi vs a product designed as a high-end WiFi product. This is in contrast to the Google WiFi which felt like it was a sincere attempt at rethinking the whole user experience.

This combined with the lack of integration with a larger ecosystem (home automation, etc) made it really hard to justify migrating off of Google WiFi.

My conclusion (right or otherwise) from my research was at best I would end up with marginally better coverage and a new set of limitations as a trade-off. It just did not justify the change.

When I re-visited the decision to replace my wireless deployment I was more-or-less fed-up. I did not want to mess with this again anytime soon so I decided to go big or go home. This led me to the switch to UniFi which in turn also led me to the switch to Protect.

If I was the target user the AmpliFi team was looking for I think they missed a few things:

  • I want less clutter, not more, the square design of the AmpliFi presumes public display of a piece of electronics I don’t want that.
  • The mesh does not support wired backhaul, and the distance between where it would be natural to use them would be quite far. Wireless backhaul had caused me some pain with Google WiFi so I was not sure this would work well for me.
  • I also didn’t want 4-6 outlets being occupied in the house, even though the mesh adapters are smaller than the Google Wifi, more is still a pain, especially given kids are not likely to leave them alone.
  • I have some basic home automation and the AmpliFi product doesn’t offer any story here.
  • I liked the parental controls I had with Google WiFi and it seemed I could approximate that but not in an easy way.
  • I liked how I can manage my parents and cousins WiFi’s in Google WiFi; it gives me a one-stop shop for how to deal with issues when people call me. I recall coming to the conclusion this was missing and if nothing else the friction of replacing their WiFi’s to be uniform would have been a barrier.
  • I have Fiber and I understood you had to run the device in bridge mode in this case service which takes away a lot of the features of the AmpliFi HD system.
  • The CloudKey Gen2 Plus having the built-in NVR meant I could consolidate how I dealt with cameras at the same time; one less thing to deal with and after a year the cost savings would allow me to break even and later save.

I basically concluded that my home was “too big” for the AmpliFi HD and that the incremental benefit of switching to it from Google WiFi was not worth the effort.

This could be marketing, this could also be poor product planning, or maybe I was just not the target customer. It is hard to say without knowing a bit more about how the product planning was done here.

In any event as the earlier post states, I’ve gone all UniFi now and I look forward to seeing how that works for us over the next year.

Google Wifi + NEST Cameras vs Ubiquiti for Home Use

I recently made the switch from Google WIFI and NEST Cameras to Ubiquiti Unfi and Protect. A few things motivated these changes and I wanted to talk about them in this blog post.

Background

Google WiFi

The most significant motivator was some network reliability issues that I was experiencing on the Google WiFi. In the end, the problem was not related to the Google WiFi but I could not diagnose without logs which the Google Wifi encrypts. Though I was able to walk through the issue with Google support and ultimately able to localize the issue it took several days of back and forth and required me walking them through exactly what to look for.

The Google Wifi actually performed great overall but we do have an above average number of devices in our house and sometimes we would experience what I believed to be congestion. This is likely because Google Wifi only supports SU-MIMO, the UniFi solution, on the other hand, supports MU‑MIMO. MU-MIMO allows a Wi-Fi router to communicate with multiple devices simultaneously. This decreases the time each device has to wait for a signal and dramatically speeds up the network as a result.

I also experienced some cases where the Google WiFi was falling back to the Mesh wireless solution even though I had a wired backhaul. I never figured out why this was happening but it was not a huge issue.

Finally, we have an outbuilding that is currently using our guest network but since it is on a Guest Network it can not do any IOT style networking where one device talks to another. To address this I needed to either set it up with a physically isolated WiFi of its own or configure a VLAN which I could not do with the Google WiFi.

As a plus, since it is a product designed for home it has features like parental controls which are useful and though it could it could use some work on usability it was actually quite useful.

To be honest, I can not say enough positive things about the Google WiFi, it is a great product that for 99% of people is probably perfect but the sad reality is that we started to outgrow it.

Google NEST Cameras

We had five Google NEST Outdoor Cameras and a Hello doorbell at our house. They worked great and were pretty reliable. We really only had four complaints about these devices.

The first of which is that they did not support POE, this meant when we set them up we had to buy USB to POE adapters and find ways to hide the long and bulky USB power cable they came with.

The second issue is that some of the cameras were on the absolute edge of our wireless network and we would, in rough weather, lose the wireless connection as a result. We did buy another Google WiFi to help with this issue but again it would have been ideal if the cameras were POE based and then this wouldn’t have been an issue.

The third issue is that the move notifications tended to be a bit annoying, we did configure zones help manage this but it was still more obnoxious than I would have liked. To configure zones we had to pay the per camera monthly fee also, this did feel a little bit like extortion — e.g. pay us not to annoy you with notifications.

The fourth and final issue was that the cost and nature of cloud storage. With a total of six cameras, the yearly cost of the NEST solution was significant. It also was dependent on cloud storage which meant my data was being stored exclusively on the cloud. As a Google employee, I have faith in the companies practices relative to managing this data but the recent issues with Ring and Alexa from Amazon poorly managing the data they store relative to their competitive offerings did give me pause.

The reality is that if it were not for the Google WiFi change I discuss above I would have likely kept the NEST Cameras. This is because, despite the above, I was pretty happy with the solution but since I was buying into the Ubiquiti ecosystem it felt like unifying on their solution would not only address the above concerns but overall make things simpler to manage in the long run.

UniFi Wireless

Despite being a very advanced product capability wise it has a pretty easy to use interface for managing. I wouldn’t recommend putting the concepts it exposes in front of the type of users I end up supporting in my personal life but the reality is once it is set up you never really have to deal with that stuff.

Since it is really designed as a business solution and not a home solution it is missing some features that a modern home user might expect. For example, it has no way to share IOT devices as Google WiFi does. It is not integrated with home automation systems either, for example, you can’t use presence and activity of devices to infer if people are home as part of the way you configure your home automation. And it has no “parental controls” concept, though you can manually configure something roughly equivalent.

With that said, since UniFi was designed for businesses, many of its access points are physically attached to the house. This means you need to run wires in walls but it also means you do not have a pile of devices sitting around on horizontal surfaces.

It also does smart channel and power management so you don’t need to worry about such things, so similar to Google WiFi it is largely a set it and forget it solution.

What you end up with when you go with a UniFi based solution is a professional, flexible, moderately easy to use, high-performance solution that is physically installed and as a result non-intrusive to the overall environment.

UniFi Protect

Ubiquiti has two video solutions, Unfi Video that is slowly being replaced and UniFi Protect. I am using the UniFi Protect offering as it is integrated with the CloudKey Gen 2 Plus which I am using to manage my wireless.

The Ubiquiti cameras I chose are the G3, mainly because they were the cheapest of the set and seemed approximately comparable to the NEST Cameras they were replacing. This was important as I intended to sell my NEST cameras to cover the cost of the change.

The G3’s do not have as nice an industrial design as the NEST cameras, they also look more commercial and essentially have no market of third-party accessories (for example skins to obscure the cameras) but they look reasonable enough.

The G3 also does not have a speaker (some other models do, for example, the G3 Micro, though it is an indoor camera) so there is no chance of two-way communication, though they do have a microphone so you can record what’s going on with the video.

I think the biggest gap in the G3 cameras relative to the NEST is they have no zoom, you have to step up to the G3 PRO which is 3x the cost of the G3 to get this.

The upside of this solution over the NEST can be summarized as:

  • No monthly fee per camera,
  • Cheaper cost per camera,
  • Data is stored locally vs on a public cloud.

There are some things that I will miss from the NEST solution, in particular:

  • Using computer vision to analyze the video, for example, do not send notifications when it is a family member, send notifications when a familiar face is seen, or ignore movement unless you see a person (some of these capabilities are only available with the new NEST Cam IQ camera).
  • It is not integrated with home automation systems, Alexa, Google Home or Siri. For example with Google Home, you can ask Google what is happening on a given camera and it will display it on your TV.
  • Having an integrated doorbell solution. I will be keeping NEST Hello, for now, to fill this gap, though having one camera there and the rest in another system is far from ideal.
  • There are no applications to integrate the cameras with AppleTV or ChromeCast so getting the cameras displayed on these devices will involve casting a browser session which lame.

With all that said, the TCO for a multi-camera NEST system is pretty high if you want to retain video and the Ubiquiti solution addresses this effectively.

Wishlist For Ubiquiti

I am installing this system into a home, and that’s not squarely where Ubiquiti is aiming this product at. With that said many new homes get Ubiquiti installs now and if I was in the product team at Ubiquiti I would seriously be looking at what I could do to better serve that market.

Based on my current experience with the product here are some things I think would be nice to have from Ubiquiti.

  • A doorbell camera, it is a shame I need to have to keep the NEST camera to have a complete solution.
  • There should be better camera choices; not having a zoom or speaker in a security camera in 2019 is lame.
  • It is disappointing there is no affordable 4k camera option when consumer products do offer them.
  • I would love to see a less obvious industrial design for the cameras that would work well with skins so you can hide the cameras more easily.
  • Produce a rack kit that allows placing both the security gateway and the CloudKey in a single 1U rack location.
  • I would like to be able to put the Protect server and cameras on one VLAN leaving the network controller on another; they are two different security domains and shouldn’t have to be co-mingled like they are currently [added this to the list after the article was posted].
  • There should be better integration between SDN and Protect, for example, I should not have to set aliases in both manually [added this to the list after the article was posted].
  • If I am going to have to have a Nest Hello and the Protect software it would be ideal if the Nest Hello was integrated into Protect [added this to the list after the article was posted].
  • Integration with Alexa, Siri and Google Home should be in the box.
  • Basic computer vision capabilities in-box, or at least able to opt in to use a cloud CV solution such as Google Vision API or Amazon Rekognition to do intelligent filtering of movement signals in the video.
  • Register the UbiquitiHome.com domain, do dynamic domain registration for subdomains/hosts as part of the on-boarding experience in setup, use Let’s Encrypt to get a certificate for that domain and do away with the self-signed certificate that is currently used.
  • Since the product line is geared towards small businesses and I suspect a good chunk of the home user market is enthusiasts it would be great to have a robust REST API with Webhooks available so custom solutions could easily be added without going into the database to extend capabilities.
  • With a robust set of REST APIs, they could offer a marketplace of applications that users could use to integrate with other systems (IFTT, Google Home, Alexa, etc).
  • Alarm.com integration of NEST Protect would probably be a real winner for the enthusiast community and I would explore a partnership there if I were Ubiquiti.

In Summary

Though I am technically not even done with my Ubiquiti journey it is clear that so far the Ubiquiti networking solution is technically superior but their camera offering still leaves a bit to be desired.

It does seem with the introduction of Ubiquiti Protect which currently has a 20 camera limit, they are looking at how they can better serve users like me. That said, only time will tell how far they go towards providing solutions that are competitive with the consumer-focused offerings.

What value can a third-party provide users when browsing the web?

While at the CA/Browser Forum I was asked by a friend if we wanted to replace EV with a new class of certificate what would that certificate look like?

My response was that I would frame the question differently. The “real” question is what problems does a typical user have that a third-party with the strengths of a CA could help with?

With this in mind, you need to first understand who this stereotypical user is, a software engineer may have different needs than a grocery store clerk. They may also have common needs, you won’t know that until you do research.

The only way to do reliable research on this topic is to actually work with those users to understand what their needs are. While this is much harder than it sounds due to biases introduced in such processes a real needs analysis requires that you start here.

With that said, I suspect this exercise would show a broad swath of the target users is concerned with these questions:

  • Will I have a good experience working with the people behind the website?
  • Do the people behind this website have a good reputation?
  • Are the people behind this website experts in their craft?
  • How do I figure out how to reach a real human when and if I need to?

I would put those concerns into the context of the interaction they will have with the website (buying a product, downloading software, etc).

With that understanding I would then try to understand what the strengths of the CA are, having been a CA for a long time I would say:

CAs are good at verifying claims relating to the subject of a certificate.

I would then try to map the identified problems and strengths together to see what potential value the CA could provide that user.

Again the right thing to do is formally do those above explorations but for the purpose of this post I suspect these exercises would find that:

  • When a user visits a website they may struggle to find out how to contact the sales/support for that business,
  • When a user visits a site for the first time it may be hard for them to determine what the companies true line of business is,
  • After a user previously visited a website and completed a transaction with it they sometimes need to contact that business after the fact and could be assisted in finding the right contact information,
  • Before deciding to do a high-value transaction with a business, customers may want to find out the experience others have had with that business.

Now, just because a user may have these problems and a CA may be able to help solve them, it does not mean the SSL indicator is the right place to help answer these questions. It just means that there is a problem and skills intersection.

When, and how to solve this problem is another exercise altogether. Let’s explore EV for a second to give that some context.

Today if we assume the information in an EV certificate is correct (and not confusing see: this and this for context) we can say it provides the answer to “if I need to sue these people where do I tell my lawyer they are at?”.

The problem with that is that you may not have that information when you need it. I say this because you typically need to sue someone after you completed a transaction with them not before. After the fact, you have no assurance that this information in the certificate will be available at the site you did the transaction with.  The website may have gone away, they could have changed their certificate, or could some other change may have taken place that makes that information not readily available to you when you need it.

In any event, the point of this post is to say CAs should not be asking what they can put into certificates but what problems users have that CAs are well suited to solve. Unless they start there they will not be solving a real problem, they will just be bolting more things onto a certificate and asking why browsers and users don’t users see value in it.

Reality vs Fantasy – The DV vs EV argument

This morning I woke up to a blog post from Melih, the founder of Comodo titled “Problem vs Solution Value mapping”.

This is a follow-up to an ongoing discussion Melih and I have been having about the value of EV, and positive trust indicators. On my blog, the conversation started July 2017 if you’re interested.

Melih’s focuses his most recent post on the assessment of “value”, correctly attempting to define it as the basis of the rest of the post. He chooses to define it as  “the direct result of a resolution to a problem.” I think it is this definition is the first part of his argument I have an issue with. Namely, The Oxford Dictionary defines “value” as “the regard that something is held to deserve; the importance, worth, or usefulness of something.”

When considering “value” with this definition, I believe an analysis of “value” would start by building a case on what is “deserved”. To do that, we have to also define a context in which that value is assessed. I think this is probably the hardest part, and probably where most of the disagreement on “value” of EV stems from.

If we say the context of this assessment is “the security and privacy guarantees that can be provided to the user by user agents to users” EV’s value is no better than that of DV. It is not a hard case to make either.

The security model of the browser is based on the concept of “origin” where that origin is essentially the “hostname” that the content was retrieved from. Any external website or resource embedded in the site (with rare exception) has the same permission as the original website as a result of this model. This is how web analytics work, advertising and many other products and services that make up the web.

Until user agents required all of these entities that make up a given site to use EV and to have the legal entity in all of the associated certificates match; EV is a false flag. It says “you are talking to this legal entity” when in-fact your talking to many legal entities and any one of them could equally harm you.

The reality is that if this change were to be made that you would almost never see EV badges though. This is because virtually every site is made up of content and services from across the web and this condition would almost never be met. This is why we do not see CAs making the argument that this rule should be enforced by UAs.

If we say the context of this assessment is “the average users practical ability to protect themselves from phishing” again EV does not fair well. There have been lots of user studies done on how users do not understand positive trust indicators, and in general, do even notice them in most cases.

Furthermore, even if we disregard these well-run studies (and the associated common sense) as Ian Carroll showed with his Stripe, Inc business in Kentucky the values displayed in these indicators can trivially be made, at a very low cost and with no traceability, be made to say whatever an attacker wants. This again frames EV as a false flag because it can so easily be used to lend credence to a phisher’s site by giving them the EV badge that says the same thing as their target site.

If this was not enough, again if we disregard these well-run studies and say that people need to take the responsibility for looking at the EV badge to get confidence they are dealing with a trustworthy entity we need to look no further than the work James Burton did when he got a certificate for his business “Identity Verified”.  In this case, if a user has been taught to look at the EV indicator for an abstract concept of “trustworthiness” we are back to the user being mislead.

All of this ignores another very real problem, that being most phishing sites are not bespoke sites, instead, they are sites that are hacked and re-purposed. A good example of this is this one from a few weeks ago. What we appear to have here is a company called Northern Computer Services, LLC hosting a website for a business with the domain name “stampsbyjudith.com” hosting a Bank of America phishing site.

Now EV proponents surely see this as an example of EV working but if you look at it critically you will see it is exactly the opposite. First, could a customer believe that this “Northern Computer Services” is somehow a service provider to Bank of America? It seems reasonable to assume that the average user does not know anything about the way Bank Of America operates its services. In-fact even if you do have some level of understanding it’s incredibly common for banks to use service providers for different capabilities, maybe this Northern Computer Services hosts the BoFa website or provide billpay or mortgage services. How is the average user to know?

But what about the URL? There is no plausible way Bank Of America is hosting their site on the domain stampsbyjudith.com! Your absolutely right! it’s a fair expectation of us that if a user happens to look at the address bar that they should be able to figure that out. This is of course something you get when you use DV though, no EV necessary. Then there is the issue that studies also show that users do not look at the address bar either.

This is why Microsoft has created SmartScreen and Google has created Safe Browsing. These solutions utilize the massive scale and technology depth of these organizations along with machine learning and other advanced techniques to find phishing sites. As a result when a user navigates to a site similar to this one they get a interstitial warning them about proceeding.

In summary, in this context, I would argue that as EV exists today it actually makes things harder on the user and easier on the attacker.

With that context in mind let’s explore each of the arguments that Melih makes.

Users want protection from Transit Providers. Sure they do but I would say the if a user framed the topic this way it would demonstrate the how little they actually understand of the problem in question. It is not just “transit providers” they need protection from, it is every entity other than those that are necessary to serve the application hosted at a domain.

Networking is so complex it is not possible to expect even some of the most technical users to understand all of the nuances involved here.

I would like to point out that Melih again attempts to redefine terms, this time in a disingenuous way. Specifically, in this part of his post suggest there is some common understanding that there is a difference between “encipherment” and “encryption”.

Let’s again take a look at what the Oxford Dictionary says:

Encryption – The process of converting information or data into a code, especially to prevent unauthorized access.

Encipherment – Convert (a message or piece of text) into a coded form.

As you can see, these words mean the same thing. The only difference being the example use case in one of the definitions. But maybe this inconsistency is use  is because the Oxford Dictionary does not address a cryptographers view on these words? Unfortunately, that is not the case either, if you were to look at books like Serious Cryptography, Cryptography and Network Security, or even the very dated Applied Cryptography you will find no usage of these terms in this way.

What Melih has suggested in the past, and continues to do so in this section is that somehow if you authenticate only the domain and use that authentication as the basis for the session protection that this is not “encryption”.

Going so far to suggest that it is only encryption if you authenticate the legal entity. This is frankly ludicrous and I can not even respond to this more than I just have here.

I can say, that redefining a term, especially in such a specious way devalues any other valid points he may have.

But what about the users! The users want to know who they are dealing with! I actually agree with this but I also think it is far more complicated than users actually understand. So much so I would argue it is not possible to do in most cases. As a father when I run into situations where my kids want things that are not possible I sometimes joke with them and say “Well I want a pony!”.

It feels to me this is probably a case where that response is appropriate. The reality is there is not a globally unique business name, this is also the case with logos. Probably the best mainstream examples of this are the fake Starbucks stores and the notorial “Apple Stores” of Asia.

Fake Apple Store Highlights Counterfeit China

77778-full

This is the nature of brand names, in-fact there is an entire discipline of law (Trademark Law) dedicated to this topic and multilateral international agreements on how such disputes are to be handled.

So in the context of the url, does EV as it stands today add or remove value? From my perspective, it seems to me at a minimum in this context it provides no value but I could also make a reasonable argument it makes things worse here as well due to the introduction of more surface area for confusion.

User’s want to know if its “safe” to interact with the website! Again I can agree with this, the problem is names do not harm — we even teach our kids rhymes to remind them of this fact:

Sticks and stones may break my bones, but names can never hurt me.

To keep users safe we have to look at far more than the name a website is hosted under; there are literally thousands of features that a solution intending to protect users safety need to consider and I would not be surprised to find out that the name is one of the least important.

This is, again, why we have solutions like SmartScreen and Safe Browsing these solutions are constantly watching feeds of data to determine if a website is safe or not. It is not possible to solve the “safety” problem in any meaningful way without similar techniques.

But user’s want to be able to trust the content they see! Again, I also think this is something that users want, I just don’t think they can have everything they want.

But before I talk about this I want to talk about how Melih is redefining a term again, he suggests that “trust” means “having the ability to validate VISA, Paypal logo etc”. The oxford dictionary defines trust as “Firm belief in the reliability, truth, or ability of someone or something.”

With that, I would think that it would be more correct to say that they want to believe what they see. This is of course a very natural thing, something scammers have taken advantage of since the dawn of time.

When considering this desire I think we have to ask ourselves what the best way we have to service the desire. We also have to acknowledge that malicious content is everywhere in the world (don’t forget our Fake Starbucks and Apple Stores from above) that the best we can do is provide a speedbump.

This is, again, why we have solutions like SmartScreen and Safe Browsing as they were designed, engineered and continually evolve to address these risks.

In closing, I believe EV as it stands today is a round peg in a square hole. This does not mean there is not value in knowing the legal identity of the organization who operates a website, it is also not because these third-parties can’t do more to help users manage the risks they are exposed to.

It is because EV is being sold as something it is not, a anti-phishing tool. Simply put it is not well suited to help with that problem and I would go so far that when we teach users to see it as such it even helps phishers.

Risk variance and managing risk

One of my favorite security sayings is “My threat model is not your threat model”. We broadly accept there are different perspectives for every problem — the same is true with security.

Consider an Enterprises IT organization where you are chartered to support and secure a business. You need to meet this charter with a fairly fixed set of resources but the business requirements you must support are always changing. To deliver a reasonable level of service enterprises standardize on a core set of ways that certain issues (user and access management, compute, etc.) will be provided and force business units down the path of adopting them.  But that standardization often results in non-ideal user experiences, disjointed business workflows, slow innovation and importantly, in this case, it also commonly results in the either over or under mitigating security risks.

Startups, who are on the other end of the spectrum, are in a race to demonstrate market traction before their funding runs out. As a result, startups either virtually ignore security and privacy altogether or re-using a component they do not understand that is not well suited to the business problem they are solving or simply solves the wrong problems for their business risks.

Both enterprises and startups show the hard reality that we are often so close to our problems and set in our way of thinking we simply miss the big picture. This natural bias can hurt our businesses and in the context of security trade-offs, and result in incidents like the recent Equifax breach or the ever-growing list of Bitcoin exchange compromises.

The first step in preventing this “missizing” of risk is to make sure you understand what your risks actually are. The right way to do that is to think adversarially, taking a step outside of your business process or solution and thinking about the structure of the system your protecting and defining a threat model that captures those risks. This is not a one time exercise, it is something you need to constantly be re-visiting and getting new perspectives into.

Consider the typical Bitcoin exchange compromise, the exchanges usually start with a basic system limited to “online hot wallets” with weak architectural protections. They probably know better (for humanity’s sake I hope they do) but decided that the risk, in the beginning, is low enough because they have so little to lose that they go forward. Later they find success and are focused on other parts of their business and never get back to fixing that early trade-off.

As an aside, this bitcoin example showcases lots of problems, the largest being the asymmetric risk distribution. Specifically, the risk here is that of the depositor but the decision to take the risk is made by the exchange. I digress but this class of problem is a real problem in most startups and is the impact of which is multiplied 1000x in Bitcoin startups.

In any event, we can see how these sorts of things happen in hindsight so how can we limit how often these issues happen in the first place? As a technologist it hurts me to say this, the answer is process.

The good news is that process does not need to be heavy-weight. You need to make sure you approach the problem systematically and regularly, for example:

  • Instead of a threat model, you can do a simple threat tree,
  • You can use your bug tracking system to track the security issues you have found,
    • You can make sure those bugs capture the security decisions you have made,
    • The consequences of the identified risks for the actors in the system are captured,
    • What mitigations you have put in place for those issues along with how effective you think they will be are captured,
  • And importantly have a plan for how you will respond when things go wrong because they will regardless of how well you plan and invest in making the right security decisions.

You can then make sure you are reviewing these and acting accordingly on a regular schedule. This will ensure your organization at least has an inventory of issues that can last the individuals on a team and will make sure there is a point of conscious risk acceptance that the organization has taken.

This does not replace a full-on security program but it can at least make sure you are looking at the problem in the context of your business, your users and not just assuming every system has the same security needs.

On a related note for you startups, especially those who are operating in murky waters regarding regulation. The current regulations belong in your threat tree so you can make sure you understand them and begin to understand how they might apply to you even if you have to squint a bit to do it.

The Evolution of Security Thinking

In design sometimes we refer to the strategies used during the design process as Design thinking.  The application of these strategies helps ensure you are solving the right problems and doing so in a repeatable way. You can attribute much of the massive improvements in usability in software and devices over the two decades to these strategies.

If we look at how we have evolved thinking around building secure systems over the last two decades we can see that we have evolved similar strategies to help ensure positive security outcomes.

If we go back to the late 80s we see systems that were largely designed for a world of honest actors. There was little real business happening on the Internet at the time and the hard problems to be solved were all related to how do we enable a global network of interconnected systems so thats where efforts were put. These efforts led us to the Internet of today but it also gave us systems vulnerable to trivial attacks such as the Morris Worm.

By the 90s the modern “security industry” was born and products designed to protect these insecure systems from the internet started to come to market. One of the most impactful examples of this was the TIS Firewall Toolkit, other examples of this way of thinking include Antivirus products and other agents that promised to keep our applications and operating systems safe from “attackers”.

By the late 90s and early 2000s, it was clear that these agents were never going to be effective at keeping the bad guys out and that we needed to be building systems that were Secure by Default, Secure by Design and Private by Design. This shift in thinking meant that solution developers needed to develop their own strategies and tooling to ensure systems could be built to be inherently resilient to the risks they were exposed to. The concept of Threat Modeling is probably the most concrete example of this, believe it or not, this basic concept was essentially absent from software development up until this point.

By this time the technical debt in deployed systems was so great we spent most of a decade just trying to rectify the mistakes of the past. Windows XP SP2 and the Microsoft Security Stand Down is probably the most visible example of the industry making this shift, it also leads to the Security Development Lifecycle that largely informs how we as an industry, approach building secure systems today.

During this timeline, cryptography was treated as something that you sprinkled on top of existing systems with the hope to make them more confidential and secure. As an industry, we largely relied on the US Government to define the algorithms we used and to tell us how to use them securely. As a general rule only products designed for government use or for the small group of “cypherpunks” even considered the inclusion of cryptography due to the complexity of “getting it right”.

Things are changing again, we see the IETF via the CFRG working to standardize on international and independently created and cryptographic algorithms in lieu of relying exclusively on governments to do this standardization. We also see the concept of Formal Verification being applied to cryptographic systems (Galois is doing great work here with Cryptol as are other great projects in the verifiable computing space) which is leading us to have frameworks we can apply to build these concepts into other products securely (check out the Noise Protocol Framework as an example).

I think the Signal Protocol, Rough time, Certificate Transparency and even Blockchain Technologies are examples of the next phase of evolution in our thinking about how we build secure systems. Not because of “decentralization” or some anti-government bent in technologists, instead, these systems were designed with a more-complete understanding of security risks associated with their use.

Trust is a necessary component of human existence. It can give us peace of mind but It can also give us broken hearts. The same is true in the context of system design. Trust cautiously.

These systems, by design, go to great length to limit the need for “trust” for a system to work as intended. They do this by minimizing the dependencies that a system takes in its design, this is because each of those dependencies represents an attack vector as we advance technology our attackers become more advanced as well. They also make extensive use of cryptography to make that possible.

This focus on dependency reduction is why we see Blockchain enthusiasts taking the maximalist position of “Decentralize all the Things”. In my opinion, centralization is not always a bad thing, over-centralization maybe, but centralization can provide value to users and that value is what we should be focused on as solution developers.

My personal take is that when we look back on the next decade we will the say the trend was not “blockchain” but instead this is when we evolved our security thinking and tooling to better utilize cryptography. Specifically that this is when we started to use cryptography to make transparency, confidentiality and verifiability part of the core of the solutions we build instead of thinking of it as a layer we apply once we are done.

Let’s talk about revocation checking, let’s talk about you and me.

I have been having a conversation with Melhi at Comodo, this is the most recent post in that series.

First of all, the author is unaware that my company has built the most sophisticated Certificate Management system that can automatically request, issue, renew and manage the whole lifecycle of the certificate. Many Fortune 500 companies rely on this amazing technology to manage their PKI infrastructure.

I am aware.

So it is with that expertise and insight I must insist that the author does not appreciate the nuance between “high frequency” renewal vs “low frequency” renewals. Short lived certs will require “high frequency” renewal system. To an IT admin this is a scary prospect! They tell us that!

Having been responsible for and or worked heavly on:

  • The Valicert OCSP responder and clients,
  • The Windows CA and the enrollment client at Microsoft, which is the most used CA software ever,
  • The Network Access Protection solution that used IPSEC and ~12-hour certificates to do segmentation of hosts at the IP layer and their health. based network isolation solution that was deployed into many of the Fortune 500,
  • All of GlobalSign’s technology offerings, in particular, the technology enabling their expansion into the Enterprise,
  • Helping the ISRG build out Let’s Encrypt,
  • Designing, building and operating numerous other high volume products and services in finance, healthcare, and government.

Above and beyond that I helped secure Bing Ads and Live ID and now work at Google on other very high scale systems.

Needless to say, I too have familiarity with “high frequency”,  “low frequency”, “disaster recovery” and “availability” problems.

As for fear, it is a natural part of life, I believe it was Nelson Mandela who famously said: Courage is not the absence of fear but the triumph over it.

The reality is that it is manual certificate lifecycle management that is the thing to be afraid of. That is why COMODO and other CAs have been building out cloud-based management and automation solutions over the last few years. For customers this reduces failure, reduces costs, gives more control, and nets higher customer satisfaction.

Importantly, one needs to remember it is manual processes that lead to the majority of outages [see 1,2,3, and 4]. The scale of this problem has even led to an entire market segment dedicated to managing the lifecycle of WebPKI certificates.

So to the question of fear, I would say the same thing my father told me, though change can be scary and uncomfortable in the short term, it usually turns out okay in the end, and often better.

I believe there can be a scalable revocation infrastructure that can serve status for all certs from all CAs that is backward compatible with existing issued certificates that can be called from a browser. As I said before I do believe in the ingenuity of our scientist and engineers to bring us this solution……soon….

You misunderstand me. I did not say it was impossible to have a scalable revocation infrastructure that is backward compatible. I said the creation of a new system that could deliver on the small message size promise we were discussing would take 10 years to design, build and deploy.

To understand my rational take a look at this post. It is a little old now, and update rates are a faster now but not massively. When you review it you will see that if you exclude IE/Edge it takes just under a year for a new version of the most common browsers to reach 90% market share.

To put that in a more concrete way, if today, Chrome, Firefox, and Safari decided to simultaneously release a new version with a new revocation scheme it would take a bit less than a year for us to see 90% deployment. That is, of course, an unrealistic goal. Apple, for example only releases new versions a few times a year and does not even support WebRTC yet. Additionally, browsers have a pretty deep-seated position on revocation checking at this point given all the problems of the past so convince them will take time.

A more realistic, but still optimistic period assuming pre-existing consensus and a will to solve the problem is 5 to 6 years. If you question this figure, just look at how long it took for TLS 1.2 to get deployed. TLS 1.2 was published in 2008 and was not enabled by default in Windows until Windows 8.1 in 2014.

Web servers are even worse, as an example consider Apache version 2 was released in 2002 and these later versions still have less than 50% deployment.

Lets call it DCSP 😉

DCSP isn’t a bad idea, but it has its own challenges. For example, some that come at the top of mind are:

  • Most browsers do not ship their own DNS clients, this is one of the problems DNSSEC had in deployment. If the operating system DNS APIs they use do not provide the information they need then they can not adopt any technology depend on it.
  • Middleboxes and captive networks make DNS-based distribution problematic. Again DNSSEC suffered from this.
  • If DCSP requires a custom record type, for example, DNS servers and tooling will need to be updated. DNS servers also do not get updated regularly, this is another thing that has held back DNSSEC as well as CAA. It is fair to say that it is fair for the WebPKI CAs to update their DNS but based on the glacial pace in which CAs adopt technology that is 3-4 years before you could see deployment even with that (see CT as an example).

To be clear, I think something like this is worth exploring but I don’t think it will see meaningful deployment in the near term.

If we were in Vegas, I would say the most expedient path is to build on what is there, while in parallel building the new thing. This can shave as much as 5 years off the time it takes to solve the problem. If nothing else this moves the CA ecosystem closer to an operational maturity capable of supporting the new system.

Ryan

My response, to his response, to my response? or short-lived certificates part 3

The conversation on short-lived certificates and their value continuesIn the most recent conversation, we have started to shift from an either or position to one where we explore what is needed to make revocation checking a viable technology, this is a topic I am passionate about.

That said, there still seems to be some confusion on short-lived certificates, specifically the author states:

Of course, the cost of short-lived certs is very high as change the whole computing infrastructure so that certificates are renewed on a daily basis (daily for it to be secure enough vs 90 day certificates in my view) and introduce this new moving part that might cause vulnerability and operational issues.

Let’s explore this “cost” argument for a moment. First, when I issue an end-entity certificate I minimally have to perform two cryptographic signatures, one on the certificate and one on the OCSP response (it’s actually more than this in some cases but to keep the conversation simple I have omitted the others).

If we look at the performance optimization that Firefox has implemented where they do not require revocation checking when the certificate is within a subset the resolution of the revocation period short-lived certificates can, in fact, reduce the cost for a CA. This is because you no longer need to sign two things during the first few days/hours of the life of the certificate and you do not need to distribute that response.

The only way I think short lived certificates are more expensive for the CA is if you compare a model of certificates issued for a period better measured in hours to a model with certificate validity measured in years providing weekly revocation updates. This, however, is a bad model, and something in-between is needed.

The author also believes the use of automation represents a security vulnerability, so this deserves a response. It is true that complexity is the enemy of security, it is even true that automation can if poorly implemented can hurt security. The inverse is also true, however.  It is also generally accepted that availability is a component of security and one of the more common problems in the WebPKI is poor manual management practices resulting in the lack of understanding what is deployed and those certificates that are deployed expiring [see 1,2,3, and 4] and taking down services.

It is also important to look at the big picture when evaluating the net-security benefits of automation, for example, does anyone honestly believe we would ever get to a world where the majority of the web is encrypted if organizations have to staff people to manually generate certificate requests and hand carry them to CAs? Is the net-benefit of automation of reliability and scope of deployment worth its secondary effects?

The author also suggests that a certificate must have a validity period of only a day to be “secure enough”. This seems both arbitrary and wrong, as stated previously the User Agents and WebTrust allows an OCSP response or CRL can be a week old and still be trusted.

One of the largest reasons for this is that clock skew is a big problem in the real world and as a result, you need to keep validity periods of certificates and revocation messages outside this skew period to prevent skew related failures.

The decision to define “secure enough” at a day, both defines the problem in an intractable way and furthermore ignores the fact that it establishes a double standard that does nothing to address the issue if stale revocation information.

If we were to bring this conversation back to how we improve certificate revocation I would say there should be one standard for how recent the client’s understanding of the certificate’s validity would need to be.

On that topic, the author goes on to discuss how 32 bytes is better than 470 (the size of the smallest OCSP response). I could not agree more about this, in fact in the 90s’ when I was at Valicert we implemented a proposal from Paul Kocher called Certificate Revocation Trees. This approach uses of Merkle Trees (the heart of the Bitcoin ledger) to provide a very space efficient solution to this problem. Unfortunately, we were unable to popularize this at the time.

Ben Laurie began work on a variation on this approach that leveraged sparse Merkle Trees that he called Revocation Transparency. I personally like the idea of this approach because it leverages the work done to make Certificate Transparency scalable. For example, Trillian, the foundational server for Google’s next generation log server is designed to scale to Trillions of certificates.

That said, there are a number of similar approaches that could be equally scalable.

While I do think that an approach similar to the above could be made to work today, I also think it is more of a long-term solution in that even with the significantly increased rate of technological adoption it would take close to ten years given the state of things for such a solution to be fully deployed if we started right now.

As such I would start with the problem definition, which would need to involve a more formal analysis of the role of revocation checking today so that the right solution was built.

In parallel I would want to see the industry adopt a more strategic plan to address the more practical and immediately solvable problems, including:

  • Measuring and improving the revocation infrastructure operated by CAs,
  • Establishing global performance and reliability metrics and reporting that all CAs must meet,
  • Funding improvements to Nginx and Apache’s OCSP Stapling implementations,
  • Working with browsers to adopt the performance optimization firefox has implemented for revocation checking,
  • Working with TLS stacks, User Agents, Servers and Service Providers to adopt OCSP Must-Staple,
  • Defining an OCSP transport based on DNS that would reduce dependency on CA infrastructure reliability,
  • Evangalizing the adoption of OCSP stapling with administrators.

Ryan

P.S.

The author also has also added in someone else who has asked some questions or more correctly seems to question my version of the historical narrative. To provide some context, my narrative comes from my practical experience working with Microsoft, eBay, Amazon, and other large companies in the mid to late 90s and through the mid-2000s.

I too have worked with the BBN Safekeeper, I have a fun story how we hired some people to extract the keys from one of these boxes I would be happy to share over a beer sometime.

However, a cool device, the first one I remember working with was the KOV-8 in the 1993/4 timeframe.

Anyway, it is true that SSL started its life in 1994/5 at which point only software implementations of crypto were used (they were all BSAFE) but it is also true that mass deployment of SSL (relatively speaking) did not start until the late 90s and early 00s and that is the time my narrative was based on.

He also has also questioned the narrative of what Windows supported in the context of key protection. Since the author knows me personally he must have simply forgotten that I was the PM for these technologies and was at Microsoft working in this area for about 15 years.

Again I think there is some confusion here, the author states:

The software based Cryptographic Service Provider for RSA allowed keys to be marked ‘not for export’ from a very early release if not the first.

and:

The CAPI features used to protect private keys were expanded and exposed as a separate API in Windows 2000 as the Data Protection API.

As someone who worked at Microsoft on these technologies for a long time I can say with absolute confidence they were not built to provide key isolation, do not provide key isolation properties and were actually not used by the SSL implementation (SCHANNEL) for the server keys. If you’re interested in learning more about the capabilities of Windows in this area check out this post I did recently.

He has also questioned the role of ValiCert in the definition of the RFC, thankfully the IETF PKIX archives are still there and if you care to look you can see Mike was basically checked out, Warwick was not publicly paticipating and the work to finalize the protocol was largely done by Ambarish Malpani the founder of Valicert.

It is my turn, or short-lived certificates part 2

My response to a recent post suggesting short-lived certificates were intended to remove the need to do revocation seems to have spurred a response from its author.

The thesis of the latest response is that:

Revocation can easily handle “key compromise” situation and do so by offering more security than short-lived certs.

The largest problem with this thesis is that it is based on an incorrect understanding. That being the author believes short-lived certificates do not get checked for revocation, or they believe it is a choice between short-lived certificates and revocation.

Both statements are not true. User Agents do not say “oh, this certification is short-lived so we won’t do revocation checking”. If User Agents do revocation checking of end-entity certificates, and not all do, the short-lived certificate will get checked just like the long-lived certificate.

Now there is a reasonable argument to be made that if CAs are allowed to produce revocation lists and OCSP responses every 7 days, and User agents will trust them for that time, that it wouldn’t be an unreasonable performance optimization. At least if the behavior was limited to certificates with validity periods shorter than 7 days. With that said, User Agents don’t do this today and I didn’t suggest they should in my post.

[Update 10:43PM May 4th 2017 It seems a changed happened while I was asleep at the wheel. Firefox has implemented the below optimization as of Firefox 41. A corrected statement follows in the next  two paragraphs]

Both statements are not true, well except for one performance optimization implemented by Firefox I will explain shortly. Basically, if User Agents do revocation checking of end-entity certificates, and not all do, the short-lived certificate will get checked just like the long-lived certificate as long as it is younger than the corresponding CRL or OCSP response would be (Firefox only).

This performance optimization Firefox implemented is based on the fact that  CAs are allowed to rely on 7-day old OCSP responses and CRLS. As a result, 7 days becomes the precision of revocation checking, It is not clear what value Firefox chose but it is a subset of that figure, not 90 days. But either way, no major CA that I am aware of issues such short-lived certs today due to time skew issues.

The second problem with this thesis is it presumes the user who does know of a compromise wants to announce it to the world and can. As I mentioned in an earlier post on revocation reasons subscribers are not keen to announce to the world that there was “a compromise”. I mention “can” because in some cases the one who knows about the comromise do not even have sufficient permissions to the associated CA consoles to request the revocation.

And finally, the last issue in this thesis is that it presumes an effectiveness of revocation checking. Today over 9% of OCSP responses fail due to issues with the CA’s revocation infrastructure (the connections time out).

That’s right 9 out of 100 revocation checks fail because CAs fail to operate capable enough infrastructure to meet the needs of the clients that rely on them. It is actually worse than that though, the largest websites use a technique called Domain Sharding to make their sites load faster, this means that the failure rate you would experience as a user if hard-fail was implemented could be 2-4x higher than that.

This is before we consider the fact that due to the poor performance and failures in CA revocation infrastructure revocation checking has been largely turned off in Chrome and other browsers.

I say this because for “revocation checking” to work for a key compromise case you need two things:

  • The CA to know the compromise occurred,
  • Revocation checking to actually work.

Now I want to be clear, I think revocation checking is a good thing and I would like to see the situation improved, as a proof of that statement here is an example of some of my work in this area:

  • While at ValiCert (the lead creators of OCSP) I worked on the standardization of RFC 2560 (OCSP) including running the interoperability testing that led to its ultimate standardization,
  • I also am the author of RFC 5019 which is the profile of OCSP in use by CAs and clients,
  • I have led the development of numerous PKI SDKs and servers which have implemented these standards (including leading the team that added support for OCSP to Windows),
  • I led the implementation of the first, and most reliable implementation of OCSP stapling (in SCHANNEL),
  • I led the CASC project to get OCSP stapling added Nginx,
  • I have helped create, and/or recreate numerous CAs and along with their revocation infrastructure,
  • And I led the first wide scale measurement and efforts of the CA revocation infrastructure via my X509LABS Revocation Report project which led to all CAs adopting the same design philosophies to begin to address abhorrent response times and uptime.

Basically, I see the value in revocation checking and think the investments need to be made to make it work and be relevant for the WebPKI of today. That said, this topic has zero to do with short-lived certificates.

Ryan