How is ACME different than XCEP/WSTEP anyway?

If you read my blog there is a reasonable chance that you are familiar with RFC 8555, the standard for Automatic Certificate Management Environment (ACME). Even though ACME is a relatively young protocol it is already used by the majority of websites on the internet for certificate lifecycle management.

While I won’t go into a lot of detail for this post to make sense you have to understand a couple of things about the ACME protocol. 

The first is that it works on the concept of dynamic “account” registration. By that I mean requestors can in real-time, request that an “account” be created for them. This account is represented by a public key pair that the ACME service will use to persist meta-data about the requestor. The ACME service can deny this request for any reason it likes but commonly in Web PKI as long as the request is wellformed it is accepted.

The next thing you need to understand is that it has the concept of “challenges” that can be used to communicate conditions that must be met before a certificate is issued. For example, an ACME service may request that the account holder demonstrate that they are authorized to get a certificate for a given domain name by placing a specific value in DNS at a well-known location. Since only a DNS administrator could perform that action the ACME service can have confidence the requestor controls the name it has requested a certificate for.

And finally, there is nothing in the ACME protocol that limits its use to just web server certificates. It is a general framework that can be used to acquire and manage certificates of any type. For example, CISCO is using ACME in their WebEx offering to facilitate the acquisition of what are essentially email certificates via OIDC authentication as a way to authenticate chat members.

But there is another very popular protocol, a well set of protocols that fewer people know about, that is XCEP and WSTEP. These protocols are used by Windows machines to both determine what kind of certificates a machine or user should enroll for as well as enabling the enrollment for those certificates. 

Similar to ACME these protocols also support, although in a more ridged rigid way, the ability for the issuer to challenge the client for additional information necessary to get a certificate of a particular type. For example, you can configure a certificate type (known as a template) to require that the requestor provide a cryptographic attestation backed by a TPM to be used to prove the machine belongs to the organization operating the certificate authority.

There are differences though, the first of which is the concept of a template, this enables XCEP/WSTEP to have one URL endpoint issue many types of certificates which is very important within an enterprise which is where certificates are used for many different scenarios.

Another difference is that XCEP/WSTEP presume the authorization of the client happened out of the band before the client requested the certificate. The dynamic approach to challenges that was adopted by ACME allowed it to tackle this problem in-band or rely on the out-of-band authorization. It supports this out-of-band concept through the concept of External Account Binding which allows the requestor to use an API key gathered out of band to prove on account creation the account key is associated with some pre-enrolled user.

And finally, ACME has a clear model for extensibility built into it. What this means is that one can easily extend it with additional capabilities. The most fundamental part of this is the Directory resource which lists all of the APIs supported by this ACME instance. One could use this, for example, to add a “Templates” API that would allow an ACME client to request specific types of certificates from the ACME endpoint.

Similarly, the concept of the challenge allows the server to demand the client do any number of things before the certificate is to be issued so the idea of adding a TPM challenge, for example, is trivial within this framework.

In short, ACME, contrary to popular belief, is not a protocol for getting and managing website certificates it is a framework for getting and managing any certificate. More importantly, it is extensible in such a way that with just a few minor additions it would be a proper superset of all the capabilities within the Windows enrollment protocol suite.

Why is this important? That’s easy! When I talk to anyone who is using certificates at any reasonable scale their concerns almost instantly come to the complexity of managing the certificate lifecycle management of those certificates across the various products and services that use them.

When we look at this complexity most of it arises from the use of a mish-mash of solutions for lifecycle management that when viewed in isolation seemed sufficient but when looked at holistically were actually woefully insufficient. 

If as an industry we move these legacy systems to a single protocol so that certificates regardless of them being for public or private PKI or representing users, machines or workloads use one protocol we will have a reliable substrate that we can use to authenticate and authorize with agility.

The next decade of Public Key Infrastructure…

Background

Before we talk about the future we need to make sure we have a decent understanding of the past. X.509 based Public Key Infrastructure originally was created in the late 80s with a focus on enterprise and government use cases.  These use cases were largely for private systems, it was not until a decade later this technology was applied to the internet at large.

Since the standards for enrollment and lifecycle management at the time were building blocks rather than solutions and were designed for government and enterprise use cases rather than the internet, the Web PKI, as it became known, relied largely on manual certificate lifecycle management and a mix of proprietary automation solutions.

While the use of PKI in the enterprise continued, primarily thanks to Microsoft AD/CS and its automatic certificate lifecycle management (I worked on this project), the Web PKI grew in a far more visible way. This was primarily a result of the fact that these certificates had to be acquired manually which led to the creation of an industry focused on sales and marketing of individual certificates.

The actors in this system had no incentive to push automation as it would accelerate the commoditization of their products. The reality was that these organizations had also lost much of their technical chops as they became sales and marketing organizations and could no longer deliver the technology needed to bring this automation anyways.

This changed in 2016 when the Internet Security Research Group, an organization I am involved in, launched Let’s Encrypt. This was an organization of technologists looking to accelerate the adoption of TLS on the web and as such started with a focus on automation as it was clear that without automation growth of HTTPS adoption would continue to be anemic. What many don’t know is in when Let’s Encrypt launched HTTPS adoption was at about 40% and year over year growth was hovering around 2-3%, about the rate of growth of the internet and — it was not accelerating. 

Beyond that TLS related outages were becoming more frequent in the press, even for large organizations. Post mortems would continuously identify the same root causes, a manual process did not get executed or was executed incorrectly.

The launch of Let’s Encrypt gave the Internet the first CA with a standards-based certificate enrollment protocol (ACME), this combined with the short-lived nature of the certificates they issued meant those that adopted it would have to use automation for their services to reliably offer TLS. This enabled products to make TLS work reliably and by default, a great example of this is the Caddy web server. This quickly took the TLS adoption rate to around 10% year over year and now we are hovering around 90%+ HTTPS on the internet.

While this was going on the concept of microservices merged with containers which led to container-orchestration, which later adopted the concept of mesh networking. This mesh networking was often based on mutual-TLS (mTLS). The most visible manifestation of that being SPIFFE, the solution used by Kubernetes.

At the same time, we saw networks becoming more composable, pushing authentication and authorization decisions out to the edge of the network. While this pattern has had several names over the years we now call it Zero Trust and a visible example of that today is Beyond Corp from Google. These solutions again are commonly implemented ontop of mutual TLS (mTLS).

We now also see the concept of Secure Access Service Edge (SASE) or Zero-Trust Edge gaining speed which extends this same pattern to lower-level network definition. Again commonly implemented ontop of mTLS.

The reality is that the Web PKI CAs were so focused on sales and marketing they missed almost all of these trends. You can see them now paying lip service to this by talking about DevOps in their sales and marketing but the reality is that the solutions they offer in this area are both too late and too little. This is why cloud technology providers like Hashicorp and cloud providers like Amazon and Google (I am involved in this also) had to step in and provide their offerings.

We now see that Web PKI CAs are starting to more seriously embrace automation for the public PKI use cases, for example, most of the major CAs now offer ACME support to some degree and generally have begun to more seriously invest in the certificate lifecycle management for other use cases.

That being said many of these CAs are making the same mistakes they have made in the past. Instead of working together and ensuring standards and software exist to make lifecycle management work seamlessly across vendors, most are investing in proprietary solutions that only solve portions of the problems at hand.

What’s next?

The usage of certificates and TLS has expanded massively in the last decade and there is no clear alternative to replace its use so I do not expect the adoption of TLS to wain anytime soon.

What I do think is going to happen is a unification of certificate lifecycle management for private PKI use cases and public PKI use cases. Mesh networking, Zero-Trust, and Zero-Trust edge is going to drive this unification.

This will manifest into the use of ACME for these private PKI use cases, in-fact this has already started, just take a look at Cert Manager and Small Step Certificates as small examples of this trend. 

This combined with the ease of deploying and managing private CAs via the new generation of Cloud CA offerings will result in more private PKIs being deployed and the availability problems from issues like certificate expiration and scalability will no longer be an issue.

We will also see extensions to the ACME protocol that make it easier to leverage existing trust relationships which will simplify the issuance process for private use cases as well as ways to leverage hardware-backed device identity and key protection to make the use of these certificate-based credentials even more secure.

As is always the case the unification of common protocols will enable interoperability across solutions, improve reliability and as a result accelerate the adoption of these patterns across many products and problems.

It will also mean that over time the legacy certificate enrollment protocols such as SCEP, WSTEP/XCEP, CMC, EST, and others will become less common.

Once this transition happens this will lead us to a world where we can apply policy based on subjects, resources, claims, and context across L3 to L7 which will transform the way we think about access control and security segmentation. It will give both more control and visibility into who has access to what.

What does this mean for the Web PKI?

First I should say that Web PKI is not going anywhere – with that said it is evolving.

Beyond the increase in automation and shorter certificate validities over the next decade we will see several changes, one of the more visible will be the move to using dedicated PKI hierarchies for different use cases. For example, we will ultimately see server authentication, client authentication, and document signing move to their own hierarchies. This move will better reflect the intent of the Web PKI and prevent these use cases from holding the Web PKI’s evolution back.

This change will also minimize the browser influence on those other scenarios. It will do this at the expense of greater ecosystem complexity around root distribution but the net positive will be felt regardless. I do think this shift will give the European CAs an advantage in that they can rely on the EUTL for distribution and many non-web user agents simply do not want to manage a root program of their own so the EUTL has the potential to be adopted more. I will add that is my hope these user agents instead adopt solution-specific root programs vs relying on a generic one not built for purpose.

The Web PKI CAs that have not re-built their engineering chops are going to fall further behind the innovation curve. Their shift from engineering companies to sales and marketing companies resulted in them missing the move to the cloud and those companies that are going through digital transformation via the adoption of SaaS, PaaS, and modern cloud infrastructures are unlikely to start that journey by engaging with a traditional Web PKI CA.

To address this reality the Web PKI CAs will need to re-invent themselves into product companies focusing on solving business problems rather than selling certificates that can be used to solve business problems. This will mean, for example, directly offering identity verification services (not selling certificates that contain assertions of identity), providing complete solutions for document signing rather than certificates one can use to sign a document or turnkey solutions for certificate and key lifecycle management for enterprise wireless and other related use cases.

This will all lead to workloads that were once on the Web PKI by happenstance being moved to dedicated workload/ecosystem-specific private PKIs. The upside of this is that the certificates used by these infrastructures will have the opportunity to aggressively profile X.509 vs being forced to carry the two decades of cruft surrounding it like they are today.

The Web PKI CAs will have an opportunity to outsource the root certificate and key management for these use cases and possibly subcontract out CA management for the issuing CAs but many of these “issuing CA” use cases are likely to go to the cloud providers since that is where the workloads will be anyway.

Due to the ongoing balkanization of the internet that is happening through increased regional regulation, we will see smaller CAs get acquired, mainly for their market presence to let the larger providers play more effectively in those markets.

At the same time, new PKI ecosystems like those used for STIR/SHAKEN and various PKIs to support IoT deployments will pop up and as the patterns used by them are found to be inexpensive, effective, and easily deployable they will become more common.

We will also see that the lifecycle management for both public and private PKI will unify ontop of the ACME enrollment protocol and that through that a new generation of device management platforms will be built around a certificate-based device identity anchored in keys bound to hardware where the corresponding certificates contain metadata about the device it is bound to.

This will lay the groundwork for improved network authentication within the enterprise using protocols like EAP-TTLS and EAP-TLS, enable Zero-Trust and Zero-Trust Edge deployments to be more easily deployed which will, in turn, blur the lines further between what is on-premise and what is in the cloud.

This normalization of the device identity concepts we use across solutions and the use of common protocols for credential lifecycle will result in better key hygiene for all use cases, and simplify deployment for those use cases.

Accountability and Transparency in Modern Systems

Over the last several decades we have seen the rate of technological innovation greatly accelerate. A key enabler of that acceleration has been the move to cloud computing which has made it possible for hardware, software, and services to be shared. This significantly reduced both the capital and time necessary to adopt and operate the infrastructure and services built on these platforms.

This migration started by enabling existing software to run using dedicated computers and networks owned and operated by someone else. As these computers got faster and the tools to share the physical hardware and networks were built, the cost of technological innovation reduced significantly. This is what democratized modern startup entrepreneurship as it made it cost-effective for individuals and small businesses to gain access to the resources once only available to the largest companies.

This flipped technological innovation on its head. It used to be that government and big businesses were the exclusive sources of technological innovation because they were the ones who could afford to buy technology. The lowering of the cost of innovation is what gave us the consumer startups we have today. This drew the attention of large companies to this emerging market and led to the creation of the modern smartphone which was fundamental to creating the market opportunity we see in consumer startups. This was a scale opportunity that was fundamentally different than the prior government and enterprise models of innovation.

As enterprises saw the rate of innovation and agility this new model provided, it became clear that they too needed to embrace this model in their businesses. It is this reality that led to the creation of Salesforce, the first Software As a Service, and AWS, the first to market with a modern Cloud Service Provider. It was these offerings that gave us Software As a Service (SaaS), Platform As a Service (PaaS), and what we think of as modern cloud infrastructure.

At first, these enterprises only moved greenfield or very isolated projects to the cloud but as the benefits of the new model became irrefutable and the capabilities of these offerings were enriched in ways that were impractical to replicate in their environments, they started moving more business-critical offerings. We can see this trend continues today, a recent survey found that 55% of IT organizations are now looking at ways to reduce their on-premise spending. This will lead to many legacy systems being replaced with more modern, scalable, agile, and secure solutions.

That same survey found that digital transformation and security are the two biggest reasons for this shift. This is no surprise when we look at how capital efficient modern businesses are relative to those based on legacy IT and manual processes, or how vulnerable legacy IT systems are to modern attacks. 

This does beg the question, what is next?. I believe that two trends are emerging. The first being the democratization of compliance for modern systems and the second being the shift in expectations of what does it mean to be “secure”.

If we look at the first trend, the democratization of compliance, we see the internet becoming balkanized through regulation and governments seeking to get more control over what people do on the internet. Increased regulation makes it significantly harder for new entrants to compete, which in turn helps entrench the incumbents who can often eat the engineering and compliance costs associated with the regulations. When you think about this in the context of the global economy in which the internet exists, an economy made up of 195 independent sovereign countries, the compliance burden becomes untenable.

Modern Cloud Service Providers can make a significant dent in this by making it possible for those who build on them to meet many of these compliance obligations as a byproduct of adopting their platforms.

In the near term, this will likely be focused on the production of the artifacts and audit reports that are needed to meet an organization’s current compliance requirements but if we project out, it will surely evolve to include services for legal identity verification, content moderation, and other areas of regulatory oversight. A decade from now I believe we will see systems being built on these platforms in such a way that they will be continually compliant producing the artifacts necessary to pass audit as a natural byproduct of the way they work. 

This will in turn make it easier to demonstrate compliance and create new opportunities such as auditors continually monitoring an organization for its compliance with guidelines rather than just doing annual point-in-time assessments as is done today.

This has also led to companies like Coalition building offerings that let customers augment existing systems with the artifacts to demonstrate conformance with security best practices are being met so that insurance companies can offer more affordable risk-based insurance policies.

As we look at the second trend, the redefinition of what it means to be secure, we can see consumers becoming more aware of security risks and as a result, their expectations around the sovereignty of their data and the confidentiality of their information evolving. 

One response to this realization is the idea of decentralization. The thesis here is arguably is that there can be no sovereignty as long as there is centralization. In practice, most of these decentralized systems are in-fact quite centralized. While there are many examples of this, one of the more visible has been the DAO hard fork which was done to recover stolen funds or the simple fact that 65% of Bitcoin mining happens in China. Additionally, for the most part, the properties that enable sovereignty typically come from the use of verifiable data-structures and cryptography and not decentralization. That is not to say these systems do not have a place, I would argue that their success and durability so far at least suggests there is “a there, there” but I would also say that, at least currently, they do not yet live up to their full promise. 

Another response to this is the consumer adoption of End-To-End encryption in messaging applications (even iMessage is end-to-end encrypted!) and by extension to that problem the verifiability of the systems that implement these schemes. 

The best example here is probably Signal, they spent time designing security and privacy into their messaging protocol and implementing systems from the beginning, modeling its design on modern threats and decades of learning about what does, and does not work. This approach led to the protocol that they defined being adopted by many of their competitors, including WhatsApp, Facebook Messenger, Skype, and Google Allo.

Signal is also a great example of the verifiability property, in particular, the work they have done with Contact Discovery is exciting. What they have done with this feature is first to minimize what information they need to deliver the capability in the hope to limit future abuse. Secondly, they leveraged technologies like SGX, which is an example of a Confidential Compute, that enables them to demonstrate what they are doing with the information they do collect. This introduces transparency and accountability which both are important ingredients to earning trust.

The use of hardware security as a key component of the security boundary has already found its way from consumer phones, laptops, and tablets to the cloud. For example, Google Cloud‘s Shielded VMs and Azure Trusted Launch use hardware to provide verifiable integrity to VM instances to make it possible to detect VMs compromised by boot- or kernel-level malware or rootkits similar to how Apple does with the iPhone. We also now see AMD Sev and SGX seeing broader deployment in the larger Cloud Service Providers (I will be the first to admit these technologies have room to grow if they are to live up to their promises but they are promising none the less).

With this foundation, the industry is starting to look at how they can bring similar levels of transparency and accountability into applications and ecosystems too. One of the projects that have demonstrated that doing this can have a big impact is Certificate Transparency. As a result of the investments in deploying Certificate Transparency, the internet is now materially more secure than it was before and this is a direct result of introducing accountability into an opaque ecosystem based on blind trust.

Another example in this space is the Golang Checksum Database where verifiable data-structures like Merkle Trees are being used to introduce accountability into the software supply chain as a means to mitigate risks for those who rely on the Golang ecosystem. 

For many problems in the security space, you can solve from one of two philosophical bases. You can either create privileged systems only visible to a few that you hope aren’t corruptible or you can build democratizing transparency into the system as a check on corruption.

Dino A. Dai Zovi

While the earlier examples are using combinations of hardware, cryptography, and verifiable data-structures to deliver on these properties, other examples take a more humble approach. For example, Google Cloud’s Access Transparency uses privilege separation, audit logs, and workflows to provide the fundamental ability to track business justifications for access to systems and data. The existence of these systems is further validation that the trend of verifiability is emerging.

So what should you take away from this post? I suppose there are four key messages:

  1. The definition of security in modern Cloud services is continuing to be influenced by the consumer space which is leading to the concepts of verifiability, accountability, data sovereignty, and confidentiality becoming table stakes.
  2. Globalization and regulation are going to accelerate the adoption of these technologies and patterns as they will ultimately become necessary to meet regulatory expectations.
  3. Increasingly verifiable data structures, cryptography, and hardware security capabilities are being used to make all of this possible.
  4. These trends will lead to the democratization of compliance to the many regulatory schemes that exist in the world.

I believe when we look back, these trends will have significantly changed the way we build systems and a new generation of businesses will emerge enabling these shifts to take place.

Information system security and how little things have changed

When I was a boy my father had me read Plato’s Republic – he wanted to give an oral report on what the key points of the book were and what my personal takeaways were after reading the book.

The first question was easy to answer from the dust jacket or maybe a Cliff Notes (For those of you who have not read the book it is an exploration of the ideas of justice and the ideal government).

With that said, I knew from experience that those personal takeaways are buried in the nuance and no shortcut would satisfy him so off to read I went. What were those takeaways? According to him what I said was: 

  1. The nature of people has not changed much,
  2. The problems we have in government have not changed much.

Why do I bring this up in the context of security? Unfortunately, it is because I do not think things have changed much in security either! I’ll give two examples that stand out to me:

Every program and every privileged user of the system should operate using the least amount of privilege necessary to complete the job.

— Jerome Saltzer, 1974, Communications of the ACM

The moral is obvious. You can’t trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code.

Ken Thompson, 1984, Reflections on Trusting Trust

The first quote is the seminal quote referring to the term “least privilege” – a concept we still struggle to see deployed nearly 50 years later. The term is old enough now the marketers have latched onto it so when you speak to many enterprises they talk about it in the scope of group management and not the more fundamental design paradigm it actually represents.

To put this concept in the context of the network in the 90s we talked about how Firewalls, however necessary, were a bit of an antipattern since they represented “the hard candy shell” containing the “soft gooey sweet stuff” the attacker wants to get at and that as a result, it was better to design security into each endpoint. 

A decade later we were talking about using network-level enforcement via “Network Admission Control” at the switch, later yet via DirectAccess and Network Access Protection we were pushing those same decisions down as close to the end device as we could, and in some cases making each of those endpoints capable of enforcing these access requests.

Today we call this pattern ZeroTrust networking, a leading example of this pattern is called BeyondCorp, but again marketers have latched onto ZeroTrust and as a result, it seems almost every enterprise product I hear about these days claims to offer some sort of ZeroTrust story but few objectively meet the criteria I would define for such a lofty term.

Similarly, if we look at the second quote all we have to do is take a look at the recent SolarWinds debacle and realize that almost nothing has changed since Ken Thompson wrote that paper. We also have dozens of examples of keys being compromised being used to attack the software supply chain, or package repositories and open source dependencies being used as attack vectors. Despite us knowing how significant these issues can be for nearly 40 years we have made very little progress in mitigating these issues.

As they say, there is nothing new under the sun, and this appears to be especially true with security. If so why is this the case? How is it we have made so little progress on these fundamental problems as an industry?

Unfortunately, I think it boils down to that customers don’t care until it is too late and this makes it hard for the industry to justify the kinds of fundamental investments necessary to protect the next generation from these decades old.

How do we improve the state of affairs here? Thats really the question, one I don’t have a good answer to.

Safes and Transparency

Lately, I have been thinking about the history of defensive security technology. One of the purest examples here can be found in safes and vaults. The core purpose of a safe is obvious, to make it cost-prohibitive for an attacker to gain access to whatever is inside without being detected.

With that said, the topic is a lot more nuanced than it seems on the surface. If we look at a safe used by a typical community bank in the 1800s, one of the things you will notice is that they often have ornate decorations on their exteriors, beautifully designed locking mechanisms and their locking mechanisms are covered by specific patents. These traits were clearly designed to signal something to the visitors of the bank, namely that they use the latest technology to keep your valuables safe.

Beyond the messaging buried in the design, these safes were also designed to mitigate specific threats, for example, In the mid-1800s it was common for attackers to steal safes, use explosives to open them and to kidnap those that had access to the secrets necessary to open a safe, or those near and dear to them. 

In response to this reality, safe manufacturers started to use materials like manganese to manufacture safes, making the walls very thick and as a result very heavy (often 3 tons or more!), rounding corners, and using locking cylinder-shaped doors in combination to make theft or the use of explosives no longer interesting vectors for an attack.

These changes, combined with artful customizations also provided a way for banks to ensure that sophisticated thieves could not replace a safe in order to delay the detection time and have a safer getaway.

They also started incorporating time locks, to make it so if someone was kidnapped, they would still not be able to open the safe outside core business hours, essentially enabling the creation of a fully disclosed ledger of all goods stored in or withdrawn from the safe.

A famous example here is from 1876 in the robbery of the Great Northfield Minnesota Bank by Jesse James and the Cole Younger gang, it was foiled due to a safe with these design characteristics.

As I think about the parallels in modern technology, I can not help but to come back to a post I did this last year titled “An Evolution of Security Thinking’, in particular how we have gone from security as something you added after the fact to one where it is built into a system from the get-go. Moreover, it seems that these safes may also represent one fo the first examples of transparency being applied as a technique used to dissuade an attacker.

If a safe has no tumbler on the outside, what good would it do to kidnap the bank manager? As a result, the attacker is forced to attempt their theft during business hours when the bank was busy and they would have a larger chance of getting caught.

If it is obvious a safe has 12” thick walls and weighs in at over 3 tons, then stealing the safe at night, or using explosives to open the safe, given the skills and resources of the attacker, is no longer a viable path of compromise either. As a result, forcing the assailant to attack the bank during the day, when the vault may already be opened.

The safe manufactures, by making their designs, and mitigations clear, were attempting to dissuade attackers from even attempting their attack. This is not materially different from how today we are applying the concepts of cryptographic transparency as a tool to mitigate other attacks.

In short, transparent systems are essentially the antithesis of security by obscurity. While designing a system to be cryptographically verifiable does not necessarily require the contents of that system to be known, just as the safe design doesn’t require the contents of the safe itself to be known, the use of these patterns makes it possible to intelligently reason about the security and integrity of the system.

Just a thought…..

P.S. Thanks to Fotis Loukos and Yael Grauer for providing feedback on this post. 

Software Supply Chain Risk Mitigation

Increasingly we are seeing attacks against what is now commonly referred to as the software supply chain.

One of the more notable examples in the last few months was from the Nodejs package management ecosystem [1]. In this case, an attacker convinced the owner of a popular but unmaintained Node package to transfer ownership to them. The attacker than crafted a version of the package that unsuccessfully attacked Copay, a bitcoin wallet platform.

This is just one example of this class of attack, insider attacks of the software supply chain are also becoming more prevalent. When looking at this risk it holistically it is also important to realize that as deployments move to the Cloud the lines between software and services also blur.

Though, not specifically an example of a Cloud deployment issue, in 2015 there was a public story of how some Facebooks employees have the ability to log into users accounts without the target user’s knowledge [2]. This insider risk variant of the supply chain exists in the Cloud in a number of different areas.

Probably the most notable being in the container images provided by their Cloud provider. It is conceivable that a Cloud provider could be compelled by government to build images that would attack a specific or set of customers as part of an investigation, or that an employee would do so under compulsion or in service of personal interests.

This is not a new risk, in fact, management of internal and external dependencies has always been core to building secure systems. What has changed is that in the rush to the Cloud and Open Source users have adopted the tools and resources these cloud providers have built to make this migration easier without fully understanding and managing this risk that they have assumed in doing so.

In response to this reality, Cloud providers are starting to provide tools to help mitigate this risk, some such examples include:

  • Providing audit records of employee access to customer data and services,
  • Building solutions to provide hardware-based trusted execution environments that provide some level of protection from cloud providers.
  • Offering hardware key management solutions provided by third-parties to protect sensitive key material,
  • Cryptographically signing the binaries and images that are published so that their distribution is controlled and tampering post-production can be detected.

Despite these advancements, there is still a long way to go to mitigate these risks in a holistic fashion.

One effort in this area I am actively involved in is in the adoption of the concept of Binary Transparency. This can be thought of as an evolution of legacy code signing models. In these solutions, a publisher places a cryptographic signature using a private key associated with a public certificate of some sort that is either directly trusted based on package origin and signature (such as with GPG signatures) or is authenticated based on the legal identity of the publisher of the package (as is the case with Authenticode).

These solutions, while valuable, help you authenticate a package but they do not provide you the tools to understand the history of that package. As a result, these publishers can produce packages either accidentally or on purpose that are malicious in nature that is signed with their “trusted keys” and it is not detectable until it is too late.

As an example of this risk, you only need to look at RealTek, over the years numerous times their code signing key has been compromised and used to produce malware, some of it targeted such as in the case of Stuxnet [3].

Binary Transparency addresses this risk in a few ways. At its core Binary Transparency can be thought of as an append-only ledger listing of all versions of a given binary, each of these versions having a pointer to a content addressable store where that binary is available.

This design enables the runtime that will execute the binary to do a few things that were not possible, It can, for example, ensure it is running the most recent version of a binary and to only run the binary when it, and some number of previous revisions are publicly discoverable. This also enables the relying parties of the published binaries and images to comp it can inspect all versions and potentially diff those versions to understand the differences.

When this technique is combined with the concept of reproducible builds, as is provided by Go [4] and a community of these append-only logs and auditors of those logs you can get strong assurances that:

  • You are running the same version as everyone else,
  • That the binary you are running is reproducible from the source you can review,
  • The binary are running has not neen modified since it was published,
  • That you, and others, will not run binaries or images that have not been made publicly available for inspection.

A system with these properties disincentivizes the attacker from executing these attacks as it significantly increases the probability of being caught and helps bound the impact of any compromise.

Importantly, by doing these things, it makes it possible to increase the trust in the Cloud offering because it minimizes the amount of trust the user must put into the Cloud provider to remain honest.

A recent project that implements these concepts is the Go Module Transparency project [5] [6].

Over time we will see these same techniques applied to other areas [7] [8] of the software supply chain, and with that trend, users of open source packages, automatic update systems, and the Cloud will be able to have increased peace of mind that their external dependencies are truly delivering on their promises.


  • [1] Node.js Event-Stream Hack Exposes Supply Chain Security Risks
  • [2] Facebook Engineers Can Access Your Account Without A Password
  • [3] STUXNET Malware Targets SCADA Systems
  • [4] REPRODUCING GO BINARIES BYTE-BY-BYTE
  • [5] Proposal: Secure the Public Go Module Ecosystem
  • [6] Transparent Logs for Skeptical Clients
  • [7] Firefox Security/Binary Transparency
  • [8] Contour: A Practical System for Binary Transparency

Secure, Privacy Preserving Key Discovery for End-To-End Encryption

A lot of products today claim to offer End-To-End Encryption but not all of these products offer the same level of protection. Some of the differences between these solutions are rooted in the protocols and cryptography that they use, in some, it is in the way they are implemented and in others it is the way they handle the discovery of the cryptographic keys of the peers involved in the session.

The topic of key discovery itself is a complicated one, on its surface, for a messaging application all you need to do is go to a directory to request the public key pairs associated with the user or their devices you will communicate with. Where things get tricky is how, as a relying party, you can tell if the key discovery mechanism is lying to you or not.

This is important because if the key discovery server is lying to you it can facilitate an impersonation of that user, add a hidden third-party to the encrypted session without your knowledge, or potentially trigger a re-encryption to a device not under your control without your knowledge.

To understand the implications here you just need to look at iMessage. Although many do not know this iMessage is actually End-to-End Encrypted! Matthew Green has done several great write-ups on its protocol [1] [2] and how the lack of verifiability in the key discovery mechanism utilized weakens the overall solution.

The most used End-to-End Encrypted messaging application is probably Facebook’s What’s App. Several years ago a security researcher [3] reached out to The Guardian to discuss what they described as a “backdoor” in What’s App, this “backdoor” was related to how it handled key discovery in device recovery use cases.

As a product person, you often need to make trade-offs to achieve your goals and that was what happened in this case. This “backdoor” was a design decision that was made to ensure billions of users could get some of the End-to-End encryption protections without compromising usability.

A number of security researchers, including myself, spoke up [4] which resulted in the article being updated to correctly reflect this reality [5] flawed reporting about WhatsApp.

Later WhatsApp and how Key Discovery happens came up in the news again, this time in an article from Wired [6. Alex Stamos, the former Chief Security Officer of Facebook, responded to this article [7] affirming some of the article’s points and talking about how a conscious decision was made to enable the associated use case:

“Read the Wired article today about WhatsApp – scary headline! But there is no secret way into WhatsApp groups chats. The article makes a few key points.”

While is response may be true, it is nor verifiably true as it relies on the behavior of the client and not cryptographic verifiability.

This is where systems like CONiKS [8], Keybase [9] and Google’s Key Transparency [10] come into play.

These solutions aim to enable automated trust establishment with untrusted communication through the use of an auditable directory of all of its users’ keys both past and present.

The fact that these solutions provide the auditable history of keys means that both the relying party and subscriber involved in the communication can reliably be made aware of when new keys have been associated with a users account, and importantly what entity added the key to the account.

With this information, they applications the users are using can both prevent messagings (via policy) being sent or notify the user when keys have changed unexpectedly.

This allows messaging clients to verify the identity of users automatically and prevents malicious/compromised servers from hijacking secure communications without getting caught.

On the surface, this sounds much easier than it is to acomplish at least at scale. WhatsApp serves over a billion users, any solution needs to be able to deal with key updates and reads at rates necessary to support such a large user base.

It needs to do this without leaking metadata associated with who the users are communicating with.

And do this without significantly increasing the amount of data a user must download or the time it takes to change keys.

While these are all tractable problems, they are not problems that are solved today in this context.

For this reason, applications that implement End-To-End Encryption typically either provide a mechanism that users who care about these risks can use to out of band verify cryptographic keys in person [11] or simply implicitly trust the key discovery service as an honest actor.

At Google, I have the pleasure of working on Google’s answer to this problem [12]. It is our hope that when complete that applications that need to securely discover keys in a verifiable way can simply download our solution and focus on their application and not need to spend years of energy to solve this problem for their applications.

I firmly believe the best way to ensure the right thing happens is to make sure that the right way is the easy way and fundamentally that is the goal for the Google Key Transparency effort.


  • [1] Attack of the Week: Apple iMessage
  • [2] Let’s talk about iMessage (again)
  • [3] The Guardian is backtracking on a controversial story about WhatsApp
  • [4] Security researchers call for Guardian to retract false WhatsApp “backdoor” story
  • [5] Flawed reporting about WhatsApp
  • [7] Read the Wired article today about WhatsApp – scary headline!
  • [8] CONIKS Project
  • [9] OKCUPID’S FOUNDERS WANT TO BRING ENCRYPTED EMAIL TO THE MASSES
  • [10] Google’s Key Transparency project aims to ease a tough task in cryptography)
  • [11] Safety number updates

Why I chose UniFi vs AmpliFi HD


I just had a brief exchange with a friend on Twitter who suggested that AmpliFi HD, not UniFi was the product Ubiquiti was building for users like me.

I thought folks might be interested in why I didn’t go that route so here is another post 🙂

I did look at AmpliFi HD, in fact, my eldest son tried to convince me to ditch the Google WiFi for AmpliFi HD shortly after it came out.

When I looked into the AmpliFi HD, my conclusion was that it was a less well featured (e.g. it didn’t seem to have the home automation, parental controls, etc features) Google WiFi with better radios and I was largely satisfied with the radio coverage I had with Google WiFi so I was not compelled to make the change.

One of the pain points I did have with my Google WiFi solution was that I had to find places to stash four Google WiFi access points to get sufficient coverage for all the devices in my network. The devices themselves look OK but we really do try to hide all the tech in the house and live by the motto “less is more” so this is a pain we did feel.

The AmpliFi didn’t really have a solution to this problem either, in fact I now probably needed more, smaller units for proper coverage. The upside of which is that those smaller units would have been less visible which would have been nice. On the other hand, I find the kids are often unplugging things in the house to free up outlets or to simply mess with me and the design of the AmpliFi mesh units are such I feared that would happen a lot.

When I looked at the UI on the AmpliFi products my conclusion was it was a stripped down UniFi vs a product designed as a high-end WiFi product. This is in contrast to the Google WiFi which felt like it was a sincere attempt at rethinking the whole user experience.

This combined with the lack of integration with a larger ecosystem (home automation, etc) made it really hard to justify migrating off of Google WiFi.

My conclusion (right or otherwise) from my research was at best I would end up with marginally better coverage and a new set of limitations as a trade-off. It just did not justify the change.

When I re-visited the decision to replace my wireless deployment I was more-or-less fed-up. I did not want to mess with this again anytime soon so I decided to go big or go home. This led me to the switch to UniFi which in turn also led me to the switch to Protect.

If I was the target user the AmpliFi team was looking for I think they missed a few things:

  • I want less clutter, not more, the square design of the AmpliFi presumes public display of a piece of electronics I don’t want that.
  • The mesh does not support wired backhaul, and the distance between where it would be natural to use them would be quite far. Wireless backhaul had caused me some pain with Google WiFi so I was not sure this would work well for me.
  • I also didn’t want 4-6 outlets being occupied in the house, even though the mesh adapters are smaller than the Google Wifi, more is still a pain, especially given kids are not likely to leave them alone.
  • I have some basic home automation and the AmpliFi product doesn’t offer any story here.
  • I liked the parental controls I had with Google WiFi and it seemed I could approximate that but not in an easy way.
  • I liked how I can manage my parents and cousins WiFi’s in Google WiFi; it gives me a one-stop shop for how to deal with issues when people call me. I recall coming to the conclusion this was missing and if nothing else the friction of replacing their WiFi’s to be uniform would have been a barrier.
  • I have Fiber and I understood you had to run the device in bridge mode in this case service which takes away a lot of the features of the AmpliFi HD system.
  • The CloudKey Gen2 Plus having the built-in NVR meant I could consolidate how I dealt with cameras at the same time; one less thing to deal with and after a year the cost savings would allow me to break even and later save.

I basically concluded that my home was “too big” for the AmpliFi HD and that the incremental benefit of switching to it from Google WiFi was not worth the effort.

This could be marketing, this could also be poor product planning, or maybe I was just not the target customer. It is hard to say without knowing a bit more about how the product planning was done here.

In any event as the earlier post states, I’ve gone all UniFi now and I look forward to seeing how that works for us over the next year.

Google Wifi + NEST Cameras vs Ubiquiti for Home Use

I recently made the switch from Google WIFI and NEST Cameras to Ubiquiti Unfi and Protect. A few things motivated these changes and I wanted to talk about them in this blog post.

Background

Google WiFi

The most significant motivator was some network reliability issues that I was experiencing on the Google WiFi. In the end, the problem was not related to the Google WiFi but I could not diagnose without logs which the Google Wifi encrypts. Though I was able to walk through the issue with Google support and ultimately able to localize the issue it took several days of back and forth and required me walking them through exactly what to look for.

The Google Wifi actually performed great overall but we do have an above average number of devices in our house and sometimes we would experience what I believed to be congestion. This is likely because Google Wifi only supports SU-MIMO, the UniFi solution, on the other hand, supports MU‑MIMO. MU-MIMO allows a Wi-Fi router to communicate with multiple devices simultaneously. This decreases the time each device has to wait for a signal and dramatically speeds up the network as a result.

I also experienced some cases where the Google WiFi was falling back to the Mesh wireless solution even though I had a wired backhaul. I never figured out why this was happening but it was not a huge issue.

Finally, we have an outbuilding that is currently using our guest network but since it is on a Guest Network it can not do any IOT style networking where one device talks to another. To address this I needed to either set it up with a physically isolated WiFi of its own or configure a VLAN which I could not do with the Google WiFi.

As a plus, since it is a product designed for home it has features like parental controls which are useful and though it could it could use some work on usability it was actually quite useful.

To be honest, I can not say enough positive things about the Google WiFi, it is a great product that for 99% of people is probably perfect but the sad reality is that we started to outgrow it.

Google NEST Cameras

We had five Google NEST Outdoor Cameras and a Hello doorbell at our house. They worked great and were pretty reliable. We really only had four complaints about these devices.

The first of which is that they did not support POE, this meant when we set them up we had to buy USB to POE adapters and find ways to hide the long and bulky USB power cable they came with.

The second issue is that some of the cameras were on the absolute edge of our wireless network and we would, in rough weather, lose the wireless connection as a result. We did buy another Google WiFi to help with this issue but again it would have been ideal if the cameras were POE based and then this wouldn’t have been an issue.

The third issue is that the move notifications tended to be a bit annoying, we did configure zones help manage this but it was still more obnoxious than I would have liked. To configure zones we had to pay the per camera monthly fee also, this did feel a little bit like extortion — e.g. pay us not to annoy you with notifications.

The fourth and final issue was that the cost and nature of cloud storage. With a total of six cameras, the yearly cost of the NEST solution was significant. It also was dependent on cloud storage which meant my data was being stored exclusively on the cloud. As a Google employee, I have faith in the companies practices relative to managing this data but the recent issues with Ring and Alexa from Amazon poorly managing the data they store relative to their competitive offerings did give me pause.

The reality is that if it were not for the Google WiFi change I discuss above I would have likely kept the NEST Cameras. This is because, despite the above, I was pretty happy with the solution but since I was buying into the Ubiquiti ecosystem it felt like unifying on their solution would not only address the above concerns but overall make things simpler to manage in the long run.

UniFi Wireless

Despite being a very advanced product capability wise it has a pretty easy to use interface for managing. I wouldn’t recommend putting the concepts it exposes in front of the type of users I end up supporting in my personal life but the reality is once it is set up you never really have to deal with that stuff.

Since it is really designed as a business solution and not a home solution it is missing some features that a modern home user might expect. For example, it has no way to share IOT devices as Google WiFi does. It is not integrated with home automation systems either, for example, you can’t use presence and activity of devices to infer if people are home as part of the way you configure your home automation. And it has no “parental controls” concept, though you can manually configure something roughly equivalent.

With that said, since UniFi was designed for businesses, many of its access points are physically attached to the house. This means you need to run wires in walls but it also means you do not have a pile of devices sitting around on horizontal surfaces.

It also does smart channel and power management so you don’t need to worry about such things, so similar to Google WiFi it is largely a set it and forget it solution.

What you end up with when you go with a UniFi based solution is a professional, flexible, moderately easy to use, high-performance solution that is physically installed and as a result non-intrusive to the overall environment.

UniFi Protect

Ubiquiti has two video solutions, Unfi Video that is slowly being replaced and UniFi Protect. I am using the UniFi Protect offering as it is integrated with the CloudKey Gen 2 Plus which I am using to manage my wireless.

The Ubiquiti cameras I chose are the G3, mainly because they were the cheapest of the set and seemed approximately comparable to the NEST Cameras they were replacing. This was important as I intended to sell my NEST cameras to cover the cost of the change.

The G3’s do not have as nice an industrial design as the NEST cameras, they also look more commercial and essentially have no market of third-party accessories (for example skins to obscure the cameras) but they look reasonable enough.

The G3 also does not have a speaker (some other models do, for example, the G3 Micro, though it is an indoor camera) so there is no chance of two-way communication, though they do have a microphone so you can record what’s going on with the video.

I think the biggest gap in the G3 cameras relative to the NEST is they have no zoom, you have to step up to the G3 PRO which is 3x the cost of the G3 to get this.

The upside of this solution over the NEST can be summarized as:

  • No monthly fee per camera,
  • Cheaper cost per camera,
  • Data is stored locally vs on a public cloud.

There are some things that I will miss from the NEST solution, in particular:

  • Using computer vision to analyze the video, for example, do not send notifications when it is a family member, send notifications when a familiar face is seen, or ignore movement unless you see a person (some of these capabilities are only available with the new NEST Cam IQ camera).
  • It is not integrated with home automation systems, Alexa, Google Home or Siri. For example with Google Home, you can ask Google what is happening on a given camera and it will display it on your TV.
  • Having an integrated doorbell solution. I will be keeping NEST Hello, for now, to fill this gap, though having one camera there and the rest in another system is far from ideal.
  • There are no applications to integrate the cameras with AppleTV or ChromeCast so getting the cameras displayed on these devices will involve casting a browser session which lame.

With all that said, the TCO for a multi-camera NEST system is pretty high if you want to retain video and the Ubiquiti solution addresses this effectively.

Wishlist For Ubiquiti

I am installing this system into a home, and that’s not squarely where Ubiquiti is aiming this product at. With that said many new homes get Ubiquiti installs now and if I was in the product team at Ubiquiti I would seriously be looking at what I could do to better serve that market.

Based on my current experience with the product here are some things I think would be nice to have from Ubiquiti.

  • A doorbell camera, it is a shame I need to have to keep the NEST camera to have a complete solution.
  • There should be better camera choices; not having a zoom or speaker in a security camera in 2019 is lame.
  • It is disappointing there is no affordable 4k camera option when consumer products do offer them.
  • I would love to see a less obvious industrial design for the cameras that would work well with skins so you can hide the cameras more easily.
  • Produce a rack kit that allows placing both the security gateway and the CloudKey in a single 1U rack location.
  • I would like to be able to put the Protect server and cameras on one VLAN leaving the network controller on another; they are two different security domains and shouldn’t have to be co-mingled like they are currently [added this to the list after the article was posted].
  • There should be better integration between SDN and Protect, for example, I should not have to set aliases in both manually [added this to the list after the article was posted].
  • If I am going to have to have a Nest Hello and the Protect software it would be ideal if the Nest Hello was integrated into Protect [added this to the list after the article was posted].
  • Integration with Alexa, Siri and Google Home should be in the box.
  • Basic computer vision capabilities in-box, or at least able to opt in to use a cloud CV solution such as Google Vision API or Amazon Rekognition to do intelligent filtering of movement signals in the video.
  • Register the UbiquitiHome.com domain, do dynamic domain registration for subdomains/hosts as part of the on-boarding experience in setup, use Let’s Encrypt to get a certificate for that domain and do away with the self-signed certificate that is currently used.
  • Since the product line is geared towards small businesses and I suspect a good chunk of the home user market is enthusiasts it would be great to have a robust REST API with Webhooks available so custom solutions could easily be added without going into the database to extend capabilities.
  • With a robust set of REST APIs, they could offer a marketplace of applications that users could use to integrate with other systems (IFTT, Google Home, Alexa, etc).
  • Alarm.com integration of NEST Protect would probably be a real winner for the enthusiast community and I would explore a partnership there if I were Ubiquiti.

In Summary

Though I am technically not even done with my Ubiquiti journey it is clear that so far the Ubiquiti networking solution is technically superior but their camera offering still leaves a bit to be desired.

It does seem with the introduction of Ubiquiti Protect which currently has a 20 camera limit, they are looking at how they can better serve users like me. That said, only time will tell how far they go towards providing solutions that are competitive with the consumer-focused offerings.

ResortQuest, Wyndham, Home Away, 2 inches of water, vomit stench and bad management.

We recently had a family vacation in Miramar Beach Florida, where we stayed at the SurfSide in unit #502. The experience was, shall we say less than we expected. The management company was incompetent, did not meet their legal requirements when responding to an emergency and left us in a bad situation despite a ton of attempts on our side to work with them.

To top things off HomeAway refused to provide even the most minimal levels of assistance when the management company failed to live up to their obligations.

Bellow is my unedited review of that experience, hopefully, it will help someone in the future.

This is our second stop while in the Destin area. Our first was as a described and great experience provided by Southern Vacation Rentals, this stay, however, left us, literally with a bad taste in our mouth.

Pulling up to the building it was obvious it had seen better days but on other vacations, we have had similar thoughts and were pleasantly surprised when we got to the room.

This time, however, when we opened the door a sour smell reminiscent of dried vomit🤮 engulfed us. A quick inspection revealed that it originated from the couches and the rugs. They, like everything in this unit, are well used and poorly taken care of. It’s clear this is a rental unit where only tenants or service people visit as I have to believe a owner would take care of the things we had to deal with.

We contacted the management firm and they said they would send house cleaning to take care of the smell.  House cleaning did not show up, so to manage the smell we had to put the couch cushions and slips on the deck as it would not be possible to stay in here with that odor. We called again that night and were told someone would be here in the morning. They did not show.

The furniture is ready to be replaced, the refrigerator was dirty inside and out, to top it off the wires that power the light inside it are hanging out due to a cracked housing.

The unit appears to have been remodeled over a decade ago but only minimal maintenance has been done since then.

Though it is clear the room was lightly cleaned prior to our visit I doubt it’s had a real deep cleaning in a long time as there was splattered food on the wall and the drinking glasses were sticky so we washed them all on arrival.

To top off the above we tried to do a load of laundry and the washer flooded the apartment bathroom, hallway, master bedroom and the hallway leading to the unit.

We called the management firm and they said they could not get ahold of maintenance and asked us to spend our evening to clean up the water as best we can, which we did. They did offer to have someone come in the morning (sound familiar?) to take care of what’s left.

It was clear the management firm was concerned with the potential damage but they did not seem to care about our situation at all. Since they couldn’t reach maintenance we also could get no replacement towels so no showers in the morning.

I should note that it’s clear this flooding has happened before because the trim in the hallway of the unit shows clear signs of past water damage.

We asked the management firm to move us to a different unit, after all, if vomit stench and a flood were not enough to justify that what would be? Unfortunately, the best they could offer was one day at another unit a unit 30 minutes away having to return the next day. Since it was already midnight and it would have only been for that night we passed.

If you recall they said someone would come in the morning to take care of the flooding, you guessed it — they never showed up.

We called again in the evening and spoke to the manager for the site and he apologized for the lack of response on prior calls and promised someone would be here to clean up and provide us towels tonight since we have six people and no towels. Of course, no one showed with towels.

We also tried to warm milk in the microwave today but it too doesn’t work,  yet another work order has been filed.

On our last day, the manager contacted us asking if we had gotten the towels he had sent, we had not. A few minutes later towels do show up, we now have had enough towels for a week but we leave in the morning.

The manager did finally offer a concession for this ridiculousness, $150 for the inconvenience. It took three of us Three hours to clean up the water alone. So it looks like they are valuing our time at a little over $15 an hour, and could care less about the inconvenience (no towels, no clean clothes,  no microwave, disgusting odor, time lost doing basic housekeeping, time wasted trying to get them to do their jobs, etc) and then there is the intangible damage they did to our vacation they place no value on.

The reality is beyond the wasted time and inconvenience we were able to use less than half of this unit. The living room was largely unusable due to lack of a place to sit, the deck was at least 1/4 unusable as it held the stinky couch bits, there was no laundry, no cooking, and no towels.

I guess it’s only fair to share the good stuff too, while dealing with vomit stench,  cleaning up a massive amount of water and failing to use the appliances we had an opportunity to think about the  locations fantastic view, if you choose to stay here despite what I shared you will rest assured you will have an amazing view of the gulf and a nice deck to enjoy it from while your not cleaning up a mess.