Monthly Archives: March 2023

Towards Greater Accountability: A Proposal for CA Issuance Decision Logs

It took us a long time, but objectively, Certificate Transparency is a success. We had to make numerous technology tradeoffs to make it something that the CAs would adopt, some of which introduced problems that took even longer to tackle. However, we managed to create an ecosystem that makes the issuance practices of the web transparent and verifiable.

We need to go further, though. One of the next bits of transparency I would like to see is CAs producing logs of what went into their issuance decisions and making this information public. This would be useful in several scenarios. For example, imagine a domain owner who discovers a certificate was issued for their domain that they didn’t directly request. They could look at this log and see an audit history of all the inputs that went into the decision to issue, such as:

  • When was the request for the certificate received?
  • What is the hash of the associated ACME key that made the request?
  • When CAA was checked, what did it say? Was it checked via multiple perspectives? Did all perspectives agree with the contents?
  • When Domain Control was checked, did it pass? Which methods were used? Was multiple perspectives used? Did all perspectives agree with the contents?
  • What time was the pre-certificate published? What CT logs was it published to?
  • What time was the certificate issued?
  • What time was the certificate picked up?

This is just an example list, but hopefully, it is enough to give the idea enough shape for the purpose of this post. The idea here is that the CA could publish this information into cheap block storage files, possibly. I imagine a directory structure something like: ” /<CA CERTHASH>/<SUBJECT CERTHASH>/log”

The log itself could be a merkle tree of these values, and at the root of the directory structure, there could be a merkle tree of all the associated logs. Though the verifiability would not necessarily be relied upon initially, doing the logging in this fashion would make it possible for these logs to be tamper-evident over time with the addition of monitors.

The idea is that these logs could be issued asynchronously, signed with software-backed keys, and produced in batches, which would make them very inexpensive to produce. Not only would these logs help the domain owner, but they would also help researchers who try to understand the WebPKI, and ultimately, it could help the root programs better manage the CA ecosystem.

This would go a long way to improving the transparency into CA operations and I hope we see this pattern or something similar to it adopted sooner rather than later.

Exploring the Potential of Domain Control Notaries for MPDV in WebPKI

In an earlier post on the Role of Multiple Perspective Domain Control Validation (MPDV) in the WebPKI, I discussed how there was an opportunity for CAs to work together to reduce the cost of meeting the upcoming requirements while also improving the security of the ultimate solution.

In this post, I want to talk a little about an idea I have been discussing for a while: specifically, Domain Control Notaries.

Before I explain the idea, let’s first look at how domain control verification (MPDV) happens. In short, the CA generates a random number and asks the certificate requestor to place that number in a location that only an administrator can access, for example, in a DNS record, in part of the TLS exchange, or in a well-known location.

If the CA is able to fetch the number, and the underlying network is living up to its promises, it can have confidence that the requestor is likely authorized for the given domain.

To understand Domain Control Notaries you will also need to have a basic understanding of what MPDV means. Thankfully, that’s easy: do that from multiple network locations and require a quorum policy to be met. This basically means that an attacker would have to tick enough of the quorum participants to bypass the quorum policy.

So that takes us back to the idea of Domain Control Notaries. As you can see, Domain Control Verification is super simple. That means you could put it on a very small computer, and it could be able to perform this simple task. For example, imagine a USB Armory that was fused at manufacturing time with firmware that only did these domain control checks. That this hardware also had a cryptographically unique key derived from the hardware fused to the device at manufacturing time.

Now imagine an aggregator handling admissions control for a network of these Domain Control Notaries. This aggregator would only allow devices that were manufactured to meet these basic requirements. It would enforce this because the manufacturer would publish a list of the public keys of each of the devices to the aggregator, which would use for admission control.

This aggregator would expose a simple REST API that would let the requestor specify some basic policy on how many of these Domain Control Notaries to broadcast their request to, what domain control methods to use, and a callback URL to be used by the aggregator when the verification is complete.

These aggregators would only accept responses from the Domain Control Notaries that signed their responses and whose keys were on this authorized list and were not added to their deny lists.

This basic framework sets you up with a network of very cheap network endpoints that can be used to perform domain control verification. You could even have a few of these aggregators each with its own Domain Control Notaries. CAs could use multiple of these aggregator networks to reduce centralization risk.

You might be asking yourself how these tiny computers could deal with the scale and performance of this task! The reality is that in the grand scheme of things, the WebPKI is relatively small! It is responsible for only 257,035 certificates every hour. The real number is actually smaller than this too because that includes some pre-certificates and in the context of domain control verification. CAs are also allowed to do some re-use of past validations if recent enough. This means we should be able to use this as a worst-case number safely. Simply put, that is only 1.18 certificates every second. That is tiny. If you spread that out over a few hundred Domain Control Notaries, the number of transactions gets that much smaller.

Then there is the question of who would run these Domain Control Notaries? A lesson learned from Certificate Transparency is that if you make it easy and cheap to participate and make it easy to both come and go, most organizations are willing to help. You’re basically asking an operator to provide a few lightbulbs of electricity and a tiny amount of network connectivity. This is easy for most organizations to sign up for since there is no tax in turning it down, and no impact if there is an outage.

Remember how I said there could be multiple aggregators? These aggregators could also expose more expensive heavy-weight nodes that were not reliant on Domain Control Notaries to provide a more reliable substrate to power such a network.

That’s Domain Control Notaries. Maybe they can be a tool to help us get to this world of MPDV everywhere.

Strengthening Domain Control Verification: The Role of Multiple Perspectives and Collaboration

The security and stability of encryption on the web rely on robust domain control verification. Certificate Authorities in the WebPKI are increasingly facing attacks that exploit weaknesses in the Border Gateway Protocol (BGP) ecosystem to acquire certificates for domains they do not legitimately control.

While Resource Public Key Infrastructure (RPKI) has the potential to mitigate these threats, its adoption is hindered by several structural barriers that have slowed its adoption.

In response, larger more security-minded CAs have started embracing the concept of Multiple Perspective Domain Control Verification (MPDV) to enhance their defenses. The fundamental idea of MPDV is that before issuing a certificate, the CA will require numerous network perspectives to agree that the domain control verification criteria have been met.

Researchers at Princeton University have played a significant role in this journey in various ways, including raising awareness about the issue, evaluating the effectiveness of different MPDV implementations, and helping determine efficient quorum policies.

This combination has led to Google Chrome signaling an intention to require MPDV from all CAs. This indicates that there is enough data to demonstrate this is both valuable and doable and I agree with this conclusion.

This new requirement will have several consequences. This is because implementing a competent MPDV solution is more difficult than it appears on the surface. For instance, these network perspectives need to be located in different networks for this to be an effective tool to mitigate these risks. One of the most expensive aspects of operating a transactional service is managing the environment in which the service runs. This means that if CAs distribute the entire MPDV checking process to alternative network perspectives, they will need to manage multiple such environments. The cost and complexity of this go up as the number of perspectives is added.

This should not be a problem for the largest CAs, and since the top 10 CAs by issuance volume account for 99.58% of all WebPKI certificates, achieving broad coverage of the web only requires a few companies to make these investments and they should be able to assume those costs. But what about the smaller CAs?

These smaller, regional CAs are often focused on language-specific support in the markets they operate in, assisting with local certificate-related product offerings such as document signing or identity certificates, and adhering to regional regulations. These are much smaller markets and leave them with far fewer resources and skills to tackle problems like this. The larger CAs on the other hand will also end up duplicating much of the same infrastructure as they worked toward meeting these requirements. 

This suggests there is an opportunity for CAs to collaborate in building a shared network of perspectives. By working together, CAs can pool resources to create a more diverse network of perspectives. This can help them meet the new requirements more efficiently and effectively, while also strengthening the internet’s overall security.

Key Management and preparing for the Crypto Apocalypse

Today, keeping sensitive information secure is more critical than ever. Although I’m not overly concerned about the looming threat of quantum computers breaking cryptography, I do worry about our approach to key management, which significantly impacts how we will ultimately migrate to new algorithms if necessary.

Traditional key management systems are often just simple encrypted key-value stores with access controls that release keys to applications. Although this approach has served us well in the past, moving away from bearer tokens to asymmetric key-based authentication and, ultimately, the era of post-quantum cryptography (PQC) demands a different approach.

Why am I concerned about bearer tokens? Well, the idea of a long-lived value that is passed around, allowing anyone who sees the token to impersonate its subject, is inherently risky. While there are ways to mitigate this risk to some extent, most of the time, these tokens are poorly managed or, worse, never changed at all. It’s much easier to protect a key that no one sees than one that many entities see.

The old key-value approach was designed around this paradigm, and although some systems have crude capabilities that work with asymmetric keys, they leave much to be desired. If we want seamless, downtime-free rollover of cryptographic keys across complex systems, we need a model that keeps keys isolated even from the entities that use them. This change will significantly improve key rollover.

This shift will require protocol-level integration into the key management layer, but once it’s done in a reusable way, keys can be changed regularly. A nice side effect of this transition is that these components will provide the integration points allowing us to move to new algorithms in the future.

What does this have to do with PQC? Unlike the shift from DES to AES or RSA to ECC, the post-quantum algorithms available now are substantially larger and slower than their predecessors, meaning the gradual migration from the once state-of-the-art to the new state-of-the-art won’t start until it absolutely has to. Instead, the migration to PQC starts by changing the way we build systems, specifically in how we architect key rollover and the lifecycle of keys. I personally believe the near-term impetus for this change will be the deprecation of bearer tokens.

The importance of seamless and automated rollover of keys is crucial for making systems secure, even if the post-quantum apocalypse never happens.

I also think we will see PQC readiness in credentialing systems. For example, we may see ACME clients support enrolling for PQC certificates simultaneously as they enroll for their ECC certificates, or perhaps support the (more bloated) hybrid certificates.

In conclusion, rethinking our key management approach is increasingly important. So far, I have not seen anyone come to market with what I would call a different approach to key management, and we need one.

The Growing Security Concerns of Modern Firmware and the Need for Change.

Today’s firmware is larger and more complex than ever before. In 1981, the IBM PC BIOS was a mere 8 KB, but now UEFI, even without considering machines with BMCs, can be 32 MB or even larger! To illustrate the magnitude of the problem, Intel will soon release its next-generation SoCs with support for 128 MB of firmware!

Essentially, UEFI has become a real-time OS with over 6 million lines of code and is still growing. This is larger than most modern operating systems. Furthermore, the various boot phases and hardware layers create significant complexity for defenders. Increased surface area leads to more vulnerabilities.

The most impactful and difficult-to-patch vulnerabilities reside in the underbelly of technology. Firmware, file systems, BGP, and other foundational aspects of technology that we often take for granted are becoming more vulnerable to attacks. It’s time to prioritize security for the very foundation of our tech. Benjamin Franklin once said, “A failure to plan is a plan to fail.” This adage often applies to long-term vulnerabilities in technology. Insufficient planning can lead to an inability to detect issues, inadequate data to assess their true severity, and a lack of responsiveness.

Firmware serves as a prime example. Many firmware-level issues remain unpatched because firmware often lacks the measurement and patching middleware we expect from software. Moreover, hardware vendors frequently behave as if their job is complete once they release a patch. Imagine if, in 2023, software vendors merely dropped a patched piece of software into a barely discoverable HTTP-accessible folder and proclaimed, “Thank goodness we’ve done our part.” This scenario largely reflects the current state of firmware.

One reason for this situation is that the problem on the surface appears intractable. A typical PC may house dozens of firmware components, with no inventory of what exists. This firmware often originates from multiple vendors and may include outdated chips that have not been updated.

Another fitting saying is, “You can’t manage what you can’t measure.” Combine this with the exponential growth of firmware surface area and the increasing number of internet-connected devices containing firmware, and you have a massive security issue arising from decades of neglect.

There is no silver bullet here. One aspect to address is the way firmware is built. USB Armory aims to solve this by making firmware memory safe, easy to read, and with minimal dependencies. While this is a positive step, it is not sufficient on its own. Binarly.io has created the best automation available for detecting firmware issues automatically, which is invaluable considering that old approaches will persist for decades.

To drive change, we need better measurement and widespread adoption of automatic update mechanisms for firmware of all sizes. These mechanisms must be safe, reliable, and robust. Misaligned incentives contribute to the problem, often resulting from a lack of accountability and transparency. This is why I dedicated as much time as I could to binary.transparency.dev while at Google.

The momentum around software supply chain security is essential, as it sheds some light on the problem, but alone it is not enough to bring about the necessary change. If you create a chip with firmware that has a vulnerability, your responsibility does not end with shipping a patch. If you ship devices without providing a way to seamlessly patch all firmware, you are failing.

Relying on the next hardware refresh cycle to update firmware on your devices in the field is insufficient. With cloud adoption, refresh cycles might even lengthen. A long-term strategy to support your devices is necessary; not doing so externalizes the consequences of your inaction on society.

If you have devices in the field that are in use, and you don’t have a confident inventory of the dependencies that exist in them, and you’re not monitoring those dependencies and the firmware itself for issues, you are part of the problem, externalizing consequences on society.

We can do better.

To improve firmware security, the industry must collaborate and adopt best practices. This includes embracing transparency, robust patch management systems, and long-term support strategies for devices. By working together to address these challenges, we can build a more secure foundation for the technology that underpins our modern world.

In conclusion, it’s crucial that we prioritize firmware security, as it plays a fundamental role in the safety and reliability of our devices and systems. By implementing more effective measurement, automatic update mechanisms, and long-term support strategies, we can reduce the risks associated with outdated and vulnerable firmware. This will help create a safer digital environment for everyone.

P.S. Thanks to @matrosov and @zaolin for their insights on the problem on Twitter.