Despite today’s widespread use of open-source software, most software is still delivered in binary form. This includes everything from the foundational firmware of our computers to the applications we use for work, extending all the way to the containers running our server software in the cloud.
A significant challenge arises when even if the source code of the software is available, reproducing the exact binary from it is often impossible. Consequently, companies and users are essentially operating on blind faith regarding any qualitative or quantitative assurances received from software suppliers. This stark reality played a critical role in the rapid and broad spread of the SolarWinds incident across the industry.
The SolarWinds Wake-Up Call
The SolarWinds attack underscored the risks inherent in placing our trust in software systems. In this incident, attackers infiltrated build systems, embedding malware into the legitimate SolarWinds software. Customers updating to the latest software version unwittingly became victims in this attack chain. It’s crucial to acknowledge that targeting a software supply chain for widespread distribution is not a new tactic. Ken Thompson, in his 1984 Turing Award Lecture, famously stated, “No amount of source-level verification or scrutiny will protect you from using untrusted code.” Regrettably, our approaches to this challenge haven’t significantly evolved since then.
Progress in the domain of supply chain security was initially slow. In 1996, Microsoft began promoting the concept of code signing with its Authenticode support, allowing customers to verify that their software hadn’t been altered post-distribution. Subsequently, the open-source movement gained traction, particularly following the release of Netscape Navigator’s source code. Over the next two decades, the adoption of open source, and to a lesser extent, code signing increased. The use of interpreted languages aided in understanding software operations, but as software grew in size and complexity, the demand for software engineers began to outstrip the supply. The adage “Given enough eyeballs, all bugs are shallow” suggests that greater openness can enhance security, yet the industry has struggled to develop a talent pool and incentive models robust enough to leverage source code availability effectively.
Before the SolarWinds incident, the industry, apart from some security engineers advocating for practices like reproducible builds, memory-safe languages, and interpreted languages, largely overlooked the topic of supply chain security. Notable initiatives like Google’s work on Binary Transparency, which predates SolarWinds, began to create an environment for broader adoption of code signing-like technologies with efforts like Go SumDB, SigStore, and Android’s Binary Transparency (each of which I had the opportunity to contribute to). However, even these solutions don’t fully address the challenge of understanding the issues within a binary, a problem that remains at the forefront of security.
The industry’s response to SolarWinds also included embracing the concept of Software Bill of Materials (SBOM). These artifacts, envisioned to be produced by the build system, document the, often third-party, components used in software. However, this approach faces challenges, such as the possibility of attackers manipulating SBOMs if they compromise the build system.
The complexity of compiled software adds another layer of difficulty. Each compiled dependency has its own dependencies, not all of which are publicly declared, as is the case with static dependencies. When software is compiled, only portions of the dependencies that are used get included, potentially incorporating multiple versions of a single dependency into the final binary. This complexity makes simple statements about software components, like “I use OpenSSL 1.0,” inaccurate for even moderately complex code. Moreover, the information derived from SBOMs is often not actionable. Without access to all sources or the ability to build binaries independently, users are left with CVE lists that provide more noise than actionable insight.
To make things worse compilers, through the optimization of builds can even remove security fixes that developers carefully put in to mitigate known issues, for example, freeing memory to keep keys cryptographic keys and passwords from getting paged to disk.
The Critical Role of Binary Analysis
If all we have is a binary, the only way to understand the risks it represents is to analyze it in the same way an attacker would. However, doing this at scale and making the analysis actionable is challenging. Recent advancements in machine learning and language development are key to addressing this challenge.
Currently, tools that operate on binaries alone fall into two categories. The first are solutions akin to 1990s antivirus programs – matching binaries to known issues. The second category helps skilled professionals reverse engineer the binary’s contents more quickly.
Both categories have struggled to keep pace with the rapid changes in software over the past few decades. A new category of tools is emerging, led by companies like Binarly, which I advise. Binarly’s approach to automated binary analysis began with key goals such as achieving processor architecture independence and language independence. This enables the analysis of binaries across different architectures without duplicating threat intelligence and identifying insecure patterns stemming from ported code or common insecure Stack Overflow examples. Identifying static dependencies and which parts of them are used in a binary is both challenging and crucial for understanding the security issues that lie beneath the surface.
Their approach is remarkable in its ability to detect “known unknowns,” enabling the identification of classes of security vulnerabilities within a binary alone. Furthermore, through symbolic execution, they can perform reachability analysis, ensuring that flagged issues are not just theoretical but can potentially be exploited by attackers.
Though their approaches are not firmware-specific, Firmware is a great example of the problems that come from binary-only distributions and customers’ reliance on blind faith that their vendors are making the right security investments. It is their unique approach to binary analysis that has enabled them to file and report more CVEs in the last two years than have ever been reported before.
Binary analysis of this kind is crucial as it scrutinizes software in its final, executable form—the form in which attackers interact with it.
Conclusion
The lesson from the SolarWinds attack is clear: no build system-based approach to articulate dependencies is entirely secure. Ken Thompson’s 1984 assertion about the limitations of trusting any code you didn’t produce yourself remains relevant. In a world where software vulnerabilities have extensive and far-reaching impacts, binary analysis is indispensable. Binarly’s approach represents a paradigm shift in how we secure software, offering a more robust and comprehensive solution in our increasingly connected world.