The True Essence of Secure by Design

When we discuss “secure by design,” we often focus on capabilities, features, and defaults—such as logging and monitoring, default-deny, regular updates, authentication, and minimizing by default privileges—rather than focus on the word “design.” Don’t get me wrong; those things are important. But if we have learned anything from the last several decades of evolving security practices, it is that security features and settings alone do not make for secure products.

For instance, the recent CrowdStrike outages and the Microsoft Storm 0558 incident illustrate that design choices have a significant impact on the security and reliability of a system. So what does it mean to have a secure “design” then?

The whole concept of interconnected systems was bolted onto the way computers work. We didn’t even see a TCP/IP stack included by default in an operating system until BSD did this in 1983. Microsoft Windows didn’t get a TCP/IP stack until 1994, around the same time TISFWTK was released, which was a framework to build a firewall, not a firewall itself. For some, these developments may seem like ancient history; however, the methods we use to build and secure software and systems have evolved from this bolt-on approach to security.

“The beginning is the most important part of the work.” — Plato

In my view, a secure design incorporates an understanding of the entire lifecycle of a system. It recognizes that the question is not if it will be hacked, but when. It then employs questions such as “How do you reduce the chances of that happening?” and “How can you make it easy to recover when it does?” to inform the system’s shape.

The Storm case is a great example of insecure design. Loading extremely valuable keys into memory on a front-end machine, in the user context of code parsing user-supplied data, enabled an unauthenticated user on the internet to trigger a memory dump that exposed the key to theft via another attack vector. At a minimum, this key should have been kept in a separate user process, one that operated at a different permission level.

Similarly, the update mechanism for CrowdStrike’s ‘signatures’ functioned like a command and control network for a large array of bots installed on mission-critical systems. The fact that they could, in essence, push code at runtime, at will into the kernel privilege space of their customers means that a malicious actor with access to these systems could push malicious code onto all of those endpoints simultaneously. This insecure design choice put customers in a precarious position, making CrowdStrike’s update system a direct and uncontrollable method for compromising their systems.

How Did We Get Here?

Today, much of the software we depend on still carries the legacy of this bolt-on approach to security. This led the industry to adopt compliance and audit standards similar to the programs used in financial services, such as the Sarbanes-Oxley Act, which paved the way for standards like ISO 27001 and PCI DSS which aspired to manage this risk. However, these measures, like the financial audits that preceded them and missed major frauds such as those at Enron and Wirecard, unsurprisingly also often fail to prevent significant security risks.

This in turn led to a shift towards posture management, where products aimed at risk discovery have proliferated. However, these solutions predominantly focus on identifying potential problems and leave the filtering and remediating of them to IT security, thereby turning into a flurry of risk mitigation tasks rather than identifying the root causes and manifesting into design changes that would fundamentally reduce the risk. These approaches are symptomatic of a broader issue in security—treating the symptoms of poor foundational security rather than addressing the root causes.

Alongside these developments, the last few decades have seen massive growth in the adoption of distributed computing technologies, which first led to MSPs, then SaaS, and now cloud computing. This brought about the concept of shared responsibility, where security is managed as an incremental part of “solving the problem” for the buyer, allowing them to focus on their core business. For many, if not most business problems, this is the right way to go but it also often complicates accountability and obscures the visibility of security controls, sometimes exacerbating the very issues it seeks to mitigate. 

Is It Really That Bleak?

At first glance, this all might seem pretty dire. After all, we do not get to replace all the technology we depend on at once, and even if we could, these new solutions will always come with their own problems. However, we must accept that you can’t effectively retrofit “secure by design” onto an existing system—or if you do, at best it will take decades to replace poorly designed bits and pieces of those existing systems. Even when improvements do occur, as we saw with Microsoft’s Trustworthy Computing initiative, eventually the organization may lose focus and start to regress, leaving us on a rollercoaster of empty promises.

We have seen some organizations take this to heart; a good example is Google. They built their infrastructure and processes on platforms that address foundational issues, allowing the vast majority of their developers to remain largely unaware of the broad spectrum of security challenges that operating services like theirs are subject to. For instance, most Google developers couldn’t tell you much about the infrastructure that runs their code. They build on top of mandated frameworks that “make the right thing the easy thing,” enabling the central security organization to perform core security functions.

A telling example of the design of these systems is how the entire concepts of authentication, authorization, and encryption are abstracted away from the developer, so they largely just happen automatically. To be clear, this approach is not without its faults, and Google’s systems are neither perfect nor suitable for all types of workloads. However, the base principles applied here are fairly universal: Make investments in infrastructure design upfront to ensure that the systems built on them are secure, well, by design.

We are starting to see the industry come around to this reality with the advent of platforms like Kubernetes and infrastructure such as SPIFFE (the Secure Production Identity Framework For Everyone), which are designed to foster environments where security is integrated at the core rather than being an add-on but this is just the beginning of this focus on incorporating security infrastructure into the way we build, not the end.

Conclusion

What is necessary is a cultural shift in how we approach design. For this to happen, customers need to start demanding transparency from vendors about how their systems are designed, assessed, operated, and serviced. Part of this responsibility involves conducting their own assessments and not relying solely on blind faith. The recent recognition on supply chain risks and global scale outages is a direct result of such blind trust.

By demanding vendors to provide a comprehensive narrative on how they’ve built their systems to be secure, rather than merely listing security features, and by engaging in initial and continuous technological due diligence to hold them accountable, we can hasten the transition to a world where security is woven into the fabric of every system, not merely added as an afterthought.

In conclusion, while we are just at the beginning of this journey and will have to continue managing systems that were not designed with transparent “secure by design” principles for decades, the direction is clear. We are moving towards a world where code, as well as users, are authenticated, and automated intelligent systems that utilize expert systems and AI evaluate whether systems are regressing or living up to their promises. If we continue on this path, our children and grandchildren may have the opportunity to live in a world that is inherently more secure, or at the very least, they will be better equipped to manage the security risks associated with the massive expansion of technology that we rely on daily.

Leave a Reply

Your email address will not be published. Required fields are marked *