Category Archives: Security

Speeding Up Development and Navigating Security Risks

In software development, time is often of the essence. Developers are constantly pushed to deliver faster and more efficiently. Tools like GitHub Copilot have emerged, promising to accelerate coding tasks significantly. A study by GitHub found that developers using Copilot completed tasks up to 55% faster compared to those who didn’t use it [1]. However, as with speeding on the road, increased velocity in coding can lead to more significant risks and potential accidents.

The Need for Speed and Its Consequences

When driving at high speeds, the time available to react to unexpected events diminishes. Similarly, when coding tasks are completed rapidly with the help of AI copilots, developers have less time to thoroughly review and understand the code. This accelerated pace can lead to over-reliance on AI suggestions and decreased familiarity with the codebase. This lack of deep understanding might obscure potential issues that could be caught with a slower, more deliberate approach.

Narrowing the Focus

At higher speeds, drivers experience “tunnel vision,” where their field of view narrows, making it harder to perceive hazards. In coding, this translates to a reduced ability to catch subtle issues or security vulnerabilities introduced by AI-generated suggestions. A study by New York University highlighted this risk, finding that nearly 40% of code suggestions by AI tools like Copilot contained security vulnerabilities [2].

For a very basic example of how this could play out consider, an implementation of an authentication module using Copilot’s suggestions might skip input sanitization, leading to vulnerabilities like SQL injection or cross-site scripting (XSS) attacks. This omission might go unnoticed until the feature is in production, where it could lead to a security breach.

This reduction in situational awareness can be attributed to the lack of contextual knowledge AI copilots have. While they can provide syntactically correct code, they lack a nuanced understanding of the specific application or environment, leading to contextually insecure or inappropriate suggestions.

The Impact of High-Speed Errors

Higher driving speeds result in more severe accidents due to the increased force of impact. In coding, the severity of errors introduced by AI suggestions can be significant, especially if they compromise security. The GitHub study noted improvements in general code quality, as Copilot often follows best practices and common coding patterns [1]. However, the rapid pace and reliance on AI can mean fewer opportunities for developers to learn from their work and catch potential issues, leading to severe consequences in the long run.

Balancing Speed and Safety

To harness the benefits of tools like Copilot while mitigating the risks, developers, and teams can adopt several strategies:

  1. Enhanced Code Reviews: As the volume of code increases with the use of AI copilots, rigorous code reviews become even more crucial. Teams should ensure that every piece of code is thoroughly reviewed by experienced developers who can catch issues that the AI might miss.
  2. Integrating Security Tools: Using AI-based code reviewers and linters can help identify common issues and security vulnerabilities. These tools can act as a first line of defense, catching problems early in the development process.
  3. Continuous Learning: Establishing a culture of continuous learning ensures that developers stay updated with the latest security practices and coding standards. This ongoing education helps them adapt to new challenges and integrate best practices into their work.
  4. Practical Update Mechanisms: Implementing reliable and practical update mechanisms ensures that when issues are found, they can be addressed quickly and effectively. This proactive approach minimizes the impact of deployed vulnerabilities.
  5. Balanced Use of AI: Developers should use AI copilots as assistants rather than crutches. By balancing the use of AI with hands-on coding and problem-solving, developers can maintain a high level of familiarity with their codebases.

Conclusion

The advent of AI coding tools like GitHub Copilot offers a significant boost in productivity, enabling developers to code faster than ever before. However, this speed comes with inherent risks, particularly concerning security vulnerabilities and reduced familiarity with the code.

At the same time, we must recognize that attackers can also leverage these sorts of tools to accelerate their malicious activities, making it crucial to integrate these tools ourselves. By implementing robust code review processes, integrating security analysis tools into the build pipeline, and fostering a culture of continuous learning, teams can effectively manage these risks and use AI tools responsibly.

The game is getting real, and now is the time to put the scaffolding in place to manage both the benefits and the risks of this new reality.

Groundhog Day: Learning from Past Key and Credential Compromises

As they say, Those who cannot remember the past are condemned to repeat it, as we look back at the last decade, it seems we are caught in our own little Groundhog Day, reexperiencing the consequences of weak authentication and poor key management over and over.

It is not that we don’t know how to mitigate these issues; it’s more that, as organizations, we tend to avoid making uncomfortable changes. For example, the recent spate of incidents involving Snowflake customers being compromised appears to be related to Snowflake not mandating multi-factor authentication or a strong hardware-backed authenticator like Passkeys for its customers.

At the same time, not all of these key and credential thefts are related to users; many involve API keys, signing keys used by services, or code signing keys. Keys we have been failing to manage appropriately to the risks they represent to our systems and customers.

Timeline of Notable Incidents

Table of Incidents

IncidentType of CompromiseDescription
Stuxnet (2010)Code Signing KeysUtilized stolen digital certificates from Realtek and JMicron to authenticate its code, making it appear legitimate.
Adobe (2012)Code Signing KeysAttackers compromised a build server and signed malware with valid Adobe certificates.
Target (2013)Developer CredentialsNetwork credentials stolen from a third-party HVAC contractor via malware on an employee’s home computer.
Neiman Marcus (2013)Developer CredentialsMalware installed on systems, possibly through compromised credentials or devices from third-party vendors.
Home Depot (2014)Developer CredentialsMalware infected point-of-sale systems, initiated by stolen vendor credentials, potentially compromised via phishing.
Equifax (2017)API KeysExploitation of a vulnerability in Equifax’s website application exposed API keys used for authentication.
CCleaner (2017)Code Signing KeysAttackers inserted malware into CCleaner’s build process, distributing malicious versions of the software.
Ticketmaster (2018)Developer CredentialsMalware from a third-party customer support product led to the compromise of payment information.
ASUS Live Update (2018)Code Signing KeysAttackers gained access to ASUS’s code signing keys and distributed malware through the update mechanism.
Google Cloud Key Leak (2019)API KeysInternal tool inadvertently exposed customer API keys on the internet, leading to potential data exposure.
Facebook Instagram API Breach (2019)API KeysPlaintext passwords were accessible to employees due to misuse of an internal API.
DoorDash Data Breach (2019)API KeysUnauthorized access to systems included the compromise of API keys, exposing sensitive customer data.
Mission Lane (2020)Developer CredentialsMalware introduced into the development environment from a developer’s compromised personal device.
BigBasket (2020)Developer CredentialsData breach with over 20 million customer records exposed, suspected initial access through compromised developer credentials.
Twitter API Key Compromise (2020)API KeysAttackers gained access to internal systems, possibly compromising API keys or administrative credentials.
Amazon AWS Key Exposure (2021)API KeysMisconfigured AWS S3 bucket led to the exposure of API keys and sensitive customer data.
Nvidia (2022)Code Signing KeysStolen code signing certificates were used to sign malware, making it appear legitimate.
Microsoft Storm 0558 (2023)Signing KeysAttackers gained access to email accounts of government agencies and other organizations by forging authentication tokens using a stolen Microsoft signing key.

When we look at these incidents we see a few common themes including:

  1. Repetitive Failures in Security Practices:
    • Despite awareness of the issues, organizations continue to face the same security breaches repeatedly. This suggests a reluctance or failure to implement effective long-term solutions.
  2. Resistance to Necessary Changes:
    • Organizations often avoid making uncomfortable but necessary changes, such as enforcing multi-factor authentication (MFA) or adopting stronger authentication methods like Passkeys, and adopting work This resistance contributes to ongoing vulnerabilities and compromises.
  3. Diverse Sources of Compromise:
    • Key and credential thefts are not solely due to user actions but also involve API keys, service signing keys, and code signing keys. This highlights the need for comprehensive key management and protection strategies that cover all types of keys.
  4. Broader Key Management Practices:
    • Moving from shared secrets to asymmetric credentials for workloads and machines can significantly enhance security. Asymmetric credentials, especially those backed by hardware, are harder to steal and misuse compared to shared secrets. Techniques like attestation, which is the device equivalent of MFA, can provide additional security by verifying the authenticity of devices and systems.
  5. Third-Party Risks:
    • Even if we deploy the right technologies in our own environment, we often ignore the security practices of our upstream providers. Several incidents involved third-party vendors or services (e.g., Target, Home Depot), highlighting the critical need for comprehensive third-party risk management. Organizations must ensure that their vendors adhere to strong security practices and protocols to mitigate potential risks.
  6. Mismanagement of API and Signing Keys:
    • Incidents such as the Google Cloud Key Leak and Amazon AWS Key Exposure point to misconfigurations and accidental exposures of API keys. Proper configuration management, continuous monitoring, and strict access controls are essential to prevent these exposures.
  7. Importance of Multi-Factor Authentication (MFA):
    • The absence of MFA has been a contributing factor in several breaches. Organizations need to mandate MFA to enhance security and reduce the risk of credential theft.
  8. Need for Secure Code Signing Practices:
    • The use of stolen code-signing certificates in attacks like Stuxnet and Nvidia underscores the importance of securing code-signing keys. Implementing hardware security modules (HSMs) for key storage and signing can help mitigate these risks.

Conclusion

Looking back at the past decade of key and credential compromises, it’s clear that we’re stuck in a loop, facing the same issues repeatedly. It’s not about knowing what to do—we know the solutions—but about taking action and making sometimes uncomfortable changes.

Organizations need to step up and embrace multi-factor authentication, adopt strong hardware-backed authenticators like Passkeys, and move workloads from shared secrets to hardware-backed asymmetric credentials for better security. We also can’t overlook the importance of managing third-party risks and ensuring proper configuration and monitoring to protect our API and signing keys.

Breaking free from this cycle means committing to these changes. By learning from past mistakes and taking a proactive approach to key and credential management, we can better protect our systems and customers – It’s time to move forward.

Integrating Security: Making Safe Software Development Seamless and Productive

As software progresses from the developer’s machine to staging and finally to production, it undergoes significant changes. Each environment presents unique challenges, and transitions between these stages often introduce security weaknesses. By integrating security practices early in the development process, we bridge these gaps and help ensure we deliver a secure product. If done right, we can also improve developer productivity.

The Journey of Software Through Different Stages

In the development stage, the focus is primarily on writing code and quickly iterating on features. Security is often an afterthought, and practices like deployments, hardening, key management, auditing, and authentication between components are usually minimal or nonexistent. As the software moves to staging, it encounters a more production-like environment, revealing a new set of problems. Dependencies, network configurations, and integrations that worked on the developer’s machine may fail or expose vulnerabilities in staging.

When the software finally reaches production, the stakes are higher. It must handle real-world traffic, maintain uptime, and protect sensitive data. Here, the absence of strong security measures overlooked in earlier stages can lead to significant issues. Weak authentication, lack of auditing, and insecure configurations become glaring problems that can result in breaches and compromises.

The Role of Security in the Software Lifecycle

Security compromises often occur in the gaps between development, staging, and production. These gaps exist because each stage is treated as an isolated entity, leading to inconsistencies in security practices. To secure the software supply chain, we must address these gaps from the very beginning.

Integrating security practices early in the development process ensures that security is not an afterthought but a core component of software development. Here’s how we can achieve this, in no particular order:

  1. Start from the beginning with a Threat Model
    The right place to start addressing this problem is with a threat model, involving everyone in its development so the lessons from its creation are factored into the product’s manifestation. This model also needs to continuously evolve and adapt as the project progresses. A well-defined threat model helps identify potential security risks and guides the implementation of appropriate security controls and design elements throughout software development.
  2. Early Implementation of Authentication and Authorization
    From the first line of code, understand how you will implement authentication and authorization mechanisms for both users and workloads. A good example of how to do this for workloads is SPIFFE, which provides a robust framework for securely identifying and authenticating workloads. SPIRL makes deploying and adopting SPIFFE easy, laying the groundwork for unifying authentication and authorization across various stages of development. Ensure that every component, service, and user interaction is authenticated and authorized appropriately. This practice prevents unauthorized access and reduces the risk of security breaches.
  3. Continuous Auditing and Monitoring
    Integrate auditing and monitoring tools into the development pipeline and runtime environments. By continuously monitoring code changes, dependencies, configurations, and runtime behavior, we can detect and address vulnerabilities early. Additionally, automated analysis tools that integrate into CI/CD pipelines, like those from Binarly, can perform security checks to ensure that vulnerabilities and back doors didn’t sneak their way in during the build. Synthetic load testing modeled after real use cases can help detect outages and abnormal behavior early, ensuring that issues are identified before reaching production, and can serve as useful production monitoring automation. 
  4. Automate Security Infrastructure
    Incorporate automated security infrastructure to offload security tasks from developers. Middleware frameworks, such as service meshes and security orchestration platforms, provide standard interfaces for security operations. For example, using a service mesh like Istio can automatically handle authentication, authorization, and encryption between microservices without requiring developers to implement these security features manually. This allows developers to focus on shipping code while ensuring consistent security practices are maintained. These frameworks enable faster deployment into production, ensuring that security checks are embedded and consistent across all stages.
  5. Implement Infrastructure as Code (IaC)
    Implement Infrastructure as Code (IaC) to ensure consistent configuration and reduce human error. By automating infrastructure management, IaC not only enhances the reliability and security of your systems but also significantly improves your ability to recover from incidents and outages when they occur. With IaC, you can quickly replicate environments, roll back changes, and restore services, minimizing downtime and reducing the impact of any disruptions.
  6. Build for Continuous Compliance
    Pair IaC with continuous compliance automation to create artifacts that demonstrate adherence to security policies, reducing the compliance burden over time. Do this by using verifiable data structures, similar to how Google and others use Trillian, to log all server actions verifiably. These immutable logs create a transparent and auditable trail of activities, improving traceability and compliance.
  7. Minimize Human Access to Production
    Minimize human access to production environments to reduce the risk of errors and unauthorized actions. For example, use a privileged access management system to implement access on demand, where temporary access is granted to production systems or data only when needed and with proper approvals. This ensures that developers can debug and maintain high availability without permanent access, enhancing security while allowing necessary interventions.
  8. Implement Code Signing and Supply Chain Monitoring
    Implement code signing to enable the verification of the authenticity and integrity of the software at every stage. Regularly audit your dependencies to identify and mitigate the risks that they come with. As Ken Thompson famously said, “You can’t trust code that you did not totally create yourself.” This highlights the importance of not relying on blind faith when it comes to the security of external code and components. 
  9. Proper Secret Management from Day One
    It’s common for organizations to overlook proper secret management practices from the onset and have to graft them on later. Ensure developers have the tools to adopt proper secret management from day one, emphasizing the use of fully managed credentials rather than treating credentials as secrets. Solutions like SandboxAQ’s Aqtive Guard can help you understand where proper secret management isn’t happening and solve the problem of last-mile key management. Tools like this turn secret management into a tool for risk management, not just sprawl management.
  10. Effective Vulnerability Management
    Vulnerability management is a critical aspect of maintaining a secure software environment. A key component of this is monitoring third-party dependencies, as vulnerabilities in external libraries and frameworks can pose significant risks. Implement tools that continuously scan for vulnerabilities in these dependencies and provide actionable insights without overwhelming your team with noise. By leveraging automated vulnerability management solutions, you can prioritize critical issues and reduce alert fatigue. These tools should integrate seamlessly into your development pipeline, allowing for real-time vulnerability assessments and ensuring that only relevant, high-priority alerts are raised. This approach not only enhances your security posture but also allows your team to focus on meaningful security tasks, rather than being bogged down by an excessive number of alerts.
  11. Shift Left with Developer-Friendly Security Tools
    Equip developers with tools and platforms that seamlessly integrate into their workflows, offering security features without adding friction. User-friendly and non-intrusive security tools increase the likelihood of early and correct adoption by developers. For instance, tools like GitHub’s Dependabot and Snyk help identify and fix vulnerabilities in dependencies, while CodeQL allows for deep code analysis to uncover security issues before they reach production. These tools make security a natural part of the development process, enabling developers to focus on writing code while ensuring robust security practices are maintained.
  12. Consider the Entire Lifecycle of The Offering
    Security is an ongoing process that extends beyond development and deployment. Regularly assess and enhance the security posture of your production environment by implementing frequent security reviews, patch management, and incident response plans. Ensure that security practices evolve alongside the software by utilizing comprehensive security playbooks. These playbooks should offer clear, repeatable steps for handling common security tasks and incidents and be updated regularly to address new threats and best practices. Crucially, feedback loops from incident response and post-mortems are essential for continuous improvement. These loops provide valuable insights into past issues, helping to prevent similar problems in the future and fostering a culture of ongoing enhancement.
  13. Profile Your System and Understand Bottlenecks
    Profile your system and understand where the bottlenecks are. Have plans to both prevent them and to respond when they are encountered. This proactive approach ensures that performance issues are addressed before they impact the user experience and that you have a strategy in place for quick remediation when problems do arise.

Conclusion

Early on, normalizing security practices across all stages of software development may be challenging, but it’s crucial. By consistently applying security measures from development to production, we can bridge the gaps that lead to vulnerabilities. This not only improves security but also enhances developer productivity.

Platforms and middleware that automate security checks, enforce policies, and provide clear visibility into security issues help developers stay productive while maintaining a robust security posture, reducing outages and security incidents. By removing these security concerns from the developer’s workflow, we allow them to concentrate on what they do best: writing code.

Making security the easy and natural choice for developers enhances both security and productivity, leading to more secure and reliable software. While it may be difficult to fully implement these practices initially, developing a model that includes continuous threat analysis and plans to address any gaps will ensure long-term success. In doing so, we make the right thing the easy thing.

Thanks to Amir Omidi for his feedback on this post

From Static to Dynamic: Adapting PKI for Cloud-Native Architectures

When it comes to workload and service credential management, a common misconception is that you can simply reuse your existing Certificate Authority (CA) and Certificate Lifecycle Management (CLM) infrastructure to achieve your desired end-state more quickly. Given that nearly every organization has client and server TLS certificates for devices and web servers and many have made significant investments in managing and issuing these certificates, this notion seems logical. However, the design constraints of these use cases differ significantly. The transition from static to dynamic environments, along with the associated credential governance problems, makes it clear that a purpose-built approach is necessary for managing workload and service credentials.

The Static Nature of Traditional CA and CLM Infrastructure

Traditional CA and CLM infrastructure primarily deals with TLS server certificates, where the identities involved are usually domain names or IP addresses. These identifiers are largely static, pre-defined, and managed out-of-band. As a result, the lifecycle of these certificates, which includes issuance, renewal, and revocation, follows a relatively predictable pattern. These certificates are usually issued for validity periods ranging from 90 days to a year, and the processes and infrastructure are designed to handle these constraints. However, use cases surrounding workload and service credentials have a totally different set of constraints, given the highly dynamic environment and various regulatory frameworks they must adhere to.

The Dynamic Nature of Workloads and Services

Workloads and services operate in a significantly different environment. These identities often come and go as services scale up or down or undergo updates. The sheer scale at which workload and service credentials are issued and managed can be hundreds of times greater than that of traditional client and server TLS certificate use cases.

Unlike the static domain names or IP addresses used in these traditional TLS certificates, the identifiers for workloads and services are dynamically assigned as workloads spin up and down or are updated. Workload and service credentials often have a much shorter lifespan compared to server certificates. They might need to be reissued every few minutes or hours, depending on the nature of the workload and the policies of the environment. This changes the expectations of the availability and scalability of the issuing infrastructure, leading to a need for credentials to be issued as close to the workload as possible to ensure issuance doesn’t become a source of downtime.

These workloads and services are also often deployed in clusters across various geographic locations to ensure fault tolerance and scalability. Unlike server certificates that can rely on a centralized CA, workload and service credentials often need a more distributed approach to minimize latency and meet performance requirements.

Another critical difference lies in the scope of trust. For TLS client and server certificates, the security domain is essentially defined by a root certificate or the corresponding root store. However, in the context of workloads and services, the goal is to achieve least privilege. This often necessitates a much more complicated PKI to deliver on this least-privileged design.

Additionally, federation is often needed in these use cases, and it is usually not possible to standardize on just X.509. For example, you might need to interoperate with a service that works based on OAuth. Even in such cases, you want to manage all of this within one common framework to have unified control and visibility regardless of the technology used for the deployment.

Purpose-Built Solutions for Workload and Service Credential Management

Given the unique challenges of managing workload and service credentials in dynamic, cloud-native environments, existing CA and CLM investments often fall short. Purpose-built solutions, such as those based on SPIFFE (Secure Production Identity Framework for Everyone), tend to offer a more effective approach by providing dynamic, attested identities tailored for these environments.

Dynamic Identifier Assignment

Unlike static identifiers managed by traditional CLM solutions, SPIFFE dynamically assigns identifiers as workloads spin up and down, aligning with the nature of modern cloud environments.

Decentralized Issuance

By issuing credentials as close to the workload as possible, SPIFFE-based solutions reduce latency and align issuance with the availability goals of these deployments, ensuring that credentials are issued and managed efficiently without becoming a source of downtime.

Granular Policy Enforcement

SPIFFE-based solutions enable enforcing fine-grained policy through its namespacing and security domain concepts. This enables organizations to define and enforce policies at a more granular level, ensuring that workloads only access the necessary resources.

Identity Governance and Federation

SPIFFE-based solutions tend to support extending existing identity governance programs to workload and service credentials while also facilitating seamless and secure access to external services across different trust domains.

Multifactor Authentication

These SPIFFE-based solutions also provide support for attestation, which can be thought of as multi-factor authentication for workloads. This attestation verifies the workload and services’ running state and environment, tying the credentials to those environments and helping minimize the risks of credential theft.

Integration with Other Systems and Protocols

These environments can seldom rely exclusively on just X.509, which is why SPIFFE supports both X.509 and JWT formats of credentials. This flexibility allows seamless integration with various systems and protocols within cloud deployments.

Conclusion

While it might be tempting to reuse existing CA and CLM infrastructure for workload and service credential management, this approach fails to address the unique challenges posed by dynamic, cloud-native environments. The ephemeral nature of workload identities, the high frequency of credential issuance, and the need for granular trust domains necessitate purpose-built solutions like those based on SPIFFE.

These purpose-built systems are designed to handle the dynamic nature of modern environments, providing strong, attested identities and ensuring that credential management aligns with the evolving needs of cloud-native workloads. Understanding these distinctions is crucial for developing an effective identity management strategy for modern workloads and services without having to go through a reset event down the line.

To learn more about effective machine identity management, check out SPIRL and see how they are leading the way.

Credential Management vs. Secret Management: Choosing the Right Approach

If we examine the contents of most secret management solutions, like HashiCorp Vault, we will find that we primarily store the logical equivalent of user IDs and passwords for services, workloads, and machines. Much like the old-school practice of writing passwords on post-it notes and sharing them, these secrets are then distributed to whatever entity needs access to the protected resource. As a result, it is no surprise that we encounter the same problems as password management: weak secrets, shared secrets, and stolen secrets. This is why we see machine credentials and keys being stolen in events like STORM 0558.

Secret Management: A Machine-Scale Password Management

Secret management can be seen as password management at a machine scale. These systems centralize secrets  (e.g., API keys, tokens, and passwords) to manage their sprawl and then make it easy to pass these shared secrets around various services and applications. However, just as simple passwords alone are seen as insufficient for user authentication—necessitating the adoption of multi-factor authentication (MFA) and migration to non-password authenticators like Passkeys—the same shortcomings apply to this legacy approach to managing these credentials.

The reality is that these secret managers were designed to address the symptoms rather than the root cause of secret sprawl. By storing and distributing these static secrets without treating them as credentials to be managed, you centralize the chaos but do not solve the fundamental problem: the need for robust, scalable machine identity management. Machines, services, and workloads, much like users credentials, require purpose-built approaches. Just as we have applied multi-factor authentication, identity governance and administration (IGA) and solutions like OKTA to user management, we need analogous systems for machines.

Credential Management for Machines

At machine and workload scale, credential management solutions need to be architected differently to handle the dynamic nature of modern cloud environments. This is where purpose-built solutions come into play. These solutions must address the unique challenges of credential management in dynamic, cloud-native environments. Unlike secret management, which often relies on static secrets, credential management provides dynamic, attested identities for workloads. For example, SPIFFE (Secure Production Identity Framework for Everyone) offers a strategy for managing dynamic identities, providing the foundation for standards-based, scalable, and robust workload identity management.

Higher-Level Concepts in Workload Credential Management:

  1. Provisioning and Deprovisioning:
    • Efficiently creating, updating, and removing machine identities as services start, stop, and scale.
  2. Role-Based Access Control (RBAC):
    • Assigning access rights based on predefined roles, ensuring that machines and services have appropriate permissions.
  3. Periodic Reviews:
    • Conducting regular reviews of machine access rights to ensure compliance and appropriateness.
  4. Policy Enforcement:
    • Defining and enforcing access control policies, ensuring that machines and services adhere to security guidelines.
  5. Audit and Reporting:
    • Generating comprehensive reports to demonstrate compliance with regulatory requirements and internal policies.
  6. Risk Analysis:
    • Identifying and mitigating risks associated with machine identities and their entitlements.
  7. Behavioral Analysis:
    • Monitoring machine behavior to detect anomalies and potential security threats.

Just as legacy approaches to secret management fail for workloads, machines, and services, these legacy IGA approaches will not work either. We need built-for-purpose solutions to adapt and not repeat the mistakes of the past. Those solutions will look like credential management systems that enable these functions as a byproduct of how they work, rather than bolting them onto an existing secret management or IGA process. 

This is why I have been an advisor to SPIRL since its founding. The founders, Evan and Eli, are why I knew I had to be involved. As the authors of the standard and having had real production-at-scale experience with SPIFFE, I knew they had what it took, which is why I have been working with them in a more formal capacity as of this last month. They are building the infrastructure that makes, as I like to say, the right thing the easy thing, ensuring their customers don’t repeat the mistakes of the past when it comes to workload and service identity.

Conclusion

Secret management tools were a necessary step in addressing the sprawl of machine secrets, and they will continue to be needed. However, as the complexity and scale of cloud environments grow, so does the need for more sophisticated solutions for workload and service credentials. Credential management systems provide the dynamic, robust framework required for modern machine identity management.

By adopting credential management as the primary approach and using secret management as the exception, organizations can achieve greater security, scalability, and operational efficiency in their cloud-native architectures. To learn more about effective machine identity management, check out SPIRL and see how they are leading the way.

ACME vs. SPIFFE: Choosing the Right One

In the world of certificate lifecycle management for workloads, two approaches often come into focus: ACME (Automated Certificate Management Environment) and SPIFFE (Secure Production Identity Framework for Everyone). While both can deliver certificates to a device or workload, they cater to different use cases. Let’s explore their core differences and why these matter, especially for dynamic workload deployments.

ACME: Proving Control Over Identifiers

ACME is primarily designed to prove control over an identifier, typically a domain name, and to deliver a certificate that demonstrates successful control verification.

  • Control Verification: ACME verifies control through various methods such as HTTP-01, DNS-01, TLS-ALPN, and External Account Binding.
  • Attestation (Optional): Attestation in ACME is secondary and optional, primarily enabling verifying if the key is well protected.
  • Pre-assigned Identifiers: ACME assumes that the identifier (like a domain name) is pre-assigned and focuses on validating control over this identifier.

This approach is particularly useful for scenarios where identifiers are static and pre-assigned, making it ideal for server authenticated TLS and applications that rely on domain names.

SPIFFE: Dynamic Assignment of Identifier Based on Attestation

SPIFFE, conversely, is designed for dynamic workloads, which frequently change as services scale or update. SPIFFE assigns identifiers to workloads dynamically, based on attestation.

  • Identifier Assignment: SPIFFE assigns an identifier to the subject (such as a workload) using attestation about the subject to construct this identifier.
  • Attestation: Unlike ACME, attestation is a core component in SPIFFE, enabling robust multi-factor authentication (MFA) for workloads based on what is running and where it is running.
  • Namespacing and Least Privilege: SPIFFE facilitates the namespacing of identifiers, building a foundation that enables authorization frameworks that promote least privilege, ensuring workloads only access necessary resources.
  • Minimal Security Domains: At the core of SPIFFE is the concept of security domains, which serve as trust boundaries between services, helping to minimize the attack surface.
  • JWT and X.509: SPIFFE supports SVIDs in both X.509 and JWT formats, enabling seamless integration with various systems and protocols within cloud deployments.

Importance in Dynamic Workload Deployments

The differences between ACME and SPIFFE are particularly significant in dynamic workload deployments:

  • Flexibility and Scalability: SPIFFE’s dynamic identifier assignment is highly suitable for environments where workloads are frequently spun up and down, such as in microservices architectures and Kubernetes clusters.
  • Role-Based Authentication: By making attestation a core component and promoting least privilege, SPIFFE ensures that each workload is authenticated and authorized precisely for its intended role.
  • Granular Policy Enforcement: SPIFFE’s namespacing and minimal security domain features enable fine-grained policy enforcement, enabling organizations to define and enforce policies at a more granular level, such as per-workload or per-service.
  • Identity Federation: SPIFFE supports identity federation enabling different trust domains to interoperate securely. This is particularly beneficial in multi-cloud and hybrid environments where different parts of an organization or third-parties.

Conclusion

While both ACME and SPIFFE are used for certificate management, they cater to different scenarios and can complement each other effectively. ACME is ideal for static environments where identifiers are pre-allocated, focusing on certificate lifecycle management for issuing, renewing, and revoking certificates. It excels in managing the lifecycle of certificates for web servers and other relatively static resources. 

On the other hand, SPIFFE excels in dynamic, high-scale environments, emphasizing credential lifecycle management with flexible and robust authentication and authorization through dynamic identifiers and attestation. SPIFFE is designed for modern, cloud-native architectures where workloads are ephemeral, and identifiers need to be dynamically issued and managed. 

By understanding these differences, you can leverage both ACME and SPIFFE to enhance your identity management strategy. Use ACME for managing certificates in static environments to ensure efficient lifecycle management. Deploy SPIFFE for dynamic environments to provide strong, attested identities for your workloads.

Navigating Content Authentication In the Age of Generative AI

In 1995, SSL was introduced, and it took 21 years for 40% of web traffic to become encrypted. This rate changed dramatically in 2016 with Let’s Encrypt and the adoption of ACME, leading to an exponential increase in TLS usage. In the next 8 years, adoption nearly reached 100% of web traffic. Two main factors contributed to this shift: first, a heightened awareness of security risks due to high-profile data breaches and government surveillance, creating a demand for better security. Second, ACME made obtaining and maintaining TLS certificates much easier.

Similarly, around 2020, the SolarWinds incident highlighted the issue of software supply chain security. This, among other factors, led to an increase in the adoption of code signing technologies, an approach that has been in use at least since 1995 when Microsoft used this approach to help deal with the problem of authenticity as we shifted away from CDs and floppy disks to network-based distributions of software. However, the complexity and cost of using code signing severely limited its widespread use, and where it was used, thanks to poor tooling, key compromises often led to a failure for most deployments to achieve the promised security properties. Decades later, projects like Binary Transparency started popping up and, thanks to the SolarWinds incident, projects that spun out of that like Go ChecksumDB, SigStore, and SigSum projects led to more usage of code signing.

Though the EU’s digital signature laws in 1999 specified a strong preference for cryptographic-based document signing technologies, their adoption was very limited, in part due to the difficulty of using the associated solutions. In the US, the lack of a mandate for cryptographic signatures also resulted in an even more limited adoption of this more secure approach to signing documents and instead relied on font-based signatures. However, during the COVID-19 pandemic, things started changing; in particular, most states adopted remote online notary laws, mandating the use of cryptographic signatures which quickly accelerated the adoption of this capability.

The next shift in this story started around 2022 when generative AI began to take off like no other technology in my lifetime. This resulted in a rush to create tools to detect this generated content but, as I mentioned in previous posts [1,2], this is at best an arms race and more practically intractable on a moderate to long-term timeline.

So, where does this take us? If we take a step back, what we see is that societally we are now seeing an increased awareness of the need to authenticate digital artifacts’ integrity and origin, just like we saw with the need for encryption a decade ago. In part, this is why we already see content authentication initiatives and discussions, geared for different artifact types like documents, pictures, videos, code, web applications, and others. What is not talked about much is that each of these use cases often involves solving the same core problems, such as:

  • Verifying entitlement to acquire the keys and credentials to be used to prove integrity and origin.
  • Managing the logical and physical security of the keys and associated credentials.
  • Managing the lifecycle of the keys and credentials.
  • Enabling the sharing of credentials and keys across the teams that are responsible for the objects in question.
  • Making the usage of these keys and credentials usable by machines and integrating naturally into existing workflows.

This problem domain is particularly timely in that the rapid growth of generative AI has raised the question for the common technology user — How can I tell if this is real or not? The answer, unfortunately, will not be in detecting the fakes, because of generative AIs ability to create content that is indistinguishable from human-generated work, rather, it will become evident that organizations will need to adopt practices, across all modalities of content, to not only sign these objects but also make verifying them easy so these questions can be answered by everyday users.

This is likely to be accelerated once the ongoing shifts take place in the context of software and service liability for meeting security basics. All of this seems to suggest we will see broader adoption of these content authentication techniques over the next decade if the right tools and services are developed to make adoption, usage, and management easy.

While no crystal ball can tell us for sure what the progression will look like, it seems not only plausible but necessary in this increasingly digital world where the lines between real and synthetic content continue to blur that this will be the case.

Update: Just saw this while checking out my feed on X and it seems quite timely 🙂

Evolving Challenges in Software Security

In 2023, we observed an average month-to-month increase in CVEs of approximately 1.64%, with this rate accelerating as the year progressed. At the same time, several trends emerged that are associated with this increase. These include a heightened focus on supply chain security by governments and commercial entities, intensified regulatory discussions around how to roll out concepts of software liability, and the expanded application of machine learning technologies in software security analysis.

Despite the broad use of open source, the large majority of software is still delivered and consumed in binary form. There are a few reasons for this, but the most obvious is that the sheer size and complication of code bases combined with the limited availability of expertise and time within consuming organizations makes the use of the source to manage risk impractical.

At the same time, it’s clear this issue is not new, for example in 1984, Ken Thompson, in his Turing Award Lecture, mentioned, “No amount of source-level verification or scrutiny will protect you from untrusted code”. This statement has been partially vindicated recently, as intelligent code analysis agents, although faster ways to produce code, have been found to exert downward pressure on code quality while also reducing the developer’s understanding of the code they produce — a bad combination.

To the extent these problems are resolved we can expect the attackers to be using the same tools to more rapidly identify new and more complex attack chains. In essence, it has become an arms race to build and apply these technologies to both offensive and defensive use cases.

It is this reality that has led to DARPA’s creation of the DARPA’s Artificial Intelligence Cyber Challenge and its various projects on using AI to both identify and fix security defects at scale.

The saying “In the middle of difficulty lies opportunity” aptly describes the current situation, where numerous security focused startups claim to offer solutions to our problems. However, the truth is often quite different.

Some of those racing to take advantage of this opportunity are focusing on software supply chain security, particularly with a focus on software composition analysis. This is largely driven by regulatory pressures to adopt the Software Bill of Materials concept. Yet, most tools that generate these documents only examine interpreted code and declared dependencies. As previously mentioned, the majority of code is delivered and consumed in compiled form, leaving customers unable to assess its correctness and completeness without enough data to do so. As a result, although these tools may help with compliance, they inadvertently cause harm by giving a false sense of security.

There are other vendors still that are essentially scaling up traditional source code reviews using large language models (LLMs). But as we’ve discussed, these tools are currently showing signs of reducing code quality and developers’ understanding of their own code. At the same time these tools produce such a high volume of false positives given the lack of context this analysis has available to it triaging the outputs can turn into a full-time job. This suggests that negative outcomes could ensue over time if we don’t adjust how we apply this technology or see significant improvements in the underlying technology itself.

These efforts are all concentrated on the software creators but if we expand the problem domain to include the consumers of software we see that outside of cloud environments, where companies like Wiz and Aqua Security provide vulnerability assessments at scale, there are hardly any resources aiding software consumers in making informed decisions about the risks they face by the software they use. A big part of this is the sheer amount of noise even these products produce, combined with the lack of actionability in such data for the consumer of the software. With that said these are tractable problems if we just choose to invest in new solutions rather than apply the same old approaches we have in the past.

As we look toward the next decade, it is clear that software security is at a pivotal point, and navigating it goes beyond just technology; it requires a change in mindset towards more holistic security strategies that consider both technical and human factors. The next few years will be critical as we see whether the industry can adapt to these challenges.

Echoes of the Past and Their Impact on Security Today

When I was a boy, my parents often made me read books they thought were important. One of these was “The Republic” by Plato, written around 380 BC. After reading each book, they’d ask me to talk about what I learned. Reading this one, I realized that politics haven’t changed much over time and that people always seem to believe their group should be the ones making the big decisions. This was the first time I truly understood the saying “History doesn’t repeat itself, but it often rhymes.” As someone who works in security, I think it’s important we all remember this. For example, these days, there’s a huge focus on Supply Chain Security in software, almost like it’s a brand-new idea. But if we look back to 1984, Ken Thompson talked about this very concept in his Turing Award lecture where he said, “No amount of source-level verification or scrutiny will protect you from using untrusted code.”

This is a common thread in information security in general, take, for example, the original forged message attack on RSA called the Bleichenbacher’s Oracle attack, it was published at the CRYPTO ’98 conference, and nearly two decades later we see Return Of Bleichenbacher’s Oracle Threat. Or the recent key recovery attack on SIDH, one of the NIST PQC selections, in this attack it was found that SIDH was vulnerable to a theorem known as “glue-and-split” developed in 1997!

While there is certainly an element of human nature involved here, there are also extenuating factors like the sheer amount of knowledge that we as a society have amassed. One of the exciting things about Large Language Models and AI more generically is that these techniques have the potential to harness the entire body of knowledge that society has amassed and to do so with far fewer mistakes enabling us to advance even faster.

With that said, there is a problem larger than that, especially as security practitioners, we often frame our problem wrong. Back in 1998 when Dan Geer was at CertCo (I worked at a competitor called Valicert back then), he wrote an excellent post on how “Risk Management is Where the Money Is”. In this post, he argued that the security industry as it was would be transformed into a risk management industry — something that has certainly happened. In this talk, he also eloquently frames how customers look at risk-reward trade-offs, and how the internet would evolve into a data center (e.g., the Cloud as we know it today), and more.

The reality is there is a lot to learn from our predecessors and by understanding historical patterns, and better utilizing the lessons learned from the past we can better prepare for and address the security issues we face today.

Challenges in Digital Content Authentication and the Persistent Battle Against Fakes

Efforts have been made for years to detect modified content by enabling content-creation devices, such as cameras, to digitally sign or watermark the content they produce. Significant efforts in this area include the Content Authenticity Initiative and the Coalition for Content Provenance and Authenticity. However, these initiatives face numerous issues, including privacy concerns and fundamental flaws in their operation, as discussed here.

It is important to understand that detecting fakes differs from authenticating originals. This distinction may not be immediately apparent, but it is essential to realize that without 100% adoption of content authentication technology—an unachievable goal—the absence of a signature or a watermark does not mean that it is fake. To give that some color just consider that photographers to this day love antique Leica cameras and despite modern alternatives, these are still often their go-to cameras.

Moreover, even the presence of a legitimate signature on content does not guarantee its authenticity. If the stakes are high enough, it is certainly possible to extract signing key material from an authentic device and use it to sign AI-generated content. For instance, a foreign actor attempting to influence an election may find the investment of time and money to extract the key from a legitimate device worthwhile. The history of DVD CSS demonstrates how easy it is for these keys to be extracted from devices and how even just being able to watch a movie on your favorite device can provide enough motivation for an attacker to extract keys. Once extracted, you cannot unring this bell.

This has not stopped researchers from developing alternative authenticity schemes. For example, Google recently published a new scheme they call SynthID. That said, this approach faces the same fundamental problem: authenticating trustworthy produced generative AI content isn’t the same thing as detecting fakes.

It may also be interesting to note that the problem of detecting authentic digital content isn’t limited to generative AI content. For example, the Costco virtual member card uses a server-generated QR code that rotates periodically to limit the exposure of sharing of that QR code via screenshots.

This does not mean that the approach of signing or watermarking content to make it authenticatable lacks value; rather, it underscores the need to recognize that detecting fakes is not the same as authenticating genuine content — and even then we must temper the faith we put in those claims.

Another use case for digital signatures and watermarking techniques involves their utility in combating the use of generative AI to create realistic-looking fake driver’s licenses and generative AI videos capable of bypassing liveness tests. There have also been instances of generative AI being used in real-time to impersonate executives in video conferences, leading to significant financial losses.

Mobile phones, such as iPhones and Android devices, offer features that help remote servers authenticate the applications they communicate with. While not foolproof, assuming a hardened and unmodified mobile device, these features provide a reasonable level of protection against specific attacks. However, if a device is rooted at the kernel level, or physically altered, these protective measures become ineffective. For instance, attaching an external, virtual camera could allow an attacker to input their AI-generated content without the application detecting the anomaly.

There have also been efforts to extend similar capabilities to browsers, enabling modern web applications to benefit from them. Putting aside the risks of abuse of these capabilities to make a more closed web, the challenge here, at least in these use cases, is that browsers are used on a wider range of devices than just mobile phones, including desktops, which vary greatly in configuration. A single driver update by an attacker could enable AI-generated content sources to be transmitted to the application undetected.

This does not bode well for the future of remote identification on the web, as these problems are largely intractable. In the near term, the best option that exists is to force users from the web to mobile applications where the server captures and authenticates the application, but even this should be limited to lower-value use cases because it too is bypassable by a motivated attacker.

In the longer term, it seems that it will fuel the fire for governments to become de facto authentication service providers, which they have demonstrated to be ineffective at. Beyond that, if these solutions do become common, we can certainly expect their use to be mandated in cases that create long-lasting privacy problems for our children and grandchildren.

UPDATE: A SecurityWeek article came out today on this topic that has some interesting figures on this topic.

UPDATE: Another SecurityWeek article on this came out today.