We often hear about customers not wanting to deploy agents, usually citing the rationale that while each one may only consume a bit of memory and CPU the sum of them slowly but surely grind systems to a halt. The real impact is management overhead, operational risk, and attack surface. Attack surface is also something that requires nuance to understand. For example, software that runs in a privileged context, such as in the kernel, parses data, communicates on a network, or is loaded at boot, represents a lot more risk than software that runs with least privilege in its own user session or in the runtime context of the user.
As they say, you can’t have your cake and eat it too. There are always trade-offs, and when it comes to software, those trade-offs usually involve choosing between performance, security, and delivering on a value proposition.
The full impact of the Crowdstrike outage won’t be understood for some time, but one thing is for sure: organizations that sell solutions reliant on agents will need to provide much more justification and explanation about how their software works and how they manage updates.
It also means that organizations, at least the mature ones, will be re-evaluating what they have on these endpoints, the value they provide, and the risks they represent from both security and operational perspectives. These organizations will also be revisiting the controls they use to manage the ingestion of the software they rely on, and how they manage that risk over time since software is a living entity and not something static.
In short, like most catastrophes, there is a silver lining. This will be a great opportunity to improve existing processes to help prevent entire classes of failures like this. Hopefully, it will include a more robust investment in holding vendors accountable and thoroughly checking their work.