On the CrowdStrike Incident

ยท 536 words ยท 3 minute read

Last Friday, a bad update from CrowdStrike caused the blue screen of death (BSOD) on (mostly) enterprise Windows machines globally. It crippled airlines, hospitals, and many other industries globally. Somewhat embarrassingly, the only thing I knew about CrowdStrike before this was that they sponsor the Mercedes-AMG F1 team. Since I am not an expert in this area, I will not comment much about the incident. Instead, I want to talk about kindness.

Kindness to your colleagues ๐Ÿ”—

As a software engineer, we should be so confident of our code that we can deploy the changes to production at any time (e.g., first thing in the morning on a Tuesday or 10 PM on Christmas Eve). This usually means considering the edge cases in your changes and testing them thoroughly before raising a Pull Request.

However, this is not always immediately possible. For example, when working with a poorly understood legacy system, we might never have full confidence in what we are doing. In such a case, we should (ideally) take things slow to mitigate the risks. We can perhaps first add additional logging to validate our hypotheses or gate the new changes behind a feature flag. Even if the overall change is risky, the deployment should, in theory, not be.

Having said that, just because you can doesn’t mean you should, which is why I also avoid deploying changes on Friday in the afternoon. A colleague once asked how I reconcile these two sentiments (“confident enough to deploy any time” but “don’t deploy on Fridays”). The key observation is that while we try our best to mitigate disasters, we should also always assume that disasters are bound to happen. In other words, we should be proud of what we build while having the humility to recognize that it can cause a SEV1.

Because software engineering is a team effort, there is a good chance that a bad deployment will cause not only you but also your colleagues to stay up late. Not deploying on Fridays is kindness to your colleagues. (And in the case of the CrowdStrike incident, kindness to the IT departments all over the world.)

Kindness to your competitors (and your future self) ๐Ÿ”—

While the whole debacle unfolded, I saw a cheeky social media post by one of CrowdStrike’s competitors saying something to the effect of “We don’t cause BSOD with our products ;)” While it was an amusing post, I would not have made such a post. Instead, I think we should, especially in times of crises, show some kindness even to our competitors.

It is easy to see companies, especially big companies, as faceless and soulless monoliths. However, if we stop for a minute, these companies are made up (mostly) of regular people. People with feelings. People who feel like crap when they pushed a bad update bringing down airlines and hospitals all over the world. They are just like you and me.

Just like how we don’t laugh when someone falls face-first on the sidewalk, we should also not make cheap shots at our competitors when they make a blunder. In engineering, despite our best efforts, bad things will always happen. Last week it was CrowdStrike. Next week it can be us.