Microsoft Pushes for Windows Changes After CrowdStrike Incident
Over the past 10 days, CrowdStrike and Microsoft have been working around the clock to help customers affected by the massive Windows BSOD issue caused by a faulty CrowdStrike update. Along with providing ways to fix the issue, CrowdStrike has already published its Preliminary Post Incident Review for this outage. According to their report, the BSOD was caused by a memory safety issue where their CSagent driver performed a read-out-of-bounds access violation.
In the wake of a major incident that affected millions of Windows PCs, Microsoft is calling for significant changes to enhance the resilience of its operating system. John Cable, Microsoft's vice president of program management for Windows servicing and delivery, said there was a need for "end-to-end resilience" in a blog post, signaling a potential shift in Microsoft's approach to third-party access to the Windows kernel.
Microsoft published their detailed technical analysis of this outage caused by the CrowdStrike driver. Microsoft's analysis confirmed the findings of CrowdStrike that the crash was due to a read-out-of-bounds memory safety error in CrowdStrike's CSagent.sys driver. The csagent.sys module is registered in a Windows PC as a file system filter driver to receive notifications about file operations, including the creation or modification of a file. This allows security products, including CrowdStrike, to scan any new file saved to disk.
Microsoft recommends security solution providers balance needs like visibility and tamper resistance with the risk of operating within kernel mode. For example, they can use minimal sensors that run in kernel mode for data collection and enforcement, limiting exposure to availability issues. The rest of the features, like managing updates, parsing content, and other operations, can occur isolated within user mode.
In the blog post, Microsoft also explained the built-in security features of the Windows OS. These security capabilities offer layers of protection against malware and exploitation attempts in Windows. Microsoft will work with the anti-malware ecosystem through the Microsoft Virus Initiative (MVI) to take advantage of Windows built-in security features to further increase security along with reliability.
Microsoft has planned the following for now:
- Providing safe rollout guidance, best practices, and technologies to make it safer to perform updates to security products.
- Reducing the need for kernel drivers to access important security data.
- Providing enhanced isolation and anti-tampering capabilities with technologies like the recently announced VBS enclaves.
- Enabling zero-trust approaches like high-integrity attestation, which provides a method to determine the security state of the machine based on the health of Windows native security features.
While over 97% of Windows PCs affected by this issue are back online as of July 25, Microsoft is now looking ahead to prevent such issues in the future.