Background On Friday, March 29th, 2024, a historical and sophisticated security vulnerability (CVE-2024-3094) was discovered in the XZ Utils package and liblzma api in version 5.6.0 and 5.6.1. While this vulnerability mostly affects Debian and RedHat distributions, there was some interesting discuss...
The main issue is the handling of security updates within the Nixpkgs ecosystem, which relies on Nix's CI system, Hydra, to test and build packages. Due to the extensive number of packages in the Nixpkgs repository, the process can be slow, causing delays in the release of updates. As an example, the updated xz 5.4.6 package took nearly 5 days to become available in the unstable branch!
Fundamentally, there needs to be a change in how security fixes are handled in Hydra. As stated in the article, Nix was lucky to be unaffected, but multiple days to push out a security patch of this severity is concerning, even if there was no reason for concern.
Kinda tired of the constant flow of endless "analysis" of xz at this point.
There's no real good solution to "upstream gets owned by evil nation state maintainer" - especially when they run it in multi-year op.
It simply doesn't matter what downstream does if the upstream build systems get owned without anyone noticing. We're fucked.
Debian's build chroots were running Sid - so they stopped it all. They analyzed and there was some work done with reproducible builds (which is a good idea for distro maintainers). Pushing out security updates when you don't trust your build system is silly. Yeah, fast security updates are nice, but it took multiple days to reverse the exploit, this wasn't easy.
Bottom line, don't run bleeding edge distros in prod.
We got very lucky with xz. We might not be as lucky with the next one (or the ones in the past).
It was not vulnerable to this particular attack because the attack didn't specifically target Nixpkgs. It could have very well done so if they had wanted to.
This blog post misses entirely that this has nothing to do with the unstable channel. It just happened to only affect unstable this time because it gets updates first. If we had found out about the xz backdoor two months later (totally possible; we were really lucky this time), this would have affected a stable channel in exactly the same way. (It'd be slightly worse actually because that'd be a potentially breaking change too but I digress.)
I see two way to "fix" this:
Throw a shitton of money at builders. I could see this getting staging-next rebuild times down to just 1-2 days which I'd say is almost acceptable. This could even be a temporary thing to reduce cost; quickly renting an extremely large on-demand fleet from some cloud provider for a day whenever a critical world rebuild needs to be done which shouldn't be too often.
Implement pure grafting for important security patches through a second overlay-like mechanism.
First of all, I'm not the author of the article, so you're barking up the wrong tree.
You're using the unstable channel.
That doesn't matter in the big scheme of things - it doesn't solve the fundamental issue of slow security updates.
You could literally build it on your own, or patch your own change without having to wait - all you have to do is update the SHA256 hash and the tag/commit hash.
Do you seriously expect people to do that every time there's a security update? Especially considering how large the ecosystem is? And what if someone wasn't aware of the issue, do you really expect people to be across every single vulnerability across the hundreds or thousands of OSS projects that may be tied to the packages you've got on your machine?
The rest of your points also assume that the older packages don't have a vulnerability. The point of this post isn't really about the xz backdoor, but to highlight the issue of slow security updates.
If you're not using Nix the way it is intended to be, it is on you. Your over-reliance on Hydra is not the fault of Nix in any way.
Citation needed. I've never seen the Nix developers state that in any official capacity.
This means users such as myself who use the unstable branch for all of their packages will still be pulling the (potentially) infected xz tarballs onto their machines!
Yeah dont do that. On any OS that's asking for problems.
Nix & Hydra’s scheduling is super basic. There is room to optimize the builds in many ways. In this case, the fact that xz is in libarchive as well as in input for Nix makes the rebuilds particularly bad.
As of today, NixOS (like most distros) has reverted to a version slightly prior to the release with the Debian-or-Redhat-specific sshd backdoor which was inserted into xz just two months ago. However, the saboteur had hundreds of commits prior to the insertion of that backdoor, and it is very likely that some of those contain subtle intentional vulnerabilities (aka "bugdoors") which have not yet been discovered.
As (retired) Debian developer Joey Hess explains here, the safest course is probably to switch to something based on the last version (5.3.1) released prior to Jia Tan getting push access.
Unfortunately, as explained in this debian issue, that is not entirely trivial because dependents of many recent pre-backdoor potentially-sabotaged versions require symbol(s) which are not present in older versions and also because those older versions contain at least two known vulnerabilities which were fixed during the multi-year period where the saboteur was contributing.
After reading Xz format inadequate for long-term archiving (first published eight years ago...) I'm convinced that migrating the many projects which use XZ today (including DPKG, RPM, and Linux itself) to an entirely different compression format is probably the best long-term plan. (Though we'll always still need tools to read XZ archives for historical purposes...)