We need to talk about packaging, signatures, checksums and reproducible builds:
On your system you have a keyring of packagers' GPG keys that you inherently trust.
Releases get signed with a key, which verifies the packager as the author, and supposedly lets you and your system trust their contents.
But do you really trust your packagers? How could you? Do you know them personally and monitor their packaging work?
Would you even know if they release a package with malicious content?
We need a system that lets us reproduce a packager's work and confirm that whatever they release was indeed built from a specific source tree without any unintended or even malicious changes.
To achieve that we require reproducible builds, so we can correlate a build with its source tree(s) and dependencies.
Whenever a new package gets released, this would allow independent systems and entities to verify that its contents really match the expectations.
@fribbledom How do you know you can trust your compiler?
@gudenau @fribbledom Bootstrapping your compiler *is* possible, even if difficult. And even without a bootstrapped compiler, repr. builds mean that every compiler needs to be backdoored _the exact same way_ or the results don't match, which is already orders of magnitude better than the status quo.
Also, a bootstrapped compiler does not help improve the entire ecosystem's security unless you have repr. builds, only your own, so the prioritization of which issue to fix first seems clear to me.
@fribbledom Sounds like Guix!
The hash uniquely identifies inputs (dependencies), recipe and source, and if the package is reproducible, content.
There is tooling to check reproducibility and find non-determinism sources. I use diffoscope when I need to compare two different builds.
From my experience, non determinism is often caused by timestamps, and less often by ordering issues or temporary file names. Debian has a more complete list of potential issues.
there is a list of unreproducible packages listed by cause of nondeterminism: https://tests.reproducible-builds.org/debian/index_issues.html
@fribbledom And yet you chose Go, one of the most hostile languages when it comes to packaging and reproducible builds?
But is it really? I think you underestimate how young the language and ecosystem is and ignore what they achieved with "dep" and Go modules in recent years.
It gives you all the tools you need to achieve reproducible builds now.
@fribbledom There can be no peace as long as vendoring and importing random shit from Git repositories is a requirement. This cannot ever work. I’m willing to die on this hill.
There is no importing _random_ shit though. You pin _and_ checksum a dependency with Go modules.
It refuses to build if there's a checksum mismatch.
@fribbledom Yes. That way you get software inheriting dependencies on multiple versions of the same library. And it’s going to be a very painful experience if there’s an issue with any of these versions that needs fixing.
@fribbledom Look, this shit has been tried and proven to be absolutely horrible with not just Go, but also Ruby, Node, Lua, Rust, …
It’s just not going to work.
...and C, C++, Java and every other language in existence. I'm really not sure what's the point you're trying to make?
Well, yes, but what's the alternative you're suggesting?
This is why it's important to depend on well-maintained libraries.
If they have good reasons to depend on different versions, they can technically do that.
If that means they're inheriting a security risk from one of their dependencies, though, then it becomes their issue (and yours by inheritance) to deal with.
@fribbledom You know, C developers tend to be more disciplined in practice so you don’t need the vendoring in the first place and multiple programs on your system can indeed rely on the exact same version of a library. C developers usually rely on distributions to package their libraries, and thus it is in their best interest to make it easy. And if you’re looking for something less bad than C, give Zig a try. It conveniently also solves the problem of build systems(!)
That's a bold claim that doesn't quite live up to my personal experience with C development.
However, you can try and enforce using the same upstream dependency & version for your project and all of _its_ dependencies in Go, as well. It certainly makes reviewing your deps a lot easier, I give you that.
Thanks for the video, Zig sounds intriguing!
@fribbledom It was really heartening to see Debian making an effort towards this.
@fribbledom soooo, debian then.
@fribbledom Alright, let's talk about this. I'm a Python developer. I see the value of reproducible builds, but I'll have to choose:
Either the build only includes my package, but not dependencies, then it's worthless.
Or it includes deps, but then my users will not get security updates from dependencies.
(To say nothing of the fact that I can GPG sign my python package, but pip will not even check signature integrity.)
Without deps reproducible builds are pretty much worthless indeed.
You will have to include them, and you will have to keep in sync with their releases and changes. As a developer that's one of the jobs you sign up for when picking a dependency. You inherit their issues and it's in your responsibility to act upon now.
So far packagers are doing the job for you, but should they really? Clearly nobody should know your own dependencies better than you, right?
@fribbledom Ah, you're seeing this issue only in terms of systems packages. Those are of no interest to my question, so forget I asked.
@fribbledom we do the same. We build everything from source, including the dependencies, and mirror most things so we know what we are building. That includes building the kernel with buildroot.
@fribbledom We developers "inherently trust" all the time, not only in packaging but adding a lot of dependencies from other people to our projects just hoping that they don't do anything malicious.
Truth is that when you install something, one person (usually) creates the package, but a lot more had contributed to the code in way the packager nor the author can expect.
A sourcemap will verity that dependency A is effectivelly the same in github, but it wont check if the code is malicious.
Verifying the sources is a separate (yet connected) issue indeed, but it's useless if you don't know whether the sources you're verifying are the ones that have been packaged.
As a developer you should never blindly trust any dependencies. If you depend on some code, you will inherit all its flaws and issues, as well. That's your responsibility.
Luckily you're not alone and it's a collective process. The same needs be achieved for verifying build integrity.
@fribbledom Just curious, do you say this because you've found suspicious things?
As an Arch user I've getting used to check where my AUR packages come from and checking that the install files do, but one user can't do that for everything you install on a given system.
@fribbledom nix has reproducible builds iirc, I used it as my primary package manager for a while
@fribbledom Did somebody call Bazel?
@clacke @fribbledom Bazel is (as far as I'm aware) the only build system that uses content-based addressing of build artefacts based on hashes of the inputs. This, among other shenanigans like sandboxing and stripping of timestamps or other side effects, makes binaries bit-by-bit identical if built from the exact same sources. (Can give a few other nice things, too, like centralised artifact caching, distributed building…)
>But do you really trust your packagers?
>How could you?
I trust the automated QA, marginally.
>Do you know them personally and monitor their packaging work?
yeah. that's why I don't trust them. also one is me
>But do you really trust your packagers?
Yes, I do.
>How could you?
How could I not? I'd go crazy if I didn't trust them.
@fribbledom Sure it would be nice to have, but even then you are only marginally safer from malicious intent. Can you trust all the code in your dependencies? Are you able to check it all yourself? Probably not. At some point it comes down to trusting someone else, no matter how you turn it.
Is it good if you can reduce the number of people you need to trust? Yes, but don't expect to be able to get that number down to zero.
Of course I'm not suggesting that it'll ever be zero.
Clearly those are connected, but separate issues. However, without reproducible and verified builds, you don't even know if the source code you're reading is actually the source code used to generate the build you're running.
It's an important step in the right direction.
@murks No, that's a collective effort. You don't have to do this alone. Trust the public consensus, not a single entity.
@fribbledom My $0.02 USD on the topic is that, unless you are fluent in multiple programming langauges and have taken the time to comb through every line of code in a package, you can never be 100% certain that a package is built as intended and will work without any maliciousness. I would say very few people have the expertise to be able to review all the code in the various languages, even less have the time to review hundreds of thousands of lines of code.
@fribbledom At some point, you just have to decide who to trust and hope they're not trying to screw you over.
You're not supposed to do that yourself and alone. The point is that we need to set up systems that let us verify integrity collectively. Reproducible builds are just one step towards that. Only once we achieve that, we can actually rest assured, that the source code we're verifying is actually the code we're running. And again, that's a collective process with open source.
@fribbledom I guess I don't fully understand your assertion. The question is whether or not we can trust those who package the software in our repos. Setting up a system that verifies integrity is in itself another piece of software we would need to trust written by others. I'm not sure how this is any different than trusting the current packagers in our respective distros.
If just one entity did that, it would indeed be useless. The point is that this can be done collectively. Anyone can run such a system and verify the packages themselves.
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!