Follow

Thread:

We need to talk about packaging, signatures, checksums and reproducible builds:

On your system you have a keyring of packagers' GPG keys that you inherently trust.

Releases get signed with a key, which verifies the packager as the author, and supposedly lets you and your system trust their contents.

But do you really trust your packagers? How could you? Do you know them personally and monitor their packaging work?

Would you even know if they release a package with malicious content?

We need a system that lets us reproduce a packager's work and confirm that whatever they release was indeed built from a specific source tree without any unintended or even malicious changes.

To achieve that we require reproducible builds, so we can correlate a build with its source tree(s) and dependencies.

Whenever a new package gets released, this would allow independent systems and entities to verify that its contents really match the expectations.

@gudenau @fribbledom Bootstrapping your compiler *is* possible, even if difficult. And even without a bootstrapped compiler, repr. builds mean that every compiler needs to be backdoored _the exact same way_ or the results don't match, which is already orders of magnitude better than the status quo.

Also, a bootstrapped compiler does not help improve the entire ecosystem's security unless you have repr. builds, only your own, so the prioritization of which issue to fix first seems clear to me.

@gudenau @fribbledom Funny thing about reproducible builds is that you only need 1 trusted machine in the entire cluster. So you don't even need widespread adoption of trustworthy hardware - only one trusted machine in the cluster works.

@kekcoin @fribbledom But the fun thing, software can't detect hardware issues if correctly masked.

Just a giant rabbit hole, eh?

I wonder if there's dual port DDR.

@gudenau @fribbledom I can verify on trusted, slow machine A whether the output of untrusted, cheap, fast machine B matches the output of trusted, fast, expensive machine C.

@gudenau @fribbledom you compiled it with tcc and then made it compile itself.

@alcinnz @fribbledom it registers stuff under the hash but the compilation process isn't necessarily reproducible?

Do wonder if it might be useful to gather data on how compilations differ. Different people compiling it could also vouch for different outcomes.

@jasper @alcinnz @fribbledom

The hash uniquely identifies inputs (dependencies), recipe and source, and if the package is reproducible, content.

There is tooling to check reproducibility and find non-determinism sources. I use diffoscope when I need to compare two different builds.

From my experience, non determinism is often caused by timestamps, and less often by ordering issues or temporary file names. Debian has a more complete list of potential issues.

@fribbledom And yet you chose Go, one of the most hostile languages when it comes to packaging and reproducible builds? :thinking_cirno:

@lachs0r

But is it really? I think you underestimate how young the language and ecosystem is and ignore what they achieved with "dep" and Go modules in recent years.

It gives you all the tools you need to achieve reproducible builds now.

@fribbledom There can be no peace as long as vendoring and importing random shit from Git repositories is a requirement. This cannot ever work. I’m willing to die on this hill.

@lachs0r

There is no importing _random_ shit though. You pin _and_ checksum a dependency with Go modules.

It refuses to build if there's a checksum mismatch.

@fribbledom Yes. That way you get software inheriting dependencies on multiple versions of the same library. And it’s going to be a very painful experience if there’s an issue with any of these versions that needs fixing.

@fribbledom Look, this shit has been tried and proven to be absolutely horrible with not just Go, but also Ruby, Node, Lua, Rust, …
It’s just not going to work.

@lachs0r

...and C, C++, Java and every other language in existence. I'm really not sure what's the point you're trying to make?

@lachs0r

Well, yes, but what's the alternative you're suggesting?

This is why it's important to depend on well-maintained libraries.

If they have good reasons to depend on different versions, they can technically do that.

If that means they're inheriting a security risk from one of their dependencies, though, then it becomes their issue (and yours by inheritance) to deal with.

@fribbledom You know, C developers tend to be more disciplined in practice so you don’t need the vendoring in the first place and multiple programs on your system can indeed rely on the exact same version of a library. C developers usually rely on distributions to package their libraries, and thus it is in their best interest to make it easy. And if you’re looking for something less bad than C, give Zig a try. It conveniently also solves the problem of build systems(!)

youtube.com/watch?v=Gv2I7qTux7

@lachs0r

That's a bold claim that doesn't quite live up to my personal experience with C development.

However, you can try and enforce using the same upstream dependency & version for your project and all of _its_ dependencies in Go, as well. It certainly makes reviewing your deps a lot easier, I give you that.

Thanks for the video, Zig sounds intriguing!

@fribbledom It was really heartening to see Debian making an effort towards this.

@fribbledom Alright, let's talk about this. I'm a Python developer. I see the value of reproducible builds, but I'll have to choose:

Either the build only includes my package, but not dependencies, then it's worthless.

Or it includes deps, but then my users will not get security updates from dependencies.

(To say nothing of the fact that I can GPG sign my python package, but pip will not even check signature integrity.)

@rixx

Without deps reproducible builds are pretty much worthless indeed.

You will have to include them, and you will have to keep in sync with their releases and changes. As a developer that's one of the jobs you sign up for when picking a dependency. You inherit their issues and it's in your responsibility to act upon now.

So far packagers are doing the job for you, but should they really? Clearly nobody should know your own dependencies better than you, right?

@fribbledom Ah, you're seeing this issue only in terms of systems packages. Those are of no interest to my question, so forget I asked.

@fribbledom we do the same. We build everything from source, including the dependencies, and mirror most things so we know what we are building. That includes building the kernel with buildroot.

@fribbledom looks like there's an ongoing discussion on the reproducible builds topic here if you hadn't already seen it: github.com/golang/go/issues/16

Apologies if I've misunderstood :).

@fribbledom We developers "inherently trust" all the time, not only in packaging but adding a lot of dependencies from other people to our projects just hoping that they don't do anything malicious.

Truth is that when you install something, one person (usually) creates the package, but a lot more had contributed to the code in way the packager nor the author can expect.

A sourcemap will verity that dependency A is effectivelly the same in github, but it wont check if the code is malicious.

@fmartingr

Verifying the sources is a separate (yet connected) issue indeed, but it's useless if you don't know whether the sources you're verifying are the ones that have been packaged.

As a developer you should never blindly trust any dependencies. If you depend on some code, you will inherit all its flaws and issues, as well. That's your responsibility.

Luckily you're not alone and it's a collective process. The same needs be achieved for verifying build integrity.

@fribbledom Just curious, do you say this because you've found suspicious things?

As an Arch user I've getting used to check where my AUR packages come from and checking that the install files do, but one user can't do that for everything you install on a given system.

@fribbledom nix has reproducible builds iirc, I used it as my primary package manager for a while

@fribbledom Yes!

reproducible-builds.org/ is a must for systems a user can trust. "Debian's dirty secret", developer-supplied binaries, is no longer a thing. Over 90% of Debian, NixOS and I assume GuixSD are now bit-for-bit reproducible!
@Xjs @fribbledom Bazel/Pants/Buck are part Make, part lightweight/partial Nix/Guix, but they don't do anything about bit-for-bit reproducibility as far as I'm aware.

@clacke @fribbledom We use Bazel at my company to guarantee bit-by-bit identical builds. I have already made use of this for some regulatory checks we had to undergo. (Finance sector, many think the scrutiny here is high, I have little comparison to make)

@clacke @fribbledom Bazel is (as far as I'm aware) the only build system that uses content-based addressing of build artefacts based on hashes of the inputs. This, among other shenanigans like sandboxing and stripping of timestamps or other side effects, makes binaries bit-by-bit identical if built from the exact same sources. (Can give a few other nice things, too, like centralised artifact caching, distributed building…)

@Xjs @fribbledom Bazel, Pants, Buck, Nix/Guix. 😀

But great to hear that there is work on physical reproducibility there too, not just logical reproducibility. Is that your work, or does Bazel have knowledge about stripping and normalizing data?

@clacke @fribbledom Bazel does, and the contributed rules do, too. I think it's still possible to construct examples that circumvent it though. – Interesting, last time I checked (some years ago) there wasn't this much choice ;)

@Xjs @fribbledom If I understand and remember correctly, Bazel, Buck and Pants are all spiritual forks of Google's internal build system. Bazel by Googlers who stayed in Google, and the other two by the Google diaspora at LinkedIn/Facebook/whatever.

Nix is a decade older than all of them, but has a stricter turtles-all-the-way-down attitude, and no make-like functionality -- it's a meta buildsystem more than a build system.

But the greatest innovation the new 3 have is probably that they track changes via notifications rather than polling megabytes upon megabytes of directories on disk every time you ask if building is still a noop.
@Xjs @fribbledom I was first made aware of these new build systems when I joined my previous-previous assignment back in 2016.

I heard about Guix just before 0.9 came out in 2015 and that's how I learned about Nix.

@fribbledom

>But do you really trust your packagers?
nope

>How could you?
I trust the automated QA, marginally.

>Do you know them personally and monitor their packaging work?
yeah. that's why I don't trust them. also one is me

@fribbledom
>But do you really trust your packagers?

Yes, I do.

>How could you?

How could I not? I'd go crazy if I didn't trust them.

@fribbledom Sure it would be nice to have, but even then you are only marginally safer from malicious intent. Can you trust all the code in your dependencies? Are you able to check it all yourself? Probably not. At some point it comes down to trusting someone else, no matter how you turn it.

Is it good if you can reduce the number of people you need to trust? Yes, but don't expect to be able to get that number down to zero.

@murks

Of course I'm not suggesting that it'll ever be zero.

Clearly those are connected, but separate issues. However, without reproducible and verified builds, you don't even know if the source code you're reading is actually the source code used to generate the build you're running.

It's an important step in the right direction.

@murks @fribbledom Without reproducible builds, reading the source code isn't even entirely useful.

Reproducible builds is step zero in being able to verify your system from source.

@clacke @fribbledom Provided you are not compiling yourself, right? If you read all source you depend on, including compiler etc. and assuming you can trust the hardware and build the software yourself then you could trust it.

@murks No, that's a collective effort. You don't have to do this alone. Trust the public consensus, not a single entity.

@fribbledom My $0.02 USD on the topic is that, unless you are fluent in multiple programming langauges and have taken the time to comb through every line of code in a package, you can never be 100% certain that a package is built as intended and will work without any maliciousness. I would say very few people have the expertise to be able to review all the code in the various languages, even less have the time to review hundreds of thousands of lines of code.

@fribbledom At some point, you just have to decide who to trust and hope they're not trying to screw you over.

@jazzyeagle

You're not supposed to do that yourself and alone. The point is that we need to set up systems that let us verify integrity collectively. Reproducible builds are just one step towards that. Only once we achieve that, we can actually rest assured, that the source code we're verifying is actually the code we're running. And again, that's a collective process with open source.

@fribbledom I guess I don't fully understand your assertion. The question is whether or not we can trust those who package the software in our repos. Setting up a system that verifies integrity is in itself another piece of software we would need to trust written by others. I'm not sure how this is any different than trusting the current packagers in our respective distros.

@jazzyeagle

If just one entity did that, it would indeed be useless. The point is that this can be done collectively. Anyone can run such a system and verify the packages themselves.

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!