mastodon.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
The original server operated by the Mastodon gGmbH non-profit

Administered by:

Server stats:

336K
active users

I accidentally found a security issue while benchmarking postgres changes.

If you run debian testing, unstable or some other more "bleeding edge" distribution, I strongly recommend upgrading ASAP.

openwall.com/lists/oss-securit

www.openwall.comoss-security - backdoor in upstream xz/liblzma leading to ssh server compromise

I was doing some micro-benchmarking at the time, needed to quiesce the system to reduce noise. Saw sshd processes were using a surprising amount of CPU, despite immediately failing because of wrong usernames etc. Profiled sshd, showing lots of cpu time in liblzma, with perf unable to attribute it to a symbol. Got suspicious. Recalled that I had seen an odd valgrind complaint in automated testing of postgres, a few weeks earlier, after package updates.

Really required a lot of coincidences.

AndresFreundTec

One more aspect that I think emphasizes the number of coincidences that had to come together to find this:

I run a number "buildfarm" instances for automatic testing of postgres. Among them with valgrind. For some other test instance I had used -fno-omit-frame-pointer for some reason I do not remember. A year or so ago I moved all the test instances to a common base configuration, instead of duplicate configurations. I chose to make all of them use -fno-omit-frame-pointer.

Afaict valgrind would not have complained about the payload without -fno-omit-frame-pointer. It was because _get_cpuid() expected the stack frame to look a certain way.

Additionally, I chose to use debian unstable to find possible portability problems earlier. Without that valgrind would have had nothing to complain.

Without having seen the odd complaints in valgrind, I don't think I would have looked deeply enough when seeing the high cpu in sshd below _get_cpuid().

There are more coincidences that are even less interesting. But even the above should make it clear how unlikely it was that I found this thing.

Just to be clear: I didn't mean that I didn't do good - I did. I mean that we got unreasonably lucky here, and that we can't just bank on that going forward.

@PJFDF @AndresFreundTec From now on, we can sleep soundly, knowing that Andres will find any future backdoors. Lesson learned. 🙏

@darabos @PJFDF @AndresFreundTec yep, he's in charge now. Do we still have to show up on Nov 4 and write his name in on the ballot?

@AndresFreundTec It must feel like checking below the car because of it making a strange sound and finding a nuke 😏

@AndresFreundTec Huge thank you for being so curious and determined. I myself often feel bad for looking deeper than seems necessary into issues (on company time). But this story showed, that we all should be really alert from now on

@AndresFreundTec

Nonetheless, amazing work. Hope you get a Gold Star at the office...

@AndresFreundTec outstanding...luck , knowledge and experience have met the right person at the right point in time.

@AndresFreundTec open source has nothing to lose essentially so incidents are just incidents - they happen everywhere all the time in every industry. open source has less interest in monetary gain so the impacts are comparatively less

@AndresFreundTec dont for a minute think that you are ever having a holiday again! We need you monitoring those timings! The world needs you! 🏅

@AndresFreundTec but there are thousands of devs running all sorts of tests. The question is not "what were the odds that you missed it", but "what were the odds that everybody missed it"...

@HydrePrever @AndresFreundTec The odds that everyone missed it are extremely high. It actually triggered a bunch of test failures, which the attacker explained away as ifunc side effects, and a lot of people simply accepted that explanation.

@gordonmessmer @HydrePrever @AndresFreundTec To be fair, test failures being considered "Flaky" isn't unheard of - as soon as I read that, I immediately thought of this StackExchange question: [ softwareengineering.stackexcha ].

What was the over/under on someone saying "Yeah, that test just fails sometimes. We don't know why though. Run it 8 times, and if it fails more than 3, then it's probably *actually* an issue.", and *that* helping to get it past the radar of scrutiny?

Software Engineering Stack ExchangeHow to get flaky tests fixed after having mitigated their flakinessRecently, I was charged with making about 9000 Selenium tests start running in CI/CD nightly. These tests had built up over about 8 years and had up until then been run in an ad hoc way. It was imp...

@AndresFreundTec That was more than just good - you probably stopped a devastating attack on the whole industry. The open source community owes you a huge debt of gratitude. Had that exploit gotten into the wild it would have been awful. Catching it early was an immense win.

I hope Microsoft recognizes how much of a contribution you made to the entire industry.

@AndresFreundTec Hey dude, you really save Millions, thanks a lot.

(Luck or not you did a fantastic job).

@AndresFreundTec Yep! Important to remember that one of the coincidences in the chain was that it landed in your lap, and you had built a set of skills over your career that were necessary to dig deeper. 🍻

@AndresFreundTec i mean, with this much luck, I wouldn't be surprised to have found out you bought a lottery ticket - although I guess it would be better to wait until the exploit was fully patched out.

Unlikely they attacked only one out of every possible package that could provide access to a system. How do we find all the other dormant hacks?

@AndresFreundTec lmfaooooooo the fact that this was found because of -fno-omit-frame-pointer confirms this is the final FOSS Discourse Event, literally every single thing that people have ever talked about in FOSS for the past year makes an appearance here

@praseodym Hah. I've apparently been doing this stuff for a while.

@AndresFreundTec I see you’ll be at Oxide and Friends, super cool! Unfortunate a bit too late for me (2am) so I’ll listen to it as a podcast.

I’m trying to understand the context a bit better: how did you get Debian with -fno-omit-frame-pointer, did you compile it yourself? Or did the valgrind errors came from PostgreSQL builds with liblzma linked to it?

@praseodym I do not have all of Debian built with -fno-omit-frame-pointer (although I do have a ~10 year old bug report about wanting a glibc package with frame pointers), just postgres. The errors came from postgres being built with libsystemd support, which in turn linked to liblzma.

@AndresFreundTec now that @fedora, @ubuntu and @archlinux have frame pointers, maybe this #xzorcist issue will encourage #Debian to follow suit

@AndresFreundTec I recognize -fno-omit-frame-pointer as being important for ASAN github.com/google/sanitizers/w which has some overlap in functionality with valgrind (but obviously a fundamentally different approach).

GitHubAddressSanitizerFlagsAddressSanitizer, ThreadSanitizer, MemorySanitizer - google/sanitizers