Follow

"Writing one byte to a pipe from a single-threaded process executes about 3900 instructions on Linux.." blog.regehr.org/archives/1676

@akkartik
I honestly would've expected it to be more than that.

@akkartik
> runtime code generation

The proposal sounds like an excellent operating system for attackers. I'd rather use a slow and boring system with high syscall overhead in exchange for full W^X in kernel code.

@niconiconi @akkartik Linux already implements dynamic code generation (BPF filters). It also supports loadable modules, which requires the kernel code to copy data from disk buffers (data space) into executable space. Dynamic code generation from a Turing-incomplete source description (e.g., implementing the precise semantics of open()) does not increase the attack surface any more than kernel modules do.

@niconiconi @akkartik That said, after reading the article, the whole thing is an outstanding and shining example for image-based VMs, such as that found on OS/400 or Smalltalk. The whole kernel and its associated runtime can be privilege DE-escalated if the backing VM is the only trusted code base in the system.

Looks like Dr. Kay and IBM were right. Again.

@vertigo @akkartik

> implements dynamic code generation (BPF filters).

Yes. But I can/have disabled BPF and BPF-JIT unconditionally on all my boxes, and practically almost no applications is affected, except for decreased performance for low-level network things. I recommend everyone to do it if they don't need it.

> It also supports loadable modules

Yes, But this can be disabled as well with minimum functionality loss. I did it on my server. Also, digital signature can be enforced for kernel modules.

@vertigo @akkartik On the other hand, if dynamic code generation is designed as an integral part of the system, the same cannot be said...

> from a Turing-incomplete source description does not increase the attack surface

This is interesting, have never thought about it, so it's not as dangerous as it seems.

BTW, I believe BPF is Turing-complete, is it?

@niconiconi @akkartik I believe it is, which is partly what motivated me to mention it.

I don't mind dynamic code generation, but it really bothers me when it's drawn from a Turing-complete source. As long as it's not Turing complete, you can statically prove useful properties of the generated code (see, e.g., how MIT's exokernel project implements disk management using dynamically synthesized code safely). As soon as Turing completeness is realized, all those guarantees go out the window.

@niconiconi @akkartik TTF font hinting bytecode is another attack vector which has been exploited. I don't think it's a coincidence that it, too, happens to be Turing complete. :/

@vertigo @akkartik I've never thought about secure code generation based on non-Turing complete construction, and falsely believed all runtime code generation is dangerous. Thanks for the comments!

@niconiconi @vertigo @akkartik What exactly is meant here by the "source" being turing-incomplete?

@neko @akkartik @niconiconi There is no way for the semantics of open() to crash the kernel or violate another process' memory space. That is guaranteed behavior. The code generated from it, which implements a file-descriptor-specific implementation of read()/write()/seek()/close()/..etc. unique to the filesystem resource you specify in the call to open(), can be completely statically proven never to fail or violate security guarantees. It can do this because it's not Turing complete.

@neko @akkartik @niconiconi Another example is bit-blit operations, prior to the widespread use of GPUs. Frequently, these would be synthesized into code which consists of several tight inner loops. Nonetheless, because the semantics of the blit is finite, you can statically compute both how many instructions will execute to complete the function, and because of that, which memory locations will be touched. In other words, the halting problem is solved for non-TC programs.

@vertigo @akkartik @niconiconi So its like saying that if a wildcard code used dynamic code generation to compile expresions, even though the generator could be TC, since wildcards are not, it's no danger?

@neko @akkartik @niconiconi As long as the *generator* was also trusted, yes. If you don't trust the generator, then a valid and correct wildcard spec can potentially cause the wrong code to be compiled.

The reason why non-TC is important is because proving non-TC semantics is much easier than proving TC semantics. (Interestingly, proving that a language is non-TC turns out to be pretty hard, though. See beza1e1.tuxen.de/articles/acci )

@vertigo @neko @akkartik The dynamic code generation in bitblit is an absolute classic. If it's unfamiliar to you, there is an complete description in Chapter 8, page 128 of the book Beautiful Code: Leading Programmers Explain How They Think. You can grab a copy from LibGen. booksdescr.org/item/index.php?

@niconiconi
I will definitely check it out. Sounds very interesting. :) dynamic code is super cool
@vertigo @akkartik I

@neko @akkartik @niconiconi It is indeed. But be careful. It's also super dangerous too. Play and have fun, but make sure you have a policy of "trust but verify" in place at every step. :)

@vertigo
I've only ever written offline dynamic code generators, so nothing that scary. Tbh i dont think I feel comfortable giving a user that much power outside of a sandbox, and even then...
@akkartik @niconiconi

@akkartik @vertigo @niconiconi I think that is still true, but some people are working on adding support for "bounded loops": lwn.net/Articles/773605/

@cstanhope @akkartik @vertigo Okay, so the current non-Turing complete BFP already enables numerous JIT-spraying attacks, so I cannot have any confidence on dynamic code generation...

@niconiconi @cstanhope @akkartik This is interesting; this is a class of attack I wasn't aware of. After reading up on it, it appears that this exploits a buggy JIT compiler, not a bug in the language it compiles.

When I google'd for BFP, eBFP, and JIT spraying, though, nothing comes up. Do you have specific links I can read more on?

@vertigo @niconiconi @cstanhope @akkartik I dimly remember a talk on eBPF from 35C3, but I think they concluded that at that point it was very hard to attack the kernel itself from eBPF:

media.ccc.de/v/35c3-9532-kerne

@galaxis @vertigo @cstanhope @akkartik While it's hard to attack the kernel directly, historically BPF-JIT has shown that it could aid the attacker in some ways.

SMEP Bypass: mainisusuallyafunction.blogspo

BPF-JIT also enables the easiest Spectre exploits: googleprojectzero.blogspot.com

Also, if the JIT compiler is buggy...

Privilege escalation via eBPF in Linux 4.9 and beyond: lwn.net/Articles/742170/

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!