*Update on the Mu computer's memory-safe language*

I wrote about it last week at akkartik.name/post/mu-2019-2, but already that post is growing out of date. Here's an initial sketch: akkartik.github.io/mu/html/app

So far all that works is function definitions with no arguments or body. They emit just the prologue and epilogue code sequences.

But I have a sketch of the next few steps of the design in there.

github.com/akkartik/mu#readme

Follow

The design in akkartik.name/post/mu-2019-2 has changed in 2 ways.

#1: no more uninitialized variables. There's just no safe way to avoid initialization when structs can contain pointers.

This adds overheads to local variables in two ways: performance, and translator complexity for length-prefixed arrays. We can't just zero out everything, we also have to write array lengths in. Since structs may contain arrays, initializing a variable may involve writing lengths to multiple locations.

#2: unwinding the stack when exiting a scope. I can't just increment the stack pointer, I need to also restore any necessary registers along the way.

Another complication is that I need Java's labeled `break`s: docs.oracle.com/javase/tutoria. Since `break` does double-duty for both conditionals and loops, it's too onerous to only be able to `break` out of one scope at a time. But labeled `break`s increase the frequency with which I need to unwind the stack. More overhead.

The common theme here seems to be that the translator needs to maintain 'symbolic' representations of contiguous memory regions, tracking either values or register bindings for offsets.

I'm not sure how I feel about these choices. Perhaps I should give up on this whole design and switch to something more traditional. A memory-safe C-like language. So I'd love to hear feedback and suggestions, either here or over email: ak@akkartik.com

Project link: github.com/akkartik/mu#readme

cc @vertigo @rain

*Update on the Mu computer's memory-safe language*

First statements now translating!

akkartik.github.io/mu/html/app

`fn foo {}` => function prologue and epilogue

`increment x` when x is on the stack.

`foo x` when x is on the stack and foo is a user-defined function.

Seems like small progress, but.. rampantgames.com/blog/?p=7745

I feel like I'm finally starting to get closure on news.ycombinator.com/item?id=1

(Project page: github.com/akkartik/mu#readme)

@vertigo @rain

*Update on the Mu computer's memory-safe language*

Not much to report this week. Last week I implemented the instruction `increment x` when `x` is on the stack. This week I did `x <- increment` when x is a register.

(In Mu, registers written to are return values, while memory locations written to are just passed in by reference.)

The good news: I now have a core data structure in place for code-generating different instructions, and this includes static dispatch.

akkartik.name/post/mu-2019-2

*Update on the Mu computer's memory-safe language*

In the last 2 weeks I went from being able to translate:

```
fn foo {
}
```

to:

```
fn foo n: int {
increment n
}
```

That required 2k LoC. So it seems fair to say that things are still in the "black triangles" phase. And there are still gaping holes. Variable declarations don't really exist yet. (I can just parse them.)

(Project page: github.com/akkartik/mu#readme)

@akkartik @rain @vertigo Please also make it so pointers can't unexpectedly be null. Catching segfaults to avoid the runtime overhead of checking for null, as Java and Go do, doesn't compensate for the fact that now every pointer dereference can cause an exception.

If you're not that concerned about performance, I think you should consider only allowing one reference at a time to any given heap object and accepting the extra copying that will require. You can add borrowed references later.

@vertigo @rain @akkartik The nice thing about unique references from a simplicity standpoint is that it's always easy to tell just from looking at the code when the object will be deallocated. This also creates more uniformity between dealing with literal structs and pointers to structs. There is a dialect of C ("Safe C" IIRC) that hides the user of pointers entirely, instead talking about "growable structs", but with that semantic the growable fields are the equivalent of nullable pointers.

@freakazoid I'm super sympathetic to null pointer issues, but this level of the stack is unfortunately not designed to statically guarantee much. Lots of cases currently lead to a hard abort at run time.

Since it's being built in unsafe machine code I'm choosing to spend my tiny complexity budget on guaranteeing memory safety. Other program bugs are going to need thorough automated tests (supported from the ground up)

This level will support static checking on the one above it.

@vertigo @rain

@akkartik @rain @vertigo Isn't it as easy to guarantee that a given variable can't be null as it is to guarantee that it points at a given type? Or are there situations where you need to use null where you can't statically guarantee it won't end up getting dereferenced?

@freakazoid I'm actually not sure how I'd perform this particular static check.

One example that comes to mind is a linked list. It has to terminate in null, right? How does the no-null crowd deal with it? Option types somehow?

@akkartik The way I'd do this is with some basic type inference. A nullable pointer is a union type between null and a non-nullable pointer. You know the result of certain operations on null, for example that it's false if evaluated as a boolean and that == null returns true. For each conditional branch, you can try to exclude subtypes from union types whenever you can narrow down the condition to a specific value. So for example `if (x)` will always execute its else block for null.

@akkartik Then in the branch you replace the type of the variable with a subtype for which the branch can execute. In the case of union<null, ptr<foo>> the subtype for one branch would just be null and for the other branch it would be ptr<foo>. The cool thing about this approach is that it's fully general and supports all use of a type as a subtype.

@akkartik Another cool thing is that you can incrementally expand the capabilities of the typechecker. You can start out only supporting == null and != null and only for pointers. You can add things like user-defined types, other kinds of subtypes, and additional type inference later.

@akkartik You can even do weird stuff like have your types propagate backwards, so that if you ever dereference a pointer you know that it can't be null there and it must be of a type that satisfies the operation. That leads to very concise code, but it can produce very confusing error messages because the typechecker has no idea what the user intended the type to be.

@akkartik I've been making heavy use of Python type annotations and the "mypy" typechecker for a parsing language I've been developing. It has a type called "Option[OtherType]" which is just syntactic sugar for "Union[NoneType, OtherType]". It works exactly the way I described, replacing types with subtypes in branches that are known to only execute with the given subtype, for example truth tests, comparisons with None, and "isinstance". And as a bonus it uses the same type inference machinery.

@freakazoid Yeah, option types are a familiar idea, but don't they require language support for pattern-matching, including ensuring that all cases are handled? Mu is a statement-oriented language, and even if-then-else gets cobbled together out of if-then. Let me think about whether I can make option types work in this setting. Thanks!

@akkartik You don't need pattern matching, but the type checker does need to be able to strip "option" off the type (or equivalently the null from the union type) in any branch where the pointer gets dereferenced. Which in practice just means some really basic type inference that only has to support the operations you expect people to use to guard against use of nulls.

Statement-oriented works fine as long as the typechecker can "see" everywhere code flow can jump from into those blocks.

@akkartik So, for example, you could use "goto" for everything as long as it's static or the set of labels a dynamic goto can jump to is enumerable, because the datastructure the typechecker should be working on is a flow graph. To produce a flow graph from spaghetti code, you chop it into basic blocks each starting with a label and ending with a goto or a label, add edges between adjacent blocks that aren't separated by gotos, and then add edges from all the gotos.

@freakazoid Yeah I have some experience with recreating structured control flow graphs out of basic blocks. Luckily I don't need it in Mu :)

@akkartik Ideally you can make a flow graph with a simple tree rewrite against the AST. Then your type checker/compiler looks a lot like an AST-walking interpreter except that it follows all branches except ones that can never execute. Your AST can even be bytecode, which is really just a pointerless representation of a tree after all.

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!