Nate Cull is a user on mastodon.social. You can follow them or interact with them if you have an account anywhere in the fediverse. If you don't, you can sign up here.

I forget who linked this wonderful article about the stunned horror and disbelief of the academic authors of the Coverity static analysis package when they encountered Real World Production Code in all its 'glory', but it's awesome.

courses.cs.washington.edu/cour

The thousand-year stare in every paragraph.

They've seen things, man.

<< Checking code deeply requires understanding the code’s semantics. The most basic requirement is that you parse it. Parsing is considered a solved problem. Unfortunately, this view is naïve, rooted in the widely believed myth that programming languages exist.

The C language does not exist; neither does Java, C++, and C#. >>

The sweet innocent children. They were shocked by this.

<<While a language may existas an abstract idea, and even have a pile of paper (a standard) purporting to define it, a standard is not a compiler. What language
do people write code in? The character strings accepted by their compiler. Further, they equate compilation with certification. A file their compiler does not reject has been certified as “C code” no matter how blatantly illegal its con tents may be to a language scholar.>>

This is a bit scary, though:

<<By default, companies refuse to let an external force modify anything; you cannot modify their compiler path, their broken makefiles (if they have any), or in any way write or reconfigure anything other than your own temporary files.>>

Well, YEAH. Your tool is there to READ, not WRITE. Do you have ANY idea how long it took to get ANYTHING working in that toolchain at all?

"It's really hard observing this species. By default, humans don't let you chop off limbs."

<<Unfortunately, the creativity of compiler writers means that despite two decades of work EDG still regularly meets defeat when trying to parse real-world large code bases.>>

This tale of woe supports my knee-jerk, bone-deep mistrust not only of compilers but the entire CONCEPT of compilation.

Compilers seem to be unfathomably complex hives of bugs.

Give us a language with a tiny, defined parse tree and then let's just do everything over the parse trees, please?

At very least it'll PARSE?

Nate Cull @natecull

<<Unfortunately, we sometimes
need a deeper view of semantics so are forced to hack EDG directly. This method is a last resort. Still, at last count (as of early 2009) there were more than 406(!) places in the frontend where we had an COVERITY to handle a specific, unanticipated construct. >>

What if, hypothetically, we used a language like Lisp instead of C

and I don't actually MEAN any actually-existing Lisp because good lord have you SEEN what's in the reader macros? (O)__(O)

· Web · 0 · 0

Engraved on the tombstone of the Internet.

<<The following event has
occurred during numerous trials. The tool finds a clear, ugly error (memory corruption or use-after-free) in important code, and the interaction with the customer goes like thus:

“So?”

“Isn’t that bad? What happens if
you hit it?”

“Oh, it’ll crash. We’ll get a call.”

[Shrug.]>>

<<Here is an open secret known to bug finders: The set of bugs found by tool A is rarely a superset of another tool B,
even if A is much better than B. >>

That... is a problem, though, isn't it?

If the tools are genuinely improving, shouldn't they be converging on finding the same bugs?

Why are our bug-finding tools NOT converging?

<<While users know in theory that the tool is “not a verifier,” it’s very different when the tool demonstrates this limitation, good and hard, by losing a few hundred known errors after an upgrade.>>

Yeah, because that's called a regression, and you're in the bug-finding business but you didn't regression test YOUR OWN PRODUCT.

<<but uncovering the effect of subtle bugs is still difficult because customer source code is complex and not available.>>

That's a pretty big problem, yeah.