#BabelOfCode 2024
Week 2
Language: Forth
Confidence level: Low
PREV WEEK: https://mastodon.social/@mcc/113743302074837530
NEXT WEEK: https://mastodon.social/@mcc/113867584791780280
RULES: https://mastodon.social/@mcc/113676228091546556
So today's challenge looks *absurdly* easy, to the point I'm mostly just suspicious that part 2 will get hard. I figure this is an okay time to burn Forth.
I'm wanting to save Fortran for a week I can use the matrix ops. This puzzle looks suspiciously like part 2 will turn into a 2-dimensional array problem.
I *think* I'm doing this in pforth, for the simple reason that gforth, uh, isn't maintained anymore it seems, and so got dropped out of Debian Testing (which I have now)? I *think* I'd be *happier* using RetroForth, which is a "modern" Forth, but I guess it's better to learn the standardized, ANS Forth first. Even though everyone hates ANS Forth? Including the inventor of Forth…?
My biggest fear is there appears to be no way to read numbers written in ASCII from a file. We're predating ASCII
First problem I hit is comments don't work. The documentation specifically says text in parenthesis are comments, but it isn't accepted.
After some staring at the docs, I realize in all the examples, there are spaces. It turns out (Comment) is not a comment, but ( Comment ) is a comment. Because ( isn't a pure operator built in the language, rather there's a FORTH word ( that eats all words until ) is found. Holy crap. I never thought I'd say this but maybe it IS possible to self-host too hard
So this documentation is kinda very bad!
Lacunae I have noticed:
- They define a `KEY` operator for taking a character from STDIN, but don't explain what happens if `KEY` receives an EOF (experimentally: I seem to get a -1?)
- They explain a special syntax `CHAR n` for inserting the ASCII value of n directly into the code, but don't explain how the fuck you're supposed to represent the ASCII value for a non-character symbol such as a space or newline
More pforth documentation horrors
- The pforth tutorial is not a tutorial for pforth but rather a general forth tutorial, and therefore hedges itself frequently. For example, notice this section where it explains that "many forths" have a CASE statement. "Many forths"? What about THIS forth I'm reading the documentation to RIGHT NOW?
- ABORT not documented. The documentation lists it as a reserved word but not what it does
Wrote a version 1.0 of my program, testing it. All I've got so far is the character parser, reading the numbers in and decoding ASCII and that's it.
This is… bad. I would describe this as bad behavior for a programming language interpreter
The forth interpreter isn't completely busted, my test.f screenshot above worked. A simple "echo ascii values" program I wrote ( BEGIN KEY DUP . CR 0< UNTIL ) worked. But my 32 line, mildly more sophisticated program just… signal 11s. I do not know how to proceed. I am using "Debian Testing", which is TECHNICALLY a beta OS, so maybe the pforth is broken *subtly*. gforth isn't in dpkg. I don't… I don't know what to do next if the compiler crashes.
Maybe I ssh into a VPS and run gforth there? :(
So here's my *current* code, which crashes in pforth:
I run it in gforth:
cat "data/sample-2.txt" | gforth src/puzzle.f
I get:
in file included from *OS command line*:-1
src/puzzle.f:13: Interpreting a compile-only word
>>>BEGIN<<< ( Line )
Backtrace:
$7F5C6D55EB30 throw
Line 13 is indeed the word "BEGIN". According to the tutorial, that is how you open a UNTIL loop.
fuck the in What?
Okay. So some updates.
It turns out loops (BEGIN..UNTIL) are a "premium" Forth feature and are only available inside functions. So I need to wrap the whole program in a : function ; . Well not the whole program, not the VARIABLEs, and I don't think you can nest the functions, and… never mind. I do the nest. New code:
https://github.com/mcclure/aoc2024/tree/85890d80d89ebe82df779e20607f78cd4275f8db
gforth fails with :
src/puzzle.f:38: Invalid memory address
I wrapped my program in a : run [code here] ; run . Line 38 is "run".
I'm still lost.
A thing worth noting here is if you read my posts carefully above, you'll find I successfully executed a BEGIN .. UNTIL program in pforth. So pforth just relaxes the requirements of gforth. I don't think I've hit what is causing pforth to crash yet. I'm just trying to satisfy gforth's requirements for running the software at all. Maybe I should have read gforth's manual instead of assuming pforth's is adequate :(
This raises an interesting problem. I could have used a "nice" Forth like RetroForth or Factor(?) but I wanted to learn ANS Forth before I moved on to specializations. However, now I realize there are *only* specializations. Pforth is apparently giving me all kinds of niceties, the premium DLC is included at the toplevel. And I know for a fact gforth (of course, because that stands for GNU Forth) contains GNU extensions. So there are two standard Forths in Linux, neither actually standard.
Okay. So this explains my segfault! I can now run in pforth.
https://xoxo.zone/@clarity/113783794022449133
I accidentally wrote on one line `partial TRUE !` ; correct would be `TRUE partial !`. All variable assignments in Forth are done with pointer dereferences, so the wrong order was like trying to write to memory address TRUE (-1). It is as if C++ allowed you to write "true = x;" by accident instead of "x = true;" and write the address of x to 0x1
I am unblocked, but still can't do jack shit in gforth
Sorry, I kinda disappeared in the middle of asking a question. Me and @spookysquid were doing something extremely normal
*kicking and throwing things*
The tutorial for pforth https://www.softsynth.com/pforth/pf_tut.php#:~:text=if%20you%20want%20to%20find%20the%20ascii%20value%20for%20any%20character says if you want an ASCII constant you write: CHAR A
I want to print an A. I write CHAR A EMIT. I get:
A ? - unrecognized word!
INCLUDE error on line #18 , level = 1
CHAR A EMIT
^^^^^^
I write CHAR 'A' EMIT. This works. The documentation lies! The documentation lies!!
I'm getting unexpected results for .S. Wait, what results do I expect? Like ABORT, PFORTH DOCUMENTS .S EXISTS, BUT NOT WHAT IT DOES.
I cannot work under these conditions. I now see why everyone writes their own Forth interpreter instead of using a prepackaged one. BECAUSE IT'S THE ONLY WAY TO KNOW WHAT ANYTHING DOES.
I asked a while back how Forth is for text parsing. I was thinking like… structured text parsing. I did not think that "input a sequence of space-separated ASCII numbers" was going to turn out to be basically a workday and then I wouldn't have it working at the end
The number 101 is somehow poisoning my stack. I don't know why but the longer the program run the more instances of the number 101 show up on my stack. I don't know why. Once it begins the logic spirals out of control because 101 shouldn't be there. That's ASCII 'e' but I can't imagine why ASCII 'e' would wind up on my stack. This would be so much easier if I felt COMPLETELY SURE I knew what all the builtin words do, but it's lying to me about CHAR, so what else could it be lying about.
I think what I really need here is some kind of live debugger that steps ONE WORD AT A TIME and prints the stack out after EVERY word executes. I'm trying to put in debug prints, but SOMETHING IS PUTTING JUNK ON MY STACK and I don't know how to tell if it's my debug prints that are creating the stack clutter
UPDATE: I have found a Forth debugger of the type I was looking for, but it only runs in DOS
https://holonforth.com/debugforth.html
EDIT: Before anyone asks, no, the Compaq Portable III does not help here, it boots but we have no way to get data on or off of it
Alright! Progress.
While I was on the bus, @unlambda looked it up and pforth *has* a symbol-by-symbol trace (it's called… TRACE). Unlike most things in pforth, *this* feature is well documented. Now it also requires me to use INCLUDE, which *isn't* documented, and which I couldn't get to work, but n/m, I just pasted the whole program into the terminal each time.
What I found: Remember me not getting `CHAR A` to work, and experimentally finding `CHAR 'A'` as substitute?
…It's not a substitute.
I jokingly referred to a Forth "premium" mode before. Non-jokingly, this is called "compiled" mode. Inside a function is "compiled"; outside is "interpreted". Apparently running CHAR in compiled mode puts the number 82 on the stack. Why? I don't know. That's ASCII "R". I don't get it. Then 'A' *separately* puts 64 (ASCII A) on the stack.
@unlambda also found the solution here: I need to say `[CHAR] A`. The brackets mean "use the interpreted-mode version, not the compiled-mode version". (1/2)
Should the tutorial have mentioned this? Should the reference manual have? Should the "Starting Forth" book I read a big chunk of, with the friendly cartoons, have mentioned it? Should, for that matter, the spec— which defines separate interpreted and compiled semantics for *many* words, but not CHAR ( https://forth-standard.org/standard/core/CHAR )— have mentioned it? None of them did. I found out by word of mouth on Mastodon that CHAR (and possibly some other stuff?) needs square brackets to act normal. (2/2).
Once I got inside of TRACE, and was watching the code execute word by word, suddenly Forth was the "elegant", clear, mechanically-precise machine I had assumed I would find Forth to be coming in. I understood what each word did, and if something went wrong I understood what. This is interesting, as when working with full programs Forth has felt like sentient jello. (1/2)
My takeaway here is that Forth is a chaotic system. Small changes in initial conditions lead to large differences in outcomes. The stack means the meaning of any one statement is dependent on the entire history of the program to that point! That means Forth is precise if you understand exactly what you're doing, but if your understanding is even a *little* off— say, because the documentation for just one keyword in the whole program is unclear— you get chaotic behavior and all is lost. (2/2)
Anyway here's my 45-line code for a harness that reads in lines of ASCII numbers and prints and dumps the stack after each.
It's… awkward in places. It would have helped to have else-if, character literals for whitespace (like '\t' and '\r'), and a better story for complex boolean exps
(`DUP DUP DUP 9 = SWAP 13 = OR SWAP 32 = OR`. Ouch.) Parts feel nice and parts feel real bad.
I expect the ACTUAL PROGRAM will be a few lines long, and possibly only take mere minutes.
So functions in ANS forth, as far as I can tell, can't have locally-scoped variables…
…but there's no restrictions on what characters you can put in a variable name, so I can just store the variables in paths *cackles evilly*
Examples of valid comments in ANS Rust:
( You )
( You can (not) redo )
( You (can not) advance )
Examples of invalid comments in ANS Rust:
(You)
( You can ( not ) redo )
( You ( can not ) advance )
No, no, I'm sorry. I was fooled by my syntax highlighter. In testing with the actual pforth executable it appears literally all of the above examples are invalid uses of Forth comments except ( You ) .
( You can ( not ) nest comments. )
Bluh
Well, it didn't take minutes. And it wasn't "a few" lines, it was 48, about the same length as the input code. But I will say once I was doing "forthy things" (pure computation) instead of stuff Forth's bad at (regular programming) it was way quicker, easier. It did occasionally feel like flying. Some of these lines have a real clarity to them.
It's now very important I don't try to do part 2 tonight. In fact, I shouldn't even read it, so I don't *reads it* dammit
I have to get up SO early tomorrow, y'all
*Head in hands after unwisely reading part 2* Part 2 is basically one big chunk of Regular Programming. Hey uh. Does anyone know if there's a way in Forth to copy THE ENTIRE STACK somewhere and then recall it later? I can think of a way to *destructively* back up the stack (move it to the return stack temporarily) but *copying* the stack… I got nothing.
UGGGH no i think i can do this with DEPTH CELLS ALLOT ( https://www.forth.com/starting-forth/8-variables-constants-arrays/ ) and then… and then some other nonsense. I think. Okay I absolutely have to stop thinking about this now. This may not have been the wisest of all possible uses of my Monday
@mcc
The shortcut key is "alt+F4". The screenshot shows it running under Windows 98. That's cursed!
@mcc pforth has TRACE (this may not display well on Mastodon):
: ABCD CHAR 0 DUP . EMIT ;
ok
Stack<10>
TRACE ABCD
<< ABCD +0 <10:0> || CHAR >> ok
s
<< ABCD +4 <10:1> 0 || (LITERAL) 0 >> ok
s
<< ABCD +12 <10:2> 0 0 || DUP >> ok
s
<< ABCD +16 <10:3> 0 0 0 || . >> ok
s
0
<< ABCD +20 <10:2> 0 0 || EMIT >> ok
s
<< ABCD +24 <10:1> 0 || EXIT >> ok
s
Finished.
ok
@mcc And with the correct syntax, you see that the literal for the ASCII value of 0 is compiled in:
: ABCD [CHAR] 0 DUP . EMIT ;
ok
Stack<10>
TRACE ABCD
<< ABCD +0 <10:0> || (LITERAL) 48 >> ok
s
<< ABCD +8 <10:1> 48 || DUP >> ok
s
<< ABCD +12 <10:2> 48 48 || . >> ok
s
48
<< ABCD +16 <10:1> 48 || EMIT >> ok
s
0
<< ABCD +20 <10:0> || EXIT >> ok
s
Finished.
ok
@ddlyh @mcc Yes, I think that's right.
And the CHAR function just reads the next symbol straight from input, so it gets it rather than the interpreter.
So at the top level, in interpreter mode, CHAR will read the next symbol, give you the ASCII value of the symbol, and the interpreter never sees it.
But in compilation mode, the compile command is what's consuming these symbols. So it just looks up CHAR in the dict, and puts that in the entry for the current word. Then 0 is a valid literal so it puts in the literal 0. Then when you run the new command, CHAR has no input to read since it's not being run from the interpreter, and it returns 0, and the literal 0 also returns 0.
But if you use [CHAR], that will execute immediately in compilation mode; so it will now go and read the next symbol, and return a literal representing the ASCII value of it to the compilation command.
Yeah, it's a bit of a weird way of thinking about it.
@mcc Oh, this helps with understanding some FORTH idioms:
https://rickcarlino.com/2017/common-forth-symbols-and-idioms.html
The square brackets are "Bring compiler into and out of immediate mode". That's kind of what I expected; you use them to run this command now, which is what's needed from CHAR because it's a function that will just grab the next character from input (after any spaces) and give you its ASCII value. So you need it run before compilation.
Parentheses are a valid part of a word name, and they are conventionally for a low-level implementation detail.
@mcc OK, yeah, if I try in gforth, I get the behavior you're seeing; 'A' (or just 'A the closing quote is unnecessary) gives the character value.
This doesn't work in my version of pforth, where I need to use CHAR or [CHAR] if I'm in compilation context.
@mcc Oh, lol, the square brackets thing is just a naming convention as well.
Decided to look at the source, and look, there are definitions for CHAR and [CHAR] with the latter having an "immediate" annotation.
: CHAR ( <char> -- char , interpret mode )
bl parse drop c@
;
: [CHAR] ( <char> -- char , for compile mode )
char [compile] literal
; immediate
https://github.com/philburk/pforth/blob/master/fth/system.fth#L610-L616
@mcc Hmmm ... zForth is very simple, is written in C and has a "trace" option:
@mcc i agree, this is why i like a little type system in my forth : )
@typeswitch I'd be very curious about that; are there any extant forths you have in mind that work this way?
I've talked to Ramsey Nasser who was trying to describe something about each word in a concatenative language being like a mapping from one stack state to another, and this means we can impose types on that stack transformation.
for full static types, there is kitten: https://kittenlang.org/
and mirth (my project): https://github.com/mirth-lang/mirth
factor does do some stack depth checking but it doesn't have static type checking.
i don't know of anything more minimal & more forth-like, but then again type system is kind of a non-minimal addition and it always implies a redesign of the APIs because forth was not designed with type checking in mind.
@typeswitch *thinks* there's a terminal emulator named kitty now. i could run kitten in kitty
@typeswitch also i think if i'm looking for a strongly typed forth i'm no longer looking for minimal. anyway thanks for the refs!
@mcc @typeswitch in a parallel direction, there's also postscript, which doesn't have types but is "safe" in the sense of having a GC and dynamically-typed objects, without being quite as extravagant as Factor
@joe @typeswitch but are there IP-unencumbered implementations
@mcc @typeswitch hmm, hadn't thought about that angle honestly. is even ghostscript IP-encumbered?
@joe @typeswitch ¯\_(ツ)_/¯
@mcc @joe @typeswitch What IP issues would there be? Ghostscript is fully open source (AGPL, not really my favorite license, but eh).
Any Adobe patents would have long since expired.
@typeswitch @mcc i'd been hanging back as to not muddy the water, but i can recommend trying mirth. it's a far better experience than ieee1275 and maybe other forth like things.
@mcc Quit bad mouthing forth. It does exactly what you tell it to do. That just might not align with what you thought you were asking for or what you might have expected
@mcc I don't like this
@onelson Now, there's one important thing to note here. Which is that although the builtin comment operator does not support balanced nested comment parenthesis. The builtin comment operator is actually just a function that throws away all its arguments. Which means, in principle, I could write my *own* implementation of the ( comment operator, a better one supporting nesting, and replace the default ( with it. Whether this is very cool or only makes things worse you will have to judge yourself.
@onelson A corollary of my conclusion above— which I'm not completely sure about, but mostly sure about— is that if someone decided they just didn't *dig* comments, comments are slowing them down, I think they could simply FORGET the comment operator and then there would be no comment support in their program
@mcc oh dear
@mcc @onelson this touches on the core takeaway I had from interacting with some dedicated lispers, which was that at least some lispers are Completely OK with the idea that every program would evolve into a different sub-dialect of lisp. This idea terrifies me.
Maximum Expressiveness as interpreted as you being able to make the language confirm to the problem domain.
Minimum expressiveness as interpreted as anyone being able to actually understand your code.
@mcc What a cruel and angelic thesis!
@mcc it’s extremely on brand that there’s a listing with comments in Chapter 1 or 2, but he doesn’t bother explaining it until Chapter 3:
https://www.forth.com/starting-forth/3-forth-editor-blocks-buffer/
@mcc je ne comprends pas
@mcc ah. Je comprends.
@mcc Which installment of the Evangelion movie series is *that* the title to?
@mcc ... dammit why didn't I think of that.
@mcc my suggestion with forth is always: consider linked lists as an option