A FORTHISH VM

wherein i eventually describe a framework for a potentially shitty virtual machine inspired by forth's dictionary • Sun#33 of 2025

it is a little-known fact that i am a borderline hysteric potentially dangerous to others forth enthusiast. the list of reasons for this numbers "many", but that is likely a topic for another, likely five billion years in the future, blog post.

irregardless, the point is this: whenever i'm thinking of a solution to just about anything, my mind unavoidably at least once ends up pondering the all-unimportant question: "can i somehow make this forth". the unfortunate thing is that, while one'd expect this question to be a generally useless one most of the time, this turns out to not be true, because the design of an indirectly-threaded forth happens to cover a lot of different types of Thing.

of course, this is a simple byproduct of the core objective of a forth system. by nature, the complete design of a forth encompasses a considerable range of Representations. how does one represent Code, Data, Links Between The Aforementioned, and perhaps the Likely Insane Character of the System's Author. this leads to an efficient structuring of just about anything a computer must somehow store, in a way that is optimised to the hells and beyond by a long lineage of programmers united in the pursuit of making the Forth as immorally efficient as possible without sacrificing usability.

none of the last paragraph is that relevant but perhaps you now better understand the retrospectively fragile reasons i am so obsessed with forths. Point Is, for mjm to be designing something is for mjm to consider about a gazillion forth-inspired design paths. a truly healthy habit.

another little-known fact is that i have an arguably unhealthy penchant for designing programming languages. this, much like my forth obsession, is likely influenced in no small part by the fact that i spent a considerable portion of my early programming journey in the esoteric programming community, which is a plaza characterised by its ability to attract people who, soon, would find themselves to have an arguably unhealthy penchant for designing programming languages.

to design a programming language as someone who has encountered every type of programming language is an endeavour that will inevitably involve some degree of ponderance over a back-end architecture for said language. in other words, Are We Compiling Or Using A Vm Or Doing A Jit Type Deal Or Attaining Nirvana Through Buddhism In Order To Build The Degree Of Patience Required To Watch A Non-Vm Interpreted Language Print The Words "Hello World" To Our Teletype, Boss.

you may at this point begin to see the intersection of vectors that lead to the formation of this article. let's fill in the blanks.

MJM was creating a PROGRAMMING LANGUAGE, when he thought of FORTH, because HE'S A MANIAC.

no, wrong, bad. try again.

because HE'S A MANIAC

the real answer is because the only practical way for me, someone who isn't gcc or llvm, to write a non-glacial language implementation is a virtual machine. and as a virtual machine falls under the category of "Anything", i naturally ended up trying somehow to relate this to forth.

i say "somehow", but in this particular instance it's actually completely straightforward. one of the many things that forth is, is a computational model that kind of specialises in compromising between generally alright speed and Pretty Fuckin' Good space savings via aggressive factorisation.

the bytecode-storage aspect of a vm thing also happens to somewhat intersect with some of my prior forth-aligned projects, namely Heph which will promptly biodegrade me if i begin talking about it. overall, this seemed a fairly good place to put something forth-scented. in retrospect this is about a good idea as any other idea ever conceived, but i find it interesting regardless

because HIS TAKING TOO LONG

okay, asshole, i just got here.
we finally arrive at the actual thing the summary promises.

the thing the summary promises

the basic idea with this particular concept is to abuse the living hell out of Relative Tokens. see, a forth dictionary essentially operates by keeping a pointer to the final item in a reversely-linked list of Words, i.e. routines. the specific way that a forth implementation stores its words within this structure is up to the implementor, but the method that tends to offer the best balance between speed, space, and implementation complexity is something called Indirect Threading. this fancy concurrency-unrelated term boils down to something that actually ends up not really being that relevant to this article, so if you're interested go Here.

the premise always ends up being "we have a linked list, and each item contains information that you can use to determine how that particular word is executed". every practical forth implementation also stores a name associated with a given word so that you can actually do live programming. but we don't need that, so we'll cut it.

another thing that forths tend to store with these word entries is a flag named the Immediate Flag. basically, when compiling forth code, the compilation routine will look up each word it comes across, and if the immediate flag is set, it'll just execute the word on the spot instead of compiling it into the final result. if the flag isn't set then it'll emit a token which points to the execution info in the word's dictionary entry.

this addition allows some pretty damn powerful metaprogramming, and while we don't strictly require full on metaprogramming to Store A Program, such a thing will come in handy later.

how on earth is a word defined you dimwit

in terms of other words, genius. duh. you should know this! c'mon.
forth basically guarantees that every successive word in the dictionary is defined either with machine code, which doesn't concern us, or with references to other already existing words. in memory, these references are just pointers to dictionary entries, as described earlier. however, these pointers are obviously dependent on the exact memory layout of wherever we're running our program, so we can't exactly just store them.

this is where the "only prior words allowed" restriction comes in handy, since every reference within a word definition is guaranteed to be to a word some n steps behind it. this means that, in our format, we can just store that number n instead of a full, potentially 64-bit reference to a word that's located god knows where.

wait hold up hold on. how does forth even do computation

what? what's wrong with you? what?
forth is FAMOUSLY a stack-based language. i.e. instead of telling it to 5 + 3, you tell it to push 5, push 3, and add.

DUDE HOW ON EARTH AM I MEANT TO PUSH NUMBERS. EVERYTHING IS A RELATIVE OFFSET THUS FAR, NO?

well, okay, smartass.

this addition allows some pretty damn powerful metaprogramming, and while we don't strictly require full on metaprogramming to Store A Program, such a thing will come in handy later.

it is now later.
a cool and fun part of just about any words is the fact that they can sort of Eat data ahead of them in the bitstream. i.e., you can have a word lit which consumes some integer N ahead of it and pushes it to the stack.

likewise, we can have our immediate words in our token stream have this ability too. i.e., we can have a similar word which eats an integer ahead of it and puts a lit token and that integer consecutively into memory. so that's how, dickwad.

how this actually looks

okay let's define this concretely.

first of all, we want a way of marking the end of a word. let's just say that a zero byte will do this, as well as appending an exit word to the completed definition so that it has the ability to return.

next, it'd be nice to have a single-byte form for our dictionary offsets. let's say that any byte with its top bit cleared is a relative offset into the dictionary from the current word.

sick! cool. however, if we only have this single-byte relative token, then we can't have any more than like 128 words in our little bytecode package, which is enough for Some cases but i suspect may end up being a bit ass with bigger projects.

so, let's put that top bit to use! let's say that, if a byte has its top bit set, we'll also eat the byte ahead of it and merge them to form a 15-bit dictionary offset. for the sake of optimisation, let's also say that this longer offset applies forwards, to the start of the dictionary, instead of backwards to the current word. this minimises the amount of arithmetic required to find a word at a known index, which i'd argue is Good.

and just to reclarify, when we look for a word using our relative offset, we do execute it on the spot if it has its immediate flag set. this is also how we define our own immediate words.

you moron. you fool. how do i define an addition operation in terms of zero extant words?

when on EARTH did i say that we start with an empty dictionary.
remember how the summary says we'll be describing a framework for a vm? this is where that comes in. the idea is, for any given application, the developer (me (or you)) defines a set of words that will already exist in the initial dictionary. these words run actual native machine code, and as such can handle doing your measly little arithmetic, like Summation. dumbass.

this ability to put just about any functionality into the initial word set i feel is what makes this concept actually appealing. it allows you to compose fairly complex domain-specific functionality, and since most words are generally defined in regards to words pretty close to them in the dictionary, the final representation of this composite functionality ends up being pretty damn compact.

conclusion

i don't really have one. this post is my first and as it goes along is kind of an uncontrolled gas leak emitting my almost unprocessed thoughts on this topic. i'll likely end up writing an abridged version of this, since by this point the post is nigh incomprehensible and my explanations are kind of dogshit.

fin.