wherein i describe a minimal framework for a forth-inspired virtual machine • Mon#34 of 2025
(this is the abridged version of this post. for the original, fully incomprehensible version, go here.)
a fun thing about myself is that i am unreasonably obsessed with forth. there are many reasons for this, but one is its sheer Applicability.
by nature, the complete design of a forth system encompasses a reasonably wide array of representations. the unique design objectives of forth necessitate a somewhat bespoke way of storing code, data, and the relationships between them.
due to the Autism at hand, i am intimately familiar with All of that. (my humility knows no bounds.) also due to the autism at hand, i find myself unavoidably thinking of ways to apply forth concepts to just about anything i make nowadays.
another fun thing about me is that i spent a prolonged portion of my early programming journey in the esoteric programming language community, where every other person had designed like ten different programming languages. while about half of these were simple aliased brainfucks, the other half contained often very syntactically... unique languages.
via absurd levels of exposure to the above mentioned, i developed an unhealthy addiction to designing programming languages. my many first attempts at this were subpar to say the least, but i kept doing it anyways, because i was a child and it was fun.
the point is, this persists to this day.
when you've truly encountered just about every type of programming language you can possibly expect to, designing a programming language is a process that results in your brain pondering about a gazillion different possibilities in every single aspect of said design. one of these is the back-end architecture of the language, i.e., whether you're compiling, or walking an ast, or implementing some sort of vm, etc.
because i am a real human, and not, say, gcc or mike pall, the only feasible way for me to implement a language that doesn't run glacially is a virtual machine.
now, the intersection of vectors that formed this particular concept makes itself visible. let's fill in the blanks.
MJMwas creating aSHITTY ABRIDGED VERSION OF HIS ARTICLE, when he thought ofIMMEDIATE DEATH, becauseIT'S ASS.
not quite what i was thinking of
MJMwas creating aPROGRAMMING LANGUAGE, when he thought ofFORTH
the basic idea with this kind of vm is to thrift a forth dictionary and then tear out the bits we don't need.
as a reminder, a forth's dictionary is the structure in which it stores its words, or routines. the dictionary exists because forth systems are inherently interactive, and all routines must be discoverable so that a user may write code using them.
generally, this dictionary is implemented as a linked list, with each element containing:
the first two are self-explanatory, but the last isn't.
the advertised power of a good forth comes from its metaprogramming. namely, the fact that every single compiled word can be executed at compile time, and modify the compilation environment freely. this is, in fact, how every single compile time construct is implemented, including the very concept of defining a new word.
the way this works is by giving every single entry in the dictionary a flag which tells the compiler if it must be executed right now, or be compiled into the current word definition. here's an example of this:
\ clarifications
\ full-line comments look like what you're reading now
\ inline comments are ( parenthesised )
\ : name ... ; defines a word with code ...
\ the word `.` pops and outputs a number on top of the stack
\ the word `immediate` makes the latest word an immediate one
: push-3 3 ; immediate
: push-5 5 push-3 ;
. \ will output 3, since push-3 was triggered just now
push-5 . \ will output 5, since push-3 wasn't actually
\ compiled into the word
for our purposes, this will turn out to be quite useful. if we have a method for compactly referencing a word, immediate or not, we can create some pretty complex structure in our code in a fairly small space.
another fun trait of forth is that all words are capable of reading data ahead of their respective token. for example, you may have seen that in the above forth snippet we can push integers by simply writing them in base 10.
when compiled into a word definition, the actual resulting data is a token for a word called lit, followed by the integer being pushed. as you may guess, when executed, lit reads an integer ahead of it, pushes it to the stack, and bumps the instruction pointer to skip to the next actual token.
now that we've got an understanding of forth's toolset, we can gracefully swipe it from its earthy pedestal and be on our way.
we define words in terms of other words. this is True. another generally true thing is that, due to the aggressive factorisation that this method of code organisation encourages, a lot of words will tend to use words defined not too many dictionary entries behind them.
therefore, it'll be beneficial for us to offer some sort of short-form token for words referenced via a relative offset from the current word.
what's shorter than a byte? (well, a few things, but they're not as convenient to process.) let's say that a byte, with its top bit cleared, represents a relative offset into the dictionary from the current word being defined.
of course, this limits us to only indexing the last 128 defined words. this may be okay for some uses, but it's probably best to offer a method of accessing more words in case our bytecode package is particularly large.
let's say that any byte with its top bit set is the higher byte of a 16-bit absolute offset into the dictionary. i.e., if you have a token 0x8000, then the vm will reference the first defined word in the dictionary.
the final thing we really need to define the structure of our program is a delimiter for words. for simplicity's sake, let's just make that a zero byte.
you may now be wondering: given a completely empty dictionary, how am i meant to implement anything at all?
valid point! our system is built on the concept of referencing existing words, and we can't do that if there aren't existing words.
the solution to this, of course, is simply to pre-populate the dictionary with some standard set of words we can use to build out the rest of our program. this is where the fun part comes in, and where the "framework" part of this post's summary becomes relevant! these pre-existing words can be just about anything, and can be chosen based on what the virtual machine's being used for.
that is all. considerably less nuclear explosion-y than the original version, i'd say.
fin
2025 © mjm
this page licensed under
CC BY-SA 4.0
return