i happen to be a bit of a forthist, and one of the
things that is continuously repeated in forthy circles
is a fairly vocal opposition to local variables. when
i was first toying around with forth i thought that
was stupid, and littered my code with wonky variables,
endless PICK
s and ROLL
s,
wrote 5-7 line words, BEGIN WHILE REPEAT
s
for every single loop in the program, very bad code.
given that, i just sort of assumed that forth tends to
create write-only code every time, and that it was a
doomed language to begin with.
then i read
this article by james hague, and thought "huh.
that's a very neat and tidy solution. that's weird.
it's suspiciously comprehensible."
then i watched
this video by hans bezemer, and thought "huh.
that's a slightly less tidy solution, but now i see
that R@
is a pretty useful word, and i
never thought that writing forth code took that much
planning. it's surprisingly comprehensible, though."
then i challenged myself by writing a mandelbrot set
program. because i was bored. and i wasn't near my
computer at the time, so i did it on a piece of paper,
and all of the weird PICK
s and
ROLL
s seemed to just disappear into the
abyss. and then i realized i had written the whole
program, without having to use a single variable.
i got back to my computer and carefully typed it in.
it worked first try.
the essence of forth is in highly factored code,
taking large procedures and splitting them into many
smaller procedures. while it seems like the same
applies to other languages, it isn't nearly to the
same extent.
forth is a hard language. it's objectively
difficult to keep the entire stack in one's head when
reading through a word's definition. so, forth prefers
that the programmer keep definitions as short as
possible. one line is a good target to aim for.
yes. one line. in terms of grokkability,
that's roughly the equivalent of a 15-20 line C
function. two lines is like 50-60 lines. three lines
is like 150 or so lines. it rapidly grows out
of control. so, why would anyone ever decide to use
such a masochistic language?
but first, we must take a brief detour into:
toki pona is a constructed language with 120 i mean
122 i mean 134 words. constructed in this case does
not mean fake but engineered. it's
sort of like trying to argue a car is a fake mode of
transportation, because horses exist.
it has fairly simple grammar, especially when compared
to other languages. words never change form, there are
no prefixes or suffixes, all words exist in all parts
of speech at once. for instance, moku refers
simultaniously to the act of eating and also the most
common thing to perform that action
upon[0].
it also lacks recursion, in that there's no
way to take a pharse and turn it into an adjective.
in english, it's possible to say "the man with the red
hat wants coffee." in toki pona, there's no way to
take the phrase "with the red hat" ("li jo e len lawa
loje") and put it into the middle of "the man wants
coffee." ("jan li wile e telo wawa.") it'd have to
be split into two sentences. "this man wants coffee:
they have a red hat." ("jan ni li wile e telo wawa:
ona li jo e len lawa loje.") it's theoretically
possible to put it in one sentence, however
the phrase has to be reduced to the point where it
becomes slightly more ambiguous. "the red-hat man
wants coffee." ("jan pi len lawa loje li wile e telo
wawa.") in the 'Biz©, we call this a relative
clause.
it's nearly impossible to put two relative clauses in
the same sentence, because then it becomes
significantly less clear what each "this" refers to.
take: "the guy who was teaching the class of
biologists was talking to one student that was solving
a rubik's cube." it would become "this guy was talking
to this student: they were teaching a class of
biologists, they were solving a rubik's cube." ("jan
ni li toki tawa jan ni pi kama sona: ona li pana e
sona tawa jan mute pi sona soweli, ona lil pona e musi
leko.") in this case, the meaning becomes way more
muddied; it's hard to tell which "this" refers to
which sentence at a glance.
in this case, there are several ways to make it
easier to parse. the main option i'm going to mention
is the fact that it's possible to move the first
relative clause to its own sentence, before everything
else, to set up context. "someone was teaching the
class of biologists. they were talking to this
student: they were solving a rubik's cube." ("jan li
pana e sona tawa jan mute pi sona soweli. ona li
toki tawa jan ni: ona li pona e musi leko.")
this process of moving clauses out into other
sentences and referencing them with pronouns is called
anaphora by people much smarter than me and
"speaking" by people as smart as me, also known as me.
anaphora here is a specific type of
endophora, which is a fancy way of saying
"creating context for a statement, and using a pronoun
to refer back to it," and in a demonstration of irony
i have used endophora in my definition of endophora.
another weird thing about that definition is the fact
that "it" is technically ambiguous. it could
be referring to the statement, instead of the context.
this is even more of a problem in toki pona, where there are exactly two distinct third person pronouns, and one of them is a demonstrative pronoun. forming a long sequence of sentences without accidentally having some form of ambiguity in the middle is very, very difficult. it's impossible to refer to three things at once, so when that occurs, simpler non-pronoun phrases have to be used as substitutes. such as "the chair," "the person," things like that.
the tangent about toki pona isn't as pointless as it seems, because pretty much everything i said maps eerily well to forth programming. check it out:
forth is a programming language with
somewhere between 75 and 130 core words.
a word here being forth jargon for a
subroutine. it is stack oriented, meaning that most
data lives on a single "data stack."
it has hardly any syntax, especially when compared to
other programming languages.
the most notable thing about it is that it lacks
local variables, in that names can't be
assigned to particular elements on the stack.
or. well. they can, but it's very annoying,
and generally not good practice. still, that doesn't
mean it's just a toy language. in fact, i'd argue
locals are never really needed, and that their
presence in forth code is a sign of ineffective
planning. that said, it's very easy to plan
ineffectively, even while being aware of it.
take this piece of code:
/* terminal input buffer */ char tib[1024]; refill() { int c, i = 0; while (c = getchar(), c != '\n') if (c == '\b') --i; else tib[i++] = c; return i; /* the length */ }translating this directly to forth results in some problems. we at the very least only have two local variables, so we won't need to do super fancy stack management.
1024 chars buffer: tib : refill 0 begin key dup emit dup 13 <> while dup 8 = if drop 1 chars - else over tib + c! char+ then repeat ;but now we suffer from a new problem, which is that, although we don't use
PICK
nor
ROLL
, the code is still indecypherable.
that's because the conventions and complexity that
works in C doesn't remotely begin to approach working
in forth.
1024 chars buffer: tibthis is fine as-is. next, we need to tackle... all of
REFILL
. let's break it down:
: refill ( set the write head to the beginning ) ( read character ) ( is it a newline? ) ( if so, then stop ) ( is it a backspace? ) ( if so, move the write head back ) ( if not, write it and move the write head forward. ) ( go back to reading a character ) ;this is still pseudocode, but it gives us an idea of the sort of words we should define beforehand.
: setup ( set the write head to the beginning ) ; : advance< ( move the write head back ) ; : advance> ( move the write head forward ) ; : write ( write it ) ; : -newline? ( is it not newline? ) ; : backspace? ( is it a backspace? ) ; : read ( read character ) ;here i've isolated each individual operation and given it a somewhat reasonable name. it's fairly easy to implement these.
: setup 0 ; : advance< 1 chars - ; : advance> char+ ; : write over tib + c! ; : -newline? dup 13 <> ; : backspace? dup 8 = ; : read key dup emit ;additionally, the logic of checking the backspace should really be part of
WRITE
.
conviniently, forth allows us to redefine words in
terms of their past definitions, so we can just extend
WRITE
.
: write backspace? if drop advance< else write advance> then ;in fact, it seems like it might be more useful to put
DROP
and WRITE
into the
definitions of ADVANCE<
and
ADVANCE>
, but then the meaning of the
word changes, so let's define two new ones instead.
: write< drop advance< ; : write> write advance> ; : write backspace? if write< else write> then ;and now
REFILL
just seems to fall out for
free.
: refill setup begin key -newline? while write repeat drop ;the
DROP
at the end there drops the last
character inputted, because it's always a newline.
1024 chars buffer: tib : setup 0 ; : backspace? dup 8 = ; : -newline? dup 13 <> ; : advance< 1 chars - ; : advance> char+ ; : write< drop advance< ; : write> over tib + c! advance> ; : write backspace? if write< else write> then ; : read key dup emit ; : refill setup begin read -newline? while write repeat drop ;if you have a forth interpreter on hand, you can test this code and see that it does in fact work.
pona sona pi (wawa pi ma Lipija) poka toki lawa mute pi jan lawa nanpa tu pi ma Atilanisi.this is one noun phrase.[1] it is still incredibly hard to understand, due to its sheer length.
let me go back to the pseudocode REFILL
.
: refill ( set the write head to the beginning ) ( read character ) ( is it a newline? ) ( if so, then stop ) ( is it a backspace? ) ( if so, move the write head back ) ( if not, write it and move the write head forward. ) ( go back to reading a character ) ;here's one fun thing to point out: we never give the character we read a name. in C, we call it
int c;
, but in the pseudocode, we just
refer to it as "it."
To stretch our analogy to the limit, perhaps three elements on the stack corresponds to the three English pronouns "this," "that," and "t’other."[2]naming the last one a way more obscure word emphasizes the fact that it's just barely still in reach. it's uncomfortable to use, because the only word that reaches that far down is
ROT
.
A Forth word should not have more than one or two arguments. This stack which people have so much trouble manipulating should never be more than three or four deep.[3]
... unambiguous
& locals
so, what is the purpose of a local variable.
the purpose of a local variable is to assign a name to
a temporary value, because it makes it easier to keep
track of larger, more complicated operations.
given everything i said in the past 10 minutes, you
should be able to recognize what the argument against
this is fairly quickly.
I remain adamant that local variables are not only useless, they are harmful.the reason why forth programmers abstain from using local variables is the same reason why toki pona speakers tend to not use words like "ki:" they make longer, more complicated sentences easier to create, on top of complicating the language overall in terms of grammatical rules and total word count.
If you are writing code that needs them you are writing, [sic] non-optimal code? Don't use local variables. Don't come up with new syntaxes for describing them and new schemes for implementing them. You can make local variables very efficient especially if you have local registers to store them in, but don't. It's bad. It's wrong.[4]
PICK
and ROLL
are even
worse, because with a local variable it at least
has a consistent name, while with PICK
and ROLL
the index changes throughout
the program.
BEGIN
WHILE REPEAT
. the problem is with always
using it, even when BEGIN UNTIL
or
DO LOOP
would work just as well if not
better.