why locals are bad


a drawing of the word 'let' with a prohibition
              sign over it. the brainmade dot org logo.

what

i happen to be a bit of a forthist, and one of the things that is continuously repeated in forthy circles is a fairly vocal opposition to local variables. when i was first toying around with forth i thought that was stupid, and littered my code with wonky variables, endless PICKs and ROLLs, wrote 5-7 line words, BEGIN WHILE REPEATs for every single loop in the program, very bad code.
given that, i just sort of assumed that forth tends to create write-only code every time, and that it was a doomed language to begin with.

then i read this article by james hague, and thought "huh. that's a very neat and tidy solution. that's weird. it's suspiciously comprehensible."
then i watched this video by hans bezemer, and thought "huh. that's a slightly less tidy solution, but now i see that R@ is a pretty useful word, and i never thought that writing forth code took that much planning. it's surprisingly comprehensible, though."
then i challenged myself by writing a mandelbrot set program. because i was bored. and i wasn't near my computer at the time, so i did it on a piece of paper, and all of the weird PICKs and ROLLs seemed to just disappear into the abyss. and then i realized i had written the whole program, without having to use a single variable.

i got back to my computer and carefully typed it in.

it worked first try.

factorization

the essence of forth is in highly factored code, taking large procedures and splitting them into many smaller procedures. while it seems like the same applies to other languages, it isn't nearly to the same extent.
forth is a hard language. it's objectively difficult to keep the entire stack in one's head when reading through a word's definition. so, forth prefers that the programmer keep definitions as short as possible. one line is a good target to aim for.
yes. one line. in terms of grokkability, that's roughly the equivalent of a 15-20 line C function. two lines is like 50-60 lines. three lines is like 150 or so lines. it rapidly grows out of control. so, why would anyone ever decide to use such a masochistic language?
but first, we must take a brief detour into:

toki pona

toki pona is a constructed language with 120 i mean 122 i mean 134 words. constructed in this case does not mean fake but engineered. it's sort of like trying to argue a car is a fake mode of transportation, because horses exist.
it has fairly simple grammar, especially when compared to other languages. words never change form, there are no prefixes or suffixes, all words exist in all parts of speech at once. for instance, moku refers simultaniously to the act of eating and also the most common thing to perform that action upon[0].
it also lacks recursion, in that there's no way to take a pharse and turn it into an adjective. in english, it's possible to say "the man with the red hat wants coffee." in toki pona, there's no way to take the phrase "with the red hat" ("li jo e len lawa loje") and put it into the middle of "the man wants coffee." ("jan li wile e telo wawa.") it'd have to be split into two sentences. "this man wants coffee: they have a red hat." ("jan ni li wile e telo wawa: ona li jo e len lawa loje.") it's theoretically possible to put it in one sentence, however the phrase has to be reduced to the point where it becomes slightly more ambiguous. "the red-hat man wants coffee." ("jan pi len lawa loje li wile e telo wawa.") in the 'Biz©, we call this a relative clause.
it's nearly impossible to put two relative clauses in the same sentence, because then it becomes significantly less clear what each "this" refers to.

anaphora

take: "the guy who was teaching the class of biologists was talking to one student that was solving a rubik's cube." it would become "this guy was talking to this student: they were teaching a class of biologists, they were solving a rubik's cube." ("jan ni li toki tawa jan ni pi kama sona: ona li pana e sona tawa jan mute pi sona soweli, ona lil pona e musi leko.") in this case, the meaning becomes way more muddied; it's hard to tell which "this" refers to which sentence at a glance.
in this case, there are several ways to make it easier to parse. the main option i'm going to mention is the fact that it's possible to move the first relative clause to its own sentence, before everything else, to set up context. "someone was teaching the class of biologists. they were talking to this student: they were solving a rubik's cube." ("jan li pana e sona tawa jan mute pi sona soweli. ona li toki tawa jan ni: ona li pona e musi leko.")
this process of moving clauses out into other sentences and referencing them with pronouns is called anaphora by people much smarter than me and "speaking" by people as smart as me, also known as me. anaphora here is a specific type of endophora, which is a fancy way of saying "creating context for a statement, and using a pronoun to refer back to it," and in a demonstration of irony i have used endophora in my definition of endophora.
another weird thing about that definition is the fact that "it" is technically ambiguous. it could be referring to the statement, instead of the context.

pronoun disambiguation

this is even more of a problem in toki pona, where there are exactly two distinct third person pronouns, and one of them is a demonstrative pronoun. forming a long sequence of sentences without accidentally having some form of ambiguity in the middle is very, very difficult. it's impossible to refer to three things at once, so when that occurs, simpler non-pronoun phrases have to be used as substitutes. such as "the chair," "the person," things like that.

the twist everyone expected

the tangent about toki pona isn't as pointless as it seems, because pretty much everything i said maps eerily well to forth programming. check it out:

toki pona & forth

forth is a programming language with somewhere between 75 and 130 core words. a word here being forth jargon for a subroutine. it is stack oriented, meaning that most data lives on a single "data stack."
it has hardly any syntax, especially when compared to other programming languages.
the most notable thing about it is that it lacks local variables, in that names can't be assigned to particular elements on the stack. or. well. they can, but it's very annoying, and generally not good practice. still, that doesn't mean it's just a toy language. in fact, i'd argue locals are never really needed, and that their presence in forth code is a sign of ineffective planning. that said, it's very easy to plan ineffectively, even while being aware of it.

anaphora & factorization

take this piece of code:

/* terminal input buffer */
char tib[1024];

refill() {
  int c, i = 0;
  while (c = getchar(), c != '\n')
    if (c == '\b')
      --i;
    else
      tib[i++] = c;
  return i; /* the length */
}
translating this directly to forth results in some problems. we at the very least only have two local variables, so we won't need to do super fancy stack management.
1024 chars buffer: tib

: refill   0 begin key dup emit dup 13 <> while
               dup 8 = if
                 drop 1 chars -
               else
                 over tib + c! char+
               then
             repeat ;
but now we suffer from a new problem, which is that, although we don't use PICK nor ROLL, the code is still indecypherable. that's because the conventions and complexity that works in C doesn't remotely begin to approach working in forth.
let's do this from the bottom up. starting with the buffer.
1024 chars buffer: tib
this is fine as-is. next, we need to tackle... all of REFILL. let's break it down:
: refill   ( set the write head to the beginning )
           ( read character )
           ( is it a newline? )
             ( if so, then stop )
           ( is it a backspace? )
             ( if so, move the write head back )
             ( if not, write it and move the write head forward. )
           ( go back to reading a character ) ;
this is still pseudocode, but it gives us an idea of the sort of words we should define beforehand.
: setup   ( set the write head to the beginning ) ;
: advance<   ( move the write head back ) ;
: advance>   ( move the write head forward ) ;
: write   ( write it ) ;
: -newline?   ( is it not newline? ) ;
: backspace?   ( is it a backspace? ) ;
: read   ( read character ) ;
here i've isolated each individual operation and given it a somewhat reasonable name. it's fairly easy to implement these.
: setup   0 ;
: advance<   1 chars - ;
: advance>   char+ ;
: write   over tib + c! ;
: -newline?   dup 13 <> ;
: backspace?   dup 8 = ;
: read   key dup emit ;
additionally, the logic of checking the backspace should really be part of WRITE. conviniently, forth allows us to redefine words in terms of their past definitions, so we can just extend WRITE.
: write   backspace? if drop advance< else write advance> then ;
in fact, it seems like it might be more useful to put DROP and WRITE into the definitions of ADVANCE< and ADVANCE>, but then the meaning of the word changes, so let's define two new ones instead.
: write<   drop advance< ;
: write>   write advance> ;
: write   backspace? if write< else write> then ;
and now REFILL just seems to fall out for free.
: refill   setup begin key -newline? while write repeat drop ;
the DROP at the end there drops the last character inputted, because it's always a newline.
so, let's recap:
1024 chars buffer: tib

: setup   0 ;
: backspace?   dup 8 = ;
: -newline?   dup 13 <> ;
: advance<   1 chars - ;
: advance>   char+ ;
: write<   drop advance< ;
: write>   over tib + c! advance> ;
: write   backspace? if write< else write> then ;
: read   key dup emit ;
: refill   setup begin read -newline? while write repeat drop ;
if you have a forth interpreter on hand, you can test this code and see that it does in fact work.
this is what forth encourages the programmer to do, to split large definitions into many, smaller more managable ones. larger definitions are harder to read and reason about, in the same way as longer sentences in toki pona become less and less parsable.
pona sona pi (wawa pi ma Lipija) poka toki lawa mute pi jan lawa nanpa tu pi ma Atilanisi.
this is one noun phrase.[1] it is still incredibly hard to understand, due to its sheer length.

pronoun disambiguation & stack management

let me go back to the pseudocode REFILL.

: refill   ( set the write head to the beginning )
           ( read character )
           ( is it a newline? )
             ( if so, then stop )
           ( is it a backspace? )
             ( if so, move the write head back )
             ( if not, write it and move the write head forward. )
           ( go back to reading a character ) ;
here's one fun thing to point out: we never give the character we read a name. in C, we call it int c;, but in the pseudocode, we just refer to it as "it."
To stretch our analogy to the limit, perhaps three elements on the stack corresponds to the three English pronouns "this," "that," and "t’other."[2]
naming the last one a way more obscure word emphasizes the fact that it's just barely still in reach. it's uncomfortable to use, because the only word that reaches that far down is ROT.
this very closely resembles how, in toki pona, the anaphors (the "ni:"s) have to be kept to a minimum to retain comprehensibility. in forth, the number of elements in the stack needs to be kept to a minimum. chuck moore remarks:
A Forth word should not have more than one or two arguments. This stack which people have so much trouble manipulating should never be more than three or four deep.[3]

... unambiguous & locals

so, what is the purpose of a local variable.
the purpose of a local variable is to assign a name to a temporary value, because it makes it easier to keep track of larger, more complicated operations.
given everything i said in the past 10 minutes, you should be able to recognize what the argument against this is fairly quickly.

I remain adamant that local variables are not only useless, they are harmful.
If you are writing code that needs them you are writing, [sic] non-optimal code? Don't use local variables. Don't come up with new syntaxes for describing them and new schemes for implementing them. You can make local variables very efficient especially if you have local registers to store them in, but don't. It's bad. It's wrong.[4]
the reason why forth programmers abstain from using local variables is the same reason why toki pona speakers tend to not use words like "ki:" they make longer, more complicated sentences easier to create, on top of complicating the language overall in terms of grammatical rules and total word count.
PICK and ROLL are even worse, because with a local variable it at least has a consistent name, while with PICK and ROLL the index changes throughout the program.
local variables in forth make bad code easy instead of keeping bad code hard while complicating the language. that is the problem with locals.

footnotes & references

  1. this is more commonly known as "food." [back]
  2. it means "The influence of Libyan power on the election of Atlantis' vice president," and is explained here. [back]
  3. thinking forth, page 199 [back]
  4. 1x forth, on argument count [back]
  5. 1x forth, on local variables [back]
  6. there isn't anything inherently wrong with BEGIN WHILE REPEAT. the problem is with always using it, even when BEGIN UNTIL or DO LOOP would work just as well if not better.