misa

the brainmade dot org logo.
so, for the past few days as of writing, i've been attempting to write a forth system in javascript, however one fairly big issue with that is the fact that, ironically, forth gets harder to implement the higher level of a programming language you choose to implement it in. this is because a typical forth system relies on things like access to the return stack, throwing pointers around left and right, and, worst of all, being able to write certain procedures in machine code. to do this portably (in, for instance, javascript), one needs to use some sort of virtual machine.
initially, i attempted to use the pdp 11 as my base architecture to implement my forth, but i very quickly realized that actually implementing a pdp 11 virtual machine has a ton of complications. (for instance, 8 bit moves sign extend when the destination is a register, but overwrite the lowest bits when the destination is memory.)
so, i took what i liked about the pdp 11, got rid of most of what i didn't like, and that turned into misa, a simple little endian 16 bit orthogonal instruction set.
all numbers are written in octal, because. octal better.
there is an assembler i wrote in forth that may be helpful for this.

registers

there are eight* registers, appropriately named r0 through r7. there is no stack pointer, or accumulator. the only exception is that r7 is the instruction pointer.
encoding register name
000 R0
001 R1
010 R2
011 R3
100 R4
101 R5
110 R6
111 R7
while an instruction is being executed, R7 points to the next instruction, not the current one.

* - this is not including the flags register. there are two flags, a zero flag and a carry flag. every instruction affects the flag registers acording to their result.

addressing modes

encoding notation (on r0) behaviour
00 r0 direct addressing; stores/loads directly into/from a register.
01 @r0 indirect addressing; stores/loads into/from an address contained in a register.
10 @r0+ postincrement addressing; stores/loads into/from an address contained in a register, then adds 2 to that register.
11 @-r0 predecrement addressing; subtracts 2 from a register, then stores/loads into/from the address in that register.
due to the fact that the instruction pointer is a register, an additional addressing mode fall out for free.
encoding notation behaviour
10 #1234 immediate addressing; stores/loads into/from the next machine word in the code.

note that if both operands of an instruction are ones that affect the same register, the source gets evaluated first, then the destination.
all instructions are divided into two broad categories: single operand, and double operand.

single operand instructions

opcode mode register
0 0 0 0 0 0 0 0
opcode mnemonic
000 INC
001 DEC
010 NEG
011 NOT
100 LSH
101 RSH
110 [double operand]
111 [double operand]

000 INC

000 00 000                    INC R0
000 01 000                    INC @R0
000 10 000                    INC @R0+
000 11 000                    INC @-R0
000 10 111 01011101 00100111  INC 123456
    
adds one to its operand.
in the case of a literal like this, it effectively self-modifies. as in, the constant, as it is stored in the machine code, will get incremented.

001 DEC

001 00 000                    DEC R0
001 01 000                    DEC @R0
001 10 000                    DEC @R0+
001 11 000                    DEC @-R0
001 10 111 01011101 00100111  DEC 123456
    
subtracts one from its operand.

010 NEG

010 00 000                    NEG R0
010 01 000                    NEG @R0
010 10 000                    NEG @R0+
010 11 000                    NEG @-R0
010 10 111 01011101 00100111  NEG 123456
    
negates its operand. more specifically, using two's complement. this is equivalent to NOT INC, except that it evaluates its operand only once.

011 NOT

011 00 000                    NOT R0
011 01 000                    NOT @R0
011 10 000                    NOT @R0+
011 11 000                    NOT @-R0
011 10 111 01011101 00100111  NOT 123456
    
binary NOTs its operand.

100 LSH

100 00 000                    LSH R0
100 01 000                    LSH @R0
100 10 000                    LSH @R0+
100 11 000                    LSH @-R0
100 10 111 01011101 00100111  LSH 123456
    
multiplies its operand by 2, a binary shift left. note that this is a logical, i.e. unsigned, shift left.

101 RSH

101 00 000                    RSH R0
101 01 000                    RSH @R0
101 10 000                    RSH @R0+
101 11 000                    RSH @-R0
101 10 111 01011101 00100111  RSH 123456
    
divides its operand by 2, a binary shift right. note that this is a logical, i.e. unsigned, shift right.

double operand instructions

marker opcode dst. mode dst. register src. mode src. register
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
opcode mnemonic
0000 MOV
0001 CMOVEQ
0010 CMOVLT
0011 CMOVGT
0100 CALL
0101 CMP
0110 ADD
0111 SUB
1000 AND
1001 OR
1010 XOR

0000 MOV

11 0000 00 001 00 000                    MOV R1, R0
11 0000 00 001 01 000                    MOV R1, @R0
11 0000 00 001 10 000                    MOV R1, @R0+
11 0000 00 001 11 000                    MOV R1, @-R0
11 0000 00 000 10 111 01011101 00100111  MOV R0, 123456
    
moves the value at the source into the destination.

0001 CMOVEQ

11 0001 00 001 00 000                    CMOVEQ R1, R0
11 0001 00 001 01 000                    CMOVEQ R1, @R0
11 0001 00 001 10 000                    CMOVEQ R1, @R0+
11 0001 00 001 11 000                    CMOVEQ R1, @-R0
11 0001 00 000 10 111 01011101 00100111  CMOVEQ R0, 123456
    
moves the value at the source into the destination if the zero flag is set. if the move does not occur, register modifications in the destination's addressing mode are skipped.

0010 CMOVLT

11 0010 00 001 00 000                    CMOVLT R1, R0
11 0010 00 001 01 000                    CMOVLT R1, @R0
11 0010 00 001 10 000                    CMOVLT R1, @R0+
11 0010 00 001 11 000                    CMOVLT R1, @-R0
11 0010 00 000 10 111 01011101 00100111  CMOVLT R0, 123456
    
moves the value at the source into the destination if the carry flag is set. if the move does not occur, register modifications in the destination's addressing mode are skipped.

0011 CMOVGT

11 0011 00 001 00 000                    CMOVGT R1, R0
11 0011 00 001 01 000                    CMOVGT R1, @R0
11 0011 00 001 10 000                    CMOVGT R1, @R0+
11 0011 00 001 11 000                    CMOVGT R1, @-R0
11 0011 00 000 10 111 01011101 00100111  CMOVGT R0, 123456
    
moves the value at the source into the destination if both the zero and carry flags are clear. if the move does not occur, register modifications in the destination's addressing mode are skipped.

0100 CALL

11 0100 00 001 00 000                    CALL R1, R0
11 0100 00 001 01 000                    CALL R1, @R0
11 0100 00 001 10 000                    CALL R1, @R0+
11 0100 00 001 11 000                    CALL R1, @-R0
11 0100 00 000 10 111 01011101 00100111  CALL R0, 123456
    
CALL is a bit of a strange instruction. it sets the destination to R7, and sets R7 to the source. note that the source is still evaluated first. this is a more generalized call instruction.
the canonical CALL instruction is CALL @R6+, @R7+.

0101 CMP

11 0101 00 001 00 000                    CMP R1, R0
11 0101 00 001 01 000                    CMP R1, @R0
11 0101 00 001 10 000                    CMP R1, @R0+
11 0101 00 001 11 000                    CMP R1, @-R0
11 0101 00 000 10 111 01010101 00100111  CMP R0, 123456
11 0101 10 111 00 000 01011101 00100111  CMP 123456, R0
    
subtracts the source from the destination, with carry, setting flags, then discards the result.

0110 ADD

11 0110 00 001 00 000                    ADD R1, R0
11 0110 00 001 01 000                    ADD R1, @R0
11 0110 00 001 10 000                    ADD R1, @R0+
11 0110 00 001 11 000                    ADD R1, @-R0
11 0110 00 000 10 111 01011101 00100111  ADD R0, 123456
    
adds the source to the destination, with carry.

0111 SUB

11 0111 00 001 00 000                    SUB R1, R0
11 0111 00 001 01 000                    SUB R1, @R0
11 0111 00 001 10 000                    SUB R1, @R0+
11 0111 00 001 11 000                    SUB R1, @-R0
11 0111 00 000 10 111 01011101 00100111  SUB R0, 123456
    
subtracts the source from the destination, with carry.

1000 AND

11 1000 00 001 00 000                    AND R1, R0
11 1000 00 001 01 000                    AND R1, @R0
11 1000 00 001 10 000                    AND R1, @R0+
11 1000 00 001 11 000                    AND R1, @-R0
11 1000 00 000 10 111 01011101 00100111  AND R0, 123456
    
binary ANDs the source with the destination.

1001 OR

11 1001 00 001 00 000                    OR R1, R0
11 1001 00 001 01 000                    OR R1, @R0
11 1001 00 001 10 000                    OR R1, @R0+
11 1001 00 001 11 000                    OR R1, @-R0
11 1001 00 000 10 111 01011101 00100111  OR R0, 123456
    
binary ORs the source with the destination.

1010 XOR

11 1010 00 001 00 000                    XOR R1, R0
11 1010 00 001 01 000                    XOR R1, @R0
11 1010 00 001 10 000                    XOR R1, @R0+
11 1010 00 001 11 000                    XOR R1, @-R0
11 1010 00 000 10 111 01011101 00100111  XOR R0, 123456
    
binary XORs the source with the destination.