misa
so, for the past few days as of writing, i've been
attempting to write a forth system in javascript,
however one fairly big issue with that is the fact that,
ironically, forth gets harder to implement the
higher level of a programming language you
choose to implement it in. this is because a typical
forth system relies on things like access to the return
stack, throwing pointers around left and right, and,
worst of all, being able to write certain procedures in
machine code. to do this portably (in, for
instance, javascript), one needs to use some sort of
virtual machine.
initially, i attempted to use the
pdp 11 as my base architecture to implement my
forth, but i very quickly realized that actually
implementing a pdp 11 virtual machine has a ton of
complications. (for instance, 8 bit moves sign extend
when the destination is a register, but overwrite the
lowest bits when the destination is memory.)
so, i took what i liked about the pdp 11, got rid of
most of what i didn't like, and that turned into misa,
a simple little endian 16 bit orthogonal instruction
set.
all numbers are written in octal, because. octal better.
there is an assembler i wrote
in forth that may be helpful for this.
registers
there are eight* registers, appropriately named
r0
through r7
. there is no
stack pointer, or accumulator. the only exception is
that r7
is the instruction pointer.
encoding |
register name |
000 |
R0 |
001 |
R1 |
010 |
R2 |
011 |
R3 |
100 |
R4 |
101 |
R5 |
110 |
R6 |
111 |
R7 |
while an instruction is being executed, R7
points to the next instruction, not the current
one.
* - this is not including the flags register.
there are two flags, a zero flag and a carry flag. every
instruction affects the flag registers acording to their
result.
addressing modes
encoding |
notation (on r0 ) |
behaviour |
00 |
r0 |
direct addressing;
stores/loads directly into/from a register. |
01 |
@r0 |
indirect addressing;
stores/loads into/from an address contained in a
register. |
10 |
@r0+ |
postincrement addressing;
stores/loads into/from an address contained in a
register, then adds 2 to that register. |
11 |
@-r0 |
predecrement addressing;
subtracts 2 from a register, then stores/loads
into/from the address in that register. |
due to the fact that the instruction pointer is a
register, an additional addressing mode fall out for
free.
encoding |
notation |
behaviour |
10 |
#1234 |
immediate addressing;
stores/loads into/from the next machine word in
the code. |
note that if both operands of an instruction are ones
that affect the same register, the source gets evaluated
first, then the destination.
all instructions are divided into two broad categories:
single operand, and double operand.
single operand instructions
opcode |
mode |
register |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
opcode |
mnemonic |
000 |
INC |
001 |
DEC |
010 |
NEG |
011 |
NOT |
100 |
LSH |
101 |
RSH |
110 |
[double operand] |
111 |
[double operand] |
000 INC
000 00 000 INC R0
000 01 000 INC @R0
000 10 000 INC @R0+
000 11 000 INC @-R0
000 10 111 01011101 00100111 INC 123456
adds one to its operand.
in the case of a literal like this, it effectively
self-modifies. as in, the constant, as it is stored in
the machine code, will get incremented.
001 DEC
001 00 000 DEC R0
001 01 000 DEC @R0
001 10 000 DEC @R0+
001 11 000 DEC @-R0
001 10 111 01011101 00100111 DEC 123456
subtracts one from its operand.
010 NEG
010 00 000 NEG R0
010 01 000 NEG @R0
010 10 000 NEG @R0+
010 11 000 NEG @-R0
010 10 111 01011101 00100111 NEG 123456
negates its operand. more specifically, using two's
complement. this is equivalent to NOT INC
,
except that it evaluates its operand only once.
011 NOT
011 00 000 NOT R0
011 01 000 NOT @R0
011 10 000 NOT @R0+
011 11 000 NOT @-R0
011 10 111 01011101 00100111 NOT 123456
binary NOTs its operand.
100 LSH
100 00 000 LSH R0
100 01 000 LSH @R0
100 10 000 LSH @R0+
100 11 000 LSH @-R0
100 10 111 01011101 00100111 LSH 123456
multiplies its operand by 2, a binary shift left. note
that this is a logical, i.e. unsigned, shift left.
101 RSH
101 00 000 RSH R0
101 01 000 RSH @R0
101 10 000 RSH @R0+
101 11 000 RSH @-R0
101 10 111 01011101 00100111 RSH 123456
divides its operand by 2, a binary shift right. note
that this is a logical, i.e. unsigned, shift right.
double operand instructions
marker |
opcode |
dst. mode |
dst. register |
src. mode |
src. register |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
opcode |
mnemonic |
0000 |
MOV |
0001 |
CMOVEQ |
0010 |
CMOVLT |
0011 |
CMOVGT |
0100 |
CALL |
0101 |
CMP |
0110 |
ADD |
0111 |
SUB |
1000 |
AND |
1001 |
OR |
1010 |
XOR |
0000 MOV
11 0000 00 001 00 000 MOV R1, R0
11 0000 00 001 01 000 MOV R1, @R0
11 0000 00 001 10 000 MOV R1, @R0+
11 0000 00 001 11 000 MOV R1, @-R0
11 0000 00 000 10 111 01011101 00100111 MOV R0, 123456
moves the value at the source into the destination.
0001 CMOVEQ
11 0001 00 001 00 000 CMOVEQ R1, R0
11 0001 00 001 01 000 CMOVEQ R1, @R0
11 0001 00 001 10 000 CMOVEQ R1, @R0+
11 0001 00 001 11 000 CMOVEQ R1, @-R0
11 0001 00 000 10 111 01011101 00100111 CMOVEQ R0, 123456
moves the value at the source into the destination if
the zero flag is set.
if the move does not occur, register modifications in
the destination's addressing mode are skipped.
0010 CMOVLT
11 0010 00 001 00 000 CMOVLT R1, R0
11 0010 00 001 01 000 CMOVLT R1, @R0
11 0010 00 001 10 000 CMOVLT R1, @R0+
11 0010 00 001 11 000 CMOVLT R1, @-R0
11 0010 00 000 10 111 01011101 00100111 CMOVLT R0, 123456
moves the value at the source into the destination if
the carry flag is set.
if the move does not occur, register modifications in
the destination's addressing mode are skipped.
0011 CMOVGT
11 0011 00 001 00 000 CMOVGT R1, R0
11 0011 00 001 01 000 CMOVGT R1, @R0
11 0011 00 001 10 000 CMOVGT R1, @R0+
11 0011 00 001 11 000 CMOVGT R1, @-R0
11 0011 00 000 10 111 01011101 00100111 CMOVGT R0, 123456
moves the value at the source into the destination if
both the zero and carry flags are clear.
if the move does not occur, register modifications in
the destination's addressing mode are skipped.
0100 CALL
11 0100 00 001 00 000 CALL R1, R0
11 0100 00 001 01 000 CALL R1, @R0
11 0100 00 001 10 000 CALL R1, @R0+
11 0100 00 001 11 000 CALL R1, @-R0
11 0100 00 000 10 111 01011101 00100111 CALL R0, 123456
CALL
is a bit of a strange instruction. it
sets the destination to R7
, and sets
R7
to the source. note that the source is
still evaluated first. this is a more generalized call
instruction.
the canonical CALL
instruction is
CALL @R6+, @R7+
.
0101 CMP
11 0101 00 001 00 000 CMP R1, R0
11 0101 00 001 01 000 CMP R1, @R0
11 0101 00 001 10 000 CMP R1, @R0+
11 0101 00 001 11 000 CMP R1, @-R0
11 0101 00 000 10 111 01010101 00100111 CMP R0, 123456
11 0101 10 111 00 000 01011101 00100111 CMP 123456, R0
subtracts the source from the destination, with carry,
setting flags, then discards the result.
0110 ADD
11 0110 00 001 00 000 ADD R1, R0
11 0110 00 001 01 000 ADD R1, @R0
11 0110 00 001 10 000 ADD R1, @R0+
11 0110 00 001 11 000 ADD R1, @-R0
11 0110 00 000 10 111 01011101 00100111 ADD R0, 123456
adds the source to the destination, with carry.
0111 SUB
11 0111 00 001 00 000 SUB R1, R0
11 0111 00 001 01 000 SUB R1, @R0
11 0111 00 001 10 000 SUB R1, @R0+
11 0111 00 001 11 000 SUB R1, @-R0
11 0111 00 000 10 111 01011101 00100111 SUB R0, 123456
subtracts the source from the destination, with carry.
1000 AND
11 1000 00 001 00 000 AND R1, R0
11 1000 00 001 01 000 AND R1, @R0
11 1000 00 001 10 000 AND R1, @R0+
11 1000 00 001 11 000 AND R1, @-R0
11 1000 00 000 10 111 01011101 00100111 AND R0, 123456
binary ANDs the source with the destination.
1001 OR
11 1001 00 001 00 000 OR R1, R0
11 1001 00 001 01 000 OR R1, @R0
11 1001 00 001 10 000 OR R1, @R0+
11 1001 00 001 11 000 OR R1, @-R0
11 1001 00 000 10 111 01011101 00100111 OR R0, 123456
binary ORs the source with the destination.
1010 XOR
11 1010 00 001 00 000 XOR R1, R0
11 1010 00 001 01 000 XOR R1, @R0
11 1010 00 001 10 000 XOR R1, @R0+
11 1010 00 001 11 000 XOR R1, @-R0
11 1010 00 000 10 111 01011101 00100111 XOR R0, 123456
binary XORs the source with the destination.