This document will go over the most basic concepts that you will need to know. This serves as a companion to the official spec at dcpu.com/dcpu-16. If you already understand the official spec or want to skip around, everything in this chapter is supposed to be review.
It is not neccessary to understand binary completely to start out. All that you need to know is that binary is the only way computers think, and that they don't "know" that any data should be interpreted in a specific way other than how the programmer tells them to.
Binary is merely a way to represent numerical values, so essentially everything that the computer knows boils down to one value or another.
Digits in binary are called bits, and the smallest peice of data the DCPU deals with is 16 bits, which means that a peice of information can be any one of 65,536 values, and nothing else. This piece of data is called a word.
Writing sixteen 1's and 0's every time you want to talk about a value would get tedious fast, so programmers often use hexidecimal to represent binary values.
Hexidecimal, or simply hex, is a way to write numbers just like binary and decimal. It is used because it is compatible with binary - each hex digit is 4 bits of information. Decimal does not have such a direct comparison.
Hexidecimal digits can be any one of 16 values, represented by the numbers 0-9 and the letters a-f. The hex digit "a" is equivalent to the decimal number 10, b=11, etc, up to f=15. In order to denote that a given number is written in hex, the prefix 0x
is used.
0x12f4
is equal to 4852
in decimal or 0001 0010 1111 0100
in binary.
In decimal, when you get up to 9 the next digit is 0 and you increment the next column to the left. In hex, after 9 is just a, then b, etc. When you get to f, thats when you go back to 0 and increment the next column.
Decimal: Hex:
8 8
9 9
10 a <---
11 b
12 c
13 d
14 e
15 f
16 10 <---
17 11
18 12
Just like the highest number you can have in 4 digits of decimal is 9999, the highest integer you can represent in 4-digits of hex is 0xffff
, which is 65,535. Since we're counting 0 as a value, thats 65,536 values, just like I mentioned above.
Assembly consists of "simple" computations which are executed one at a time, in order. These are called instructions. Each instruction contains an operation and one or two values.
set a, 12
add a, 4
set b, 2
sub a, b
In the example above there are several different instructions shown, using the set
, add
, and sub
operations. As you can probably guess, add
and sub
perform arithmatic on integers. Set allows you to copy a value to another place, in this example we store two values in two registers, called a
and b
. Registers are like little pockets of memory on the processor which can be accessed quickly.
When you write assembly, like many programming languages, you need to compile it into machine code before the processor can run it. The translation from assembly to machine code is more direct than in other programming languages because each assembly instruction represents an equivalent machine code instruction, and vice-versa. It is a one-to-one translation.
b401 9402 8c21 0403
The above block of machine code, represented in hexadecimal numbers, is equivalent to the assembly code shown before. I've divided the machine code into blocks of 4 hexidecimal digits, for human readability. Each block of 4 digits represents one word (16 bits).
Instructions in DCPU-Assembly take up between 1 and 3 words each, and in this example each instruction is one word. b401
is equivalent to set a, 12
.
I have shown you the above block of machine code to illustrate a point, but it is not important for you to know specifically what machine code something translates into. If you would like to see how your code is being interpreted, the online emulator dcpu.ru is good because it shows line-by-line what machine code is generated by your assembly.
DCPU assembly differs from an actual assembly language in that the operations are all designed to be easy to understand by humans, whereas actual modern processors have instruction sets that are designed to be written by compilers and are not very human-friendly.
set a, 12
The first part of an instruction is the opcode, which represents which operation the processor should be performing. Each operation expects either 1 or 2 values, which represent the data that will be manipluated.
add a, 4
This instruction is like the mathematical equation a = a + 4
. The first value is overwritten with the result of the operation. The second value is never altered. Here, 4 is added to the value in the a
register and the result is stored back in the a
register.
set b, 2
sub a, b
Here, b
is loaded with a value and then subtracted from a
. As before, the result is stored in a
and b
is left untouched.
I will not list every possible opcode here, because it would get tedious. Instead, check out the official documentation at dcpu.com/dcpu-16.
Registers are little peices of memory on the processor which are used to store values for immediate use. The a
and b
from the previous example are registers. The DCPU has 8 general purpose registers and 4 special purpose registers.
###General Purpose Registers:
Because the DCPU is made to be fun and simplified for humans, every register is one word and can be manipulated directly, except IA
which I'll talk about later. Actual CPU achitectures can have differently sized registers which are for specific things, and which are full of little rules and tricks.
A, B, C
X, Y, Z
I, J ; these two are sort of unique
Each of the general purpose registers can be used for any arbitrary usage that the programmer wants to, but I
and J
are unique in that there are two operations which affect them directly.
sti
is an opcode which means "set then increment" it is like set
, but after it sets the value it will increment the i
and j
registers, even if they weren't used in the operation. Likewise, std
means "set then decrement".
sti a, 2 ; sets a to 2 and then increments I and J
std b, 1 ; sets b to 1 and then decrements I and J
There are also 4 other registers which have special meanings.
The special registers have specific uses. The most important to understand at first is PC
. This is the program counter, or instruction pointer, and it points to the location in memory where the current instruction is. After each instruction, pc
is automatically incremented by 1 and then the instruction at that location is executed.
Like pc
, the stack pointer sp
is used to keep track of a location in memory that the processor uses for other operations. In this case it is part of the stack, which is essentially an area of memory that values can be quickly stored in and retrieved from, but only in sequential order. We'll talk about the stack more later.
This is another pointer which we'll talk about later. Essentially a programmer can define a subroutine which gets run when hardware wants to talk to the DCPU or when another subroutine calls an Interrupt. ia
points to the location in memory where that subroutine starts.
The exess register is used when a function overflows the 16-bit possible values. This is a very luxurious function of the DCPU because "real" architectures often only have 1 bit for overflow and it is shared between different uses. This provides 16 whole bits of extra information when an operation overflows, making the DCPU-16 almost a 32-bit processor.
We can use registers to store information temporarily, but when we want something to be more durable we put it in memory so that we can refer to it later. We refer to it by remembering the address in memory that it is stored in.
The DCPU has 2^16 words of memory. Conveniently, a word is also 16 bits. This means that there are enough values in 1 word to refer to each word of memory distinctly. This value is called its address, and basically is the index of that specific word. The first word's address in memory is 0, the next one 1, etc, all the way up to 0xffff.
We can store data in arbitrary addresses by using the square brackets [
and ]
.
set a, 4
sub a, 12
set [0x6656], a
;... later
add [0x6656], 36
The number inside the brackets is interpreted as an address, and the value of memory at that address is used for the operation.
Above, the address 0x6656 is chosen randomly, but you can refer to specific parts of your program by using labels.
While you are writing instructions, you can define a label to refer to a specific place in the code. A label is translated by the compiler into the address of next line of code after the label definition.
set a, 1
:label
add a, 1
set pc, label
Labels are defined by prefixing the label with a semicolon. Some compilers support putting the semicolon after the label, which is how real assembly works, but not all DCPU compilers will recognize those labels at the moment.
In the example above, label
would translate to the number 1. The first instruction, set a, 1
is encoded into the 0th word of memory, and add a, 1
is encoded into the 1st word. When set pc, label
is compiled, it will be translated to set pc, 1
.
By accessing pc
directly, you change the program flow. This example will run in a continuous loop because the instruction pointer keeps getting set to the same value.
Label definitions themselves are not translated into machine code, so the above example would compile into only three words.
One of the special registers that we saw earlier was the stack pointer. The stack pointer stores an address in memory.
Basically, the stack is a group of values that starts at the end of memory and "grows" backwards. When you add an item onto the stack the stack pointer is decreased by 1 and the value is stored in [sp]
.
Reading and writing to the stack is faster than reading and writing to arbitrary areas of memory.
set push, 8
set push, b
set a, pop
set b, pop
The key words push
and pop
act like registers in that you can set them and read values from them, but they aren't. They automatically adjust sp
and read/write [sp]
.
You put values on the stack by setting push, and you get them back using pop. In this example, a
is set to whatever b
was, and b
is set to 8. Values are retrieved in the opposite order that they were put in.
There are two other key words related to the stack, peek
and pick
.
You can access the value at the stack pointer without modifying the stack pointer using peek
. It is exactly like [sp]
. You can access values near the stack pointer by using pick
and a number.
set push, 8
set push, 2
set a, peek ; set a to 2
set a, pick 1 ; set a to 8
When writing any sort of program, it is good practice to leave comments alongside the code in order to describe your intent. This will make your code easier to understand later, if it is not immediately obvious what you are trying to do.
Comments are added to DCPU-16 assembly with the semi-colon ;
character.
set a, 16
sub a, 2
; shl means shift-left,
; and it is can be used to
; multiply integers by
; a power of 2.
shl a, 5
; in this case, 2^5 is 32
; so that was like:
; mul a, 32
You don't need to add comments when your code is obvious, but as you can see the intent of the above example might not have been clear without the comment.