Skip to content

How to document the code

Vextrove edited this page May 14, 2020 · 19 revisions

Disassembling the code is easy, and computers are very good at it.

What is difficult is to label and comment the original code, finding the meaning behind the function calls and memory manipulations.

Of course we can just read the disassembled sources in a text editor, understand what the assembly does, and label it accordingly. However this is a slow and difficult process–but, fortunately, there are a few techniques to help us.

Table of contents

See also: Tooling for reverse-engineering

Labelling functions

In the raw disassembly, functions are only referenced by their address. It looks like:

func_003_4563::
    ; This is the function originally located in bank 003, at address 4563

There are a few useful things to do with a raw function address like this:

  • Cross-reference it (if it is called from bank 0)
  • Label the function according to its purpose
  • Document the inner workings of the function

Cross-reference a function

Functions called from bank 0 are usually called by memory address. Once a new bank is disassembled, by reading the code, we can actually infer that an address stands for a function in a specific bank.

For instance, considering the following code:

    ; Switch to bank $03
    ld   a, $03
    ld   [MBC3SelectBank], a
    ; Call function at address $4563
    call $4563

This obviously calls the address $4563 in bank $03. So we can cross-reference the function, and rewrite it as:

    ld   a, BANK(func_003_4563)
    ld   [MBC3SelectBank], a
    call func_003_4563

This is good: now, even if the address of func_003_4563 changes (or even is moved to another bank), the code will still be compiled correctly.

We can do even better: as calling a function in another bank (a "farcall") is done quite often, macros were written to shorten this sequence. So it can be rewritten as:

    callsb func_003_4563

callsb is a macro that will expand to exactly the code above (see macros.asm for more details).

Label a function according to its purpose

Gaining a generic understanding of what a function does as a whole (even without understanding the details) is quite useful: we can give it a name, and find it called in other places.

Some methods for understanding the general purpose of a function:

  • Change the first instruction of the function to be a ret (return), to see what happens when the function is not executed;
  • Use a tool to display the function as C-like code, and read it (like awake; see below);
  • In last resort, read the assembly code.

Labelling data blocks

A binary ROM has no fixed structure. At first glance, it just look like a bag of bytes–without clear separation between code, text, graphics, data… There are no metadata to tell the disassembler which parts are executable code, and which parts are data.

So we have to figure out the data parts ourselves.

By default, the disassembler considers everything as code. But sometimes we see a pattern like this:

    nop                  ; 5EA6: $00
    nop                  ; 5EA7: $00
    rst  $38             ; 5EA8: $FF
    add  a, b            ; 5EA9: $80
    add  a, b            ; 5EAA: $80
	
func_001_5EAB::          
    ld   hl, $5EA6       ; 5EAB:
    add  hl, de          ; 5EAE:
    ldh  a, [hl]         ; 5EB0:

Looking at this automatically-disassembled code, there are several hints that some of this is actually data:

    ; This sequence of code doesn't really make sense:
    ; usually code doesn't contains `nop` instructions,
    ; and adding the same value twice is dubious.
    nop                  ; 5EA6: $00
    nop                  ; 5EA7: $00
    rst  $38             ; 5EA8: $FF
    add  a, b            ; 5EA9: $80
    add  a, b            ; 5EAA: $80
	
func_001_5EAB::
    ; Loading a 2-bytes number into `hl` is a common pattern for accessing a memory address as data
    ld   hl, $5EA6       ; 5EAB:
    ; Also, adding a variable to `hl` is a common pattern for accessing an array-like data structure
    add  hl, de          ; 5EAE:
    ldh  a, [hl]         ; 5EB0:

All these hints point to one $5EA6 being not code, but an array of data.

So we can rewrite it as:

data_001_5EA6::
    db $00, $00, $FF, $80, $80

func_001_5EAB::          
    ld   hl, data_001_5EA6 ; 5EAB:
    add  hl, de            ; 5EAE:
    ldh  a, [hl]           ; 5EB0:

In this version:

  • We labelled the data address with a reference (data_001_5EA6 ) rather than a raw numeric address;
  • We converted the code into data instructions.

We still don't know exactly how this data is used, but at least we know it is data.

Labelling memory locations

A Game Boy game has no local variables. All of the game state is stored in globally accessible RAM areas.

Documenting the meaning behind a RAM address is very useful. It is difficult to understand the purpose of a function that operates on a raw RAM address:

func_020_52C3::
    ld   a, $C408
    inc  a
    ld   $C408, a
    co   $1F
    jp   z, label_020_52CF
    ld   a, $01
    ld   $C409, a
label_020_52CF:
    ret

However, if the can understand how the RAM address is used, it will be much easier to understand what the function is doing.

To understand the purpose of a variable, we can either:

  • search for other places in the code where this variable is used, and how;
  • or use BGB’s debugger to watch the variable changes while the game is running;
  • or even use BGB to change the value of the variable while the game is running, and see what is does.

If we’re lucky, we can eventually understand that $C14B is modified when the Pegasus Boots are charging.

We can then document this finding in wram.asm:

;wram.asm

wPegasusBootsChargeMeter:: ; C14B
  ds 1

We can now perform a Search-and-replace across the whole code base, to give meaning to the memory location:

func_020_52C3::
    ld   a, wPegasusBootsChargeMeter
    inc  a
    ld   wPegasusBootsChargeMeter, a
    cp   $1F
    jp   z, label_020_52CF
    ld   a, $01
    ld   $C409, a
label_020_52CF:
    ret

The purpose of the function now becomes quite clear: it increments the Pegasus boots charge meter.

This even gives us the probable use of $C409: it very probably stores whether the Pegasus Boots are fully charged.

We can now label the extra variable, rename the function according to its purpose, and document its code.

ChargePegasusBoots::
    ; Increment wPegasusBootsChargeMeter
    ld   a, wPegasusBootsChargeMeter
    inc  a
    ld   wPegasusBootsChargeMeter, a
    ; If wPegasusBootsChargeMeter >= $1F…
    cp   $1F
    jp   z, .return
    ; Set the boots as charged
    ld   a, $01
    ld   wPegasusBootsReady, a
.return
    ret

Labelling constant values

#TODO