(version: 0.6.0, rev. 38)
A binary scripted patch, hereafter a BSP, is a file containing a script indicating how to convert some data, called the source, into some other data, called the target. In order to achieve this goal, the patcher must execute the script contained in the BSP. This document formally specifies how that execution must be done, and how the BSP file must be interpreted.
The words "byte", "halfword" and "word" throughout this document respectively refer to 8, 16 and 32-bit quantities. All halfword or word quantities are stored, read and written in little-endian form (that is, least significant byte first) unless otherwise indicated.
For clarity of exposition, in some cases, the patch source syntax (as interpreted by this project's compiler) is used. This is done purely for simplicity; the patch source syntax does not constitute part of this specification, and any other implementation of a BSP compiler may use any other syntax or even a completely different language compiled to the BSP format. The BSP format is fully specified by its binary form as documented here.
A BSP engine must be capable of executing a BSP file according to this model. The model describes how the engine behaves and what a BSP can expect of the engine running it.
The execution model consists of the following elements:
- A patch space, which contains the contents of the BSP file; this space is initialized when loading the BSP file. The patch space is immutable (that is, read-only and of fixed size), and it has the same size as the BSP file itself.
- A file buffer, which is the memory space where the operations of the BSP file are carried out. This file buffer is of variable size, and it is initialized with the data of the source file; it has the same size as the source file upon initialization.
- 256 word variables, all of which are initialized to zero. These variables are numbered 0 to 255 and can be referenced by the instructions in the BSP.
- An instruction pointer, which is a word initialized to zero.
- A current file pointer, which is a word initialized to zero and in unlocked state. The current file pointer can be in one of two states: unlocked, in which case it behaves normally as documented here, and locked, which makes it a read-only value; writes or updates to the current file pointer must be silently discarded (without causing any warnings or other kinds of error messages) while in locked state.
- A stack, unbounded in size, that contains word values. It is initialized empty.
- A message buffer, unbounded in size (although engines may limit the size to a reasonable length and discard any data added to it beyond that limit), that is initialized to an empty string.
The patch space and the file buffer have a maximum size of 4,294,967,295 bytes. Attempting to write to the file buffer past its end increases its length (zero-filling any gaps that occur this way); attempting to read from either the file buffer or the patch space beyond their respective ends is an error. Note that executing code involves reading from the patch space, and thus this restriction still applies.
A location in the patch space is referred to as an address, while a location in the file buffer is referred to as a position (therefore, the value of the instruction pointer is an address, and the value of the current file pointer is a position). Both of them are numbered sequentially from zero in the usual way.
All arithmetic is carried out in 32-bit words unless otherwise noted. All values are treated as unsigned other than
when indicated; negative values are simply a notational convenience (e.g., -1 is actually 0xffffffff
, that is, the
same as 4,294,967,295).
All errors should be treated as fatal. If the engine encounters an error condition (such as an invalid opcode, reading past the end, or a division by zero), it should not attempt to resume execution; instead, it should inform the user of the error and abort immediately.
The execution of the patch results in a word value called an exit status, similar in spirit to the exit status of a program in several operating systems. An exit status of zero indicates success; other values must be treated as failure and thus handled as errors. When a BSP exits with a status of zero, the contents of the file buffer must be written out to the target file as the output of the patching process.
Execution of the BSP proceeds one instruction at a time, in the usual way for this format: the byte pointed by the instruction pointer is read, which determines the opcode; the operands (if any) for that opcode are read as well; and the instruction pointer is incremented by the total amount of bytes read; the instruction is then executed. Execution continues this way until the BSP exits (via the corresponding instruction). Note that some instructions will modify the instruction pointer upon execution.
The opcode of each instruction is defined by the first byte. This also defines its length, and the kind of operands it
will take. An instruction can be asigned more than one opcode depending on its arguments. For instance, the set
instruction has two opcodes: one when called with two variables as its operands, and another one when called with a
variable and an immediate (a constant value encoded in the instruction itself).
Variables are always encoded as a single byte indicating the variable number. Immediates (that is, constant numerical operands that are encoded directly in the instruction) are typically words, and they are encoded in little-endian format; the actual width of immediate operands is indicated in the opcode table. Operands are encoded in the order that they appear.
Note that the size of an instruction (in bytes) can be calculated by adding the size of its operands and adding one byte for the opcode itself. It is shown in the table for simplicity.
Opcode bytes that don't appear in the following table constitute undefined instructions, and attempting to execute one of them must be considered a fatal error.
Opcode byte | Instruction | Size | Operands |
---|---|---|---|
0x00 |
nop | 1 | none |
0x01 |
return | 1 | none |
0x02 |
jump | 5 | word |
0x03 |
jump | 2 | variable |
0x04 |
call | 5 | word |
0x05 |
call | 2 | variable |
0x06 |
exit | 5 | word |
0x07 |
exit | 2 | variable |
0x08 |
push | 5 | word |
0x09 |
push | 2 | variable |
0x0a |
pop | 2 | variable |
0x0b |
length | 2 | variable |
0x0c |
readbyte | 2 | variable |
0x0d |
readhalfword | 2 | variable |
0x0e |
readword | 2 | variable |
0x0f |
pos | 2 | variable |
0x10 |
getbyte | 6 | variable, word |
0x11 |
getbyte | 3 | variable, variable |
0x12 |
gethalfword | 6 | variable, word |
0x13 |
gethalfword | 3 | variable, variable |
0x14 |
getword | 6 | variable, word |
0x15 |
getword | 3 | variable, variable |
0x16 |
checksha1 | 6 | variable, word |
0x17 |
checksha1 | 3 | variable, variable |
0x18 |
writebyte | 2 | byte |
0x19 |
writebyte | 2 | variable |
0x1a |
writehalfword | 3 | halfword |
0x1b |
writehalfword | 2 | variable |
0x1c |
writeword | 5 | word |
0x1d |
writeword | 2 | variable |
0x1e |
truncate | 5 | word |
0x1f |
truncate | 2 | variable |
0x20 |
add | 10 | variable, word, word |
0x21 |
add | 7 | variable, word, variable |
0x22 |
add | 7 | variable, variable, word |
0x23 |
add | 4 | variable, variable, variable |
0x24 |
subtract | 10 | variable, word, word |
0x25 |
subtract | 7 | variable, word, variable |
0x26 |
subtract | 7 | variable, variable, word |
0x27 |
subtract | 4 | variable, variable, variable |
0x28 |
multiply | 10 | variable, word, word |
0x29 |
multiply | 7 | variable, word, variable |
0x2a |
multiply | 7 | variable, variable, word |
0x2b |
multiply | 4 | variable, variable, variable |
0x2c |
divide | 10 | variable, word, word |
0x2d |
divide | 7 | variable, word, variable |
0x2e |
divide | 7 | variable, variable, word |
0x2f |
divide | 4 | variable, variable, variable |
0x30 |
remainder | 10 | variable, word, word |
0x31 |
remainder | 7 | variable, word, variable |
0x32 |
remainder | 7 | variable, variable, word |
0x33 |
remainder | 4 | variable, variable, variable |
0x34 |
and | 10 | variable, word, word |
0x35 |
and | 7 | variable, word, variable |
0x36 |
and | 7 | variable, variable, word |
0x37 |
and | 4 | variable, variable, variable |
0x38 |
or | 10 | variable, word, word |
0x39 |
or | 7 | variable, word, variable |
0x3a |
or | 7 | variable, variable, word |
0x3b |
or | 4 | variable, variable, variable |
0x3c |
xor | 10 | variable, word, word |
0x3d |
xor | 7 | variable, word, variable |
0x3e |
xor | 7 | variable, variable, word |
0x3f |
xor | 4 | variable, variable, variable |
0x40 |
iflt | 10 | variable, word, word |
0x41 |
iflt | 7 | variable, word, variable |
0x42 |
iflt | 7 | variable, variable, word |
0x43 |
iflt | 4 | variable, variable, variable |
0x44 |
ifle | 10 | variable, word, word |
0x45 |
ifle | 7 | variable, word, variable |
0x46 |
ifle | 7 | variable, variable, word |
0x47 |
ifle | 4 | variable, variable, variable |
0x48 |
ifgt | 10 | variable, word, word |
0x49 |
ifgt | 7 | variable, word, variable |
0x4a |
ifgt | 7 | variable, variable, word |
0x4b |
ifgt | 4 | variable, variable, variable |
0x4c |
ifge | 10 | variable, word, word |
0x4d |
ifge | 7 | variable, word, variable |
0x4e |
ifge | 7 | variable, variable, word |
0x4f |
ifge | 4 | variable, variable, variable |
0x50 |
ifeq | 10 | variable, word, word |
0x51 |
ifeq | 7 | variable, word, variable |
0x52 |
ifeq | 7 | variable, variable, word |
0x53 |
ifeq | 4 | variable, variable, variable |
0x54 |
ifne | 10 | variable, word, word |
0x55 |
ifne | 7 | variable, word, variable |
0x56 |
ifne | 7 | variable, variable, word |
0x57 |
ifne | 4 | variable, variable, variable |
0x58 |
jumpz | 6 | variable, word |
0x59 |
jumpz | 3 | variable, variable |
0x5a |
jumpnz | 6 | variable, word |
0x5b |
jumpnz | 3 | variable, variable |
0x5c |
callz | 6 | variable, word |
0x5d |
callz | 3 | variable, variable |
0x5e |
callnz | 6 | variable, word |
0x5f |
callnz | 3 | variable, variable |
0x60 |
seek | 5 | word |
0x61 |
seek | 2 | variable |
0x62 |
seekfwd | 5 | word |
0x63 |
seekfwd | 2 | variable |
0x64 |
seekback | 5 | word |
0x65 |
seekback | 2 | variable |
0x66 |
seekend | 5 | word |
0x67 |
seekend | 2 | variable |
0x68 |
5 | word | |
0x69 |
2 | variable | |
0x6a |
menu | 6 | variable, word |
0x6b |
menu | 3 | variable, variable |
0x6c |
xordata | 9 | word, word |
0x6d |
xordata | 6 | word, variable |
0x6e |
xordata | 6 | variable, word |
0x6f |
xordata | 3 | variable, variable |
0x70 |
fillbyte | 6 | word, byte |
0x71 |
fillbyte | 6 | word, variable |
0x72 |
fillbyte | 3 | variable, byte |
0x73 |
fillbyte | 3 | variable, variable |
0x74 |
fillhalfword | 7 | word, halfword |
0x75 |
fillhalfword | 6 | word, variable |
0x76 |
fillhalfword | 4 | variable, halfword |
0x77 |
fillhalfword | 3 | variable, variable |
0x78 |
fillword | 9 | word, word |
0x79 |
fillword | 6 | word, variable |
0x7a |
fillword | 6 | variable, word |
0x7b |
fillword | 3 | variable, variable |
0x7c |
writedata | 9 | word, word |
0x7d |
writedata | 6 | word, variable |
0x7e |
writedata | 6 | variable, word |
0x7f |
writedata | 3 | variable, variable |
0x80 |
lockpos | 1 | none |
0x81 |
unlockpos | 1 | none |
0x82 |
truncatepos | 1 | none |
0x83 |
jumptable | 2 | variable |
0x84 |
set | 6 | variable, word |
0x85 |
set | 3 | variable, variable |
0x86 |
ipspatch | 6 | variable, word |
0x87 |
ipspatch | 3 | variable, variable |
0x88 |
stackwrite | 9 | word, word |
0x89 |
stackwrite | 6 | word, variable |
0x8a |
stackwrite | 6 | variable, word |
0x8b |
stackwrite | 3 | variable, variable |
0x8c |
stackread | 6 | variable, word |
0x8d |
stackread | 3 | variable, variable |
0x8e |
stackshift | 5 | word |
0x8f |
stackshift | 2 | variable |
0x90 |
retz | 2 | variable |
0x91 |
retnz | 2 | variable |
0x92 |
pushpos | 1 | none |
0x93 |
poppos | 1 | none |
0x94 |
bsppatch | 10 | variable, word, word |
0x95 |
bsppatch | 7 | variable, word, variable |
0x96 |
bsppatch | 7 | variable, variable, word |
0x97 |
bsppatch | 4 | variable, variable, variable |
0x98 |
getbyteinc | 3 | variable, variable |
0x99 |
gethalfwordinc | 3 | variable, variable |
0x9a |
getwordinc | 3 | variable, variable |
0x9b |
increment | 2 | variable |
0x9c |
getbytedec | 3 | variable, variable |
0x9d |
gethalfworddec | 3 | variable, variable |
0x9e |
getworddec | 3 | variable, variable |
0x9f |
decrement | 2 | variable |
0xa0 |
bufstring | 5 | word |
0xa1 |
bufstring | 2 | variable |
0xa2 |
bufchar | 5 | word |
0xa3 |
bufchar | 2 | variable |
0xa4 |
bufnumber | 5 | word |
0xa5 |
bufnumber | 2 | variable |
0xa6 |
printbuf | 1 | none |
0xa7 |
clearbuf | 1 | none |
0xa8 |
setstacksize | 5 | word |
0xa9 |
setstacksize | 2 | variable |
0xaa |
getstacksize | 2 | variable |
0xab |
(bit shifts) | -- | (see below) |
0xac |
getfilebyte | 2 | variable |
0xad |
getfilehalfword | 2 | variable |
0xae |
getfileword | 2 | variable |
0xaf |
getvariable | 3 | variable, variable |
0xb0 |
addcarry | 11 | variable, variable, word, word |
0xb1 |
addcarry | 8 | variable, variable, word, variable |
0xb2 |
addcarry | 8 | variable, variable, variable, word |
0xb3 |
addcarry | 5 | variable, variable, variable, variable |
0xb4 |
subborrow | 11 | variable, variable, word, word |
0xb5 |
subborrow | 8 | variable, variable, word, variable |
0xb6 |
subborrow | 8 | variable, variable, variable, word |
0xb7 |
subborrow | 5 | variable, variable, variable, variable |
0xb8 |
longmul | 11 | variable, variable, word, word |
0xb9 |
longmul | 8 | variable, variable, word, variable |
0xba |
longmul | 8 | variable, variable, variable, word |
0xbb |
longmul | 5 | variable, variable, variable, variable |
0xbc |
longmulacum | 11 | variable, variable, word, word |
0xbd |
longmulacum | 8 | variable, variable, word, variable |
0xbe |
longmulacum | 8 | variable, variable, variable, word |
0xbf |
longmulacum | 5 | variable, variable, variable, variable |
Bit shifting opcodes are defined by the byte that follows the opcode byte. The first byte is always 0xab
for these
instructions; the next byte is divided into three bit fields, as follows (where bit 0 is the least significant):
- Bit 7: variable/immediate flag
- Bits 6-5: shift type
- Bits 4-0: shift count
The opcode is determined by the shift type, as follows:
Shift type | Opcode |
---|---|
0 | shiftleft |
1 | shiftright |
2 | rotateleft |
3 | shiftrightarith |
If the shift count is 0, the shift count is read from the lower 5 bits of a variable; a byte will be added after the rest of the operands indicating which variable. Finally, the type of the shift operand is determined by the variable/immediate flag: if the flag is 0, the operand is an immediate; if it is 1, the operand is a variable.
This is summarized in the following table (note that the operand list doesn't include the byte that comes after the opcode byte, with the bitfields as specified above):
V/I flag | Shift count | Size | Operands |
---|---|---|---|
0 | 0 | 8 | variable, word, variable |
0 | not 0 | 7 | variable, word |
1 | 0 | 5 | variable, variable, variable |
1 | not 0 | 4 | variable, variable |
The different instructions are listed here in alphabetical order, for lookup convenience. They are described in separate sections.
- add
- addcarry
- and
- bsppatch
- bufchar
- bufnumber
- bufstring
- call
- callnz
- callz
- checksha1
- clearbuf
- decrement
- divide
- exit
- fillbyte
- fillhalfword
- fillword
- getbyte
- getbytedec
- getbyteinc
- getfilebyte
- getfilehalfword
- getfileword
- gethalfword
- gethalfworddec
- gethalfwordinc
- getstacksize
- getvariable
- getword
- getworddec
- getwordinc
- ifeq
- ifge
- ifgt
- ifle
- iflt
- ifne
- increment
- ipspatch
- jump
- jumpnz
- jumptable
- jumpz
- length
- lockpos
- longmul
- longmulacum
- mulacum (note that this is an alias)
- menu
- multiply
- nop
- or
- pop
- poppos
- pos
- printbuf
- push
- pushpos
- readbyte
- readhalfword
- readword
- remainder
- retnz
- return
- retz
- rotateleft
- seek
- seekback
- seekend
- seekfwd
- set
- setstacksize
- shiftleft
- shiftright
- shiftrightarith
- stackread
- stackshift
- stackwrite
- subborrow
- subtract
- truncate
- truncatepos
- unlockpos
- writebyte
- writedata
- writehalfword
- writeword
- xor
- xordata
Instructions are detailed here, including their operands and semantics.
For simplicity, the script compiler's syntax is used to show the form of the instruction. Operands that must be
variables are prefixed with #
; other operands can be either variables or immediates. (In no case an instruction
only accepts an immediate as an operand.)
nop
This instruction does nothing at all. It can be used, for instance, as a filler.
set #variable, any
increment #variable
decrement #variable
getvariable #variable, #number
The set
instruction sets the variable's value to the specified value, which can be an immediate value or another
variable.
The increment
and decrement
instructions respectively add and subtract one from the specified variable. While they
are equivalent to using add #variable, #variable, 1
or subtract #variable, #variable, 1
, they are available as
shorter forms (that also use fewer bytes in the BSP file).
The getvariable
instruction sets a variable to the value of a variable whose number matches the value of the second
argument. That is, if variable 1 has the value 5, then getvariable #2, #1
would set variable 2 to the value of
variable 5. (Note that the second argument to getvariable
can't be an immediate value, because that would be
equivalent to a set
instruction: getvariable #2, 5
would be the same as set #2, #5
. Also note that only the least
significant byte of the second argument's value is used; the upper bytes are ignored.)
add #variable, any, any
subtract #variable, any, any
multiply #variable, any, any
divide #variable, any, any
remainder #variable, any, any
and #variable, any, any
or #variable, any, any
xor #variable, any, any
These instructions perform the specified calculation between the two last operands and store the result in the variable indicated by the first. The operands to the calculation are treated as unsigned values in all cases.
If the last operand to divide
or remainder
is zero, a fatal error occurs.
Note that the script compiler accepts a two-operand shorthand for these instructions, that is simply expanded to the
full three-operand form by repeating the first operand. (That is, add #var, 3
is converted to add #var, #var, 3
prior to compilation.) This shorthand is a feature of the compiler, not part of the specification for the instructions;
the instructions (in the binary file) can only exist in three-operand form.
shiftleft #variable, any, count
shiftright #variable, any, count
shiftrightarith #variable, any, count
rotateleft #variable, any, count
These instructions shift the value in the operand by the amount of bits specified in the count; bit counts of 1 to 31
bits are allowed. (An immediate bit count of 0 is not valid; in the script compiler language, specifying a bit count of
0 will cause the instruction to be encoded as a set
instruction.) If the bit count is a variable, then only the five
least significant bits of that variable will be used as bit count; the rest are silently truncated. (A bit count of 0
as a result of using a variable bit count is acceptable.)
The shiftleft
and shiftright
instructions shift the value in the corresponding direction, filling the shifted bits
with zeros. The shiftrightarith
instruction behaves like shiftright
, but extends the sign bit (the most significant
bit) instead; if this bit is 1, the shifted bits will be filled with ones. The rotateleft
instruction shifts the
value to the left and inserts the bits that are dropped from the value on the right: a left rotation of 4 bits on the
value 0x12345678
will generate the value 0x23456781
.
Note that no rotateright
instruction exists; the same effect can be achieved by negating the rotation count and using
a rotateleft
instruction.
As with the arithmetical and logical instructions, the script compiler accepts a two-operand shorthand for these
instructions, that is expanded to the full three-operand form by repeating the first operand: shiftleft #var, 2
will
be expanded to shiftleft #var, #var, 2
prior to compilation. This is a feature of the compiler, not part of the
specification for the instructions; the instructions (in the binary file format) only exist in three-operand form.
Note on encoding: the last operand, the bit count, is already encoded in the second byte of the instruction (which selects the opcode and the operand types) when it is an immediate, and therefore will be out of order. (When it is a variable, it will be encoded explicitly as the last byte of the instruction, as usual.)
addcarry #result, #carry, any, any
subborrow #result, #borrow, any, any
longmul #low, #high, any, any
longmulacum #low, #high, any, any
These instructions perform calculations that are similar to their regular counterparts, but with special care for results that do not fit in 32 bits.
The addcarry
instruction adds the last two operands and stores the result in the variable specified by the first. If
the addition carries (that is, if the result, as a 32-bit unsigned value, is strictly lower than the operands), the
variable specified by the second operand is incremented by one. (Note that the "strictly lower than its operands"
condition can only be fulfilled for both operands or for neither, so testing against any one of the operands is
sufficient.)
The subborrow
instruction performs a subtraction in a similar way, storing the result in the variable specified by
the first operand. If the subtraction borrows (that is, if the minuend (the instruction's third operand) as a 32-bit
unsigned value is strictly lower than the subtrahend (the instruction's last operand)), the variable specified by the
second operand is decremented by one.
The longmul
instruction performs a 64-bit (unsigned) multiplication between its operands, and stores the lower word
in the variable specified by the first operand and the higher word in the variable specified by the second. For
instance, longmul #1, #2, 0x12345678, 0x87654321
would set variable 1 to 0x70b88d78
and variable 2 to 0x09a0cd05
.
(Note that 0x12345678
times 0x87654321
is 0x09a0cd0570b88d78
.)
The longmulacum
instruction performs a 64-bit multiplication in the same way, but it adds the result to the 64-bit
value contained between the variables in the first two operands. For instance, if variable 1 contains 0xfedcba98
and
variable 2 contains 0x76543210
, then longmulacum #1, #2, 0x01234567, 0x89abcdef
would set variable 1 to
0xc82b00c1
and variable 2 to 0x76f0d5ae
. (Note that 0x76543210fedcba98
+ (0x01234567
* 0x89abcdef
) is
0x76f0d5aec82b00c1
.)
In case that the first two operands point to the same variable, these instructions behave as follows:
- The
addcarry
andsubborrow
instructions will increment/decrement the variable in case of carry/borrow, but not store the actual result of the addition/subtraction anywhere; the result is discarded. - The
longmul
instruction will store the high word of the result in the variable. The low word of the result is discarded. - The
longmulacum
instruction will store the low word of the result in the variable. The high word of the result is discarded. Note that this intentionally differs from the behavior of the other instructions.
Due to the special behavior of the longmulacum
instruction when the first two operands are the same variable, the
compiler has a shorthand for this instruction, called mulacum
; this instruction takes three operands, and it is
expanded to longmulacum
by repeating the first operand (which must be a variable). Note that this is a feature of the
compiler, not part of the specification itself; the instruction in its binary form exists in four-operand form only.
push any
pop #variable
These instructions perform basic stack manipulations. The push
instruction pushes a value into the stack (which can
be either a variable or a word immediate), and the pop
instruction pops a value from the stack and stores it in the
specified variable.
If the pop
instruction is executed with an empty stack, a fatal error occurs.
jump address
jumpz #variable, address
jumpnz #variable, address
call address
callz #variable, address
callnz #variable, address
return
retz #variable
retnz #variable
The jump
instruction updates the instruction pointer, setting it to the value in its operand, which causes control to
jump to the instruction pointed by it. (Note that the address operands in these instructions can be immediate addresses
or variables.)
The call
instruction does the same, but it first pushes the current value of the instruction pointer (which points to
the next instruction) to the stack. The return
instruction pops a value from the stack and sets the instruction
pointer to it, thus returning from a prior call; executing return
with an empty stack is equivalent to exit 0
.
The z
versions of the instructions above execute conditionally based on the value of a variable: they execute if the
variable is zero, or do nothing otherwise. The nz
versions invert this condition.
ifeq #variable, any, address
ifne #variable, any, address
iflt #variable, any, address
ifle #variable, any, address
ifgt #variable, any, address
ifge #variable, any, address
These instructions conditionally jump to the specified address, based on the condition specified by the instruction itself. Respectively, the conditions are that the variable is equal, not equal, less than, less than or equal, greater than, and greater than or equal than the second argument to the instruction.
exit any
This instruction terminates the execution of the script. The operand to this instruction is the exit status of the BSP: zero indicates success, and any other value indicates failure (the engine may use this value in any way it wishes as long as it follows this convention). The engine must write out the contents of the file buffer to the output (the patch target) upon exit with a status of zero.
If the current BSP is being executed as a result of a parent script executing a bsppatch
instruction, the argument to
exit becomes the exit status that the parent bsppatch
instruction will store. In this case, execution resumes with
the parent script; the engine must not terminate execution (regardless of exit status) and must not write to the output
(again, regardless of exit status) upon exit of a script invoked by bsppatch
.
print address
This instruction prints a message to the user. The address parameter indicates where in the patch space the message
begins; the message must be UTF-8 encoded, and continues until a null byte (a byte with value 0x00
) is found.
This document does not specify how the engine will display the message; however, in case of a console-based application (or an environment that behaves in a similar fashion), it is recommended that the engine prints a newline character after the message.
Further considerations regarding message strings are given in the String handling section.
bufstring address
bufchar any
bufnumber any
printbuf
clearbuf
The first three instructions concatenate data at the end of the message buffer. The last two display it and clear it.
The bufstring
instruction concatenates a string (in the same format as for the print
instruction) at the end of
the message buffer. No separator is inserted before or after the string.
The bufchar
instruction appends a single Unicode character to the message buffer. The value passed to the bufchar
instruction as an argument must represent a valid non-surrogate Unicode codepoint (i.e., it must be between 0x000000
and 0x00d7ff
, or 0x00e000
and 0x10ffff
); passing a value outside of those ranges is a fatal error. Values above
0x1fffff
are reserved for further versions of the specification.
The bufnumber
instruction appends the decimal representation of a number to the message buffer. The number is
treated as a 32-bit unsigned value and converted to decimal, and printed using the regular digit characters (0-9,
U+0030
to U+0039
). The shortest representation of the number is used; that is, no leading zeros are printed (other
than a single 0
for the number zero itself).
The printbuf
instruction prints the contents of the message buffer as a message (as if it had been passed to the
print
instruction) and clears the buffer, resetting it to the empty string. The clearbuf
instruction resets the
message buffer to the empty string without printing it.
Further considerations regarding the message buffer are given in the String handling section.
menu #variable, address
This instruction displays a menu with options, from which the user has to select one. The selected option number is written to the indicated variable; the first option is numbered zero.
The address parameter should point to a list of pointers in patch space, each pointer pointing in turn to the text that
each option will contain (in the same format that the print
instruction uses); the list is terminated with a
0xffffffff
value. For instance, this code fragment would display a menu with four options:
menu #result, Options
; ...
Options:
dw .zero
dw .one
dw .two
dw .three
dw -1
.zero
string "Option 0"
.one
string "Option 1"
.two
string "Option 2"
.three
string "Option 3"
If the list of pointers is empty (i.e., if the first pointer is 0xffffffff
), no menu is shown to the user, and the
variable is set to 0xffffffff
.
Further considerations regarding the text used as option labels are given in the String handling section.
Note that a menu with just one option must still be shown to the user, as it is possible to use such a menu to give the user the possibility of aborting the process by stopping the BSP engine.
jumptable #variable
This instruction expects to be followed by a list of pointers, as many as needed according to the values the variable
may have. The instruction will read the word at instruction pointer + 4 * #variable
and jump to the address that is
read. Note that no bounds checking is performed on the value of the variable, so the script must ensure that the
instruction will jump to a correct location.
If the address calculation performed above results in an address that goes beyond the end of patch space, including the case where either the multiplication or the (unsigned) addition result in a value that doesn't fit in 32 bits, a fatal error occurs.
For instance, the following code will jump to .zero
, .one
or .two
depending on whether the value of #var
is 0,
1 or 2 respectively (if #var
is 3 or greater, the result is undefined):
jumptable #var
dw .zero
dw .one
dw .two
.zero
; ...
.one
; ...
.two
; ...
stackread #variable, position
stackwrite position, any
stackshift amount
These instructions operate directly on the values in stack.
The stackread
instruction reads a value from the stack into a variable, and the stackwrite
instruction writes a
value to a position in stack. For both instructions, the position argument is treated as a signed value: if it is
positive, it refers to a position starting from the bottom of the stack (that is, where values would be pushed or
popped next); and if it is negative, it refers to a position starting from the top of the stack (that is, where the
very first value in the stack was pushed).
Positive positions in stack are numbered from the current stack position upwards: the value that would be popped next is position 0, the following one in stack is position 1, and so on. Negative positions in stack are numbered downwards from the stack top: the very first value in the stack is position -1, the one pushed after that one is position -2, and so on. Attempting to access a position in stack that doesn't exist (that is, a position that is greater or equal than the number of elements in the stack, or less than minus that amount) is a fatal error.
The stackshift
instruction performs a mass push/pop, changing the stack size. The argument to this instruction is
also treated as a signed value: if it is positive, as many zeros as the argument indicates are pushed into the stack,
making it larger; if it is negative, as many elements as the absolute value of the argument indicates are popped and
stored nowhere, making the stack smaller (it is a fatal error to attempt to pop more values than the stack currently
holds). An argument of zero makes the instruction do nothing.
getstacksize #variable
setstacksize any
These instructions manipulate the size of the stack directly.
The getstacksize
instruction stores the size of the stack (in number of words) in the specified variable. If the size
of the stack is too large to be stored in a word, then 0xffffffff
is stored.
The setstacksize
instruction changes the size of the stack to be the specified value. If this value is larger than
the current size of the stack, enough zeros are pushed to make the stack as large as needed; if this value is smaller
than the current size of the stack, enough values are popped (and stored nowhere) as to reduce the stack to the
specified size.
getbyte #variable, address
getbyteinc #variable, #address
getbytedec #variable, #address
gethalfword #variable, address
gethalfwordinc #variable, #address
gethalfworddec #variable, #address
getword #variable, address
getwordinc #variable, #address
getworddec #variable, #address
These instructions are used to fetch values from the BSP itself. They read a value from patch space and store it in a variable. The address indicates where the value is located; halfwords and words are read in little-endian form.
The inc
and dec
variants respectively increment and decrement the address variable by the size of the value (1, 2
or 4) after reading from that address. For these instructions, the address must be a variable; it cannot be an
immediate address. (Note that a reference to a label is compiled as an immediate.) If both arguments to one of these
instructions refer to the same variable, the instruction behaves as its regular counterpart, setting the variable to
the value at the address and disregarding the increment.
readbyte #variable
readhalfword #variable
readword #variable
getfilebyte #variable
getfilehalfword #variable
getfileword #variable
The first group of instructions reads a value from the file buffer, located at the position given by the current file pointer. The current file pointer itself is incremented by the size of the value read (1, 2 or 4) after reading the value, unless it is in locked state. Halfwords and words are read in little-endian form.
The instructions in the second group perform the same task as their counterparts in the first group, but they don't update the current file pointer after reading, regardless of whether it is in locked or unlocked state.
It is a fatal error to attempt to read beyond the end of the file buffer.
writebyte any
writehalfword any
writeword any
These instructions write a value to the file buffer, at the position indicated by the current file pointer. The current file pointer itself is incremented by the size of the value written (1, 2 or 4) after writing the value, unless it is in locked state. Halfwords and words are written in little-endian form.
If the instruction attempts to write past the end of the file buffer, the file buffer is extended to accomodate the new value. If this results in a gap of uninitialized data, this gap is filled with zeros.
Note that the writebyte
and writehalfword
instructions take respectively a byte and a halfword argument; the upper
bytes of the value are silently ignored.
fillbyte count, any
fillhalfword count, any
fillword count, any
These instructions repeatedly write the value indicated by their second argument as many times as the first argument
indicates. All considerations about the current file pointer apply as in the write
instructions; namely, this causes
the current file pointer to be incremented accordingly.
Attempting to write past the end of the file buffer behaves as with the write
instructions.
Note that the fillbyte
and fillhalfword
respectively take a byte and a halfword as their second argument; the upper
bytes of the value passed there are silently ignored.
writedata address, length
xordata address, length
The writedata
instruction writes data (from the patch space, starting at the location indicated by the address
argument) into the file buffer, starting at the current file pointer and going for as many bytes as the length argument
indicates. All considerations about the current file pointer apply as in the other write
instructions; namely, this
causes the current file pointer to be incremented accordingly.
The xordata
instruction applies a XOR operation between the data currently in the file buffer and the data pointed to
by the address in the patch space.
Attempting to write past the end of the file buffer behaves as with the write
instructions. Note that xordata
behaves identically to writedata
in this case.
Also note that, since these instructions read from the patch space, they may attempt to read beyond its end; this is a fatal error.
length #variable
This instruction stores the current length of the file buffer in the specified variable.
truncate any
truncatepos
The truncate
instruction changes the length of the file buffer to the specified value. If the file buffer is made
shorter this way, the data at the end of it is lost; if it is made longer, the gap at the end is filled with zeros.
Note that this instruction does not update the current file pointer, even if it would point beyond the end of the file
buffer after execution.
The truncatepos
instruction changes the length of the file buffer to make it end at the current file pointer; that
is, it sets the length of the file buffer to the value of the current file pointer, causing the current file pointer to
point to the end of the file buffer.
pos #variable
lockpos
unlockpos
pushpos
poppos
The pos
instruction stores the value of the current file pointer in the specified variable.
The lockpos
and unlockpos
instructions set the current file pointer's state accordingly.
The pushpos
instruction pushes the value of the current file pointer into the stack, and the poppos
instruction
pops a value from the stack and sets the current file pointer to it. Note that it is a fatal error to execute poppos
with an empty stack. Also note that poppos
does not update the current file pointer if it is in locked state
(although it still pops a value, thus behaving as a dummy pop, or stackshift -1
).
seek any
seekfwd any
seekback any
seekend any
The seek
instruction sets the current file pointer to the specified value. The seekfwd
and seekback
instructions
respectively add and subtract the specified value from the current file pointer. The seekend
instruction subtracts
the specified value from the file buffer length and sets the current file pointer to the result.
It is a fatal error to cause the calculations performed by the last three instructions to overflow.
Note that no bounds checking is performed on the current file pointer: it is allowed to point to a position beyond the end of the file buffer by any amount. Also note that these instructions do nothing if the current file pointer is in locked state.
checksha1 #variable, address
This instruction calculates the SHA-1 hash of the current contents of the file buffer (the current file pointer is not read or updated by this instruction), and compares it to the hash stored in patch space at the specified address. The hash is stored as a 20-byte value, most significant byte first for convenience. The hash is to be calculated as specified by the formal specification in RFC 3174.
The comparison returns a bit mask, which sets a bit for each byte that fails the comparison: bit 0 is set if the first
byte of the real hash differs from the first byte of the compared hash, and so on. For instance, knowing that the SHA-1
hash for an empty input is da39a3ee5e6b4b0d3255bfef95601890afd80709
, the following script:
checksha1 #result, .hash
; ...
.hash
hexdata ff39a3ee5eff4b0d3255bfef95601890afd80709
when executed with an empty file buffer would set #result
to 0x00000021
. (Notice that the first and sixth bytes of
the hash have been replaced with 0xff
bytes in the hash provided to .hash
.)
Also notice that the instruction will set the variable to zero if (and only if) the hash matches in full.
If this instruction is executed multiple times (with any arguments), the engine is not required to recalculate the hash as long as the contents of the file buffer haven't changed since the last time the hash was calculated; since it is known that the hash will come out to the same value, the engine may simply reuse this value for efficiency.
ipspatch #variable, address
This instruction applies an IPS patch, embedded in the BSP itself (that is, read from patch space), to the current file
buffer. The current file pointer is taken as an offset to be added to all positions in the IPS patch itself (if this is
undesirable, execute a seek 0
instruction beforehand), but it is not updated by this instruction.
The IPS patch is located in patch space, at the address specified by the corresponding argument to the instruction.
After applying the patch, the variable passed to this instruction is set to point to the address where the IPS patch
ends (that is, to the end of the patch, right after the EOF
marker).
Note that, as with any other instruction that writes to the file buffer, writing past its end will extend it accordingly and zero-fill any gaps that arise; also, as with any other instruction that reads from patch space, reading past its end is a fatal error.
The IPS patching procedure (as implemented by this specification, in compliance with the IPS specification itself) is defined as follows:
- Read the first five bytes of the patch; they must be equal to the hexadecimal string
50 41 54 43 48
(representing the ASCII stringPATCH
). If these bytes don't match, a fatal error occurs. - Read three bytes and interpret them as a 24-bit big-endian unsigned value. If this value is equal to
0x454f46
(which happens to be the ASCII stringEOF
), the patch is done (and the variable passed to theipspatch
instruction must be set to the address that would be read next); otherwise, continue with the next step. - Add this value to the current file pointer to determine where to begin writing. (Note that this step is specific to this specification, and not part of the IPS format itself.)
- Read two bytes and interpret them as a 16-bit big-endian unsigned value; this is the amount of data to write.
- If the value read in the previous step is not zero, read as many bytes as that value indicates and write those bytes (in the same order) to the file buffer at the position calculated in step 3; then go back to step 2.
- Otherwise (that is, if the value read in step 4 was zero), read another 16-bit unsigned value (representing the count) followed by a single byte; repeatedly write this byte as many times as the count specifies starting at the position calculated in step 3, and then go back to step 2.
bsppatch #variable, address, length
This instruction executes a BSP contained within the BSP. That is, it is used to create a child BSP that is forked from the current one and executed separately; the parent is suspended until the child finishes.
The child BSP has its own variables, stack, instruction pointer, message buffer and patch space, but it shares the file
buffer and current file pointer with the parent; the file buffer and the current file pointer (including its state) are
inherited from the parent script (the one executing the bsppatch
instruction), and they can be modified freely by the
child BSP (retaining the modifications when it exits).
The child BSP's patch space is created with the length specified in the bsppatch
instruction, and filled with data
read from the parent's patch space starting at the specified address; a fatal error occurs if this causes the engine to
read past the end of the parent's patch space. The child BSP's variables and instruction pointer are all initialized to
zero, its message buffer is initialized to the empty string, and its stack is initialized empty, just like when
executing a BSP normally. Execution then resumes with the child, continuing until the child exits; only when the child
exits, the bsppatch
instruction of the parent can be completed. When the child exits, the variable passed to
bsppatch
in the parent is set to the child's exit status.
The engine must not write out to the target file when the child exits, regardless of exit status; it must, instead,
pass the exit status to the parent via the variable passed to bsppatch
. The file buffer and current file position,
being shared resources between the parent and the child, must also conserve their state when execution returns from the
child to the parent.
If the child's execution triggers a fatal error, this fatal error must be propagated to the parent; in other words, a fatal error at any depth must halt the whole engine. Execution of the parent must not be resumed after a fatal error occurs in the child.
Several instructions in this specification deal with strings — namely, the print
and menu
instructions, as well as those that manipulate the message buffer. This section specifies how the engine
must behave when handling strings, and which part of the functionality is implementation-dependent.
Valid strings in the BSP itself must be in UTF-8 format, as specified by RFC 3629, regardless of the
effective output format of the engine. Any UTF-8 decoding errors (such as overlong encodings) must be treated as
fatal, without attempting any recovery; surrogate codepoints (i.e., those between 0x00d800
and 0x00dfff
) must be
treated as fatal errors as well.
Although the engine must accept any valid UTF-8 string, it isn't required to be able to effectively display any
Unicode character; an engine incapable of handling the full Unicode character set may choose to use a reduced one and
replace characters not in its reduced set with zero or more suitable substitution characters. However, an engine is
required to support at least Latin letters (A-Z, a-z), digits (0-9), spaces, and the following punctuation characters:
'-,.;:#%&!?/()[]
. All of these characters are encoded as single UTF-8 bytes, and belong to the following ranges:
0x20
- 0x21
, 0x23
, 0x25
- 0x29
, 0x2c
- 0x3b
, 0x3f
, 0x41
- 0x5b
, 0x5d
, and 0x61
- 0x7a
.
Control characters in strings must be accepted, as they are valid UTF-8 characters; they are also valid arguments to
the bufchar
instruction. (In particular, 0
is a valid argument to bufchar
, and therefore must not be treated as
a string terminator in that context.) However, since they are not in the ranges listed in the previous paragraph,
engines are not required to support them; control characters may be ignored (i.e., substituted by nothing) when the
string (or the message buffer) is displayed to the user, or handled in any other appropriate way.
The engine may enforce a limit on the number of bytes and/or characters that the message buffer can accept; this limit
may also be dynamically determined during execution. If such a limit is enforced, characters and/or bytes in excess
must be silently discarded without error; the engine must take care to discard multibyte characters as a whole, and
not only some of their bytes. (For instance, if the last character to be added to the buffer is codepoint 0x0000a0
,
encoded as 0xc2
0xa0
, the engine may keep both bytes or discard them both, but it must not discard just the last
byte.) If any of the instructions that append to the message buffer (i.e., bufstring
, bufchar
or bufnumber
)
cause some data to be silently discarded due to the buffer being full, any further such instructions must be silently
ignored (i.e., wholly discarded) until the buffer is cleared via the printbuf
or clearbuf
instructions.
The engine may enforce a similar limit on the number of bytes and/or characters to be printed by a single print
instruction, as well as a maximum length for option labels for the menu
instruction. Any text exceeding these limits
must be silently truncated as given in the previous paragraph.
Invalid UTF-8 strings given as arguments to the bufstring
instruction cause a fatal error; this error may occur at
the time of executing that instruction, or when executing any further instruction that manipulates the message buffer,
up to the point where the message buffer is printed via the printbuf
instruction. If the message buffer is never
printed, the error may occur up to the point where the message buffer is cleared (either via the clearbuf
instruction or due to terminating execution) or not at all; this is implementation-defined.
Multibyte UTF-8 characters appended to the message buffer via bufstring
instructions must be fully contained within
a single string; if two or more consecutive instructions append parts of a multibyte UTF-8 character that build up to
a valid character, the engine may accept those parts as a whole character or trigger a fatal error. For instance, the
following snippet:
bufstring .first
bufstring .second
; ...
; UTF encoding of U+00A0: 0xc2, 0xa0
.first
db 0xc2, 0
.second
db 0xa0, 0
may either append a 0x0000a0
codepoint to the message or cause a fatal error.