There is ~700 instructions in TVM, so it's quite painful to implement tools such as disassembler, decompiler, etc. This repo's goal is to provide machine-readable (JSON) description of TVM instructions: bytecode format, value flow and control flow information.
Codepage | Specification |
---|---|
0 | cp0.json |
JSON Schema of specifications is available in schema.json. Generators such as quicktype.io can be used to generate wrappers for specific programming language.
Based on instructions.csv.
JSON key | Status | Description |
---|---|---|
doc | âś… Implemented | Provides human-readable information about instructions. Can be useful to provide integrated docs to user, for example, in disassembler. |
bytecode | âś… Implemented | Describes instruction encoding. It contains information to build assemblers and disassemblers. |
value_flow | âś… Implemented | Describes how instruction changes current stack. This part of specification allows to analyze how instructions interact with each other, so it becomes possible to implement high-level tools such as decompilers. |
control_flow | âś… Implemented | Describes code flow (operations with cc register). It helps to reconstruct a control flow graph. This part mainly contains semantics of cont_* category instructions. For example, both JMPX and CALLX transfers execution to continuation on stack, but only CALLX returns and JMPX is not. |
aliases | âś… Implemented | Specifies instruction aliases. Can be used to provide to user information about special cases (for example, SWAP is a special case of XCHG_0i with i = 1). |
Convenient way is to add submodule to your tool. This will greatly simplify debugging and upgrading process.
git submodule add https://github.com/hacker-volodya/tvm-spec
However, nothing can stop you from just copying cp0.json
(and schema.json
if necessary).
- tvm-spec-example, tiny TVM disassembler
- tvm-research, collection of tool prototypes with the power of tvm-spec
[
{
"mnemonic": "LDU",
"since_version": 0,
"doc": {
"category": "cell_parse",
"description": "Loads an unsigned `cc+1`-bit integer `x` from _Slice_ `s`.",
"gas": "26",
"fift": "[cc+1] LDU",
"fift_examples": [],
"opcode": "D3cc",
"stack": "s - x s'"
},
"bytecode": {
"tlb": "#D3 cc:uint8",
"prefix": "D3",
"operands": [
{
"name": "c",
"type": "uint",
"size": 8,
"min_value": 0,
"max_value": 255,
"display_hints": [{ "type": "add", "value": 1 }]
}
]
},
"value_flow": {
"inputs": {
"stack": [
{ "type": "simple", "name": "s", "value_types": ["Slice"] }
],
"registers": []
},
"outputs": {
"stack": [
{ "type": "simple", "name": "x", "value_types": ["Integer"] },
{ "type": "simple", "name": "s2", "value_types": ["Slice"] }
],
"registers": []
}
},
"control_flow": { "branches": [], "nobranch": true }
},
{
"mnemonic": "EXECUTE",
"since_version": 0,
"doc": {
"category": "cont_basic",
"description": "_Calls_, or _executes_, continuation `c`.",
"gas": "18",
"fift": "EXECUTE\nCALLX",
"fift_examples": [],
"opcode": "D8",
"stack": "c - "
},
"bytecode": { "tlb": "#D8", "prefix": "D8", "operands": [] },
"value_flow": {
"inputs": {
"stack": [
{ "type": "simple", "name": "c", "value_types": ["Continuation"] }
],
"registers": []
},
"outputs": { "stack": [], "registers": [] }
},
"control_flow": {
"branches": [
{
"type": "variable",
"var_name": "c",
"save": {
"c0": {
"type": "cc",
"save": { "c0": { "type": "register", "index": 0 } }
}
}
}
],
"nobranch": false
}
}
]
Key | Description |
---|---|
mnemonic | How instruction is named in original TVM implementation. Not necessarily unique (currently only DEBUG is not unique). Required. |
since_version | Global version (ConfigParam 8) which enables this instruction. Version 9999 means that instruction has no global version and currently unavailable in mainnet. |
doc | Free-form human-friendly information which should be used for documentation purposes only. Required. |
doc.category | Category of instruction (examples: cont_loops, stack_basic, dict_get). Required. |
doc.description | Free-form markdown description of instruction. Required. |
doc.gas | Free-form description of gas amount used by instruction. Required. |
doc.fift | Free-form fift usage description. Required. |
doc.fift_examples | Array of usage examples in Fift. Each array element is object with fift and description keys. Required. |
doc.opcode | Free-form bytecode format description. Required. |
doc.stack | Free-form description of stack inputs and outputs. Usually the form is [inputs] - [outputs] where [inputs] are consumed stack values and outputs are produced stack values (top of stack is the last value). Required. |
bytecode | Information related to bytecode format of an instruction. Assuming that each instruction has format `prefix |
bytecode.tlb | TL-b bytecode format description. Required. |
bytecode.prefix | Prefix to determine next instruction to parse. It is a hex bitstring as in TL-b (suffixed with _ if bit length is not divisible by 4, trailing '1' + '0' * x must be removed). Required. |
bytecode.operands_range_check | In TVM, it is possible for instructions to have overlapping prefixes, so to determine actual instruction it is required to read next length bits after prefix as uint i and check from <= i <= to . Optional, there is no operands check in case of absence. |
bytecode.operands | Describes how to parse operands. Order of objects in this array represents the actual order of operands in instruction. Required. |
bytecode.operands[i].name | Operand variable name. Allowed chars are a-zA-Z0-9_ , must not begin with digit or underscore and must not end with underscore. Required. |
bytecode.operands[i].type | Type of operand. Must be one of int , uint , ref , pushint_long , subslice . Types are described below. Required. |
bytecode.operands[i].* | Additional options for types, specified below. |
value_flow | Information related to usage of stack and registers by instruction. Required. |
value_flow.inputs | Incoming values constraints. Required. |
value_flow.inputs.registers | Defines how registers is used by instruction. Instruction must not operate on registers other than defined here. Required. |
value_flow.inputs.registers[i] | Register which is access by instruction. |
value_flow.inputs.registers[i].type | Type of register: "constant", "variable", "special". Required. |
value_flow.inputs.registers[i].* | Properties for registers of each type are described below. |
value_flow.inputs.stack | Defines how current stack is used by instruction. Instruction must not operate on stack entries other than defined here. Optional, input stack is unconstrained if absent. |
value_flow.inputs.stack[i] | Stack entry or group of stack entries. |
value_flow.inputs.stack[i].type | Type of stack entry. Can be one of "simple", "const", "conditional", "array". Required. |
value_flow.inputs.stack[i].* | Properties for stack entries of each type are described below. |
value_flow.outputs | Outgoing values constraints. Required. Identical to value_flow.inputs. |
control_flow | Information related to current cc modification by instruction. Required. |
control_flow.branches | Array of possible branches of an instruction. Specifies all possible values of cc after instruction execution. Required. Each branch described by a Continuation object described below. |
control_flow.nobranch | Can this instruction not perform any of specified branches in certain cases (do not modify cc)? Required. If instruction has no control flow, nobranch is true and branches are empty. |
display_hints
property is available for uint
, int
, ref
and subslice
operand types. For ref
and subslice
there is continuation
and dictionary
hint types, and for uint
there is add
, stack
, register
, pushint4
, optional_nargs
and plduz
hints.
display_hints[i].type | Description |
---|---|
continuation | Slice from this operand (type = ref, subslice) is a continuation. |
dictionary | Slice from this operand (type = ref, subslice) is a dictionary with key size defined in size_var property of a hint. |
add | Some constant must be added in order to correctly display this operand (type = uint). [cc+1] LDU <=> { "type": "add", "value": 1 } . |
stack | This operand (type = uint) is a stack entry reference (s0 for example). |
register | This operand (type = uint) is a register reference (c0 for example). |
pushint4 | To display this operand (type = uint), 16 must be substracted, but only if operand value > 10. For example, 0xF displayed as -1, 0xA is 10 and 0xB is -5. |
optional_nargs | 0xF is displayed as -1, other values left untouched (type = uint). |
plduz | This operand (type = uint) is displayed as 32*(c+1) . If value is not divisible by 32 during assembly, error should be raised. |
{
"name": "i",
"type": "uint",
"size": 4,
"min_value": 0,
"max_value": 15,
"display_hints": [{ "type": "stack" }, { "type": "add", "value": 1 }]
}
Type of unsigned size
-bit integer with valid values in range min_value
...max_value
. Arguments size
, min_value
, max_value
of type number
and display_hints
of type array
are required.
{
"name": "i",
"type": "int",
"size": 4,
"min_value": -8,
"max_value": 7,
"display_hints": []
}
Type of signed size
-bit integer with valid values in range min_value
...max_value
. Arguments size
, min_value
, max_value
of type number
and display_hints
of type array
are required.
{
"name": "c",
"type": "ref",
"display_hints": [{ "type": "continuation" }]
}
Type of a single reference. Unlike subslice
with refs_add = 1
, should dereference instead of returning empty cell with a single reference.
{
"name": "x",
"type": "pushint_long"
}
Special type which currently is used only in PUSHINT_LONG
instruction. Consists of 5-bit uint size
and integer of bit size 8 * size + 19
.
{
"name": "slice",
"type": "subslice",
"bits_length_var_size": 5,
"refs_length_var_size": 2,
"bits_padding": 1,
"refs_add": 1,
"completion_tag": true,
"max_bits": 248,
"min_bits": 0,
"max_refs": 4,
"min_refs": 1,
"display_hints": []
}
TLB notation: r:(## 2) xx:(## 5) c:((r + 1) * ^Cell) ssss:((8 * xx + 1) * Bit)
Loads r
uint of size refs_length_var_size
(if present), x
uint of size bits_length_var_size
. Then loads subslice of bit length x * 8 + bits_padding
and ref count r + refs_add
. If completion_tag
argument with value true
is passed, remove completion tag from bitstring (trailing '1' + '0' * x
).
Argument | Description |
---|---|
bits_length_var_size | Size of bit length operand. Required. |
bits_padding | Constant integer value to add to length of bitstring to load. Required. |
refs_length_var_size | Size of ref count operand. Optional, assuming no refs if absent. |
refs_add | Constant integer value to add to ref count. Optional, assuming 0 if absent. |
completion_tag | Boolean flag, tells to remove trailing '1' + '0' * x from bitstring if true. Required. |
max_bits | Hint for maximum bits available to store for this operand. Required. |
min_bits | Hint for minimum bits available to store for this operand. Required. |
max_refs | Hint for maximum refs available to store for this operand. Required. |
min_refs | Hint for minimum refs available to store for this operand. Required. |
display_hints | Set of hints for converters between Asm.fif and bytecode representations. Required. |
{
"type": "simple",
"name": "c",
"value_types": ["Continuation"]
}
Specifies a single stack entry, attaches name
to it and optionally constraints its type by value_types
. Possible types are Integer
, Cell
, Builder
, Slice
, Tuple
, Continuation
, Null
. For example, dict is actually "value_types": ["Slice", "Null"]
as it can be either Slice or Null. Absence of value_types
means that this stack entry may be of any type.
{
"type": "const",
"value": 32,
"value_type": "Integer"
}
Specifies a constant value on stack. value_type
is either Integer
or Null
, with number
type value for Integer
and null
value for Null
.
[
{
"type": "conditional",
"name": "status",
"match": [
{
"value": 0,
"stack": []
},
{
"value": -1,
"stack": [
{
"type": "simple",
"name": "c",
"value_types": ["Cell"]
}
]
}
]
},
{
"type": "simple",
"name": "status",
"value_types": ["Integer"]
}
]
Stack notation: 0 or c -1
[
{
"type": "conditional",
"name": "x",
"match": [
{"value": 0, "stack": []}
],
"else": [
{
"type": "const",
"value": null,
"value_type": "Null"
}
]
},
{
"type": "simple",
"name": "x",
"value_types": ["Integer"]
}
]
Stack notation: x or null x
Depending on value
of variable specified in name
, put values specified in corresponding arm of match
. If else
arm is not present, then input for this conditional may be only of specified values.
{
"type": "array",
"name": "key_message_pairs",
"length_var": "n",
"array_entry": [
{
"type": "simple",
"name": "pk",
"value_types": ["Slice"]
},
{
"type": "simple",
"name": "msg",
"value_types": ["Slice"]
}
]
}
Stack notation: pk_1 msg_1 ... pk_n msg_n
Specifies a bunch of stack entries with length from variable length_var
, usually noted as x_1 ... x_n
. Each part of array, such as x_i
or x_i y_i
is described in array_entry
. Used in tuple, continuation arguments and crypto operations.
- Each variable name is unique across
operands
,value_flow
, andcontrol_flow
sections of each instruction. Assumed that variables are immutable, so if variablex
is defined both in inputs and outputs, it goes to output without any modification. - Value flow describes only
cc
stack usage before the actual jump or call. Subsequent continuations may have a separate stack, so this will be defined in control flow section of this spec.
{
"type": "constant",
"index": 0
}
Represents c{index}
register.
{
"type": "variable",
"var_name": "i"
}
Represents c{{var_name}}
register. Its number is taken from variable var_name
, defined in stack inputs or operands.
{
"type": "special",
"name": "gas"
}
{
"type": "special",
"name": "cstate"
}
Represents gas registers (used by app_gas instructions) and cstate (used by COMMIT instruction).
Each object represents a continuation, which can be constructed in several ways:
- from a variable (operand or stack)
- from cc register
- from c0-c3 registers
- by "extraordinary continuation" constructors (as in tvm.pdf, p.50: "4.1.5. Extraordinary continuations.")
Savelist can be defined using save
property. Keys are c0
, c1
, c2
, c3
and values are continuation objects (in fact, continuation is a recursive type, representing a tree of continuations). Please note that savelist defined here will not override already saved continuation registers (that's the standard behaviour of TVM when saving registers).
{
"type": "variable",
"var_name": "c",
"save": {
"c0": {
"type": "cc",
"save": {
"c0": { "type": "register", "index": 0 }
}
}
}
}
Specifies a variable-backed continuation. Variable may be not referenced previously (in operands
or value_flow
section), assuming that it is defined somewhere in the actual implementation of the instruction. Example of such instruction is DICTIGETJMP
: internally it does dictionary lookup for continuation retrieval, so x
in control flow is expected to be defined in the implementation of DICTIGETJMP
.
{
"type": "cc",
"save": {
"c0": { "type": "register", "index": 0 }
}
}
Specifies a continuation, which is constructed from cc code and save
registers. In C++ implementation this operation is implemented in VmState::extract_cc.
{
"type": "register",
"index": 3,
"save": {
"c0": {
"type": "cc",
"save": {
"c0": { "type": "register", "index": 0 }
}
}
}
}
Specifies continuation from register with index
("index": 3
=> take continuation from c3
).
{
"type": "special",
"name": "repeat",
"args": {
"count": "n",
"body": {
"type": "variable",
"var_name": "c"
},
"after": {
"type": "cc",
"save": {
"c0": { "type": "register", "index": 0 }
}
}
}
}
Specifies extraordinary continuation. save
property does not apply here.
name | args | description |
---|---|---|
repeat | string count; continuation body, after | Count is the name of a count variable. Jumps to body with c0 = repeat(count - 1, body, after) , jumps to after if count <= 0 . |
until | continuation body, after | Pops boolean from stack, jumps to after if it is true , otherwise jumps to body with c0 = until(body, after) . |
while | continuation cond, body, after | Pops boolean from stack, jumps to after if it is false , otherwise jumps to body with c0 = cond and cond.c0 = while(cond, body, after) . |
again | continuation body | Jumps to body with c0 = again(body) . |
pushint | continuation next; integer value | Push value on stack, jump to next . |
{
"mnemonic": "ROT2",
"alias_of": "BLKSWAP",
"doc_fift": "ROT2\n2ROT",
"doc_stack": "a b c d e f - c d e f a b",
"description": "Rotates the three topmost pairs of stack entries.",
"operands": {
"i": 1,
"j": 3
}
}
Key | Description |
---|---|
mnemonic | Alias name. Required. |
alias_of | Mnemonic of aliased instruction. Required. |
doc_fift | Free-form fift usage description. Optional. |
doc_stack | Free-form description of stack inputs and outputs. Usually the form is [inputs] - [outputs] where [inputs] are consumed stack values and outputs are produced stack values (top of stack is the last value). |
description | Free-form markdown description of alias. Optional. |
operands | Values of original instruction operands which are fixed in this alias. Currently it can be integer or slice without references which is represented by string of '0' and '1's. Type should be inferred from original instruction operand loaders. Required. |