Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(docs): asm functions #1061

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Fixed

- `asm` functions now support full range of Fift-asm syntax: PR [#855](https://github.com/tact-lang/tact/pull/855)
- `asm` functions now support full range of Fift-asm syntax: PR [#855](https://github.com/tact-lang/tact/pull/855), PR [#1061](https://github.com/tact-lang/tact/pull/1061)

- Fix `npm` installations of Tact compiler or any of the packages depending on it by hiding unnecessary post-install runs of `husky`: PR [#870](https://github.com/tact-lang/tact/pull/870)

Expand Down
41 changes: 1 addition & 40 deletions docs/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,8 @@
],
"dictionaries": ["fift-words", "tvm-instructions"],
"words": [
"ADDRAND",
"BBITS",
"BREFS",
"Brujin",
"bocchi",
"Cheatsheet",
"Cheatsheets",
"Comptime",
Expand All @@ -36,65 +34,28 @@
"Komarov",
"Korshakov",
"Laika",
"MYADDR",
"Masterchain",
"Merkle",
"NEWC",
"Neovim",
"Nonterminal",
"Novus",
"Offchain",
"Offchain",
"PLDDICT",
"PLDIX",
"PLDREF",
"PLDSLICEX",
"PLDUX",
"POSIX",
"PUSHINT",
"PUSHREF",
"PUSHSLICE",
"Parens",
"RANDU",
"RAWRESERVE",
"RAWRESERVE",
"REWRITESTDADDR",
"REWRITEVARADDR",
"SBITS",
"SDBEGINSQ",
"SDEMPTY",
"SDSKIPFIRST",
"SEMPTY",
"SENDMSG",
"SENDRAWMSG",
"SETCONTARGS",
"SETINDEXVARQ",
"SETNUMARGS",
"SREFS",
"SREMPTY",
"STBR",
"STDICT",
"STIX",
"STON.fi",
"STOPTREF",
"STREF",
"STSLICER",
"STUX",
"STVARUINT",
"Satoshi",
"Seamus",
"Sedov",
"Stateinit",
"Sánchez",
"THROWIFNOT",
"TIMELOCK",
"Tarjan",
"Timeouted",
"Toncoin",
"Toncoins",
"Topup",
"Trunov",
"UBITSIZE",
"Uninit",
"alnum",
"assgn",
Expand Down
198 changes: 181 additions & 17 deletions docs/src/content/docs/book/functions.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,10 @@ Functions in Tact could be defined in different ways:

* Global static function
* Extension functions
* Mutable functions
* Mutation functions
* Native functions
* [Assembly functions](#asm)
* [Internal functions](/book/contracts#internal-functions)
* Receiver functions
* Getter functions

Expand Down Expand Up @@ -85,9 +87,9 @@ extends fun customPow(self: Int, c: Int): Int {
}
```

## Mutable functions
## Mutation functions

Mutable functions are performing mutation of a value replacing it with an execution result. To perform mutation, the function must change the `self` value.
Mutation functions are performing mutation of a value replacing it with an execution result. To perform mutation, the function must change the `self` value.

```tact
extends mutates fun customPow(self: Int, c: Int) {
Expand All @@ -104,7 +106,7 @@ extends mutates fun customPow(self: Int, c: Int) {
Native functions are direct bindings of FunC functions:

> **Note**
> Native functions could be also mutable and extension ones.
> Native functions could also be mutation and extension ones.

```tact
@name(store_uint)
Expand All @@ -114,6 +116,167 @@ native storeUint(s: Builder, value: Int, bits: Int): Builder;
extends mutates native loadInt(self: Slice, l: Int): Int;
```

## Assembly functions, `asm` {#asm}

<Badge text="Available since Tact 1.5" variant="tip" size="medium"/><p/>

:::caution

These are very advanced functions that require experience and vigilance in both definitions and usage. The logical errors in them are extremely hard to spot, the error messages are abysmal, and type checking isn't currently provided by Tact.

That said, if you know what you're doing, they can offer you the smallest possible gas usage, the best performance and the most control over [TVM][tvm] execution. Remember — with great power comes great responsibility.

:::

Assembly functions (or `asm{:tact}` functions for short) are module-level functions that allow writing [TVM][tvm] assembly directly in Tact. Unlike all other functions, their bodies consist only of [TVM instructions][tvm-instructions], and don't use any [Tact statements](/book/statements).

```tact
// all assembly functions must start with "asm" keyword
// ↓
asm fun answer(): Int { 42 INT }
// ------
// Notice, that the body contains
// only of numbers, strings and TVM instructions
```

### Caveats {#asm-caveats}

[TVM instructions][tvm-instructions] are case-sensitive and are always written in upper case (capital letters).

```tact
/// ERROR!
asm fun bad1(): Cell { mycode }

/// ERROR!
asm fun bad2(): Cell { MyCoDe }

/// 👍
asm fun good(): Cell { MYCODE }
```

It is not necessary to enclose TVM instructions in double quotes. On the contrary, they are then interpreted as strings, which is probably _not_ what you want:

```tact
// Pushes the string "MYCODE" onto the compile-time stack,
// where it gets discarded even before the compute phase starts
asm fun wrongMyCode() { "MYCODE" }

// Invokes the TVM instruction MYCODE during the compute phase,
// which returns the contract code as a Cell
asm fun myCode(): Cell { MYCODE }
```

The syntax for parameters and return values is the same as for other function kinds, but there is one caveat — argument values are pushed to the stack before the function body is executed, and return values are what's left on the stack afterward.

Since the bodies of `asm{:tact}` functions do not contain Tact statements, any direct references to parameters in function bodies will be recognized as [TVM][tvm] instructions, which can easily lead to very obscure error messages.

```tact
/// Simply returns back the value of `x`
asm fun identity(x: Int): Int { }

/// COMPILATION ERROR!
/// The `BOC` is not recognized as a parameter,
/// but instead is interpreted as a non-existent TVM instruction
asm fun bocchiThe(BOC: Cell): Cell { BOC }

/// Loads a signed `len`-bit integer from Slice `s`,
/// and returns it with the remainder of `s`
asm fun sliceLoadInt(s: Slice, len: Int): IntSlice { LDIX }
// ↑ ↑
// | Pushed last, sits on top of the stack
// Pushed first, sits on the bottom of the stack

/// Maps onto values placed by LDIX on the stack
struct IntSlice { a: Int; b: Slice }
// ↑ ↑
// | Pushed last, sits on top of the stack
// Pushed first, sits on the bottom of the stack
novusnota marked this conversation as resolved.
Show resolved Hide resolved
```

The return values are provided bottom-up from the stack and the unused values are discarded.

```tact
// Same function as before, but now we don't use the `IntSlice` Struct
// and instead only take one value from the stack (going bottom-up)
novusnota marked this conversation as resolved.
Show resolved Hide resolved
asm fun sliceLoadInt(s: Slice, len: Int): Int { LDIX }
// ↑
// captures the Int value, discarding
// the Slice one produced by LDIX instruction
```

### Arrangements {#asm-arrangements}

Sometimes it's useful to change the order of arguments pushed to the stack or the order of return values. You can do that with `asm{:tact}` arrangements in the following manner:

```tact
// Changing the order of arguments to match the STDICT signature:
// `c` will be pushed first and get on the bottom of the stack,
// while `self` will be pushed last and get on top of the stack
asm(c self) extends fun asmStoreDict(self: Builder, c: Cell?): Builder { STDICT }
novusnota marked this conversation as resolved.
Show resolved Hide resolved

// Changing the order of return values of LDVARUINT16,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still not clear what the notation -> 1 0 means regarding what happens to the results of LDVARUINT16 in the stack itself. The explanation states that 1 represents the value of stack register 1, etc. but it does not explain the significance of writing them in the order -> 1 0. Probably what needs to be said is that the notation -> 1 0 describes how the contents of the stack will be rearranged, when reading -> 1 0 left-to-right: the contents of register s1 will be placed at the top of the stack, and the contents of register s0 will be placed second-to-top.

One alternative way of explaining could be in terms of removing from the stack: -> 1 0 means that s1 is removed first, followed by s0. Hence, the function returns the Builder in s0 because it was the stack content removed last.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I am having second thoughts on using "removing" because it becomes confusing with what happens with the rest of the stack. For example, suppose that after executing some asm function with declaration -> 2 1 0, we have the 5 element stack (top is leftmost):

a b c d e

Then, -> 0 1 2 means "remove s0, then s1, then s2", so that the stack after removing s0 is:

b c d e

But then, s0 contains now b, when previously b was in s1.

So, probably a better word instead of "removing" would be "read from":

-> 1 0 means that s1 is read from the stack first, followed by s0. Hence, the function returns the Builder in s0 because it was the stack content read last.

Copy link
Member Author

@novusnota novusnota Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing is, as I've just checked in tests, -> 0 1 2 is not about taking or not taking any results, but merely about positioning items for the whatever result type we've specified. Like, if the return type is Int, one can only specify -> 0 and nothing else, even though -> 0 in this case is the same as not writing anything at all. And when the Structs, long Structs (more than 15 entries) or even nested Structs are involved, this is getting complicated.

Thus, my description of s0 matching 0, s1 matching 1 is actually incorrect and has to be rewritten. And I've got to check the cases with long or nested Structs here as well, same as for the "stack calling conventions" bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So, this declaration is incorrect (because it returns only one element):

asm(self len -> 1 0) extends fun asmLoadInt(self: Slice, len: Int): Slice { LDIX }

but this is correct:

asm(self len) extends fun asmLoadInt(self: Slice, len: Int): Int { LDIX }

even though it will discard the Slice result and keep only the Int. Or is this last one also incorrect?

Mmmm.... very confusing indeed. So, when using the notation -> m n p it is not possible to discard values in the result type. I think this is acceptable. It is better to explicitly state all the results than to rely on understanding implicit discards.

Copy link
Member Author

@novusnota novusnota Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First one is incorrect. Second one could've been correct if we had our own backend or if we'd alter FunC generation, but since I tested that it's also incorrect — nothing can be discarded in result type.

It worked for me in previous tests mainly because FunC doesn't perform any checks, and because all asm function bodies are embedded in Fift code.

I had some DROP instructions very deep later on in other asm functions, which unexpectedly (for me) cleared the stack for this one. And I noticed that a little too late.

In the end, this really proves the point of those cautionary paragraphs at the top of the assembly functions description. This stuff is really messy, intertwined and hard to debug (until our own backend for it, of course). But I'll persevere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand and thank you for your effort!

So, let's adapt the explanation so that no discards happen in the result type.

Now, regarding nested structs, structs in arguments and structs with more than 15 fields, if you think that the explanation would become too complex to fit it in the page or that the explanation would become so convoluted because of those exceptional cases, probably it would be better to explain those in a separate page, with a link to that page.

// capturing only the last one as the return value of the whole function
asm(-> 1 0) extends mutates fun asmLoadCoins(self: Slice): Int { LDVARUINT16 }
novusnota marked this conversation as resolved.
Show resolved Hide resolved
// ---
// Notice, that return values are best thought as tuples with indexed access into them
// and not as bottom-up representation of stack values
novusnota marked this conversation as resolved.
Show resolved Hide resolved

// Changing the order of return values while explicitly stating
// the default order of arguments as it is
asm(self len -> 1 0) extends fun asmLoadInt(self: Slice, len: Int): SliceInt { LDIX }

// Used to map onto values placed by LDIX on the stack in reversed order
struct SliceInt { a: Slice; b: Int }
```

Putting the above all together we get:

```tact
fun showcase() {
let b = beginCell()
.storeCoins(42)
.storeInt(27, 10)
.asmStoreDict(emptyMap());

let s = b.asSlice();
let coins = s.asmLoadCoins(); // 42
let sliceInt = s.asmLoadInt(10); // Slice remainder and 27
}
```

### Attributes {#asm-attributes}

The following attributes can be specified:

* `inline{:tact}` — does nothing, since assembly functions cannot be inlined yet.
* [`extends{:tact}`](#extension-function) — makes it an [extension function](#extension-function).
* [`mutates{:tact}`](#mutation-functions) (along with [`extends{:tact}`](#extension-function)) — makes it an [extension mutation function](#mutation-functions).

Those attributes _cannot_ be specified:

* `abstract{:tact}` — assembly functions must have a body defined.
* `virtual{:tact}` and `override{:tact}` — assembly functions cannot be defined within a contract or a trait.
* [`get{:tact}`](#getter-functions) — assembly functions cannot be [getters](#getter-functions).

```tact
/// `Builder.storeCoins()` extension function
asm extends fun storeCoins(self: Builder, value: Int): Builder {
STVARUINT16
}

/// `Slice.skipBits()` extension mutation function
asm extends mutates fun skipBits(self: Slice, l: Int) {
SDSKIPFIRST
}
```

:::note[Useful links:]

[TVM overview in TON Docs][tvm]\
[List of TVM instructions in TON Docs][tvm-instructions]

:::

## Receiver functions

Receiver functions are special functions that are responsible for receiving messages in contracts and could be defined only within a contract or trait.
Expand Down Expand Up @@ -144,32 +307,33 @@ contract Treasure {

<Badge text="Available since Tact 1.6" variant="tip" size="medium"/><p/>

As other functions in TVM contracts, getters have their *unique* associated function selectors which are some integers ids (called *method IDs*).
Some of those integers are reserved for internal purposes, e.g. -4, -3, -2, -1, 0 are reserved IDs and
regular functions (internal to a contract and not callable from outside) are usually numbered by subsequent (small) integers starting from 1.
By default, getters have associated method IDs that are derived from their names using the [CRC16](https://en.wikipedia.org/wiki/Cyclic_redundancy_check) algorithm as follows:
`crc16(<function_name>) & 0xffff) | 0x10000`.
Sometimes this can get you the same method ID for getters with different names.
If this happens, you can either rename some of the contract's getters or
specify the getter's method ID manually as a compile-time expression like so:
Like other functions in TON contracts, getters have their _unique_ associated function selectors, which are $19$-bit signed integer identifiers commonly called _method IDs_.

Method IDs of getters are derived from their names using the [CRC16](https://en.wikipedia.org/wiki/Cyclic_redundancy_check) algorithm as follows: `(crc16(<function_name>) & 0xffff) | 0x10000`. In addition, Tact compiler conditionally reserves some method IDs for use in [getters of supported interfaces](/book/contracts#interfaces), namely: $113617$ for `supported_interfaces`, $115390$ for `lazy_deployment_completed`, and $121275$ for `get_abi_ipfs`.

Sometimes, getters with different names end up with the same method ID. If this happens, you can either rename some of the getters or manually specify the method ID as a [compile-time](/ref/core-comptime) expression like so:

```tact
contract ManualMethodId {
const methodId: Int = 16384 + 42;

get(self.methodId) fun methodId1(): Int {
get(self.methodId)
fun methodId1(): Int {
return self.methodId;
}

get(crc32("crc32") + 42 & 0x3ffff | 0x4000)
fun methodId2(): Int {
return 0;
return crc32("crc32") + 42 & 0x3ffff | 0x4000;
}
}
```

Note that you *cannot* use method IDs that are reserved by TVM and you cannot use some initial positive integers because those will be used as function selectors by the compiler.
Unlike getters, method IDs for [internal functions](/book/contracts#internal-functions) and some special functions are obtained sequentially: integers in the inclusive range from $-4$ to $0$ are given to [certain message handlers](https://docs.ton.org/v3/documentation/smart-contracts/func/docs/functions#special-function-names), while internal functions are numbered with method IDs starting at $1$ and going up to $2^{14} - 1$ inclusive.

Since method IDs are $19$-bit signed integers and some of them are reserved, only the inclusive ranges from $-2^{18}$ to $-5$ and from $2^{14}$ to $2^{18} - 1$ are free to be used by users. To avoid collisions, it's recommended to specify method IDs only in these ranges, avoiding the method IDs of Tact-specific getters mentioned above.

User-specified method IDs are 19-bit signed integers, so you can use integers from $-2^{18}$ to $-5$ and from $2^{14}$ to $2^{18} - 1$.
[slice]: /book/cells#slices

Also, a few method IDs are reserved for the usage by the getters the Tact compiler can insert during compilation, those are 113617, 115390, 121275.
[tvm]: https://docs.ton.org/learn/tvm-instructions/tvm-overview
[tvm-instructions]: https://docs.ton.org/v3/documentation/tvm/instructions
4 changes: 2 additions & 2 deletions docs/src/content/docs/book/import.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Additionally, Tact compiler has a versatile set of standard libraries, which com

:::caution

NOTE: All imported code is combined together with yours, so it's important to avoid name collisions and always double-check the sources!
All imported code is combined together with yours, so it's important to avoid name collisions and always double-check the sources!

:::

Expand Down Expand Up @@ -39,7 +39,7 @@ import "./relative/path/to/the/target/func/file.fc";
import "../subfolder/imported/func/file.fc";
```

But in order to use functions from such file, one has to declare them as `native` functions first. For example, when standard library [@stdlib/dns](/ref/stdlib-dns) uses a `dns.fc` FunC file, it maps FunC functions to Tact ones like so:
But in order to use functions from such file, one has to declare them as `native` functions first. For example, when standard library [`@stdlib/dns`](/ref/stdlib-dns) uses a `dns.fc` FunC file, it maps FunC functions to Tact ones like so:

```tact
// FunC code located in a file right next to the current Tact one:
Expand Down
4 changes: 2 additions & 2 deletions docs/src/content/docs/ref/core-cells.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -942,7 +942,7 @@ fun cautiousParse(payload: Cell): GuessCoin? {
## Struct.fromSlice

```tact
extends fun fromSlice(self: Struct, cell: Slice): Struct;
extends fun fromSlice(self: Struct, slice: Slice): Struct;
```

Extension function for any structure type [Struct][struct].
Expand Down Expand Up @@ -1066,7 +1066,7 @@ fun cautiousParse(payload: Cell): TripleAxe? {


```tact
extends fun fromSlice(self: Message, cell: Slice): Message;
extends fun fromSlice(self: Message, slice: Slice): Message;
```

Extension function for any message type [Message][message].
Expand Down
Loading