From dfc1858bb70319bbc7ea3c945d5267b0eccff6c5 Mon Sep 17 00:00:00 2001 From: Lukasz Stafiniak Date: Tue, 8 Oct 2024 12:28:23 +0200 Subject: [PATCH] syntax_extensions.md: fix headings and Table of Contents --- lib/syntax_extensions.md | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/lib/syntax_extensions.md b/lib/syntax_extensions.md index 1fda466d..56fee22b 100644 --- a/lib/syntax_extensions.md +++ b/lib/syntax_extensions.md @@ -1,24 +1,25 @@ -# Syntax extensions `%cd` and `%op` {#syntax-extensions-cd-and-op} +# Syntax extensions %cd and %op - Table of contents - [Preliminaries](#preliminaries) - - [The syntax for %op {#syntax-for-op}](#syntax-for-op) - - [The syntax for %cd](#syntax-for-cd) + - [The syntax for %op](#the-syntax-for-op) + - [The syntax for %cd](#the-syntax-for-cd) - [Numeric and N-dimensional array literals](#numeric-and-n-dimensional-array-literals) - [Wildcard bindings](#wildcard-bindings) - [Inline declarations](#inline-declarations) - [Using OCANNL's generalized einsum notation](#using-ocannls-generalized-einsum-notation) - - [Further features of the syntax extension %cd](#features-of-syntax-cd) + - [Syntax of the generalized einsum notation](#syntax-of-the-generalized-einsum-notation) + - [Further features of the syntax extension %cd](#further-features-of-the-syntax-extension-cd) - [Referencing arrays: tensor value, tensor gradient, merge buffer of a tensor node](#referencing-arrays-tensor-value-tensor-gradient-merge-buffer-of-a-tensor-node) - [Block comments](#block-comments) - - [Further features of the syntax extension %op](#features-of-syntax-op) + - [Further features of the syntax extension %op](#further-features-of-the-syntax-extension-op) - [Name from binding](#name-from-binding) - [Label from function argument](#label-from-function-argument) - [Configuring inline declarations: inline output dimensions, initial values](#configuring-inline-declarations-inline-output-dimensions-initial-values) - - [Lifting of the applications of ~config arguments: if it's an error, refactor your code](#lifting-of-the-applications-of-config-arguments-if-its-an-error-refactor-your-code) + - [Lifting of the applications of config arguments: if an error, refactor your code](#lifting-of-the-applications-of-config-arguments-if-an-error-refactor-your-code) - [Implementation details](#implementation-details) - [The hard-coded to-the-power-of operator](#the-hard-coded-to-the-power-of-operator) - - [Intricacies of the syntax extension %cd](#implementation-extension-cd) + - [Intricacies of the syntax extension %cd](#intricacies-of-the-syntax-extension-cd) - In a nutshell - Syntax extension `%cd` stands for "code", to express assignments and computations: `Assignments.comp`. - Syntax extension `%op` stands for "operation", to express tensors: `Tensor.t`. @@ -36,7 +37,7 @@ Functions inside `Operation.NTDSL` use `~grad_spec:Prohibit_grad` when calling i The extension points open `NTDSL.O`, resp. `TDSL.O`, for the scope of the extension point, to expose the corresponding operators. -## The syntax for `%op` {#syntax-for-op} +## The syntax for %op The `%op` syntax is simpler than the `%cd` syntax since it relies more on regular OCaml expressions. For example, we can write without syntax extensions: @@ -70,7 +71,7 @@ When there is a function directly under the `%op` extension point, like in the e When the declaration is followed by a literal float, the float provides the initial value to initialize the tensor. Otherwise, the tensor value cells are initialized randomly with uniform distribution. -## The syntax for `%cd` {#syntax-for-cd} +## The syntax for %cd The basic building blocks of the `%cd` syntax are individual assignments, separated by semicolons. The assignments, represented via `Assignments.Accum_binop` and `Assignments.Accum_unop`, are in full generality accumulating: @@ -98,7 +99,7 @@ type Assignments.t = For example the binary case in pseudocode: `if initialize_neutral then lhs = 0; lhs = lhs accum (rhs1 op rhs2)` (assuming the neutral element of `accum` is 0). The representation also has a field `projections` which determines which loops should be run and how the tensor nodes should be indexed to perform the computation. -The basic `%cd` syntax for binary operator assignments has the form: ` ` (or ` ` when `` is not an operator). The binary operators in the ` ` part have a straightfowrad syntax: `` is one of `+`, `-`, `*`, `/`, `**` (to-power-of), `-?/` (ReLU-Gate). `` starts with `=`, followed by `:` only if `initialize_neutral` is true, then followed by one of `+`, `-`, `*`, `/`, `**`, `?/`. The fields ``, ``, `` will often be either special-purpose identifiers (e.g. `t`, `t1`, `t2`, `g`, `g1`, `g2`) or identifiers bound to tensors. ``, `` will also often be (non-differentiable) tensor expressions. The notation `.grad` stands for the gradient node of the given tensor. For more about "slot fillers", and to learn about the operators `*+` and `++`, see the section [further features of the syntax extension %cd](#features-of-syntax-cd). +The basic `%cd` syntax for binary operator assignments has the form: ` ` (or ` ` when `` is not an operator). The binary operators in the ` ` part have a straightfowrad syntax: `` is one of `+`, `-`, `*`, `/`, `**` (to-power-of), `-?/` (ReLU-Gate). `` starts with `=`, followed by `:` only if `initialize_neutral` is true, then followed by one of `+`, `-`, `*`, `/`, `**`, `?/`. The fields ``, ``, `` will often be either special-purpose identifiers (e.g. `t`, `t1`, `t2`, `g`, `g1`, `g2`) or identifiers bound to tensors. ``, `` will also often be (non-differentiable) tensor expressions. The notation `.grad` stands for the gradient node of the given tensor. For more about "slot fillers", and to learn about the operators `*+` and `++`, see the section [further features of the syntax extension %cd](#further-features-of-the-syntax-extension-cd). How is the `projections` field determined? `projections` can be given explicitly as a labeled argument `~projections`. If they aren't but `%cd` realizes there is a `~projections` parameter in scope, it uses it -- see `lib/operation.ml` where this option is used to define tensor operations. If instead of `~projections` a `~logic` labeled argument is given, the string passed is used to determine projections. `~logic:"."` means a pointwise operation. `~logic:"@"` means an "output axes of rhs2 match input axes of rhs1" operation (matrix multiplication is a special case). `~logic:"T"` means transpose of input and output axes. The string passed to `~logic` can also use OCANNL's generalization of the einsum notation, allowing arbitrary permutations and reductions of axes. If no information is given, the default is a pointwise operation. @@ -190,8 +191,8 @@ The specification syntax has two modes: The syntax of a generalized einsum spec has two variants: -- unary: "\ shape spec `=>` \ shape spec", specifies a unary assignment ` ` (see [syntax for `%cd`](#syntax-for-cd)), -- binary: "\ shape spec `;` \ shape spec `=>` \ shape spec", specifies a binary assignment ` ` (see [syntax for `%cd`](#syntax-for-cd)). +- unary: "\ shape spec `=>` \ shape spec", specifies a unary assignment ` ` (see [syntax for `%cd`](#the-syntax-for-cd)), +- binary: "\ shape spec `;` \ shape spec `=>` \ shape spec", specifies a binary assignment ` ` (see [syntax for `%cd`](#the-syntax-for-cd)). Recall that a tensor _shape_ is composed of three _rows_, i.e. sequences of axes: batch, input and output axes. Correspondingly, a shape spec in the notation can be: @@ -229,7 +230,7 @@ Examples: - `..v..|ijk => ..v..kji`: reverse the three rightmost output axes, reduce any other output and input axes, pointwise for batch axes, pairing the batch axes with the leftmost output axes of the result. - `2..v..|... => ..v..`: slice the tensor at dimension 2 of the leftmost batch axis, reduce all its input and output axes, preserve its other batch axes as output axes. -## Further features of the syntax extension `%cd` {#features-of-syntax-cd} +## Further features of the syntax extension %cd ### Referencing arrays: tensor value, tensor gradient, merge buffer of a tensor node @@ -256,7 +257,7 @@ type Assignments.t = Schematic example: `~~("space" "separated" "comment" "tensor p debug_name:" p; )`. The content of the comment uses application syntax, must be composed of strings, ``, `.value` (equivalent to ``), `.grad` components, where `` is any tensor expression or tensor identifier. -## Further features of the syntax extension `%op` {#features-of-syntax-op} +## Further features of the syntax extension %op ### Name from binding @@ -281,7 +282,7 @@ If it is a list expression following an inline declaration, the expression is pa ... ``` -### Lifting of the applications of `~config` arguments: if it's an error, refactor your code +### Lifting of the applications of config arguments: if an error, refactor your code If you recall, inline declared param tensors get lifted out of functions except for the function `fun ~config ->`, where they get defined. Our example `let%op mlp_layer ~config x = ?/ ("w" * x + "b" config.hid_dim)` translates as: @@ -369,7 +370,7 @@ let rec pointpow ?(label : string list = []) ~grad_spec p t1 : Tensor.t = On the `Tensor` level, this is implemented as a binary tensor operation, but it is exposed as a unary tensor operation! To avoid the complexities of propagating gradient into the exponent, `Operation.pointpow` is implemented as a function of only one tensor, the exponent is a number. We hard-code the pointwise-power-of operator `NTDSL.O.( **. )`, resp. `TDSL.O.( **. )`, in the `%cd` and `%op` syntaxes, to pass the numeric value to `pointpow` (the second argument of `**.`) without converting it to a tensor first. -### Intricacies of the syntax extension `%cd` {#implementation-extension-cd} +### Intricacies of the syntax extension %cd The syntax `%cd` translator needs to accomplish more than a context-free conversion of a concise notation to an `Assignments.t` data-type.