Skip to content

Commit

Permalink
Adds glossary and consolidates grammar (#359)
Browse files Browse the repository at this point in the history
  • Loading branch information
popematt authored Oct 31, 2024
1 parent c35fc69 commit e5e02a5
Show file tree
Hide file tree
Showing 7 changed files with 288 additions and 125 deletions.
3 changes: 2 additions & 1 deletion _books/ion-1-1/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
- [Shared modules](modules/shared_modules.md)
- [Inner modules](modules/inner_modules.md)
- [The system module](modules/system_module.md)
- [Grammar](modules/grammar.md)
- [Binary encoding](binary/encoding.md)
- [Encoding primitives](binary/primitives.md)
- [`FlexUInt`](binary/primitives/flex_uint.md)
Expand All @@ -39,6 +38,8 @@
- [E-expressions](binary/e_expressions.md)
- [Annotations](binary/annotations.md)
- [NOP](binary/nop.md)
- [Grammar](grammar.md)
- [Glossary](glossary.md)
<!--
The todo.md page is a placeholder target for links we haven't populated yet.
Only pages that are listed in `SUMMARY.md` will be shown to users; todo.md
Expand Down
163 changes: 163 additions & 0 deletions _books/ion-1-1/src/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Glossary

**active encoding module**<br/>
The _encoding module_ whose symbol table and macro table are available in the current _segment_ of an Ion _document_.
The active encoding module is set by a _directive_.

**argument**<br/>
The sub-expression(s) within a macro invocation, corresponding to exactly one of the macro's parameters.

**cardinality**<br/>
Describes both the number of argument expressions that a parameter will accept when the macro is invoked,
and the number of values that the parameter may expand to during evaluation.
A parameter's cardinality can be `zero-or-one`, `exactly-one`, `zero-or-more`, or `one-or-more`,
specified in a signature by one of the modifiers `?`, `!`, `*`, or `+` respectively.
If no modifier is specified, cardinality defaults to `exactly-one`.

**declaration**<br/>
The association of a name with an entity (for example, a module or macro). See also _definition_.
Not all declarations are definitions: some introduce new names for existing entities.

**definition**<br/>
The specification of a new entity.

**directive**<br/>
A keyword or unit of data in an Ion document that affects the encoding environment, and thus the way the document's data is encoded and decoded.
In Ion 1.0 there are two directives: _Ion version markers_, and the _symbol table directives_.
Ion 1.1 adds _encoding directives_.

**document**<br/>
A stream of octets conforming to either the Ion text or binary specification.
Can consist of multiple _segments_, perhaps using varying versions of the Ion specification.
A document does not necessarily exist as a file, and is not necessarily finite.

**E-expression**<br/>
See _encoding expression_.

**encoding directive**<br/>
In an Ion 1.1 segment, a top-level S-Expression annotated with `$ion_encoding`.
Defines a new encoding module for the segment immediately following it.
At the end of the encoding directive, the new _encoding module_ is promoted to be the _active encoding module_.
The _symbol table directive_ is effectively a less capable alternative syntax.

**encoding environment**<br/>
The context-specific data maintained by an Ion implementation while encoding or decoding data. In
Ion 1.0 this consists of the current symbol table; in Ion 1.1 this is expanded to also include the Ion
spec version, the current macro table, and a collection of available modules.

**encoding expression**<br/>
The invocation of a macro in encoded data, aka e-expression.
Starts with a macro reference denoting the function to invoke.
The Ion text format uses "smile syntax" `(:macro ...)` to denote e-expressions.
Ion binary devotes a large number of opcodes to e-expressions, so they can be compact.

**encoding module**<br/>
A _module_ whose symbol table and macro table can be used directly in the user data stream.

**expression**<br/>
A serialized syntax element that may produce values.
_Encoding expressions_ and values are both considered expressions, whereas NOP, comments, and IVMs, for example, are not.

**expression group**<br/>
A grouping of zero or more _expressions_ that together form one _argument_.
The concrete syntax for passing a stream of expressions to a macro parameter.
In a text _e-expression_, a group starts with the trigraph `(::` and ends with `)`, similar to an S-expression.
In _template definition language_, a group is written as an S-expression starting with `..` (two dots).

**inner module**<br/>
A _module_ that is defined inside another module and only visible inside the definition of that module.

**Ion version marker**<br/>
A keyword directive that denotes the start of a new segment encoded with a specific Ion version.
Also known as "IVM".

**macro**<br/>
A transformation function that accepts some number of streams of values, and produces a stream of values.

**macro definition**<br/>
Specifies a macro in terms of a _signature_ and a _template_.

**macro reference**<br/>
Identifies a macro for invocation or exporting. Must always be unambiguous. Lexically
scoped. Cannot be a "forward reference" to a macro that is declared later in the document;
these are not legal.

**module**<br/>
The data entity that defines and exports both symbols and macros.

**opcode**<br/>
A 1-byte, unsigned integer that tells the reader what the next expression represents
and how the bytes that follow should be interpreted.

**optional parameter**<br/>
A parameter that can have its corresponding subform(s) omitted when the macro is invoked.
A parameter is optional if both it and the parameters that follow it in the macro signature can accept an empty stream.

**parameter**<br/>
A named input to a macro, as defined by its signature.
At expansion time a parameter produces a stream of values.

**qualified macro reference**<br/>
A macro reference that consists of a module name and either a macro name exported by that module,
or a numeric address within the range of the module's exported macro table. In TDL, these look
like _module-name_::_name-or-address_.

**required parameter**<br/>
A macro parameter that is not _optional_ and therefore requires an argument at each invocation.

**rest parameter**<br/>
A macro parameter—always the final parameter—declared with `*` or `+` cardinality,
that accepts all remaining individual arguments to the macro as if they were in an implicit _argument group_.
Applies to Ion text and TDL.
Similar to "varargs" parameters in Java and other languages.

**segment**<br/>
A contiguous partition of a _document_ that uses the same _active encoding module_.
Segment boundaries are caused by directives: an IVM starts a new segment (ending the prior segment, if any),
while `$ion_symbol_table` and `$ion_encoding` directives end segments (with a new one starting immediately afterward).

**shared module**<br/>
A module that exists independent of the data stream of an Ion document. It is identified by a
name and version so that it can be imported by other modules.

**signature**<br/>
The part of a macro definition that specifies its "calling convention", in terms of the shape,
type, and cardinality of arguments it accepts.

**symbol table directive**<br/>
A top-level struct annotated with `$ion_symbol_table`. Defines a new encoding environment
without any macros. Valid in Ion 1.0 and 1.1.

**system e-expression**<br/>
An _e-expression_ that invokes a _macro_ from the _system-module_ rather than from the _active encoding module_.

**system macro**<br/>
A macro provided by the Ion implementation via the system module `$ion`.
System macros are available at all points within Ion 1.1 segments.

**system module**<br/>
A standard module named `$ion` that is provided by the Ion implementation, implicitly installed so
that the system symbols and system macros are available at all points within a document.
Subsumes the functionality of the Ion 1.0 system symbol table.

**system symbol**<br/>
A symbol provided by the Ion implementation via the system module `$ion`.
System symbols are available at all points within an Ion document, though the selection of symbols
varies by segment according to its Ion version.

**TDL**<br/>
See _template definition language_.

**template**<br/>
The part of a macro definition that expresses its transformation of inputs to results.

**template definition language**<br/>
An Ion-based, domain-specific language that declaratively specifies the output produced by a _macro_.
Template definition language uses only the Ion data model.

**unqualified macro reference**<br/>
A macro reference that consists of either a macro name or numeric address, without a qualifying module name.
These are resolved using lexical scope and must always be unambiguous.

**variable expansion**<br/>
In _TDL_, a special form that causes all argument expression(s) for the given _parameter_ to be expanded and the result of the expansion to be substituted into the _template_.
114 changes: 114 additions & 0 deletions _books/ion-1-1/src/grammar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Grammar

This chapter presents Ion 1.1's _domain grammar_, by which we mean the grammar of the domain
of values that drive Ion's encoding features.

We use a BNF-like notation for describing various syntactic parts of a document, including Ion data structures.
In such cases, the BNF should be interpreted loosely to accommodate Ion-isms like commas and unconstrained ordering of struct fields.

### Documents
```bnf
document ::= ivm? segment*
ivm ::= '$ion_1_0' | '$ion_1_1'
segment ::= value* directive?
directive ::= ivm
| encoding-directive
| symtab-directive
symtab-directive ::= local-symbol-table ; As per the Ion 1.0 specification¹
encoding-directive ::= '$ion_encoding::(' module-body ')'
```

&nbsp;&nbsp;&nbsp;&nbsp;¹[Symbols – Local Symbol Tables](https://amazon-ion.github.io/ion-docs/docs/symbols.html#local-symbol-tables).

### Modules
```bnf
module-body ::= import* inner-module* symbol-table? macro-table?
shared-module ::= '$ion_shared_module::' ivm '::(' catalog-key module-body ')'
import ::= '(import ' module-name catalog-key ')'
catalog-key ::= catalog-name catalog-version?
catalog-name ::= string
catalog-version ::= unannotated-uint ; must be positive
inner-module ::= '(module' module-name module-body ')'
module-name ::= unannotated-identifier-symbol
symbol-table ::= '(symbol_table' symbol-table-entry* ')'
symbol-table-entry ::= module-name | symbol-list
symbol-list ::= '[' symbol-text* ']'
symbol-text ::= symbol | string
macro-table ::= '(macro_table' macro-table-entry* ')'
macro-table-entry ::= macro-definition
| macro-export
| module-name
macro-export ::= '(export' qualified-macro-ref macro-name-declaration? ')'
```
### Macro references
```bnf
qualified-macro-ref ::= module-name '::' macro-ref
macro-ref ::= macro-name | macro-addr
qualified-macro-name ::= module-name '::' macro-name
macro-name ::= unannotated-identifier-symbol
macro-addr ::= unannotated-uint
```

### Macro definitions
```bnf
macro-definition ::= '(macro' macro-name-declaration signature tdl-expression ')'
macro-name-declaration ::= macro-name | 'null'
signature ::= '(' parameter* ')'
parameter ::= parameter-encoding? parameter-name parameter-cardinality?
parameter-encoding ::= (primitive-encoding-type | macro-name | qualified-macro-name)'::'
primitive-encoding-type ::= 'uint8' | 'uint16' | 'uint32' | 'uint64'
| 'int8' | 'int16' | 'int32' | 'int64'
| 'float16' | 'float32' | 'float64'
| 'flex_int' | 'flex_uint'
| 'flex_sym' | 'flex_string'
parameter-name ::= unannotated-identifier-symbol
parameter-cardinality ::= '!' | '*' | '?' | '+'
tdl-expression ::= operation | variable-expansion | ion-scalar | ion-container
operation ::= macro-invocation | special-form
variable-expansion ::= '(%' variable-name ')'
variable-name ::= unannotated-identifier-symbol
macro-invocation ::= '(.' macro-ref macro-arg* ')'
special-form ::= '(.' '$ion::'? special-form-name tdl-expression* ')'
special-form-name ::= 'for' | 'if_none' | 'if_some' | 'if_single' | 'if_multi'
macro-arg ::= tdl-expression | expression-group
expression-group ::= '(..' tdl-expression* ')'
```
41 changes: 6 additions & 35 deletions _books/ion-1-1/src/macros/defining_macros.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ Syntactically, the signature is an s-expression of [parameter declarations](#mac

### Template definition language (TDL)

The macro's _template_ is a single Ion value that defines how a reader should expand invovations of the macro.
The macro's _template_ is a single Ion value that defines how a reader should expand invocations of the macro.
Ion 1.1 introduces a template definition language (TDL) to express this process in terms of the macro's parameters.
TDL is a small language with only a few constructs.

Expand Down Expand Up @@ -209,23 +209,24 @@ $ion_encoding::(
#### Macro invocations

Macro invocations call an existing macro.
The invoked macro could be a [system macro](system_macros.md), a macro imported from a [shared module](../todo.md), or a macro previously defined in the current scope.
The invoked macro could be a [system macro](system_macros.md), a macro imported from a
[shared module](../modules/shared_modules.md), or a macro previously defined in the current scope.

Syntactically, a macro invocation is an s-expression whose first value is the operator `.` and whose second value is a macro reference.

##### Grammar
```bnf
macro-invocation ::= '(.' macro-ref macro-arg* ')',
macro-invocation ::= '(.' macro-ref macro-arg* ')'
macro-ref ::= (module-name '::')? (macro-name | macro-address)
macro-arg ::= expression | arg-group
macro-arg ::= expression | expression-group
macro-name ::= ion-identifier
macro-address ::= unsigned-ion-integer
arg-group ::= '(::' expression* ')'
expression-group ::= '(..' expression* ')'
```

##### Invocation syntax illustration
Expand Down Expand Up @@ -393,33 +394,3 @@ Special forms are similar to macro invocations, but they have their own expansio
See [_Special forms_](special_forms.md) for the list of special forms and a description of each.

Note that unlike macro expansions, special forms cannot accept argument groups.

#### TDL Grammar
```bnf
expression ::= ion-scalar | ion-ql-container | operation | variable-expansion
ion-scalar ::= ; <Any Ion scalar value>
ion-ql-container ::= ; <An Ion container quasi-literal>
operation ::= macro-invocation | special-form
variable-expansion ::= '(%' variable-name ')'
variable-name ::= ion-identifier
macro-invocation ::= '(.' macro-ref macro-arg* ')'
special-form ::= '(.' ('$ion::')? special-form-name expression* ')'
macro-ref ::= (module-name '::')? (macro-name | macro-address)
macro-arg ::= expression | arg-group
macro-name ::= ion-identifier
macro-address ::= ion-unsigned-integer
arg-group ::= '(::' expression* ')'
```

15 changes: 2 additions & 13 deletions _books/ion-1-1/src/modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,16 +94,5 @@ $ion_encoding::(

Many of the grammatical elements used to define modules and macros are _identifiers_--symbols that do not require quotation marks.

More explicitly, an identifier is a sequence of one or more ASCII letters, digits, or the characters `$` (dollar sign) or `_` (underscore), not starting with a digit. It also cannot be of the form `$\d+`, which is the syntax for symbol IDs. (For example: `$3`, `$10`, `$458`, etc.)

```bnf
identifier ::= identifier-start identifier-char*
identifier-start ::= letter
| '_'
| '$' letter
| '$_'
| '$$'
identifier-char ::= letter | digit | '$' | '_'
```
More explicitly, an identifier is a sequence of one or more ASCII letters, digits, or the characters `$` (dollar sign) or `_` (underscore), not starting with a digit.
It also cannot be of the form `$\d+`, which is the syntax for symbol IDs (for example: `$3`, `$10`, `$458`, etc.), nor can it be a keyword (`true`, `false`, `null`, or `nan`).
Loading

0 comments on commit e5e02a5

Please sign in to comment.