Grammar.Lex
and Grammar.Syntax
parts are similar in notation and functionality to Parsing Expression Grammars (PEGs
), (extended) BNF-like notation can also be used (see below)
- Extra Settings
- Style Model
- Fold Model (not available)
- Match Model (not available)
- Lexical Model
- Syntax Model (optional)
- Parser
- Modularity and Future Directions
Grammar.RegExpID
defines the prefixID
for any regular expressions (represented as strings) used in the grammar
The first character after the RegExpID
is considered the regular expression delimiter (similar to php regex usage), in this way regular expression flags can be added (mostly the case-insensitive flag i
)
Supported Regular Expression Flags
i
case-insensitive matchx
extended regular expression, contains dynamic substitution patterns (useful forblock
tokens' patterns, see below)l
match from start of line instead of current stream position (same as starting regular expression with^
special character)
example:
"RegExpID" : "RegExp::"
// .. various stuff here
"aToken" : "RegExp::/[abc]+/i", // define a regular expression /[abc]+/i, or [abc]+ with case-insensitive flag
// note: the delimiters ( /, / ) are NOT part of the regular expression
// regular expression syntax and escaping is same as regexs defined with new RegExp() object in javascript
"anotherToken" : "RegExp::#[def]+#i", // define a regular expression /[def]+/i, or [def]+ with case-insensitive flag
// note: other delimiters are used ( #, # )
// .. other stuff here
Grammar.Extra
defines any (editor-specific) extra settings (eg.electricChars
,fold
, etc..) to be added to the generated mode, and is a map of the form:
editor_specific_setting_id -> editor_specific_setting_value
(new, optional)
Generic, editor-independent, code folding functionality is supported (by generic folders implementations).
Add a fold
type in the Grammar.Extra."fold"
option.
Generic Folding Types supported:
- custom string, folds on custom delimiters separated by comma e.g
"<[,]>"
"braces"
, folds on braces i.e{
,}
(alias of"{,}"
)"brackets"
, folds on brackets i.e[
,]
(alias of"[,]"
)"parens"
/"parentheses"
, folds on parentheses i.e(
,)
(alias of"(,)"
)"brace"
/"cstyle"
/"c"
, folds on braces (i.e{}
) and brackets (i.e[]
) (alias of"{,}+[,]"
)"indent"
/"indentation"
, folds on blocks delimited by same indentation (e.g as inpython
)"tags"
/"markup"
/"xml"
/"html"
, folds based onxml
/markup
tags andCDATA
blocks- NOTE folding
"block"
-type comments, if existing and defined as such, is done automaticaly, no need to add separate folder option
NOTE One may use multiple different code-folders, combined with +
operator.
For example:
"Extra" : {
// combine multiple code folders in order by "+",
// here both brace-based and tag-based (each one will apply where applicable, in the order specified)
"fold" : "braces+brackets+parens+tags"
}
(new, optional)
Generic, editor-independent, code token matching functionality is supported (by generic matchers implementations).
Add a match
type in the Grammar.Extra."match"
option.
Note If match
setting is not given, match
takes analogous settings to fold
settings (if set)
Generic Matching Types supported:
- custom string, matches custom delimiters separated by comma e.g
"<[,]>"
"braces"
, matches braces i.e{
,}
(alias of"{,}"
)"brackets"
, matches brackets i.e[
,]
(alias of"[,]"
)"parens"
/"parentheses"
, matches parentheses i.e(
,)
(alias of"(,)"
)"brace"
/"cstyle"
/"c"
, matches braces (i.e{}
), brackets (i.e[]
) and parentheses (i.e()
) (alias of"{,}+[,]+(,)"
)"tags"
/"markup"
/"xml"
/"html"
, matchesxml
/markup
tags (not implemented yet)
NOTE One may use multiple different code-token-matchers, combined with +
operator.
For example:
"Extra" : {
// combine multiple token matchers in order by "+",
// here both custom and brace-based (each one will apply where applicable, in the order specified)
"match" : "<[,]>+braces+brackets+parens"
}
Grammar.Style
Model defines the mapping of tokens to editor styles and is a map of the form:
token_id -> editor_specific_style_tag
(not available)
In future, support a parametrisable Grammar.Fold
Model which can parametrise code folders
for user-defined custom code folding (see above).
(not available)
In future, support a parametrisable Grammar.Match
Model which can parametrise code token matchers
for user-defined custom code token matching (see above).
Grammar.Lex
Model defines the mapping of token patterns and token configuration to an associated token_id
and is a map of the form:
token_id -> token_configuration_object
token configuration
can be:
-
a
pattern
orarray of patterns
-
an
object
having at least (some or all of) the followingproperties
:"type"
:token
type
(default"simple"
)"msg"
:token
error message
(defaulttoken default error message
)"tokens"
:pattern
orarray of patterns
for this tokenproperties
depending ontoken type
(see below)- optionaly, token
"type"
can be annotated inside thetoken_id
(see below)
-
a token type can be
"simple"
(default),"block"
,"line-block"
,"escaped-block"
,"escaped-line-block"
,"comment"
,"action"
-
a token can extend / reference another token using the
extend
property; this way 2 tokens that share common configuration but different styles (depending on context) can be defined only once. Examples are tokens foridentifiers
andproperties
While both have similar tokenizers, the styles (and probably other properties) can be different (see syntax notations for a more convenient and more flexible alternative).
// Style
//
// .. stuff here..
"identifier" : "style1",
"property" : "style2",
// ..other stuff here..
// Lex
//
// .. stuff here..
"identifier": "RegExp::/[a-z]+/",
"property": {
"extend" : "identifier" // allow property token to use different styles than identifier, without duplicating everything
},
// ..other stuff here..
-
a literal
null
valued token matches up to 'end-of-line' ; can be useful when needing to match remaining code up to end-of-line (e.g see single-line comments below) -
a literal
false
or0
valued token matchesempty production
; can be useful in defining syntax sequences that can repeat (can be used as alternative to"zeroOrMore"
,"oneOrMore"
group types, plus give a familiar feel to defining rule productions ) -
a literal
empty string
token (''
) matchesnon-space
; can be useful when multiple tokens should be consecutive with no space between them -
a literal
string
becomes a token (eg insideSyntax
Model sequence) with atokenID
same as itsliteral value
-
a token can be defined using just the
token_id
and the token pattern(s); token type is assumed"simple"
-
multiple
"simple"
tokens (which are NOT regular expresions) are grouped into one regular expression by default using"\\b"
(word-boundary) delimiter; this is usefull for speed fine-tuning the parser, adding the"combine"
property in the token configuration, can alter this option, or use a different delimiter -
"simple"
tokens can also be used to enable keyword autocomplete functionality ("autocomplete":true|false[keywords]
, option ) -
other tokens can be excepted, which would possibly match similarly to this token, for example
keywords
orreserved tokens
, ("except":["other_token1","other_token2",..]
, option )
-
"block"
,"line-block"
,"escaped-block"
,"escaped-line-block"
,"comment"
token types take pairs of patterns[start-pattern, end-pattern]
-
if
"end-pattern"
is missing,"end-pattern"
is same as"start-pattern"
-
if
"end-pattern"
has the (literal) valuenull
,"end-pattern"
matches what remains up toend-of-line
(eg. in single line comment blocks) -
if
"end-pattern"
is astring
, it can contain special string replacement patterns, the"end-pattern"
will be generated dynamically from the respective"start-pattern"
by replacement (providedstart-pattern
is a regex), make sure to properly escape literal"$"
values in order to match them correctly (see previous link) -
if
"end-pattern"
is anumber
, then the"end-pattern"
is generated dynamically from the respective"start-pattern"
match group (providedstart-pattern
is a regex) -
"escaped-block"
,"escaped-line-block"
type is a"block"
which can contain its"end-pattern"
if it is escaped ,"strings"
are classic examples -
"block"
,"comment"
,"escaped-block"
can span multiple lines by default, setting the"multiline": false
option can alter this -
"escaped-line-block"
,"line-block"
type is a"block"
which can span only a single line ("multiline":false
) -
"escaped-block"
,"escaped-line-block"
by default uses the"\\"
escape character, setting the"escape": escChar
option, can alter this -
"comment"
type is a"block"
type with the additional semantic information that token is about comments, so comment toggle functionality and comment interleave functionality can be enabled -
"comment"
type can be interleaved inside other syntax sequences automatically by the parser ("interleave": true
, token option) (the comment token should still be added ingrammar.Parser
part ), else comment interleaving could be handled manually in thegrammar.Syntax
part -
all block-type tokens can have different styles for
block delimiters
andblock interior
(see examples), for example having a block token with ID"heredoc"
, the interior different style can be represented inStyle
part of grammar as"heredoc.inside"
(make sure your tokenIDs
do not accidentally match this option) -
all block-type tokens can be empty (default). If a block must necessarily contain something (be non-empty, e.g regular expressions modeled as blocks), set the
"empty":false
flag tofalse
.
(new, experimental)
Action
tokens enable the grammar parser to perform some extra context-specific parsing functionality on tokens.
An action
token in a grammar applies only and directly to the token preceding it. It performs an action on that token only.
-
"action"
tokens canstart
("context":true
,"hypercontext":true
) andend
("context":false
,"hypercontext":false
) a new dynamiccontext
orhypercontext
respectively so succesive actions take place in that (hyper)context, for example unique object literal properties and unique xml tag attributes and localy-scoped variables can be done this way, (seetest/grammars/xml.js
for an example) -
"action"
tokens candefine
a new symbol or canundefine
an already defined symbol extracted from the (preceding) matched token (very experimental) -
"action"
tokens can test if a symbol extracted from the (preceding) matched token is alreadydefined
ornotdefined
(very experimental) -
"action"
tokens canpush
orpop
stringIDs
onto the data stack generated from the (preceding) matched token, for example associated tag mathing can be done this way, (seetest/grammars/xml.js
for an example) -
"action"
tokens cancheck
the (preceding) matched token is unique, for example unique identifiers checking can be done this way, (seetest/grammars/xml.js
for an example) -
"action"
tokens can generate a harderror
in context ("error":"error message"
), for example special syntax errors can be modeled, in the grammar, as needed, if needed, (seetest/grammars/xml.js
for an example) -
.. more actions to be added like
indent
/outdent
etc.. -
multiple
"action"
tokens in sequence are applied to the same preceding token -
"actions"
that are successful can transfer their style modifier (if they have) to the token itself (for examplescoped/local variables
can be handled this way)
Note: unique
action is equivalent to a combination of notdefined
and define
actions, however for convenience of use and impementation it is taken as one action
Example:
// stuff here..
"match" : {
// this will push a token ID generated from the matched token
// it pushes the matched tag name (as regex group 1)
// string replacement codes are similar to javascript's replace function
"type" : "action",
"push" : "<$1>"
},
"matched" : {
// this will pop a token ID from the stack
// and try to match it to the ID generated from this matched token
// it pops and matches the tag name (as regex group 1)
// string replacement codes are similar to javascript's replace function
// note 1: the ID has similar format to the previously pushed ID
// any format can be used as long as it is consistent (so matching will work correctly)
// note 2: a "pop" : null or with no value, pops the data unconditionally (can be useful sometimes)
"type" : "action",
"pop" : "<$1>",
"msg" : "Tag \"$1\" does not match!"
}
// other stuff here..
// syntax part
"start_tag": "open_tag match tag_attribute* close_open_tag",
"end_tag": "close_tag matched",
// other stuff here..
Action
tokens (as well as lookahead
tokens) enable (strong) context-sensitive parsing by supporting lookbehind
and context-sensitive functionality. For example one (usual) way to define a (general) context-sensitive grammar production is:
Cb A Ca ==> B C
A production like the above will expand the A
non-terminal depending on the surrounding (the context) Cb
(i.e context before) and Ca
(i.e context after) (non-)terminals (using generic non-terminals results in strong context-sensitivity, also refered as unrestricted). By transposing the context (non-)terminals to the other side of the production rule, we get:
A ==> Cb> B C <Ca
They become lookbehind
(i.e Cb>
) and lookahead
(i.e <Ca
) kind of tokens respectively. Lookahead
tokens cover some of this functionality and (context) actions that store/lookup (previous) symbols cover the other functionality. Thus non-trivial context-sensitive parsing becomes possible in an otherwise simple grammar.
One should not confuse semantics with context-sensitivity in grammar. For example that "whether a symbol is already defined or is unique or matches another symbol is a semantical action and not related to syntax". This is misleading because context-sensitive syntax in fact handles these cases. What would seem like semantics for what are called context-free grammars, is just syntax (in context, context implies positional memory) for so-called (general) context-sensitive grammars.
There are other approaches to (mild or strong) context-sensitive grammar parsing for example: using indices in productions (indexed grammars) or extra attached attributes (attribute grammars), visibly pushdown grammars, parsing expression grammars with lookaheads. This approach uses a set of simple, generic and composable action tokens, which can also support or simulate other approaches (e.g visibly pushdown tokens, lookahead tokens, lookbehind tokens, attached attributes, etc). Another way to look at it is that action
tokens enable to declaratively describe non-linear syntax (involving multiple correlated dimensions), which in the final analysis is context-sensitivity (eg positional memory and correlation).
(new, optional)
Lexical tokens can annotate their type
in their token_id
as "token_id:token_type"
for convenience.
Example:
"a_token:comment": [["<--", "-->"]]
// is equivalent to =>
"a_token": {
"type": "comment",
"tokens": [["<--", "-->"]]
}
"a_token:action": {"push":"$1"}
// is equivalent to =>
"a_token": {
"type": "action",
"push": "$1"
}
// and so on..
(optional)
Grammar.Syntax
Model defines the mapping of token context-specific sequences to an associated composite_token_id
and is a map of the form:
composite_token_id -> composite_token_configuration_object
-
Inside the
Syntax
Model, (single) tokens can also be defined (similar toLex
part), however it is recommended (single) tokens be defined inLex
part -
Syntax
includes special token types (like generalised regular expressions for composite token sequences, or parsing expressions in the style ofPEG
s) -
Syntax
token"type"
can be:"alternation"
,"sequence"
,"zeroOrOne"
,"zeroOrMore"
,"oneOrMore"
,"repeat"
"positiveLookahead"
,"negativeLookahead"
; these tokens contain sequences of subtokens ("tokens"
) to be matched according to some scheme -
types:
"type":"repeat"
,"repeat":[min, max]
, match any of the tokens a minimummin
times and a maximummax
times else error (analogous to a regex:(t1 | t2 | t3..){min, max}
, wheret1
,t2
, etc are also composite tokens)"type":"zeroOrOne"
, match any of the tokenszero or one
times (analogous to a regex:(t1 | t2 | t3..)?
, wheret1
,t2
, etc are also composite tokens)"type":"zeroOrMore"
, match any of the tokenszero or more
times (analogous to a regex:(t1 | t2 | t3..)*
, wheret1
,t2
, etc are also composite tokens)"type":"oneOrMore"
, match any of the tokensone or more
times (analogous to a regex:(t1 | t2 | t3..)+
, wheret1
,t2
, etc are also composite tokens)"type":"alternation"
, match any of the tokens (analogous to a regex:(t1 | t2 | t3..)
, wheret1
,t2
, etc are also composite tokens)"type":"sequence"
, match all the tokens in sequence (analogous to a regex:(t1 t2 t3 ..)
, wheret1
,t2
, etc are also composite tokens) else error"type":"positiveLookahead"
, try to match the token without consuming the token and return whether succesful or not"type":"negativeLookahead"
, try to match the token without consuming the token and return whether failed or not
-
a
"subgrammar"
or"grammar"
syntax token type, represents a reference to another (external) grammar to be used as sub-grammar inside this current grammar (new feature, experimental). For example anhtml
grammar can make use ofcss
andjavascript
sub-grammars in special places inside the whole grammar, like below (see test examples for further info)
// other Syntax definitions..
/* token name to be used inside THIS grammar */
"javascript" : {
/* the external grammar reference name, here it is the same as the token name */
"subgrammar":"javascript"
},
"tag_attribute" : "attribute '=' (string | number)",
/* here `javascript` refers to the token name, NOT the sub-grammar name, although in this case they are the same */
"script_tag" : "'<script'.tag tag_attribute* '>'.tag javascript '</script>'.tag",
"other_tag" : "tag_open | tag_close | text",
"tag" : "script_tag | other_tag",
"html" : "tags*"
// other Syntax definitions..
This way multiple grammars can be combined and multiplexed easily and intuitively. Note another way to combine grammars is to actualy merge all needed grammars in a single grammar specification (this can also be useful, but more tedious)
-
a syntax token can contain (direct or indirect)
recursive references
to itself ( note: some rule factoring may be needed, to avoid grammarleft-recursion
orambiguity
) -
a
"ngram"
or"n-gram"
syntax token type, is similar to a"type":"sequence"
type, with the difference that it is only matched optionally (suitable to be used ingrammar.Parser
part) -
a syntax token in
Parser
that it wrapped in[..]
(i.e as anarray
), is shorthand notation to interpret it as ann-gram
-type token (for convenience of notation, one can see it in various example grammars) -
It is recommended to have only
Lex.tokens
orSyntax.ngrams
in thergrammar.Parser
part and notSyntax.group
tokens which are mostly auxilliary -
The
grammar.Syntax
part is quite general and flexible and can define a complete language grammar, however since this is for syntax highlighting and not for code generation, defining only necessary syntax chunks can be lighter
(new)
Syntax
part supports shorthand definitions (similar to PEG
/ BNF
-style definitions) for syntax sequences and groups of syntax sequences (see below).
Note: In order for a (grammar) specification for (programming) language highlight to be detailed and self-contained (to be re-usable and flexible under the most general conditions), it should model all available types of tokens which play a part in the code and the way it is highlighted (usualy not modeled by other approaches or other grammar specifications), for example:
<start-of-file>
token (modeled as^^
, see below)<first-non-blank-line>
token (modeled as^^1
, see below)<start-of-line>
token (modeled as^
, see below)<end-of-line>
token (modeled as$
, see below)<up-to-line-end>
token (modeled asnull
, see above,block
tokens)<non-space>
token (modeled as''
, see below)<indentation>
token (not available yet)<de-indentation>
token (not available yet)- and so on..
Specificaly:
"t": "t1 | t2 | t3"
// is equivalent to =>
"t": {
"type": "alternation",
"tokens": ["t1", "t2", "t3"]
}
"t": "t1*"
// is equivalent to =>
"t": {
"type": "zeroOrMore",
"tokens": ["t1"]
}
"t": "t1+"
// is equivalent to =>
"t": {
"type": "oneOrMore",
"tokens": ["t1"]
}
"t": "t1?"
// is equivalent to =>
"t": {
"type": "zeroOrOne",
"tokens": ["t1"]
}
"t": "t1&"
// is equivalent to =>
"t": {
"type": "positiveLookahead",
"tokens": ["t1"]
}
"t": "t1!"
// is equivalent to =>
"t": {
"type": "negativeLookahead",
"tokens": ["t1"]
}
"t": "t1{1,3}"
// is equivalent to =>
"t": {
"type": "repeat",
"repeat": [1,3], // match minimum 1 times and maximum 3 times
"tokens": ["t1"]
}
"t": "t1* | t2 | t3"
// is equivalent to =>
"t1*": {
"type": "zeroOrMore",
"tokens": ["t1"]
},
"t": {
"type": "alternation",
"tokens": ["t1*", "t2", "t3"]
}
"t": "t1* t2 | t3"
// is equivalent to =>
"t1*": {
"type": "zeroOrMore",
"tokens": ["t1"]
},
"t1* t2": {
"type": "sequence",
"tokens": ["t1*", "t2"]
},
"t": {
"type": "alternation",
"tokens": ["t1* t2", "t3"]
}
// tokens can be grouped using parentheses
"t": "(t1 t2)* | t3"
// is equivalent to =>
"t1 t2": {
"type": "sequence",
"tokens": ["t1", "t2"]
}
"(t1 t2)*": {
"type": "zeroOrMore",
"tokens": ["t1 t2"]
}
"t": {
"type": "alternation",
"tokens": ["(t1 t2)*", "t3"]
}
// literal tokens wrapped in quotes (' or ") (e.g 'abc') are equivalent to their literal value (i.e abc)
// literal tokens wrapped in brackets (e.g [abc]) are equivalent to matching any of the enclosed characters (like in regular expressions)
// note: LIKE REGULAR EXPRESSIONS having ^ first in a character selection, e.g [^abc] provides a NEGATIVE match
// note2: the PEG features
// 1. negative lookahead feature (i.e not-predicate !t)
// 2. positive lookahead feature (i.e and-predicate &t)
// ARE supported (see above)
// regular expressions can also be defined inside a syntax rule, using /../[i] format, e.g /abc/, /abc/i
// this will create a new simple token (with token_id=/abc/i) which is a regular expression
// using a dotted-style modifier, on the regex defined this way, can style it dynamicaly (see below), e.g /abc/i.style1
// empty literal token w/ quotes (i.e '') matches NON-SPACE production (i.e fails if space is encountered)
// zero literal token w/o quotes (i.e 0) matches EMPTY production (i.e succeeds always)
// ^^ literal token w/o quotes (i.e ^^) matches SOF (i.e start-of-file, first line of code)
// ^^1 literal token w/o quotes (i.e ^^1) matches FNBL (i.e first non-blank line of code)
// ^ literal token w/o quotes (i.e ^) matches SOL (i.e start-of-line, any line)
// $ literal token w/o quotes (i.e $) matches EOL (i.e end-of-line, any line, along with any extra space)
// e.g
"t": "t1 '=' t2"
// is equivalent to =>
"t_equal": {
"type": "simple",
"tokens": "="
},
"t": {
"type": "sequence",
"tokens": ["t1", "t_equal", "t2"]
}
// any (simple or composite) token followed by a dotted-style modifier, uses this style in context
// for example, the token "t3", below, will be styled differently depending on context
// i.e whether it comes after "t1" or after "t2"
// ..
// Style..
"Style": {
"style1": "an-editor-style",
"style2": "another-editor-style"
}
// ..
// Syntax..
"t": "t1 t3.style1 | t2 t3.style2"
// style modifier for composite tokens also works
// i.e below both "t2" and "t3" tokens will be styled with "style1", if after "t1"
// note: ".style1" overrides internal "t2.style2" modifier as well
"t": "t1 (t2.style2 t3).style1"
// and so on..
// ..
Grammar.Parser
defines what to parse and in what order ( only patterns defined in this part of the grammar
will actually be parsed , everything else is auxilliary
)
- a syntax token used in the
parser
, which is enclosed in brackets[..]
, i.e as anarray
, is interpreted as ann-gram
-type token (see syntax part, above). This is used for convenience ans succintness of notation, instead of creating separaten-gram
token(s), for use in theparser
part
The model envisioned for modular highlighting is shown below:
- User specifies a
specification
(agrammar
model) for a language - The specification along with the source code (to be highlighted) is passed to the
analyser
(parser
), which uses the specification to analyse the code and extract data (see below) - The analysed code data and the source code are passed to the
renderer
(editor
, e.gcodemirror
,ace
,..) which renders the code and interacts with the user
More or less this model is used now by highlight editors (e.g codemirror
, ace
, ..) however it is not modular.
The new specification-analyser-renderer model
has the following features / advantages:
- Specification is at the same time both concise and detailed, self-contained (this is achieved by modeling all necessary features of a language in a symbolic way, see below).
- Specifications can be extended / merged / combined to construct new specifications for variations, dialects, composite languages, with minimum effort (this is achieved by the appropriate format for a specification and how/what it models, see below).
- The model is based on interfaces for analyser and renderer, while (underlying) implementations can vary as needed. This decouples highlighting specifications from underlying implementations and differences between platforms,editors,languages and so on..
- Optionaly, a specification can be directly transformed into hard code (highlight mode source code for a language) and be used directly (in place of analyser) without re-doing the first 2 stages (
specification - analyser
).
The format for a specification should be such that is widely supported, is textual or also has a equivalent strictly-textual representation, is concise, and enables composition operations. JSON
is such a format (and ubiquitous in javascript-based projects) and is the format used by the grammar
(specification
).
In order for a (grammar) specification for (programming) language highlight to be detailed and self-contained (to be re-usable and flexible under the most general conditions), and also concise, it should model all available types of tokens/actions which play a part in the code and the way it is highlighted (usualy not modeled by other approaches or other grammar specifications).
For example (see above):
-
<start-of-file>
,<start-of-line>
,<end-of-line>
token -
<first-non-blank-line>
,<up-to-line-end>
token -
<non-space>
,<indentation>
,<dedentation>
token -
<block>
,<delimited>
token (e.g in the sense thatcomments
,strings
,heredocs
,.. are delimited, block tokens) -
and so on..
-
handle arbitrary, user-defined, toggle comments and keyword autocompletion functionality (achieved by semanticaly annotating language
comments
andkeywords
) -
handle arbitrary, user-defined, lint-like syntax-annotation functionality (achieved by modeling language
syntax
analysis except strictlylexical
analysis, in-context) -
handle arbitrary, user-defined, code
folding
(e.g viafold action
token or via aFold
Model, see above) -
handle arbitrary, user-defined, code
matching
(e.gbrackets
,tags
, etc..) (e.g viamatch action
token or via aMatch
Model, see above) -
handle arbitrary, user-defined, code
(out-)indentation
(e.g viaindent action
token or via anIndentation
Model) -
handle arbitrary, user-defined, non-linear token/symbol matching via
push action
andpop action
tokens -
handle arbitrary, user-defined, dynamic contexts and hyper-contexts via
context action
andhypercontext action
tokens -
handle arbitrary, user-defined, dynamic symbol store and lookup functionality via
define action
,defined action
,undefine action
andnotdefined action
tokens -
handle arbitrary, user-defined, unique/duplicate identifiers via
unique action
token
- and so on..
The difference between <indentation>
token and <indent>
action token (although related) is that the indentation
token recognizes indentations and indentation changes (that may signal a code block, for example like braces do), while an indent
action token creates dynamic indentation (e.g by inserting indentation
tokens, or symbols to be recognised by indentation
tokens).