nvim-neorg · vhyrro · Aug 8, 2024 · Aug 8, 2024 · Aug 15, 2024 · Aug 15, 2024
diff --git a/2.0-specification.norg b/2.0-specification.norg
@@ -0,0 +1,189 @@
+@document.meta
+title: Norg v2 Specification
+authors: [
+    vhyrro
+    mrossinek
+]
+categories: specifications
+version: 2.0
+@end
+
+* Norg File Format Specification
+
+  This file contains the formal file format specification of the Norg syntax version 2.0.
+  This document is written in the Norg format in its original form and, thus, attempts to be
+  self-documenting.
+
+  Please note that this is *not* a reference implementation. This document acts as a specification
+  that must be strictly followed when implementing a parser - this prevents discrepancies across
+  applications.
+
+* Introduction
+
+  Norg is a structured, plaintext markup format. It's designed to be viewed
+  standalone while also providing a suite of markup utilities for typesetting
+  structured documents.
+
+  The format is geared towards a variety of use cases: from creating basic notes and writing project plans to
+  interactive code execution and spreadsheets. The syntax itself is lightweight and easy to
+  reason about.
+
+  Compared to other plain-text file formats like e.g. Markdown, Org, RST or AsciiDoc, Norg
+  sets itself apart most notably by following a strict philosophy and ruleset:
+  ~ *Consistency:* the syntax should be consistent. Even if you know only a
+    part of the syntax, learning new parts should not be surprising and rather
+    feel predictable and intuitive.
+  ~ *Simplicity*: the syntax itself should not do any typesetting by default, but should provide all
+     the utilities required for users to perform typesetting themselves. 
+  ~ *Unambiguity:* the syntax should leave _no_ room for ambiguity. This is
+    especially motivated by the use of
+    [tree-sitter]{https://tree-sitter.github.io/tree-sitter/} for the original
+    syntax parser, which takes a strict left-to-right parsing approach and only
+    has single-character look-ahead.
+  ~ *[Free-form]{https://en.wikipedia.org/wiki/Free-form_language}:* whitespace
+    is _only_ used to delimit tokens but has no other significance! This is
+    probably the most contrasting feature to other plain-text formats which
+    often adhere to the [off-side
+  rule]{https://en.wikipedia.org/wiki/Off-side_rule}, meaning that the syntax
+    relies on indentation to carry meaning.
+
+  Although built with the note-taking tool *Neorg* in mind, Norg can be utilized in a wide range of applications,
+  from external note-taking plugins to even messaging services through {* layers}, a pay-for-what-you-use
+  system that allows for selecting syntax relevant to a given application.
+
+* Notation
+
+  Syntax is described using the following notation:
+  - Special syntax is wrapped in `<>` characters. For example: `<title>`.
+  - One or more repetitions of an object (with no whitespace inbetween) is denoted via `+`: `<char>+`.
+  - Obligatory whitespace (one or more) is denoted as a single space.
+  - Obligatory newlines (one or more) are denoted as a single newline character.
+
+  Examples:
+  - `<special> <title>` - some special character, one or more whitespace, then a title.
+  - %TODO: Expand%
+
+* Concepts
+
+  This chapter defines the fundamental concepts of the Norg format, upon which other syntax elements build upon.
+
+** Characters
+
+   The smallest unit of text in Norg is the /character/. A character is any
+   Unicode [code point]{https://en.wikipedia.org/wiki/Code_point} or
+   [grapheme]{https://www.unicode.org/glossary/#grapheme}.
+
+   We identify several types of characters, listed below.
+
+*** Whitespace
+
+    Any set of text can be delimited by whitespace. Consecutive whitespace characters
+    are collapsed to a single space during rendering.
+
+    Whitespace constitutes any code point in the [Unicode Zs general category]{https://www.fileformat.info/info/unicode/category/Zs/list.htm}.
+
+    Tabs are not expanded to spaces during rendering and since whitespace has no semantic meaning there is no need
+    to define a default tab stop. However, if a parser must (for implementation reasons) define a
+    tab stop, we suggest setting it to 4 spaces.
+
+    Any line may be preceded by a variable amount of whitespace, which should
+    be ignored. Upon encountering a {*** line endings}[line ending], it is
+    recommended for parsers to continue consuming (and discarding) consecutive
+    whitespace characters exhaustively.
+
+    The "start of a line" is considered to be /after/ this initial whitespace has been parsed.
+    Keep this in mind when reading the rest of the document.
+
+*** Punctuation
+
+    A character is considered punctuation if belongs to any of the following general Unicode categories:
+    - `Pc`
+    - `Pd`
+    - `Pe`
+    - `Pf`
+    - `Pi`
+    - `Po` 
+    - `Ps`
+
+*** Line Endings
+
+    Line endings are distinct from {*** whitespace} in Norg as they describe
+    boundaries between parts of text. Norg complies with the
+    {https://www.unicode.org/standard/reports/tr13/tr13-5.html}[Unicode newline
+    guidelines] and uses the same terms as present in that document.
+
+    Below are the accepted combinations of newline characters and their functions:
+    - LS - should be parsed as-is and act as a Line Separator.
+    - PS - should be parsed as-is and act as a Paragraph Separator.
+    - NLF (any of `CR`, `LF`, `CRLF`, or `NEL`) *should not* be treated as an LS character as outlined in the Unicode guidelines, but rather
+      as a regular, non-separating newline character. This is so the flow of paragraphs is retained.
+    - Two consecutive NLF sequences - should emulate the behaviour of a Paragraph Separator.
+
+    Examples:
+    @table
+    |----------------------------------------|
+    | Hello,<CR> |  <p>Hello, world!</p>     |
+    | world!     |                           |
+    |----------------------------------------|
+    | Hello,<LF> |  <p>Hello, world!</p>     |
+    | world!     |                           |
+    |----------------------------------------|
+    | Hello,<CRLF> | <p>Hello, world!</p>    |
+    | world!       |                         |
+    |----------------------------------------|
+    | Hello,<NEL>  | <p>Hello, world!</p>    |
+    | world!       |                         |
+    |----------------------------------------|
+    | Hello,<LS> | <p>Hello,<br/> world!</p> |
+    | world!     |                           |
+    |----------------------------------------|
+    | Hello,<PS> | <p>Hello,</p>             |
+    | world!     | <p>world!</p>             |
+    |----------------------------------------|
+    | Hello,<CR><CR> | <p>Hello,</p>         |
+    | world!         | <p>world!</p>         |
+    |----------------------------------------|
+    | Hello,<LF><LF> | <p>Hello,</p>         |
+    | world!         | <p>world!</p>         |
+    |----------------------------------------|
+    | Hello,<CRLF><CRLF> | <p>Hello,</p>     |
+    | world!             | <p>world!</p>     |
+    |----------------------------------------|
+    | Hello,<NEL><NEL> | <p>Hello,</p>       |
+    | world!           | <p>world!</p>       |
+    |----------------------------------------|
+    @end
+
+*** Text
+
+    All other characters not described by the previous sections should be considered a "text" character.
+    Consecutive text characters make up /words/.
+
+*** Escaped Character
+
+    Any character can be escaped through the use of the backslash (`\\`). The escape sequence consumes only
+    the next character.
+
+    An escaped character should be treated distinct from a regular bit of text or whitespace. This becomes important
+    during parsing of {* inline items}.
+
+** Paragraphs
+
+   Paragraphs are a combination of {*** line endings}, {*** whitespace} and {*** text}.
+   They also contain any quantity of {* inline items}.
+
+   Paragraphs are terminated/delimited by some types of {*** line endings} or by the EOF (end of file).
+   They are also implicitly terminated by other {* block-level items}.
+
+* Layers
+
+  Norg is built up of layers, or in other words a set of features that a parser/tool can support
+  depending on how much of the specification they'd like to deal with.
+
+  It's recommended to stick to these layers when implementing Norg in your own application (as it's
+  easy to tell end users that an application supports e.g. "layer 2" of the Norg specification), but
+  of course these can't apply to every possible use case. In such case you can use a /custom layer/
+  and pick and choose what you want to support. Just make sure to let your users know which features
+  you've implemented, so they don't get confused!
+
+  %TODO(vhyrro): Finish once the spec is done%
diff --git a/1.0-specification.norg → older-specifications/1.0-specification.norg b/1.0-specification.norg → older-specifications/1.0-specification.norg
@@ -145,6 +145,7 @@ version: 1.0
    whitespace is also considered empty.
 
 * Detached Modifiers
+
   Norg has several detached modifiers. The name originates from their differentiation to the
   {* attached modifiers}, which will be discussed later. These make up the majority of the syntax.
 
@@ -1720,6 +1721,7 @@ version: 1.0
   The link takes precedence, and no bold is rendered.
 
 * Layers
+
   Norg is built up of layers, or in other words a set of features that a parser/tool can support
   depending on how much of the specification they'd like to deal with.
 

diff --git a/stdlib.norg b/stdlib.norg
@@ -23,6 +23,6 @@ This includes all carryover tags, ranged tags and their behaviours.
 =comment ...
 =end
 
-=eval language-name? &captures* >code
-.invoke-janet (neorg/execute (or &language-name& (neorg/ast/ranged-verbatim-tag/parameter &code& 0) (error "Language type to execute could not be inferred!")) \[&captures&\] (or (neorg/ast/ranged-verbatim-tag? &code& "code") (error "Expected code block to follow `#eval` block!")))
+=eval &captures* >code
+.invoke-janet (neorg/execute '\[&captures&\] (or (neorg/ast/ranged-verbatim-tag? ```&code&``` "code") (neorg/ast/ranged-verbatim-tag/content ```&code&```) (error "Expected code block to follow \`#eval\` block!")))
 =end