Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new v2 specification #42

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
189 changes: 189 additions & 0 deletions 2.0-specification.norg
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
@document.meta
title: Norg v2 Specification
authors: [
vhyrro
mrossinek
]
categories: specifications
version: 2.0
@end

* Norg File Format Specification

This file contains the formal file format specification of the Norg syntax version 2.0.
This document is written in the Norg format in its original form and, thus, attempts to be
self-documenting.

Please note that this is *not* a reference implementation. This document acts as a specification
that must be strictly followed when implementing a parser - this prevents discrepancies across
applications.

* Introduction

Norg is a structured, plaintext markup format. It's designed to be viewed
standalone while also providing a suite of markup utilities for typesetting
structured documents.

The format is geared towards a variety of use cases: from creating basic notes and writing project plans to
interactive code execution and spreadsheets. The syntax itself is lightweight and easy to
reason about.

Compared to other plain-text file formats like e.g. Markdown, Org, RST or AsciiDoc, Norg
sets itself apart most notably by following a strict philosophy and ruleset:
~ *Consistency:* the syntax should be consistent. Even if you know only a
part of the syntax, learning new parts should not be surprising and rather
feel predictable and intuitive.
~ *Simplicity*: the syntax itself should not do any typesetting by default, but should provide all
the utilities required for users to perform typesetting themselves.
~ *Unambiguity:* the syntax should leave _no_ room for ambiguity. This is
especially motivated by the use of
[tree-sitter]{https://tree-sitter.github.io/tree-sitter/} for the original
syntax parser, which takes a strict left-to-right parsing approach and only
has single-character look-ahead.
~ *[Free-form]{https://en.wikipedia.org/wiki/Free-form_language}:* whitespace
is _only_ used to delimit tokens but has no other significance! This is
probably the most contrasting feature to other plain-text formats which
often adhere to the [off-side
rule]{https://en.wikipedia.org/wiki/Off-side_rule}, meaning that the syntax
relies on indentation to carry meaning.

Although built with the note-taking tool *Neorg* in mind, Norg can be utilized in a wide range of applications,
from external note-taking plugins to even messaging services through {* layers}, a pay-for-what-you-use
system that allows for selecting syntax relevant to a given application.

* Notation

Syntax is described using the following notation:
- Special syntax is wrapped in `<>` characters. For example: `<title>`.
- One or more repetitions of an object (with no whitespace inbetween) is denoted via `+`: `<char>+`.
- Obligatory whitespace (one or more) is denoted as a single space.
- Obligatory newlines (one or more) are denoted as a single newline character.

Examples:
- `<special> <title>` - some special character, one or more whitespace, then a title.
- %TODO: Expand%

* Concepts

This chapter defines the fundamental concepts of the Norg format, upon which other syntax elements build upon.

** Characters

The smallest unit of text in Norg is the /character/. A character is any
Unicode [code point]{https://en.wikipedia.org/wiki/Code_point} or
[grapheme]{https://www.unicode.org/glossary/#grapheme}.

We identify several types of characters, listed below.

*** Whitespace

Any set of text can be delimited by whitespace. Consecutive whitespace characters
are collapsed to a single space during rendering.

Whitespace constitutes any code point in the [Unicode Zs general category]{https://www.fileformat.info/info/unicode/category/Zs/list.htm}.

Tabs are not expanded to spaces during rendering and since whitespace has no semantic meaning there is no need
to define a default tab stop. However, if a parser must (for implementation reasons) define a
tab stop, we suggest setting it to 4 spaces.

Any line may be preceded by a variable amount of whitespace, which should
be ignored. Upon encountering a {*** line endings}[line ending], it is
recommended for parsers to continue consuming (and discarding) consecutive
whitespace characters exhaustively.

The "start of a line" is considered to be /after/ this initial whitespace has been parsed.
Keep this in mind when reading the rest of the document.

*** Punctuation

A character is considered punctuation if belongs to any of the following general Unicode categories:
- `Pc`
- `Pd`
- `Pe`
- `Pf`
- `Pi`
- `Po`
- `Ps`

*** Line Endings

Line endings are distinct from {*** whitespace} in Norg as they describe
boundaries between parts of text. Norg complies with the
{https://www.unicode.org/standard/reports/tr13/tr13-5.html}[Unicode newline
guidelines] and uses the same terms as present in that document.

Below are the accepted combinations of newline characters and their functions:
- LS - should be parsed as-is and act as a Line Separator.
- PS - should be parsed as-is and act as a Paragraph Separator.
- NLF (any of `CR`, `LF`, `CRLF`, or `NEL`) *should not* be treated as an LS character as outlined in the Unicode guidelines, but rather
as a regular, non-separating newline character. This is so the flow of paragraphs is retained.
- Two consecutive NLF sequences - should emulate the behaviour of a Paragraph Separator.

Examples:
@table
|----------------------------------------|
| Hello,<CR> | <p>Hello, world!</p> |
| world! | |
|----------------------------------------|
| Hello,<LF> | <p>Hello, world!</p> |
| world! | |
|----------------------------------------|
| Hello,<CRLF> | <p>Hello, world!</p> |
| world! | |
|----------------------------------------|
| Hello,<NEL> | <p>Hello, world!</p> |
| world! | |
|----------------------------------------|
| Hello,<LS> | <p>Hello,<br/> world!</p> |
| world! | |
|----------------------------------------|
| Hello,<PS> | <p>Hello,</p> |
| world! | <p>world!</p> |
|----------------------------------------|
| Hello,<CR><CR> | <p>Hello,</p> |
| world! | <p>world!</p> |
|----------------------------------------|
| Hello,<LF><LF> | <p>Hello,</p> |
| world! | <p>world!</p> |
|----------------------------------------|
| Hello,<CRLF><CRLF> | <p>Hello,</p> |
| world! | <p>world!</p> |
|----------------------------------------|
| Hello,<NEL><NEL> | <p>Hello,</p> |
| world! | <p>world!</p> |
|----------------------------------------|
@end

*** Text

All other characters not described by the previous sections should be considered a "text" character.
Consecutive text characters make up /words/.

*** Escaped Character

Any character can be escaped through the use of the backslash (`\\`). The escape sequence consumes only
the next character.

An escaped character should be treated distinct from a regular bit of text or whitespace. This becomes important
during parsing of {* inline items}.

** Paragraphs

Paragraphs are a combination of {*** line endings}, {*** whitespace} and {*** text}.
They also contain any quantity of {* inline items}.

Paragraphs are terminated/delimited by some types of {*** line endings} or by the EOF (end of file).
They are also implicitly terminated by other {* block-level items}.

* Layers

Norg is built up of layers, or in other words a set of features that a parser/tool can support
depending on how much of the specification they'd like to deal with.

It's recommended to stick to these layers when implementing Norg in your own application (as it's
easy to tell end users that an application supports e.g. "layer 2" of the Norg specification), but
of course these can't apply to every possible use case. In such case you can use a /custom layer/
and pick and choose what you want to support. Just make sure to let your users know which features
you've implemented, so they don't get confused!

%TODO(vhyrro): Finish once the spec is done%
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ version: 1.0
whitespace is also considered empty.

* Detached Modifiers

Norg has several detached modifiers. The name originates from their differentiation to the
{* attached modifiers}, which will be discussed later. These make up the majority of the syntax.

Expand Down Expand Up @@ -1720,6 +1721,7 @@ version: 1.0
The link takes precedence, and no bold is rendered.

* Layers

Norg is built up of layers, or in other words a set of features that a parser/tool can support
depending on how much of the specification they'd like to deal with.

Expand Down
4 changes: 2 additions & 2 deletions stdlib.norg
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,6 @@ This includes all carryover tags, ranged tags and their behaviours.
=comment ...
=end

=eval language-name? &captures* >code
.invoke-janet (neorg/execute (or &language-name& (neorg/ast/ranged-verbatim-tag/parameter &code& 0) (error "Language type to execute could not be inferred!")) \[&captures&\] (or (neorg/ast/ranged-verbatim-tag? &code& "code") (error "Expected code block to follow `#eval` block!")))
=eval &captures* >code
.invoke-janet (neorg/execute '\[&captures&\] (or (neorg/ast/ranged-verbatim-tag? ```&code&``` "code") (neorg/ast/ranged-verbatim-tag/content ```&code&```) (error "Expected code block to follow \`#eval\` block!")))
=end