-
Notifications
You must be signed in to change notification settings - Fork 0
Manual
In this tutorial you'll learn what Rewire is, how to set it up, and how to use it.
Rewire is a "programming language framework". C++ templates are used to describe the grammar and semantic structure (respectively how your language looks like and what each piece does) so that the lexer, parser, and visitor of Rewire can make it happen for you, by (hopefully correctly) interpreting your input text based on your grammar description. Rewire has very little limits on how you can describe your programs. These will be made clear throughout the tutorial.
Before diving into the actual code, You first need to be able to at least run the default language. This language is bundled with Rewire so you don't have to actually write anything.
Your favorite compiler needs to support C++23 (meaning, the latest C++ standard as of writing), and
I can't assist with setting up the project on platforms different than Visual Studio on
Windows, since never even tried to build the software there. Rewire uses absolutely no
platform-specific things, and no compiler extensions whatsoever. It's 100% standard with
zero dependencies other than the standard library, so eventually you'll just be able to
compile it. When you do, and you succeed, please write to me describing how you did so, or
contribute to the wiki, so that this section can be enriched.
In the meantime, you can just import the project in Visual Studio by navigating to
the .sln
file and choosing it. It should automatically open the whole project, so you
can run it in Visual Studio itself.
Once you cloned and built the project, and if everything runs as expected, you can start to use Rewire. You can either keep the default syntax, in which case STOP READING HERE and refer to the "Default Language" section, or keep reading.
Look at the file explorer in Visual Studio. There will be a Description.ixx
file
containing everything you're supposed to work on. That's the only file you should edit.
This file contains three sections, in order: Lexer, Parser, Eval.
Given a string "1 + 2" which we need to evaluate as "3", the three sections
do the following:
- Lexer describes the tokens your language is composed of.
- This will let you describe "what your language looks like".
- In this case, it tells us that
1 + 2
parses as{"1", "+", "2"}
.
- Parser describes the structure of your language.
- This will let you take various tokens and get their
structure. This is where you say that
1
and2
are arguments to+
.
- This will let you take various tokens and get their
structure. This is where you say that
- Eval describes the meaning of the structure previously defined.
- Here you'll say that a list of a number, an operaton, and another number, are to be interpreted as a function call (that is, the operation is to be evaluated).
You probably understand now what's the first restriction of a Rewire language:
⚠️ Your language must be able to be defined in terms of lists.
This is because Rewire uses lists internally, but it's not that limiting, as you'll
soon find out.
Every single implemented thing in Rewire is heavily documented in Description.ixx
,
so if you can't find some info there, it's probably in the file itself.
In fact, you can just refer to the file for the basics on how to use everything.
Let's get started!
In the Lexer section, you'll fine a bunch of uneditable stuff (not much though):
- The definition of the
Any
,Punctuation
,Keyword
,Either
andSeq
structs. - The definition of the
Identifier
,Number
,Boolean
,String
structs.
Everything else can be changed.
The Punctuation
structure is used to describe that some 1-char string is
"Punctuation". In practice it just means that you can't use those where you expect
an identifier and so on. Think of them as separators in your source code.
Suppose, for instance, that you'd like to parse
foo(a, b, c)
as a function call to foo
with a
, b
and c
as arguments.
You probably don't want to be able to call the (
function or calling foo
with )
as an argument. This is why you can choose, in the Punctuations
structure,
which characters are punctuation tokens. The lexer will split the input string according to
those. Please also note that the whitespace ' '
character also needs to explicitly be
marked as punctuation.
Strings are denoted with "
(double quotes) on both sides of the string:
"ciao mondo" is a string, and inside this, no punctuation token gets lexed as such.
You won't get the tokens "ciao
,
and mondo"
.
The syntax for strings cannot be changed right now.
In case it wasn't clear from the default file, you can define some
punctuation token with Punctuation<C>
where C is a C++ character literal,
i.e. any printable character enclosed in single quotes.
Going further, you'll find the real reason why you're here: The grammar.
the Keywords
structure acts exactly the same as the punctuation struct, but
you're not limited to one-char tokens.
You already met Either
, one of the three ways to specify Rewire grammars, but
you didn't (probably) understand it fully, so here's a primer of all the Rewire constructs:
-
Either<T1, T2, ... Tn>
tries to lex T1, T2, ... until a match is found. if Tn fails to match, then the lexer fails.- T1, ... Tn must be Types, meaning any combination of the types I'll describe shortly.
-
Any<T>
matches any (even zero) occurrences ofT
. It can never fail. -
Seq<T1, T2, ..., Tn>
matches T1, T2, ..., Tn in sequence. If any single type (including Tn!) fails to match,Seq
fails. -
Identifier
represents any alphanumeric (letters and numbers) word. -
String
represents"strings"
. -
Number
represents base-10 numbers. -
Boolean
matches the special stringstrue
andfalse
.
You can mix and match each and every one of these types to make your own types. This is how Rewire works.
the using
C++ keyword is used to do so: it lets you give aliases to types for "storing" them:
using Operation = Seq<Number, Identifier, Number>;
We just made an Operation
type we'll use later. This type matches a Number, an Identifier, and a Number, in sequence.
There's a problem with this though! Identifier
matches any identifier, which probably won't do, so we must specify what we mean
by operation. If we only want the 4 basic arithmetic operations of addition, subtraction, product and division we can do this:
using Operator = Either<Punctuation<'+'>, Punctuation<'-'>,
Punctuation<'*'>, Punctuation<'/'>>;
using Operation = Seq<Number, Operator, Number>;
Two special types that YOU CANNOT REMOVE are the LineContinuation
and LineEndToken
token types. These can be anything, but please don't remove them.
- LineContinuation lets you add a newline without the string being split and sent to the lexer.
- LineEndToken is used after the LineContinuation to split the lexer on "newlines" which can be anything you need them to be.
there are types used to lex the input stream and auxillary types only used by other types. the former will be called throughout the tutorial and the code documentation "top-level types" to not cause confusion with the latter.
once you have defined your top-level types, possibly by using some auxillary type, you can decide to use them by editing (not removing) the Forms
type
near the very end of the Lexer
namespace. In our case it will look sort of like this:
using Forms = Operation;
If instead you had more than one type, you would wrap it all in an Either
clause:
using Forms = Either<Operation, AnotherType, ...>;
Once you have all that set up, We can get to the Parser section just below the Lexer one.
TODO