Skip to content
Sofia edited this page Jan 31, 2024 · 2 revisions

Introduction

In this tutorial you'll learn what Rewire is, how to set it up, and how to use it.

What's Rewire, anyway?

Rewire is a "programming language framework". C++ templates are used to describe the grammar and semantic structure (respectively how your language looks like and what each piece does) so that the lexer, parser, and visitor of Rewire can make it happen for you, by (hopefully correctly) interpreting your input text based on your grammar description. Rewire has very little limits on how you can describe your programs. These will be made clear throughout the tutorial.

Setting the project up

Before diving into the actual code, You first need to be able to at least run the default language. This language is bundled with Rewire so you don't have to actually write anything.

Your favorite compiler needs to support C++23 (meaning, the latest C++ standard as of writing), and I can't assist with setting up the project on platforms different than Visual Studio on Windows, since never even tried to build the software there. Rewire uses absolutely no platform-specific things, and no compiler extensions whatsoever. It's 100% standard with zero dependencies other than the standard library, so eventually you'll just be able to compile it. When you do, and you succeed, please write to me describing how you did so, or contribute to the wiki, so that this section can be enriched.
In the meantime, you can just import the project in Visual Studio by navigating to the .sln file and choosing it. It should automatically open the whole project, so you can run it in Visual Studio itself.

OK, what now?

Once you cloned and built the project, and if everything runs as expected, you can start to use Rewire. You can either keep the default syntax, in which case STOP READING HERE and refer to the "Default Language" section, or keep reading.

Look at the file explorer in Visual Studio. There will be a Description.ixx file containing everything you're supposed to work on. That's the only file you should edit. This file contains three sections, in order: Lexer, Parser, Eval. Given a string "1 + 2" which we need to evaluate as "3", the three sections do the following:

  • Lexer describes the tokens your language is composed of.
    • This will let you describe "what your language looks like".
    • In this case, it tells us that 1 + 2 parses as {"1", "+", "2"}.
  • Parser describes the structure of your language.
    • This will let you take various tokens and get their structure. This is where you say that 1 and 2 are arguments to +.
  • Eval describes the meaning of the structure previously defined.
    • Here you'll say that a list of a number, an operaton, and another number, are to be interpreted as a function call (that is, the operation is to be evaluated).

You probably understand now what's the first restriction of a Rewire language:

⚠️ Your language must be able to be defined in terms of lists.

This is because Rewire uses lists internally, but it's not that limiting, as you'll soon find out. Every single implemented thing in Rewire is heavily documented in Description.ixx, so if you can't find some info there, it's probably in the file itself. In fact, you can just refer to the file for the basics on how to use everything.
Let's get started!

Lexer

In the Lexer section, you'll fine a bunch of uneditable stuff (not much though):

  • The definition of the Any, Punctuation, Keyword, Either and Seq structs.
  • The definition of the Identifier, Number, Boolean, String structs.

Everything else can be changed.

Punctuation tokens

The Punctuation structure is used to describe that some 1-char string is "Punctuation". In practice it just means that you can't use those where you expect an identifier and so on. Think of them as separators in your source code. Suppose, for instance, that you'd like to parse
foo(a, b, c)
as a function call to foo with a, b and c as arguments. You probably don't want to be able to call the ( function or calling foo with ) as an argument. This is why you can choose, in the Punctuations structure, which characters are punctuation tokens. The lexer will split the input string according to those. Please also note that the whitespace ' ' character also needs to explicitly be marked as punctuation.
Strings are denoted with " (double quotes) on both sides of the string:
"ciao mondo" is a string, and inside this, no punctuation token gets lexed as such.
You won't get the tokens "ciao, and mondo". The syntax for strings cannot be changed right now.

In case it wasn't clear from the default file, you can define some punctuation token with Punctuation<C> where C is a C++ character literal, i.e. any printable character enclosed in single quotes.

Going further, you'll find the real reason why you're here: The grammar.

Keywords

the Keywords structure acts exactly the same as the punctuation struct, but you're not limited to one-char tokens.

Structure of a Rewire grammar

You already met Either, one of the three ways to specify Rewire grammars, but you didn't (probably) understand it fully, so here's a primer of all the Rewire constructs:

  • Either<T1, T2, ... Tn> tries to lex T1, T2, ... until a match is found. if Tn fails to match, then the lexer fails.
    • T1, ... Tn must be Types, meaning any combination of the types I'll describe shortly.
  • Any<T> matches any (even zero) occurrences of T. It can never fail.
  • Seq<T1, T2, ..., Tn> matches T1, T2, ..., Tn in sequence. If any single type (including Tn!) fails to match, Seq fails.
  • Identifier represents any alphanumeric (letters and numbers) word.
  • String represents "strings".
  • Number represents base-10 numbers.
  • Boolean matches the special strings true and false.

You can mix and match each and every one of these types to make your own types. This is how Rewire works.
the using C++ keyword is used to do so: it lets you give aliases to types for "storing" them:

using Operation = Seq<Number, Identifier, Number>;

We just made an Operation type we'll use later. This type matches a Number, an Identifier, and a Number, in sequence.
There's a problem with this though! Identifier matches any identifier, which probably won't do, so we must specify what we mean by operation. If we only want the 4 basic arithmetic operations of addition, subtraction, product and division we can do this:

using Operator = Either<Punctuation<'+'>, Punctuation<'-'>,
                        Punctuation<'*'>, Punctuation<'/'>>;
using Operation = Seq<Number, Operator, Number>;

Two special types that YOU CANNOT REMOVE are the LineContinuation and LineEndToken token types. These can be anything, but please don't remove them.

  • LineContinuation lets you add a newline without the string being split and sent to the lexer.
  • LineEndToken is used after the LineContinuation to split the lexer on "newlines" which can be anything you need them to be.

there are types used to lex the input stream and auxillary types only used by other types. the former will be called throughout the tutorial and the code documentation "top-level types" to not cause confusion with the latter.
once you have defined your top-level types, possibly by using some auxillary type, you can decide to use them by editing (not removing) the Forms type near the very end of the Lexer namespace. In our case it will look sort of like this:

using Forms = Operation;

If instead you had more than one type, you would wrap it all in an Either clause:

using Forms = Either<Operation, AnotherType, ...>;

Once you have all that set up, We can get to the Parser section just below the Lexer one.

Parser

TODO