Skip to content
Francis Galiegue edited this page May 16, 2014 · 3 revisions

Introduction

I am still discovering things as I write this document; this relates my experience so far with what happens and when, and it is by no means an exhaustive explanation.

Still, it may help, so read on...

Creating a parser

When you write a parser, you basically write a set of Rules, as in:

public class MyParser
{
    Rule rule1()
    {
        return ch('x');
    }

    Rule rule2()
    {
        return sequence('x', "y");
    }
}

So, first things first...

How do I create a parser?

Like this:

final MyParser parser = Parboiled.createParser(MyParser.class);

This single line of code does quite a lot; and this "quite a lot" includes some black magic too.

Before we delve into this black magic, let us filter out what is not black magic. Starting with:

Why can you insert string literals, char literals etc in certain rules but not others?

This is because some methods producing rules (sequence() is one of them, but so is, for instance, firstOf()) accept Objects as argument; these arguments will, in turn, call the .toRule() or .toRules() methods. Both of these method will yield the appropriate matchers depending on the real type of the objects.

So, for booleans, how does this work? How come I can write sequence("x", 3 >= 3)?

Here again it is the toRule{,s} method which help. However, for expressions/methods/etc returning booleans there is a further treatment at parser build time; basically, all boolean expressions in your rules will be turned into Actions.

This particular part of the process is part of the black magic mentioned earlier. So, now, let us delve into that...

OK, so, what does Parboiled.createParser() really do?

What this method does is create a subclass of your parser class; for instance, if your parser class has a fully qualified name of com.mycompany.xx.MyParser, this method will create a class named com.mycompany.xx.MyParser$$parboiled. Note that this means your parser class cannot be final.

The process by which this class is created is very, very low level: it consists, in its entirety, of bytecode inspections and transformations.

Yes, your eyes didn't deceive you. Now, how is this all done? Well, parboiled didn't document this! However, Grappa does...

Not that it explains much, does it? ;) So, here is a quick overview:

  • ASM is used extensively; it "swallows" your parser class into a ParserClassNode;
  • in turn, the ParserClassNode identified a number of RuleMethods (basically, those are all methods in your grammar returning Rules);
  • the bytecode transformation will occur on the rule methods which need to be modified..
  • ... and finally a new instance of your parser is created.