Skip to content
Francis Galiegue edited this page May 19, 2014 · 4 revisions

Introduction

This page is meant as an introduction to the concepts you should know about parsers; you could of course jump directly to the examples, but you may not understand why parsers are written this way.

And this is where this page will help you...

Brief anatomy of a parser

There are three elements to a parser:

  • the parser class itself;
  • the rule methods (or rules; however, "rule methods" is more appropriate, see below);
  • the context.

The parser class

This is the class (or classes; more on this later) you create. It must extend either BaseParser or (recommended) EventBusParser. The difference between those is that the latter provides an event bus to dispatch parsing events to external classes.

A parser class is first and foremost a Java class; therefore, like any other class, it can have different constructors, instance variables, static variables, inner classes (static or not), etc etc. Typically:

// See below for the type parameter
public class MyParser
    extends EventBusParser<Object>
{
    // Constructors, methods, instance variables, rule methods etc
}

The rule methods

A rule method is a method in your parser class returning a Rule. EventBusParser and BaseParser both provide a set of builtin rule methods which you will use for building your own rules (for instance, digit() to match an ASCII digit, unicodeRange() to match a range of Unicode code points etc).

Note that you are not limited to rule methods in your parser class. However, if your method returns a Rule then it becomes a rule method.

Note that some rules accepting more than one argument can contain boolean expressions; such expressions however must not appear as the first argument. For instance:

Rule withBooleanTest()
{
    return sequence(oneOrMore(alpha()), match().length() <= 10);
}

The context

The context is what your parser will see at runtime; this context consists of three elements:

  • the input text, plus the current offset in the text etc;
  • information about the last match;
  • and, finally, the value stack.

And this value stack has a type parameter. And this type parameter is also the one of your parser class. Therefore you are limited as to what you can store into/retrieve from the stack. However, this means you can do stuff like this as well:

// Base parser...
public abstract BaseValueParser<T extends MyBaseClass>
    extends EventBusParser<T>
// Parser for one implementation...
public class ConcreteValueParser
    extends BaseValueParser<ConcreteClass>
// etc