How are we going to parse? 21st Oct version #1990

amitu · 2024-10-21T08:15:14Z

amitu
Oct 21, 2024
Maintainer

I read it this evening. This is the exact opposite of nom crate (parse combinator) and this is what the rust parser uses (their custom recursive descent parser).

I've 3 ways so far:

parser combinators (which you're also trying). The error handling in this style was a bit new for me, I don't think I still understand it.

parser generator like LALRPOP. This crate seems very tested and mature. It's taking care of the heavy lifting of writing a recursive descent parser. We just have to write a BNF more or less

handmade recursive descent parser: I feel like our language is not too hard for this. The advantage is full control over the codebase.

.. rest snipped ..

This is a good summary of alternatives we have.

Initial Bias Towards handmade parsers

I found A Beginner's Guide to Parsing in Rust by Richard L. Apodaca quite interesting. I kind of started here. He recommends writing one or two handmade parsers before you go to parser generators.

Based on my study (this article and others) my bias against generators is we have to pick a generator - what we do after picking a parser generator is quite specific to the specific one we pick, and the shortcomings become apparent much later in the project. the more we invest in a generator, the more we will want to "hack around" the limitations of the generator.

Further I felt our p1 grammar was simple enough that reading the entire code base containing our handmade parser is less work than going through the docs of the parser generator we pick.

lalrpop

The fact that @siddhantk232 was biased towards it made me curious and I read it up over the weekend and it does feel like quite a good alternative. It is by an author I really respect, and some big name projects are using it.

The thing that really compelled me to take lalrpop seriously is reliable documented grammar. In the "Beginner's Guide to Parsing" article Apodaca strongly recommends writing the grammar before writing the parser.

Given a grammar, writing a parser boils down to finding a method to translate each production rule into a function. It's convenient to name these functions after the production rules they express.

Emphasis mine. And this became very apparent to me when I was playing with my hand written parser first attempt.

Keeping Grammar Up To Date

So if its important to have a BNF / formal grammar for hand written parser approach to work, or even otherwise, even if we can write hand written parser without needing formal grammar, we still should have formal grammar.

Formal grammar helps with making sure parser is not buggy, as you can prove your code does what grammar says or code is wrong. It also helps write new implementations. If all the grammar is hidden in code, over a period of time it would become hard to understand the rules, and we have to kind of keep relying on the code, it would be really hard to say if our grammar is buggy or code is buggy or the source program is buggy.

So we want formal grammar, and therefore it becomes important to use a generator even if the language (at least p1) is simple. Any syntax change in parser generator world requires us to update the grammar first. In this world grammar file is localised into a single file, and has very little code noise.

This same consideration also rules out parser combinators to me.

Tokenizer

All three approaches can work with custom tokeniser. In case of parser generator this is true with lalrpop. I am playing with logos and they are also supported / recommended? by lalrpop, so its a good choice I think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fastn-stack

How are we going to parse? 21st Oct version #1990

{{title}}

Replies: 0 comments

Select a reply

fastn-stack

How are we going to parse? 21st Oct version #1990

amitu Oct 21, 2024 Maintainer

Initial Bias Towards handmade parsers

lalrpop

Keeping Grammar Up To Date

Tokenizer

Replies: 0 comments

amitu
Oct 21, 2024
Maintainer