You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I read it this evening. This is the exact opposite of nom crate (parse combinator) and this is what the rust parser uses (their custom recursive descent parser).
I've 3 ways so far:
parser combinators (which you're also trying). The error handling in this style was a bit new for me, I don't think I still understand it.
parser generator like LALRPOP. This crate seems very tested and mature. It's taking care of the heavy lifting of writing a recursive descent parser. We just have to write a BNF more or less
handmade recursive descent parser: I feel like our language is not too hard for this. The advantage is full control over the codebase.
Based on my study (this article and others) my bias against generators is we have to pick a generator - what we do after picking a parser generator is quite specific to the specific one we pick, and the shortcomings become apparent much later in the project. the more we invest in a generator, the more we will want to "hack around" the limitations of the generator.
Further I felt our p1 grammar was simple enough that reading the entire code base containing our handmade parser is less work than going through the docs of the parser generator we pick.
lalrpop
The fact that @siddhantk232 was biased towards it made me curious and I read it up over the weekend and it does feel like quite a good alternative. It is by an author I really respect, and some big name projects are using it.
The thing that really compelled me to take lalrpop seriously is reliable documented grammar. In the "Beginner's Guide to Parsing" article Apodaca strongly recommends writing the grammar before writing the parser.
Given a grammar, writing a parser boils down to finding a method to translate each production rule into a function. It's convenient to name these functions after the production rules they express.
Emphasis mine. And this became very apparent to me when I was playing with my hand written parser first attempt.
Keeping Grammar Up To Date
So if its important to have a BNF / formal grammar for hand written parser approach to work, or even otherwise, even if we can write hand written parser without needing formal grammar, we still should have formal grammar.
Formal grammar helps with making sure parser is not buggy, as you can prove your code does what grammar says or code is wrong. It also helps write new implementations. If all the grammar is hidden in code, over a period of time it would become hard to understand the rules, and we have to kind of keep relying on the code, it would be really hard to say if our grammar is buggy or code is buggy or the source program is buggy.
So we want formal grammar, and therefore it becomes important to use a generator even if the language (at least p1) is simple. Any syntax change in parser generator world requires us to update the grammar first. In this world grammar file is localised into a single file, and has very little code noise.
This same consideration also rules out parser combinators to me.
Tokenizer
All three approaches can work with custom tokeniser. In case of parser generator this is true with lalrpop. I am playing with logos and they are also supported / recommended? by lalrpop, so its a good choice I think.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
On Discord, @siddhantk232 wrote:
This is a good summary of alternatives we have.
Initial Bias Towards handmade parsers
I found A Beginner's Guide to Parsing in Rust by Richard L. Apodaca quite interesting. I kind of started here. He recommends writing one or two handmade parsers before you go to parser generators.
Based on my study (this article and others) my bias against generators is we have to pick a generator - what we do after picking a parser generator is quite specific to the specific one we pick, and the shortcomings become apparent much later in the project. the more we invest in a generator, the more we will want to "hack around" the limitations of the generator.
Further I felt our p1 grammar was simple enough that reading the entire code base containing our handmade parser is less work than going through the docs of the parser generator we pick.
lalrpop
The fact that @siddhantk232 was biased towards it made me curious and I read it up over the weekend and it does feel like quite a good alternative. It is by an author I really respect, and some big name projects are using it.
The thing that really compelled me to take lalrpop seriously is reliable documented grammar. In the "Beginner's Guide to Parsing" article Apodaca strongly recommends writing the grammar before writing the parser.
Emphasis mine. And this became very apparent to me when I was playing with my hand written parser first attempt.
Keeping Grammar Up To Date
So if its important to have a BNF / formal grammar for hand written parser approach to work, or even otherwise, even if we can write hand written parser without needing formal grammar, we still should have formal grammar.
Formal grammar helps with making sure parser is not buggy, as you can prove your code does what grammar says or code is wrong. It also helps write new implementations. If all the grammar is hidden in code, over a period of time it would become hard to understand the rules, and we have to kind of keep relying on the code, it would be really hard to say if our grammar is buggy or code is buggy or the source program is buggy.
So we want formal grammar, and therefore it becomes important to use a generator even if the language (at least p1) is simple. Any syntax change in parser generator world requires us to update the grammar first. In this world grammar file is localised into a single file, and has very little code noise.
This same consideration also rules out parser combinators to me.
Tokenizer
All three approaches can work with custom tokeniser. In case of parser generator this is true with
lalrpop
. I am playing withlogos
and they are also supported / recommended? by lalrpop, so its a good choice I think.Beta Was this translation helpful? Give feedback.
All reactions