This program is a tool similar to ocamlyacc and menhir that generates LALR(1) parsers. However, unlike those, it uses the continuation-passing style, which allows for better type inference and more efficient compiler optimizations.
- Specializes in generating LALR(1) parsers, with support for LR(0), SLR, and LR(1) parsers.
- Largely compatible with ocamlyacc, supporting:
%left
,%right
, and%nonassoc
precedence and associativity declarations$1
,$2
, etc., for referencing semantic values- Compatibility with the
Parsing
module, including functions likeParsing.symbol_start
- Supports a subset of menhir-specific features:
- Named semantic values using the
id=symbol
syntax - Keywords such as
$startpos
and$loc
- Parametric rules using the
rule(param1, param2)
syntax - Full support for menhir's standard library
- Shorthand notations
*
,+
, and?
%inline
rules- Anonymous rules
- Named semantic values using the
- Experimental/non-standard:
%when
conditions
Planned but not yet supported:
- Error recovery
- Parametric semantic actions
To generate a parser from a .mly
grammar definition, use the following command:
cpspg [-o OUTPUT] INPUT
To integrate this tool into a dune build process, add the following rule to your dune
file:
(rule
(deps Parser.mly)
(targets Parser.ml Parser.mli)
(action
(chdir %{workspace_root} (run cpspg %{deps} %{targets}))))
This project is managed using dune
. To build it, execute:
dune build
Note that lib/Parser.ml
is bootstrapped from lib/Parser.mly
. When code generation changes, it can be
promoted to a new version:
dune build @bootstrap
dune promote
LR parsers are essentially finite automata with a stack containing various semantic values. Representing such a stack in
a functional language is challenging. Traditional tools like ocamlyacc
and menhir
employ workarounds — ocamlyacc
uses Obj.repr
as an escape hatch, while menhir
utilizes GADTs. These approaches make it difficult to typecheck
parsers without explicit type declarations. CPS enables the generation of simpler, more type-safe code that is also more
readable (sort of).