-
Notifications
You must be signed in to change notification settings - Fork 18
FAQ
This wiki is now deprecated, please consult the new-style wiki here: https://j-mie6.github.io/parsley/faq
In parsley-4.x.y
and up, combinators are no longer totally lazy and adopt the
idea of lazy and strict arguments. In general, all combinators have strict
receivers.
- A strict argument/receiver indicates that the combinator intends for that argument to be parsed "immediately" without a possibility of consuming input before-hand
- A lazy argument/receiver indicates that the combinator acknowledges that input may be consumed by another argument (or the combinator itself) before the lazy argument will be executed.
Why is this important? Recursive parsers are implemented in Parsley with
"knot-tying" using lazy val
. When a parser appears recursively in its own
definition, it needs to be kept lazy otherwise it will be evaluated during its
own definition, leading to an infinite loop. Lazy positions in combinators
indicate probably safe places for recursion to occur. On the other hand, the
absence of input-consumption before a strict position probably indicates that
recursion in that position will result in a left-recursive parser: the
necessary condition for left-recursion is that a recursive call is made without
having consumed input since the last recursive call. Technically, this is known as unguarded (by input consumption) recursion, with correct parsers making use of guarded recursion. This is not bullet-proof:
- Strict positions sometimes occur in combinators where they should be lazy
but cannot be due to language limitations:
choice
, thezipped
syntax, etc all fall victim to this. - Lazy positions suggest the previously executed parsers could have consumed
input, but it is not guaranteed that they actually do. As an example,
LazyParsley.unary_~
can be used to make a parser lazy in an otherwise strict position, but doesn't itself consume input, so care must still be taken to ensure there is input consumed by previous arguments!
Note that a strict parser in a lazy position is still lazy!
For an example, lazy val q = (p <~ ':', q).zipped(f)
would be problematic
because the recursive call is used naked in a totally strict combinator
(zipped
). There are two ways to fix this problem:
- Reshuffle the surrounding context to move it into a lazy position:
lazy val q = (p, ':' ~> q).zipped(f)
would be fine, since the right-hand side of~>
is a lazy position guarded by the input consumption of':'
. - Use
LazyParsley.unary_~
to introduce laziness:Here, theimport parsley.Parsley.LazyParsley lazy val q = (p <~ ':', ~q).zipped(f)
unary_~
is used to just makeq
lazy: this is simply defined asunit ~> q
, which placesq
into a unguarded lazy position, sinceunit
does not consume input. The guardedness in this case comes fromp <~ ':'
, which is guaranteed to consume a:
on success.
This sounds like you've run into the above issue with left-recursion. This means that a parser appears as the first possible route of exploration from itself:
lazy val bad = bad ~> ...
lazy val badExpr = Add.lift(badExpr, '+' ~> badExpr) <|> number
lazy val goodExpr = attempt((number <~ '+', goodExpr).zipped(Add)) <|> number
The first two parsers are both examples of left-recursive parsers, which are
not allowed in Parsley. The third is caused by the above problem of a genuine
recursive call in a strict position, and can be fixed with unary_~
or by
rearranging the '+'
parser as above:
lazy val goodExpr = attempt((number, '+' ~> goodExpr).zipped(Add)) <|> number
lazy val goodExpr = attempt((number <~ '+', ~goodExpr).zipped(Add)) <|> number
As above you've probably made a left-recursive parser, but this time hidden the bad recursion inside an unguarded lazy position:
import parsley.Parsley.LazyParsley
lazy val badExpr = Add.lift(~badExpr, '+' ~> badExpr) <|> number
Perhaps you tried to fix the above bad parser using unary_~
even though its
an unsafe recursive position?
This error is thrown by the Parsley compiler when Scala would otherwise have thrown a
NullPointerException
during what's known as "let-finding". This is the very first phase of the
pipeline, and, since Parsley does not use null
anywhere in the front-end, its a symptom of a
parser having been demanded before its actually been defined. So what does mean for you, the user?
Well, unfortunately, Scala doesn't give us any indication about which parser is at fault (and the Java 14 Helpful NPEs don't help much either). But here are the possible causes:
- A parser references one that has been defined below it and it hasn't been marked as
lazy val
- A combinator doesn't have the right laziness characteristics (conventionally, Parsley defines
all its combinators arguments using by-name parameters and
lazy val
s for those that appear multiple times in the body) - Be careful: builder combinators to abstract position tracking (as per the Parsley guide) also will require the same care as (2)
Now, the solution to (1) is simple, just either add a lazy val
, or reorder the grammar clauses so
that there is no forward referencing. Similarly, (2) is simple to fix by adding in the right
laziness to the parameter. However, because Parsley is careful to make as much as possible lazy (be
careful of parsley.implicits.zipped.{Zipped2, Zipped3}
, however, neither of them are lazy!) you
may find that you can define an entire parser without ever running into this problem, even if nothing
is marked lazy
: lucky you! My advice is to try and keep things ordered nicely, or mark everything
as lazy; of course, laziness will have a slight runtime penalty, so its worth seeing how much of the
laziness you can eliminate by just reordering.
You may be helped out by Scala here, either by an error about "crossed-initialisation" or a warning about "Reference to uninitialized value". If you see either of these, check the order and laziness of the parsers concerned!
The Wiki refers to the latest version in the parsley-4.x.y
series.