Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsley 4 Major Changes #98

Merged
merged 184 commits into from
Nov 30, 2022
Merged

Parsley 4 Major Changes #98

merged 184 commits into from
Nov 30, 2022

Conversation

j-mie6
Copy link
Owner

@j-mie6 j-mie6 commented Oct 5, 2021

This PR consolidates the Parsley 4 major API changes ready for the eventual release of next release series.

The scope of Parsley 4's changes are massive: probably the largest shift in Parsley's history! The aims of this project have been the following:

User-Facing Changes

Major Changes

  • Improve the expression parsing combinators in line with the presentation in Design Patterns for Parser Combinators
  • Enforced the deprecation of many combinators:
    • all the original function based register combinators - they are now methods on register itself, allowing better type inference
    • the problematic <\> combinator, which both had an inappropriate associativity, and encouraged an overuse of attempt
    • unsafeLabel, which leveraged the previously unsound error message system of pre-2.6.0: this just simply isn't needed, and optimisations of label in the backend more than make up for its loss.
    • the remaining unsafe API in general: none of this functionality is required, and causes a variety of legacy code maintenance problems
  • Moved to a strict API: this means that the receivers of combinators are strict, and left-recursion will now diverge sooner. Left-recursion handling is out of scope for Parsley, so this is a good change. It may also result in improved warnings from Scala itself that point out left-recursion in parsers. The LazyParsley extension class still exists, and it supports the unary_~ combinator to make a parser lazy for use in the fully strict zipped combinators. As a result, the structure of the library is simplified, with all the combinators defined as methods on Parsley itself.
  • Improved factoring of the expression API to distinguish between the heterogeneous chains in infix with the homogeneous ones in chain.
  • Removed the cast combinator, as the register restriction is lifted, and it can be a misleading combinator.
  • Moved a selection of combinators into new homes:
    • Parsley.sequence -> combinator.sequence
    • Parsley.traverse -> combinator.traverse
    • Parsley.skip -> combinator.skip
    • Parsley.void made into a method on Parsley itself
    • character.anyChar -> character.item
    • combinator.repeat -> combinator.exactly
    • combinator.optionally -> combinator.optionalAs
    • Parsley.LazyMapParsley -> extension.HaskellStyleMap
    • Parsley.LazyChooseParsley -> extension.LazyChooseParsley
  • Removed the revision system for error builders, as it is no longer compatible with the new versioning policy
  • Added wide-carets to the error builder
  • Some combinators have been adjusted to remove inconsistent states or be more consistent with other combinators:
    • fail must receive at least one argument
    • string no longer accepts the empty string (there is no reason to do this!)
    • collectMsg now accepts 1 or more message arguments
    • collectMsg and guardAgainst now produce Seq[String] as the messages (this should be non-empty!)
    • precedence requires at least one atom and one level, this also allows for vararg versions in the other direction
  • zipped combinators are now fully strict, and LazyZipped has been removed entirely
  • Allow the user to decide how to extract an unexpected token from the input (provided as IndexedSeq[Char]) at fail position additionally given the amount of input the parser tried to parse unsuccessfully.
  • Remove most (if not all) of the default arguments found in the API: changes to default arguments are not backwards binary incompatible, which may violate the new versioning policy.
  • Any preliminary changes ready to support Support for Alternative Input Sources #132

Major parsley.token Changes

  • Remove BitSet and Impl
  • Remove LanguageDef and replace with LexicalDesc and friends
  • Subdivide Lexer into categories, which can share functionality with each other:
    • numeric (tests for Combined required)
    • text
    • symbol
    • names
    • space
    • separators
    • enclosing

Minor Changes

  • Support for mixed-fixity precedence levels
  • Added fresh combinator, for generation of unique results from pure
  • Added the strings combinator, which efficiently parses one of a set of strings
  • Added the strings combinator, which efficiently parses one a mapping of strings to other parsers
  • Added the stringOfMany and stringOfSome combinators, which allow for efficient string construction, instead of the old .map(_.mkString) idiom on many or some
  • Added the | combinator alias
  • Added filterWith back onto the API
  • Added combinator.ifP
  • Added extensions.OperatorSugar which comes with a few operators inspired by the SPC library
  • Added the newReg and fillReg combinators
  • Added the forP and forP_ combinators
  • Added the forYieldP, and forYieldP_ combinators
  • Added the mapFilter combinator
  • An error message debugging combinator
  • Added the generic bridge traits of ParserBridge0 through ParserBridge22 as well as ParserSingletonBridge for basic use of the Parser Bridge and Singleton Bridge patterns.
  • Added ap1 through ap22, which are logical precursors to lift1 to lift22.
  • Added markAsToken and unexpectedWhen combinators to parsley.errors.combinator
  • fail and unexpected should be able to specify the caret width of their error, for unexpected errors, widest now wins against ambiguity (docs required)

Patch Changes

  • Improved the implementation of persist
  • Made the <|> combinator more consistent when the (internal) JumpTable optimisation applied
  • General improvements to semantic preservation of optimisations on parsers
  • Error semantics adjusted to be consistent with Why does label act only on the first set of hints? mrkkrp/megaparsec#482
  • Better unicode support for error messages
  • The .unexpected, .!, .collectMsg, .guardAgainst, .filter, .collect, .filterMap, .filterOut, and .filterNot combinators now have amended semantics by default, and sets the caret width to be the thing that was parsed. (behaviour needs documentation)

Other Changes

  • Major documentation overhaul, with the entire documentation rewritten and standardised, with examples throughout. This should be much friendlier for newcomers to the library.
  • Changed the versioning policy to be consistent with Scala's SemVer policy: M.m.p represents binary back-compat, source back-compat and then patch. This should make the library far more stable in the wild, and the policy is enforced by CI.
  • More than 4 registers may be now used simultaneously

Internal Changes

  • Simplified Cont to not require a given result type at the operation level, this allows the same instance to be used in different places
  • Split the internal AST into two halves: a fully strict AST called StrictParsley and a partially lazy AST called LazyParsley. This divides the work of processing a tree into a frontend and a backend. This improves performance and allows for more advanced use of mutable structure in the backend. This is also much more maintainable.
  • Moved datastructures into parsley.internal.collections
  • AST normalisation is used, replacing old <|> and *>, <* nodes with Choice and Seq constructors, which can normalise the tree in linear time as opposed to polynomial time.
  • Instructions have been reworked:
    • Many instructions have been renamed to have more meaningful names
    • Stateful instructions have been removed (with the exception of CalleeSave, which is a special case)
    • Call has been much more optimised, and GoSub has been removed as a result: in general the return mechanism is greatly simplified and improved
    • Applied TCO more aggressively across parsers, since stateful instruction preservation is no longer a concern
    • Most handlers have been split into two instructions to remove conditional statements in these instructions
    • Several performance improvements to instructions and code generation across the board
  • The Context#status flag has been replaced by Context#good and Context#running, which allows for a tighter loop in Context#run(). In addition, Context#fail() can only be called when Context#good is already false: this avoids some redundant work being performed by failure handlers which re-fail.

@j-mie6 j-mie6 added the major This change would affect break backwards compatibility label Oct 5, 2021
j-mie6 added 24 commits January 24, 2022 20:11
* Removed redundant parameter in expression, whoops!

* Fixed for scala 3... I think they have bugs with co-variance...

* Modernised the design of the precedence, this new scheme works a bit nicer on Scala 3

* Updated documentation
* Removed strictness on main non-operator combinators

* User API is fully strict now, internal is lazy

* Fixed 2.12

* Unary AST-nodes strict

* Amazing, looks like we've hit a Scala 3 bug... reverting

* uncurried Binary

* fully strict left on Binary
…apshot names are more faithful to the semver. Version names are checked before release
@j-mie6 j-mie6 added this to the Parsley 4 milestone Feb 23, 2022
* Broke out into infix and chain

* Make zipped strict, and added mixed precedence
@j-mie6 j-mie6 merged commit 51ccb79 into master Nov 30, 2022
@j-mie6 j-mie6 deleted the parsley-4 branch November 30, 2022 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major This change would affect break backwards compatibility
Projects
None yet
1 participant