Skip to content

Sparv Pipeline v4.1.0

Compare
Choose a tag to compare
@anne17 anne17 released this 23 Aug 11:37
· 440 commits to master since this release

Note: Sparv v4.1.0 was released on PyPI on April 14, 2021 but we forgot to create a GitHub release then... 🤦🙈

Documentation: https://spraakbanken.gu.se/sparv

Added

  • New preload functionality for preloading annotators to speed up annotation process.
  • Added verbose mode for progress bar, showing all concurrently running tasks (by using the -v flag, mainly usable
    together with the -j flag for multiprocessing).
  • Ability to limit the number of parallel processes used by specific annotators.
  • Source document names are now shown in error messages.
  • Added exporter configuration to wizard.
  • Added several new configuration options for Stanza, and helpful error message to help mitigate memory problems.
  • An error message is now displayed when attempting to run annotations without input files.
  • Added new command (languages) to show a list of supported languages.
  • You can now refer to models outside the Sparv data dir.
  • Added importers section to sparv run-rule --list.
  • Class values inferred from annotation usage is now shown when running sparv classes.
  • Dry-running (sparv run -n) now shows a summary of tasks.

Changed

  • Improved progress bar. Shows number of tasks completed and left, instead of estimated time (which wasn't very
    helpful).
  • Slightly quicker startup time.
  • Malt and Stanza no longer perform dependency parsing on tokens not belonging to any sentences.
  • The build-models command no longer builds all models by default unless the --all flag is used.
  • Regular annotators used as custom_annotations are now configured using config instead of params.
  • Updated and improved documentation.

Fixed

  • Fixed broken combined XML export.
  • Fixed several problems with the Stanza module.
  • MySQL tables now support all unicode characters (by using the utf8mb4 charset).
  • Fixed support for retaining existing segments in segment module.
  • Fixed crash in SALDO module due to orphaned tokens.
  • Fixed unicode normalization in XML import module.
  • Removed broken unused models from build-models.
  • Fixed YAML syntax highlighting which was unreadable in some terminals.
  • Fixed rare TreeTagger crash.
  • Fixed some bugs in Stanford module.