Sparv Pipeline v4.1.0
Note: Sparv v4.1.0 was released on PyPI on April 14, 2021 but we forgot to create a GitHub release then... 🤦🙈
Documentation: https://spraakbanken.gu.se/sparv
Added
- New preload functionality for preloading annotators to speed up annotation process.
- Added verbose mode for progress bar, showing all concurrently running tasks (by using the
-v
flag, mainly usable
together with the-j
flag for multiprocessing). - Ability to limit the number of parallel processes used by specific annotators.
- Source document names are now shown in error messages.
- Added exporter configuration to wizard.
- Added several new configuration options for Stanza, and helpful error message to help mitigate memory problems.
- An error message is now displayed when attempting to run annotations without input files.
- Added new command (
languages
) to show a list of supported languages. - You can now refer to models outside the Sparv data dir.
- Added importers section to
sparv run-rule --list
. - Class values inferred from annotation usage is now shown when running
sparv classes
. - Dry-running (
sparv run -n
) now shows a summary of tasks.
Changed
- Improved progress bar. Shows number of tasks completed and left, instead of estimated time (which wasn't very
helpful). - Slightly quicker startup time.
- Malt and Stanza no longer perform dependency parsing on tokens not belonging to any sentences.
- The
build-models
command no longer builds all models by default unless the--all
flag is used. - Regular annotators used as
custom_annotations
are now configured usingconfig
instead ofparams
. - Updated and improved documentation.
Fixed
- Fixed broken combined XML export.
- Fixed several problems with the Stanza module.
- MySQL tables now support all unicode characters (by using the utf8mb4 charset).
- Fixed support for retaining existing segments in segment module.
- Fixed crash in SALDO module due to orphaned tokens.
- Fixed unicode normalization in XML import module.
- Removed broken unused models from
build-models
. - Fixed YAML syntax highlighting which was unreadable in some terminals.
- Fixed rare TreeTagger crash.
- Fixed some bugs in Stanford module.