v0.4.0
New:
- Context rules and tagging (https://github.com/knaw-huc/golden-agents-htr#7): allows specifying regular-expression like patterns to match entities spanning multiple 'words'
- Allow choosing unicode codepoints for offsets instead of UTF-8 byte offsets (#15)
Bugfixes:
- use lowest frequency of either variant or target when using variant lists (https://github.com/knaw-huc/golden-agents-htr#15)
- Allow out-of-vocabulary words in LM; not everything that's in the lexicons has to necessarily also in the LM