Releases: WorksApplications/Sudachi
Releases · WorksApplications/Sudachi
Sudachi version 0.6.0-beta2
v0.6.0-beta2 version -> 0.6.0-beta2
Sudachi version 0.6.0-beta1
Pre-relesease of 0.6.0
Sudachi version 0.5.3
This release includes the following new features and a bug fix.
- Changed the priority of user dictionaries
- If the cost is the same, the words in the dictionary added later will take precedence
- Fixed a bug where sentences were incorrectly separated by spaces.
- Added a method to dump the internal structure as JSON
Sudachi version 0.5.2
This release includes the following a new feature.
- Added
IgnoreYomiganaPlugin
which removes yomigana in parentheses.- This feature is enabled by default
- The default length of hiragana characters recognized as reading kana is up to 4 characters
- See sudachi.json for details
$ echo '徳島(とくしま)に行(い)く' | java -jar sudachi-0.5.2.jar
徳島(とくしま) 名詞,固有名詞,地名,一般,*,* 徳島
に 助詞,格助詞,*,*,*,* に
行(い)く 動詞,非自立可能,*,*,五段-カ行,終止形-一般 行く
EOS
Sudachi version 0.5.1
This release includes the following new features.
- Added synonym group IDs field to user dictionary
- Added
allowEmptyMorpheme
to settings- Setting this property to false suppresses tokens of length 0
- The default value is true
$ echo … | java -jar sudachi-0.5.1.jar -s '{"allowEmptyMorpheme":false}'
… 補助記号,句点,*,*,*,* .
… 補助記号,句点,*,*,*,* .
… 補助記号,句点,*,*,*,* .
EOS
Sudachi version 0.5.0
This release includes the following new features.
- Added synonym group IDs field to use Sudachi Synonym Dictionary
- New dictionary format, but is backwards compatible
- Command line output can now be customized via plugins
Sudachi version 0.4.3
This release includes a bug fix.
- Fix overrun with surrogate pairs
Sudachi version 0.4.2
This release includes a bug fix.
- Fix buffer overrun with character normalization in
Tokenizer#tokenize(Reader)
Sudachi version 0.4.1
This release includes a new method for sentence boundary detection.
- Add
Tokenizer#tokenizeSentences(Reader)
Sudachi version 0.4.0
This release includes a new sentence boundary detector and a bug fix.
- Add a new sentence boundary detector
- Add
Tokenizer#tokenizeSentences
- Add
SentenceDetector
- The CLI makes sentence boundary disambiguation
- Add
- Fix a bug causing normalized characters to be misaligned