Releases · WorksApplications/Sudachi

Changed the priority of user dictionaries
- If the cost is the same, the words in the dictionary added later will take precedence
Fixed a bug where sentences were incorrectly separated by spaces.
Added a method to dump the internal structure as JSON

Assets 3

13 Mar 07:44

github-actions

v0.5.2

26b6124

Sudachi version 0.5.2

This release includes the following a new feature.

Added IgnoreYomiganaPlugin which removes yomigana in parentheses.
- This feature is enabled by default
- The default length of hiragana characters recognized as reading kana is up to 4 characters
- See sudachi.json for details

$ echo '徳島(とくしま)に行(い)く' | java -jar sudachi-0.5.2.jar
徳島(とくしま)  名詞,固有名詞,地名,一般,*,*     徳島
に      助詞,格助詞,*,*,*,*     に
行(い)く        動詞,非自立可能,*,*,五段-カ行,終止形-一般       行く
EOS

Assets 3

25 Nov 10:00

github-actions

v0.5.1

5a403d7

Sudachi version 0.5.1

This release includes the following new features.

Added synonym group IDs field to user dictionary
Added allowEmptyMorpheme to settings
- Setting this property to false suppresses tokens of length 0
- The default value is true

$ echo … | java -jar sudachi-0.5.1.jar -s '{"allowEmptyMorpheme":false}'
…       補助記号,句点,*,*,*,*   .
…       補助記号,句点,*,*,*,*   .
…       補助記号,句点,*,*,*,*   .
EOS

Assets 3

04 Nov 03:11

kazuma-t

v0.5.0

2f68da2

Sudachi version 0.5.0

This release includes the following new features.

Added synonym group IDs field to use Sudachi Synonym Dictionary
- New dictionary format, but is backwards compatible
Command line output can now be customized via plugins

Assets 2

19 Jun 01:26

kazuma-t

v0.4.3

f26e037

Sudachi version 0.4.3

This release includes a bug fix.

Fix overrun with surrogate pairs

Assets 3

29 May 02:03

kazuma-t

v0.4.2

c1b4d19

Sudachi version 0.4.2

This release includes a bug fix.

Fix buffer overrun with character normalization in Tokenizer#tokenize(Reader)

Assets 3

26 May 07:36

kazuma-t

v0.4.1

1cbf883

Sudachi version 0.4.1

This release includes a new method for sentence boundary detection.

Add Tokenizer#tokenizeSentences(Reader)

Assets 3

05 Apr 04:51

kazuma-t

v0.4.0

2d491bb

Sudachi version 0.4.0

This release includes a new sentence boundary detector and a bug fix.

Add a new sentence boundary detector
- Add Tokenizer#tokenizeSentences
- Add SentenceDetector
- The CLI makes sentence boundary disambiguation
Fix a bug causing normalized characters to be misaligned

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: WorksApplications/Sudachi

Sudachi version 0.6.0-beta2

Sudachi version 0.6.0-beta1

Sudachi version 0.5.3

Sudachi version 0.5.2

Sudachi version 0.5.1

Sudachi version 0.5.0

Sudachi version 0.4.3

Sudachi version 0.4.2

Sudachi version 0.4.1

Sudachi version 0.4.0