Kyujitai / Shinjitai Text preprocessors #1357

Casheeew · 2024-08-26T04:33:36Z

This PR adds a Kyujitai (旧字体) to Shinjitai (新字体) text preprocessor, which is useful when reading older texts.

Based on https://github.com/DrTurnon/kyujipy/blob/master/kyujipy/basic_converter.py

This PR does not include transformations caused by the 同音による書き換え reform
it does not include 俗字, 別体, 誤字 or other uncommon forms/variants.

Kuuuube

Mentioned in discord that this probably shouldnt use regex but after testing it looks like this is the right way to go. Any other way I could think of handling this benchmarked much slower. Probably hitting a sweet spot in browser optimization there for the number of possible replaces that are required here.

ext/js/language/ja/shinjitai-converter.js

Co-authored-by: Kuuuube <[email protected]> Signed-off-by: Cashew <[email protected]>

djahandarie · 2024-10-12T03:55:01Z

In the case there is a direct match on the kyuujitai prior to conversion, it shows that first, right?

Casheeew · 2024-10-12T06:24:12Z

In the case there is a direct match on the kyuujitai prior to conversion, it shows that first, right?

Yes, thats right. That is true for preprocessors in general.
(This PR is currently waiting for @Lyroxide to process more data and move the entire kyuji-shinji converter into a separate library)

codspeed-hq · 2024-10-13T01:34:19Z

CodSpeed Performance Report

Merging #1357 will not alter performance

_{Comparing Casheeew:shinji-preprocessor (ccd0225) with master (6496b68)}

Summary

✅ 5 untouched benchmarks

add shinji preprocessor

5cf3beb

Casheeew requested a review from a team as a code owner August 26, 2024 04:33

Casheeew added 2 commits August 26, 2024 11:40

fix lint

d5358b9

fix eslint test

23e58cc

Casheeew marked this pull request as draft August 26, 2024 11:14

Kuuuube added kind/enhancement The issue or PR is a new feature or request area/linguistics The issue or PR is related to linguistics labels Aug 26, 2024

Kuuuube reviewed Aug 26, 2024

View reviewed changes

ext/js/language/ja/shinjitai-converter.js Outdated Show resolved Hide resolved

Casheeew and others added 2 commits September 18, 2024 13:49

use nullish coalescing

e60e8eb

Co-authored-by: Kuuuube <[email protected]> Signed-off-by: Cashew <[email protected]>

add more data sources

b00d2f4

Casheeew marked this pull request as ready for review September 18, 2024 06:35

fix style

ccd0225

Casheeew marked this pull request as draft October 12, 2024 06:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kyujitai / Shinjitai Text preprocessors #1357

Kyujitai / Shinjitai Text preprocessors #1357

Casheeew commented Aug 26, 2024 •

edited

Loading

Kuuuube left a comment

djahandarie commented Oct 12, 2024

Casheeew commented Oct 12, 2024

codspeed-hq bot commented Oct 13, 2024 •

edited

Loading

Kyujitai / Shinjitai Text preprocessors #1357

Are you sure you want to change the base?

Kyujitai / Shinjitai Text preprocessors #1357

Conversation

Casheeew commented Aug 26, 2024 • edited Loading

Kuuuube left a comment

Choose a reason for hiding this comment

djahandarie commented Oct 12, 2024

Casheeew commented Oct 12, 2024

codspeed-hq bot commented Oct 13, 2024 • edited Loading

CodSpeed Performance Report

Merging #1357 will not alter performance

Summary

Casheeew commented Aug 26, 2024 •

edited

Loading

codspeed-hq bot commented Oct 13, 2024 •

edited

Loading