Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Yezichak - other languages #422

Closed

Conversation

StefanVukovic99
Copy link
Collaborator

As discussed starting with here, this is a working approach to supporting more languages. Needs a bit more work to catch up with the type annotations and tests, but it's in a good place to start.

StefanVukovic99 and others added 30 commits March 27, 2023 14:48
* minimum viable product

* add english adjective comparative and superlative

* add english g-dropping inflection

* english word formation suffix -y

* prefix deinflections

* english phrasal verbs
*/
function createPhrasalVerbInflectionsFromRules(sourceRules){
return sourceRules.map(({inflected, deinflected}) => {
const inflectedSuffix = inflected.source.replace('.*', '').replace('$', '');

Check failure

Code scanning / CodeQL

Incomplete string escaping or encoding High

This replaces only the first occurrence of '$'.
@djahandarie
Copy link
Collaborator

Amazing, thanks so much for working on this and trying to upstream it. I think it's worth supporting multi languages simply due to how many people around the world can benefit from it. It will necessarily complicate the codebase a bit but let's see how much we can minimize that.

I guess the PR is still a bit in progress to match the newest lints on yomitan, and might not need detailed review yet, but if @toasted-nutbread has any overall architectural comments it'd probably be good to get those now.

Also, if you could include in the PR body the outline of how you changed things at a high level, I think that would help reviewers too!

It will probably require a bit of discussion to get this merged but let's try and do it! 🦾

Copy link

github-actions bot commented Dec 22, 2023

⚠️ Visual differences introduced by this PR; please validate if they are desirable.

View Playwright Report (note: open the "playwright-report" artifact)

@toasted-nutbread
Copy link

I've not done a full review, but I would very much like to see something like this broken down into smaller parts. This looks to be a very large commit presently, which may make it difficult to review. I haven't gone in depth into this yet, but there seems to be a lot of changes that could be performed as smaller feature branches. For example, the change of fetchJson utilities should be its own feature branch, there are a ton of changes to the readme, etc.

@toasted-nutbread toasted-nutbread self-requested a review December 22, 2023 16:36
@StefanVukovic99
Copy link
Collaborator Author

StefanVukovic99 commented Dec 22, 2023

Thank you for your effort in reviewing this. There are indeed multiple features in there, but due to how the projects have diverged I am unsure how to split this.

I'll update the readme so that it matches yomitan's more closely.

Trying to enumerate the features:

  1. Added a language setting
  2. Made the deinflection rules more general. The japanese rules all only replace suffixes, richer deinflection is useful for other languages.
  3. Added a setting to opt out of using the rule flags on deinflections, some dictionaries may not have them. Added more parts of speech to rule flags.
  4. added another term meta type for IPA (maybe should be added to v3 instead of making a v4 meta schema)
  5. Started abstracting language-specific features into the language folder. LanguageUtil is supposed to be a wrapper around them, ensuring that only the necessary deinflection reasons and text processing is done.
  6. adapted from seth - dictionary deinflection (allowing entries to redirect to other entries). (also an irrelevant feature where holding the alt key when clicking the speaker plays audio for transformedText)
  7. Extracting the duplicate fetchAsset/Json/Text calls (unimportant)
  8. added an option to opt out of dictionary or algorithm deinflection, if a user doesn't like the deinflector for a language or the deinflections from his dicts.
  9. The "text transformations" for japanese - replace katakana with hiragana and such - have mostly been abstracted and textTransformations for other languages (handling letter case/diacritics/orthography) added
  10. added "search resolution setting" - in languages where spaces are used there might be no need to search every substring, allowing bigger scan length while keeping performance.
  11. started localization, added setting for that
  12. moved more settings to advanced settings, to make it easier for beginner users

@toasted-nutbread
Copy link

My recommendation would be to try to isolate individual features or changes based on what they are doing. I am not sure what your or other's git workflow might look like, but were I to do this, I would basically be starting feature branches off of yomitan/master and cherry-picking commits for certain features, or manually copying them over. You can usually do this by thinking about things as building blocks and what the dependencies are, starting with the ones with no dependencies.

Part of this is what you listed in your comment above, and I could add some other places where individual "feature" branches should be used. For example, I find it helpful for source code cosmetic/refactoring changes to be branches of their own. This would include things like newline and whitespace changes which otherwise don't contribute semantic difference, folder structure changes (e.g. language=>dictionary), and so on. I could probably review and comment on the places that I see this.

My rationale behind this also stems from my rationale on why squash commits should be used when merging; #342. Squashing this entire PR in its current state would be entirely too large and would make it very difficult to follow the history easily or debug any new issues. On the other hand, 150+ commits should not be added directly to the master branch, as a great deal of the commits are irrelevant noise to the final product, and in general that makes it more difficult to understand and deal with project history. Therefore this points in the direction of requiring more isolation of the different parts which comprise the current PR in its overall state, i.e. multiple PRs.

@StefanVukovic99
Copy link
Collaborator Author

Ok, let's try and do it in chunks. first up - #429

@djahandarie
Copy link
Collaborator

One basic question: are multiple languages usable at the same time, or do you need to set the entire instance to be in one language mode? I think being able to use multiple at the same time without needing to switch modes would be a good UX, but not sure if there are practical complications with doing this.

@StefanVukovic99
Copy link
Collaborator Author

One basic question: are multiple languages usable at the same time, or do you need to set the entire instance to be in one language mode? I think being able to use multiple at the same time without needing to switch modes would be a good UX, but not sure if there are practical complications with doing this.

For now it's one language per profile, since settings tend to differ between languages, and for performance reasons. Switching profiles is pretty quick, and the profile conditions can also be used.

@Casheeew
Copy link
Member

For now it's one language per profile, since settings tend to differ between languages, and for performance reasons. Switching profiles is pretty quick, and the profile conditions can also be used.

Would be nice to have more profile modifier keys, #408; especially as the number of profiles will blow up as one might want to add more languages.

@Casheeew Casheeew added kind/enhancement The issue or PR is a new feature or request area/linguistics The issue or PR is related to linguistics area/dictionary-format The issue or PR is related to dictionary formatting labels Feb 27, 2024
@StefanVukovic99
Copy link
Collaborator Author

Only took us three months, but we got everything! Big thanks to @toasted-nutbread, @djahandarie, @Scrub1492 (and anyone else who helped)! Looking forward to keep working with you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dictionary-format The issue or PR is related to dictionary formatting area/linguistics The issue or PR is related to linguistics kind/enhancement The issue or PR is a new feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants