Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add readings for cmn/Mandarin #150

Open
evertedsphere opened this issue Oct 8, 2024 · 5 comments
Open

Add readings for cmn/Mandarin #150

evertedsphere opened this issue Oct 8, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@evertedsphere
Copy link

This should be split out into new downloads specifically for Mandarin (cmn-*) instead of defaulting to this in the zh-* downloads.

@StefanVukovic99 StefanVukovic99 added the enhancement New feature or request label Nov 2, 2024
@shiki-tm
Copy link

shiki-tm commented Nov 9, 2024

If readings get added to Mandarin, could you also release one with Zhuyin readings please!? There are people (legends) who prefer to study with Zhuyin.
よろしくお願いします

@shiki-tm
Copy link

Screenshot_20241110_200033_Chrome
One thing I just noticed on the website is sometimes there's a 2nd reading separatd by the comma. The 2nd reading is the Taiwan reading (I've checked 3 examples so far) and the 1st reading is like the standard Beijing version. So it would make sense for the zhuyin version to have the 2nd readings in my opinion sir. I don't know if any entries have more than 2 readings though.

@shiki-tm
Copy link

sorry i forgot to suggest one thing to you about zhuyin. you see the neutral tone marker in picture above before 一, its not that common in taiwan readings but for the sake of matching readings with other dicts (ran through my modified pinyin conversion script of rudnam's) i think its best if you can have the neutral tone marker after the syllable, just like every other tone marker.
Im sure you and most people dont care about this but if you need me to explain my argument more i think i have a very good argument against this standard placement of the marker (from taiwan's MOE). if it can be done. sorry to complicate this. of course i can just run this function myself after downloading the dict file but it would be great if its in the auto releases. thanks to there being spaces between the syllables, this is how I change the placement of this marker :
const adjustNeutralToneMarker = (reading: string) => {
// Match the neutral tone marker at the beginning of a syllable and move it to the end
return reading.replace(/(\u02d9)([\u3105-\u312F]+)(\s|$)/g, '$2$1$3');
};

Also I agree that Mandarin entries should probably be split up from the other chinese languages, if thats what this post was originally about. im very interested in some like hakka, hokkien, and cantonese for example. though if later on we could manually combine mandarin with other languages id like to know how.

One other thing about the 2nd readings. I found one example where the 2nd reading isnt actually used in taiwan (or in china standard) as far as i know: 鑰匙. If i had to guess its like some very old original reading. maybe used dialectical somewhere idk. i think it just so happens that the older original readings tend to be the reading used in taiwan...something like that. either way the 2nd readings are much more useful for taiwanese mandarin learners overall (also using zhuyin). if i find many more exceptions ill keep track of them.

this is weird coming from someone who studies traditional characters but ive come to think its useful to also have headwords that have simplified characters version because you never know what youll find in immersion or online site comments so i wanted to ask if its not too much extra file size if you could add simplified headwords too please. this dictionary is especially useful for slang and that sort of stuff that isnt in typical dicts.

thank you for considering~

@shiki-tm
Copy link

Okay sorry I just realized that you can press on the more button and it'll show you more info about the readings on Wiktionary.

It says Taiwan or variant in Taiwan so that's a way more reliable way of knowing what to match with the zhuyin readings version instead of just adding all 2nd readings.
Screenshot_20241129_115806_Chrome
Screenshot_20241129_115653_Chrome
I also found this entry that has 3 readings, but down in the details it says they're like Beijing variants. So in the case of no Taiwan reading found it's fine to just have the standard Chinese/mandarin (1st) reading.
Screenshot_20241129_115354_Chrome

@shiki-tm
Copy link

I apologize I have more suggestions/requests for the mandarin dict. I thought might as well add them in this thread.

There's 4 sections I've found that can be useful but seem to not be in the current dict.
"Compounds" "Etymology" "alternative forms" "Coordinate terms". There can be more than one etymology per meaning I think.

Screenshot_20241221_001319_Chrome
Screenshot_20241221_002138_Chrome
Screenshot_20241221_002350_Chrome

I think they'd be useful to have. If possible.
If some of these sections have to have their font made smaller or something with css to not make the entry too bloated then I think that would still work great.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants