Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

希望在模型预训练中加入单词本 #2086

Open
Crestina2001 opened this issue Feb 20, 2025 · 5 comments
Open

希望在模型预训练中加入单词本 #2086

Crestina2001 opened this issue Feb 20, 2025 · 5 comments

Comments

@Crestina2001
Copy link

因为在推理时,遇到中英夹杂的场景,往往需要对单个单词进行发音,而由于在训练中可能缺少这一部分,导致中英夹杂的场景下单词读法很奇怪

@XiongKexin
Copy link

+1,同样发现中英夹杂的场景可能发音奇怪,尤其是中文和单个英文字母,比如说“A4纸”,“C1D驾照”这样的词,可能出现字母前停顿,或者字母吞音的状况。想请教,一般通过什么方法能够改善呢?

@foreverhell
Copy link

same issue

@WyntalGeer
Copy link

same issue+1

@mondorysix
Copy link

codeswitching时,字母开头会被language segment 为英文,此时触发cleaner.py L46:47处的条件,在前面增加了一个逗号。

Image

+1,同样发现中英夹杂的场景可能发音奇怪,尤其是中文和单个英文字母,比如说“A4纸”,“C1D驾照”这样的词,可能出现字母前停顿,或者字母吞音的状况。想请教,一般通过什么方法能够改善呢?

@KamioRinn
Copy link
Contributor

codeswitching时,字母开头会被language segment 为英文,此时触发cleaner.py L46:47处的条件,在前面增加了一个逗号。

之前的版本删掉后推单字母会因为太短吧出现各种奇怪的问题。新的我再看看

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants