Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing Results for W2V2-FS-20ms #40

Open
HildaNya opened this issue Mar 28, 2024 · 0 comments
Open

Reproducing Results for W2V2-FS-20ms #40

HildaNya opened this issue Mar 28, 2024 · 0 comments

Comments

@HildaNya
Copy link

Based on the paper, I've successfully reproduced results for Charsiu's FC-10ms, textless FC-10ms, MFA, WebMaus, but I'm having trouble reproducing the pretrained FS-20ms model.
I first downloaded the charsiu/en_w2v2_fs_10ms from HuggingFace into my working directory.
Then I followed the tutorial for generating alignments. When I try

charsiu = charsiu_forced_aligner(aligner='charsiu/en_w2v2_fs_10ms')

, the results are complete gibberish and are nowhere near the paper's results.

When I try

charsiu = charsiu_attention_aligner(aligner='charsiu/en_w2v2_fs_10ms')

, the results are slightly better, but still not as good as that of the paper's.

My questions are:

  1. Which one of the above lines should I be using when calling the fs_10ms aligner?
  2. Is there perhaps a step I'm missing after downloading the model from HuggingFace?

Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant