Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model throws ValueError because of audio files #1

Open
kRichard32 opened this issue Sep 19, 2024 · 2 comments
Open

Model throws ValueError because of audio files #1

kRichard32 opened this issue Sep 19, 2024 · 2 comments

Comments

@kRichard32
Copy link

image

Because whisper expects the mel input features to be of length 3000, the whisper model throws exceptions. This was a quick workaround I implemented, but there's probably a better way of doing things...

@AnfengXu136
Copy link
Collaborator

AnfengXu136 commented Sep 20, 2024

Thank you for raising the issue.
Did you try transformers==4.30.2 (as in requirements.txt)?
For the quick start with 10s input audio, we noticed the issue when using a more recent transformer version, but it should work on the older transformer version.

However, for anyone wishing to train the model with variable input length larger than 30s, this walkaround with padding to 30s can work. But I believe the positional embedding replacement codes must be commented out.
I will keep this issue open for people to reference. Thank you for pointing this out.

@kRichard32
Copy link
Author

kRichard32 commented Sep 20, 2024

Ah, I was using python3.12 so I had to use a more recent transformers version. Thanks for the help.

Edit:
In order to support smaller audio files, the code in here is not enough. Variable tmp_length (line 212 of the file) will also need to be set to 1500 (instead of self.get_feat_extract_output_lengths(len(x[0]))) in order to avoid size mismatch between tensors in the encoder and decoder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants