-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[eval] fix check for language arg #139
base: main
Are you sure you want to change the base?
Conversation
# We need to read the audio files as arrays and tokenize the targets. | ||
audio_column_name = data_args.audio_column_name | ||
language = language_to_id(data_args.language, model.generation_config) if data_args.language else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue previously was that we could have passed the language
arg for an English-only checkpoint, which is invalid. We now check this case first, and then set the normalizer afterwards
elif data_args.language is not None: | ||
raise ValueError( | ||
"Setting language token for an English-only checkpoint is not permitted. The language argument should " | ||
"only be set for multilingual checkpoints." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Informative error that will be thrown if language
is passed but the checkpoint is English-only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solve this edge case but also improves readability, thanks @sanchit-gandhi !
Fixes distil-medium.en/discussions/14 by re-ordering the set-up steps such that we:
Doing 1 before 2 means we can be certain here that if the language arg is passed, we are dealing with a multilingual checkpoint.