Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix previous text prepending #142

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bofenghuang
Copy link
Contributor

Hi 👋,

Thank you for continuously adding more features to the Whisper distillation code!

As I reviewed the section on prepending previous text during the preparation of training data, I made the following adjustments based on my interpretation:

  1. Moved the prepending of decoder_prev_token_id to the end to ensure it's always triggered, even when prev_ids aren't cut by the previous two conditions
  2. Updated the total length check to len(prev_ids + token_ids) + 1, which now includes decoder_prev_token_id since it's always added
  3. Removed prev_ids from the trim_length calculation. For instance, with 3 prev_ids and 3 token_ids and a max_label_length of 6, we should retain only the last 2 tokens in prev_ids, calculated as max_label_length - len(token_ids) - 1 = 6 - 3 - 1 = 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant