Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi 👋,
Thank you for continuously adding more features to the Whisper distillation code!
As I reviewed the section on prepending previous text during the preparation of training data, I made the following adjustments based on my interpretation:
decoder_prev_token_id
to the end to ensure it's always triggered, even whenprev_ids
aren't cut by the previous two conditionslen(prev_ids + token_ids) + 1
, which now includesdecoder_prev_token_id
since it's always addedprev_ids
from thetrim_length
calculation. For instance, with 3prev_ids
and 3token_ids
and amax_label_length
of 6, we should retain only the last 2 tokens inprev_ids
, calculated asmax_label_length - len(token_ids) - 1
= 6 - 3 - 1 = 2