Replies: 2 comments 4 replies
-
The best approach I've found so far is to give a prompt with disfluencies to Whisper. As described for French in the paper "TRANSCRIBING AND ALIGNING CONVERSATIONAL SPEECH: A HYBRID PIPELINE APPLIED TO FRENCH CONVERSATIONS"
The fact that Whisper suppresses disfluencies is that it was mostly trained on transcriptions without disfluencies (typically, subtitles of Youtube). |
Beta Was this translation helpful? Give feedback.
-
Wow, thank you for that answer. Will check out the paper right away. Looks like it's exactly what I'm looking for. Thank you so much! |
Beta Was this translation helpful? Give feedback.
-
From the Readme:
Although the removal of disfluencies by whisper is often a feature, for research purposes, it may well be a disadvantage (even a dealbreaker). What is the best strategy (if any) to stop whisper from cleaning up the transcript?
I saw the discussion about supressing tokens where I learned that
If this also includes at least some disfluencies, like "uhs" and "mmhs", that (i.e. not suppressing -1) would be a step in the right direction, but does it?
Are there any other tricks to make whisper more truthful to the original?
Beta Was this translation helpful? Give feedback.
All reactions