Replies: 1 comment 1 reply
-
I do see ending effects with OpenAI whisper which uses PyTorch as a back-end. E.g. running: import whisper
model = whisper.load_model("turbo")
# load audio and pad/trim it to fit 30 seconds
data = whisper.load_audio("conal.mp3")
data = data[-1_000_000:]
# make log-Mel spectrogram and move to the same device as the model
result = model.transcribe(data)
print(result["text"]) gives the following where the last few sentences are just gibberish:
One thing that helps is setting result = mlx_whisper.transcribe(
"conal.mp3",
path_or_hf_repo="mlx-community/whisper-large-v3-turbo",
condition_on_previous_text=False,
) the ending is much better:
This is anecdotally in line with what I've heard before which is that the conditioning can be more accurate sometimes but also cause repetitions in the output. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
My transcription task of this audio file with this mlx conversion of Whisper-v3-turbo via mlx-whisper concluded inaccurately with the word "Yeah" repeated 27x.
I wondered where this token repetition came from and @awni suggested I start a discussion to "verify the result by running the same audio through the transformers PyTorch implementation as a test"
Here is the code I used:
Here is the tail end of the transcription for the audio that concludes with "oh yeah"
Beta Was this translation helpful? Give feedback.
All reactions