You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, I am trying to get the Refiner to work with MLX Whisper, but it seems to expect a different inference function to the one used for transcribing, a function that takes text_tokens as a parameter. I can't find an MLX Whisper function that takes text_tokens as input, so is it possible to avoid using a different inference function?
Currently I get this error:
stable_whisper/non_whisper/refinement.py", line 291, in get_prob token_probs: torch.Tensor = self.inference_func(audio_segment, text_tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^TypeError: inference() takes 1 positional argument but 2 were given
Refinement adjusts the timestamps of transcribed words based how the confidence scores changes from changing the audio source. So it needs to be a function that takes in specific audio segments and words/tokens and output confidence scores for those words/tokens with respect to the audio segment.
It needs low level access to the model. So mlx_whisper.transcribe() will not work because it takes in audio and outputs different words and timestamps with different audio inputs.
Hi there, I am trying to get the Refiner to work with MLX Whisper, but it seems to expect a different inference function to the one used for transcribing, a function that takes text_tokens as a parameter. I can't find an MLX Whisper function that takes text_tokens as input, so is it possible to avoid using a different inference function?
Currently I get this error:
My code:
The text was updated successfully, but these errors were encountered: