Don't understand the warning when inputing an np array #28
-
Hi, First thanks for this piece of code! Everytime I call I don't understand because Could you explain what I should do ? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Most audio aren't 16 kHz, so if you're passing in a numpy array into import whisper
from stable_whisper import _load_audio_waveform
audio_path = 'audio.mp3'
audio_array = whisper.load_audio(audio_path) # resampled to 16 kHz
original_waveform_img = _load_audio_waveform(audio_path, 100, 10000)
resampled_waveform_img = _load_audio_waveform(audio_array, 100, 10000) portion of the images: The bottom (resampled) is slightly shifted to the right. Haven't extensively tested if it affects the accuracy much. So to provide model.transcribe(audio_array, audio_for_mask=audio_path)
# or
with open(audio_path, 'rb') as f:
audio_bytes = f.read()
model.transcribe(audio_array, audio_for_mask=audio_bytes) Since shift is minor, it may likely not make much a difference. So you can also ignore warnings: import warnings
warnings.filterwarnings('ignore') |
Beta Was this translation helpful? Give feedback.
Most audio aren't 16 kHz, so if you're passing in a numpy array into
transcribe
, it has most likely been resampled to 16 kHz. But a minor shift happens on when using_load_audio_waveform
to generate the waveform image from the resampled audio which may effect the accuracy of the suppression.portion of the images:
The bottom (resampled) is slightly shifted to the right. Haven't extensively tested i…