Skip to content

Don't understand the warning when inputing an np array #28

Discussion options

You must be logged in to vote

Most audio aren't 16 kHz, so if you're passing in a numpy array into transcribe, it has most likely been resampled to 16 kHz. But a minor shift happens on when using _load_audio_waveform to generate the waveform image from the resampled audio which may effect the accuracy of the suppression.

import whisper
from stable_whisper import _load_audio_waveform
audio_path = 'audio.mp3'
audio_array = whisper.load_audio(audio_path) # resampled to 16 kHz
original_waveform_img = _load_audio_waveform(audio_path, 100, 10000)
resampled_waveform_img = _load_audio_waveform(audio_array, 100, 10000)

portion of the images:

The bottom (resampled) is slightly shifted to the right. Haven't extensively tested i…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@Ca-ressemble-a-du-fake
Comment options

Answer selected by Ca-ressemble-a-du-fake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants