insanely-fast-whisper backend #122

marziye-A · 2024-09-19T12:57:01Z

hi ,thanks for your great work!
i want to use the streaming mode with insanely fast whisper backend. i am adding this backend but i don't know what is the "ts_words" function? what is its utility and what it takes as input ?does the output of the whisper backend need to have timestamps?

can you please help me to understand this function?
any help is really appreciated.

Gldkslfmsd · 2024-09-20T05:59:32Z

hi, thanks. Why do you need insanely fast whisper? As far as I know, it uses faster-whisper, same as ours.

What ts_word function do you mean? can you give link to the line where it is specified?

And yes, whisper-streaming needs word-level timestamps.

marziye-A · 2024-09-22T14:14:25Z

thank you for your answer.
i think it doesn't use the faster whisper backend. its based on huggingface transformers and flash attention.

it is in this line for faster-whisper backend:

whisper_streaming/whisper_online.py

Line 138 in 225f038

def ts_words(self, segments):

it is in this line for openai whisper backend:

whisper_streaming/whisper_online.py

Line 185 in 225f038

def ts_words(self, segments):

and i want to implement this function for faster whisper backend.

Gldkslfmsd · 2024-09-23T07:14:32Z

Alright. ts_words is quite poorly documented here:

whisper_streaming/whisper_online.py

Line 80 in 225f038

# return: transcribe result object to [(beg,end,"word1"), ...]

. It converts the object that comes from the transcribe function into an object that is the same for all backends -- a list of tuples (beg, end, word) where beg and end are floats -- seconds from beginning of the recording, in which the word was uttered.
Word is string. In faster-whisper, it may be a subword, like "space-delimited" can be in two parts: " space" and "-delimited", they should not be joined with a space:

whisper_streaming/whisper_online.py

Line 31 in 225f038

    
           sep = " "   # join transcribe words with this character (" " for whisper_timestamped,

Gldkslfmsd · 2024-09-23T07:19:26Z

i think it doesn't use the faster whisper backend. its based on huggingface transformers and flash attention.

OK. I think the speed in insanely-fast-whisper is because of using large memory and batching. It's applicable only to the offline mode, you can chunk the whole long recording into small pieces and process them in parallel. In streaming mode, you can use batching like #55 and #42. It should speed a little but not too much.

But anyway, feel free to try it and share your latency-quality test results compared to faster-whisper. Or make a PR and I may do the test.

Gldkslfmsd mentioned this issue Sep 30, 2024

Introducing of whisper-jax #126

Closed

Gldkslfmsd closed this as completed Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

insanely-fast-whisper backend #122

insanely-fast-whisper backend #122

marziye-A commented Sep 19, 2024 •

edited

Loading

Gldkslfmsd commented Sep 20, 2024

marziye-A commented Sep 22, 2024

Gldkslfmsd commented Sep 23, 2024

Gldkslfmsd commented Sep 23, 2024

insanely-fast-whisper backend #122

insanely-fast-whisper backend #122

Comments

marziye-A commented Sep 19, 2024 • edited Loading

Gldkslfmsd commented Sep 20, 2024

marziye-A commented Sep 22, 2024

Gldkslfmsd commented Sep 23, 2024

Gldkslfmsd commented Sep 23, 2024

marziye-A commented Sep 19, 2024 •

edited

Loading