Skip to content

Commit

Permalink
Update How to create high-quality offline video transcriptions and su…
Browse files Browse the repository at this point in the history
…btitles using Whisper and Python.md
  • Loading branch information
ookgezellig committed Nov 5, 2024
1 parent 92a7e9c commit 70bd6df
Showing 1 changed file with 9 additions and 15 deletions.
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# How to create high-quality offline video transcriptions and subtitles using Whisper and Python

<image src="media/afbeelding1.png" width="400" hspace="10" align="right"/>
<br clear="all" />

I always thought that 'doing things with AI' was equivalant to smoking data centers, overheated servers, and massive cloud computing power.

Expand Down Expand Up @@ -31,10 +30,10 @@ As I work with ChatGPT regularly, I had heard of [Whisper, OpenAI’s speech-to-

After some research to see if this could suit my ASR (Automatic Speech Recognition) needs, I found out that [this model excels in Dutch](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). But it also performs well in English.

OK, that already sounds promising. But Whisper doesn’t have a user-friendly front end (as far as I know), so I had to work with the API and Python. Fortunately, I found [this short blog post](https://nicobytes.com/blog/en/how-to-use-whisper/) to help me get started, and, combined with the [documentation](https://platform.openai.com/docs/guides/speech-to-text), it was straightforward to set up.

<a href="https://nicobytes.com/blog/en/how-to-use-whisper/" target="_blank"><image src="media/afbeelding2.png" width="400" hspace="10" align="right"/></a>

OK, that already sounds promising. But Whisper doesn’t have a user-friendly front end (as far as I know), so I had to work with the API and Python. Fortunately, I found [this short blog post](https://nicobytes.com/blog/en/how-to-use-whisper/) to help me get started, and, combined with the [documentation](https://platform.openai.com/docs/guides/speech-to-text), it was straightforward to set up.

Further in this article, you’ll read about what I ultimately created with it and find ready-to-use Python code to try it out yourself.

## FFmpeg is needed
Expand Down Expand Up @@ -62,9 +61,7 @@ the ‘large’ model is downloaded to your machine once. (See here for [the ava
Does transcription go reasonably fast? The 'large-v2' model I use operates at about real-time speed, so if the audio is 15 minutes long, transcription takes about 15-20 minutes. The base and medium models are smaller and faster but deliver lower quality.

## And such quality! With subtitles! Even with poor input!
Beyond offline use, I am utterly amazed by the quality of the generated text. I’ll show this best through this (rather dull and quite lengthy) test video where I used myself as the test subject:

https://commons.wikimedia.org/wiki/File:Wikidata_Workshop_-_Theoretical_part_-_Maastricht_University_-_15_October_2024.webm
Beyond offline use, I am utterly amazed by the quality of the generated text. I’ll show this best through this (rather dull and quite lengthy) [test video](https://commons.wikimedia.org/wiki/File:Wikidata_Workshop_-_Theoretical_part_-_Maastricht_University_-_15_October_2024.webm) where I used myself as the test subject:

<image src="media/afbeelding5.png" width="100%" hspace="0" align="left"/>
<br clear="all" />
Expand All @@ -76,29 +73,26 @@ As you can hear in the video, I’m certainly not making an effort to speak clea
And the [subtitles (closed captions)](https://commons.wikimedia.org/wiki/TimedText:Wikidata_Workshop_-_Theoretical_part_-_Maastricht_University_-_15_October_2024.webm.en.srt) you see in the video are also completely generated by Whisper, with the timing spot-on.

## Example code, try it yourself
To share my knowledge and code, I created the GitHub repo

[https://github.com/KBNLresearch/videotools](https://github.com/KBNLresearch/videotools)
To share my knowledge and code, I created the GitHub repo [https://github.com/KBNLresearch/videotools](https://github.com/KBNLresearch/videotools)

The relevant module is [transcribe_audio.py](../transcribe_audio.py), which is run from [runtools.py](../runtools.py), the main function of this repo.

If desired, you can have the audio transcript corrected by ChatGPT. I made an initial setup for that in [ai_correct_audiotranscripts.py](../ai_correct_audiotranscripts.py). To use this, you’ll need an [OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key). But note that you’ll lose the privacy advantage and offline use with this option, as the ChatGPT models are far too large to run on a personal laptop.
If you want, you can have the audio transcript corrected by ChatGPT, for which I made an initial setup in [ai_correct_audiotranscripts.py](../ai_correct_audiotranscripts.py). To use this, you’ll need an [OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key). But please note that you’ll lose the privacy advantage and offline use, as the ChatGPT models are far too large to run on a personal laptop.

And since I was at it, I created a few other video and audio tools that only use FFmpeg, not Whisper or ChatGPT.
As a side product, I also created a few other video and audio tools that only require FFmpeg, without a need for Whisper or ChatGPT.

<image src="media/afbeelding6.png" width="400" hspace="10" align="right"/>
<a href="https://github.com/KBNLresearch/videotools" target="_blank"><image src="media/afbeelding6.png" width="400" hspace="10" align="right"/></a>

## Questions, comments?
Since I’m “just experimenting,” I’d love to hear questions, feedback, tips, etc., on this new piece of AI for me. You can find my contact details below.
Since this was just a first experiment with this new piece of AI for me, I’d love to hear your questions, feedback, tips, etc. You can find my contact details below.

## Similar articles
* [Super efficient! Subtitling or transcribing your video with AI](https://id.nl/huis-en-entertainment/computer-en-gaming/software/superefficient-je-video-ondertitelen-of-transcriberen-met-ai) (in Dutch)

## Licensing
<image src="media/icon_cc0.png" width="100" hspace="10" align="right"/>

All original materials in this repo, expect for the [blog article header]
are released under the [CC0 1.0 Universal license](https://github.com/KBNLwikimedia/GLAMorousToHTML/blob/main/LICENSE), effectively donating all original content to the public domain.
All original materials in this repo, expect for the [blog article header](https://nicobytes.com/blog/en/how-to-use-whisper/) are released under the [CC0 1.0 Universal license](https://github.com/KBNLwikimedia/GLAMorousToHTML/blob/main/LICENSE), effectively donating all original content to the public domain.

## Contact
<image src="media/icon_kb2.png" width="200" hspace="10" align="right"/>
Expand Down

0 comments on commit 70bd6df

Please sign in to comment.