Update How to create high-quality offline video transcriptions and su…

…btitles using Whisper and Python.md
ookgezellig · Nov 5, 2024 · 70bd6df · 70bd6df
1 parent 92a7e9c
commit 70bd6df
Showing 1 changed file with 9 additions and 15 deletions.
diff --git a/...-quality offline video transcriptions and subtitles using Whisper and Python.md b/...-quality offline video transcriptions and subtitles using Whisper and Python.md
@@ -1,7 +1,6 @@
 # How to create high-quality offline video transcriptions and subtitles using Whisper and Python
 
 <image src="media/afbeelding1.png" width="400" hspace="10" align="right"/>
-<br clear="all" />
 
 I always thought that 'doing things with AI' was equivalant to smoking data centers, overheated servers, and massive cloud computing power.
 
@@ -31,10 +30,10 @@ As I work with ChatGPT regularly, I had heard of [Whisper, OpenAI’s speech-to-
 
 After some research to see if this could suit my ASR (Automatic Speech Recognition) needs, I found out that [this model excels in Dutch](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). But it also performs well in English.
 
-OK, that already sounds promising. But Whisper doesn’t have a user-friendly front end (as far as I know), so I had to work with the API and Python. Fortunately, I found [this short blog post](https://nicobytes.com/blog/en/how-to-use-whisper/) to help me get started, and, combined with the [documentation](https://platform.openai.com/docs/guides/speech-to-text), it was straightforward to set up.
-
 <a href="https://nicobytes.com/blog/en/how-to-use-whisper/" target="_blank"><image src="media/afbeelding2.png" width="400" hspace="10" align="right"/></a>
 
+OK, that already sounds promising. But Whisper doesn’t have a user-friendly front end (as far as I know), so I had to work with the API and Python. Fortunately, I found [this short blog post](https://nicobytes.com/blog/en/how-to-use-whisper/) to help me get started, and, combined with the [documentation](https://platform.openai.com/docs/guides/speech-to-text), it was straightforward to set up.
+
 Further in this article, you’ll read about what I ultimately created with it and find ready-to-use Python code to try it out yourself.
 
 ## FFmpeg is needed
@@ -62,9 +61,7 @@ the ‘large’ model is downloaded to your machine once. (See here for [the ava
 Does transcription go reasonably fast? The 'large-v2' model I use operates at about real-time speed, so if the audio is 15 minutes long, transcription takes about 15-20 minutes. The base and medium models are smaller and faster but deliver lower quality.
 
 ## And such quality! With subtitles! Even with poor input!
-Beyond offline use, I am utterly amazed by the quality of the generated text. I’ll show this best through this (rather dull and quite lengthy) test video where I used myself as the test subject:
-
-https://commons.wikimedia.org/wiki/File:Wikidata_Workshop_-_Theoretical_part_-_Maastricht_University_-_15_October_2024.webm
+Beyond offline use, I am utterly amazed by the quality of the generated text. I’ll show this best through this (rather dull and quite lengthy) [test video](https://commons.wikimedia.org/wiki/File:Wikidata_Workshop_-_Theoretical_part_-_Maastricht_University_-_15_October_2024.webm) where I used myself as the test subject:
 
 <image src="media/afbeelding5.png" width="100%" hspace="0" align="left"/>
 <br clear="all" />
@@ -76,29 +73,26 @@ As you can hear in the video, I’m certainly not making an effort to speak clea
 And the [subtitles (closed captions)](https://commons.wikimedia.org/wiki/TimedText:Wikidata_Workshop_-_Theoretical_part_-_Maastricht_University_-_15_October_2024.webm.en.srt) you see in the video are also completely generated by Whisper, with the timing spot-on.
 
 ## Example code, try it yourself
-To share my knowledge and code, I created the GitHub repo
-
-[https://github.com/KBNLresearch/videotools](https://github.com/KBNLresearch/videotools)
+To share my knowledge and code, I created the GitHub repo [https://github.com/KBNLresearch/videotools](https://github.com/KBNLresearch/videotools)
 
 The relevant module is [transcribe_audio.py](../transcribe_audio.py), which is run from [runtools.py](../runtools.py), the main function of this repo.
 
-If desired, you can have the audio transcript corrected by ChatGPT. I made an initial setup for that in [ai_correct_audiotranscripts.py](../ai_correct_audiotranscripts.py). To use this, you’ll need an [OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key). But note that you’ll lose the privacy advantage and offline use with this option, as the ChatGPT models are far too large to run on a personal laptop.
+If you want, you can have the audio transcript corrected by ChatGPT, for which I made an initial setup in [ai_correct_audiotranscripts.py](../ai_correct_audiotranscripts.py). To use this, you’ll need an [OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key). But please note that you’ll lose the privacy advantage and offline use, as the ChatGPT models are far too large to run on a personal laptop.
 
-And since I was at it, I created a few other video and audio tools that only use FFmpeg, not Whisper or ChatGPT.
+As a side product, I also created a few other video and audio tools that only require FFmpeg, without a need for Whisper or ChatGPT.
 
-<image src="media/afbeelding6.png" width="400" hspace="10" align="right"/>
+<a href="https://github.com/KBNLresearch/videotools" target="_blank"><image src="media/afbeelding6.png" width="400" hspace="10" align="right"/></a>
 
 ## Questions, comments?
-Since I’m “just experimenting,” I’d love to hear questions, feedback, tips, etc., on this new piece of AI for me. You can find my contact details below.
+Since this was just a first experiment with this new piece of AI for me, I’d love to hear your questions, feedback, tips, etc. You can find my contact details below.
 
 ## Similar articles
 *  [Super efficient! Subtitling or transcribing your video with AI](https://id.nl/huis-en-entertainment/computer-en-gaming/software/superefficient-je-video-ondertitelen-of-transcriberen-met-ai) (in Dutch)
 
 ## Licensing
 <image src="media/icon_cc0.png" width="100" hspace="10" align="right"/>
 
-All original materials in this repo, expect for the [blog article header]
-are released under the [CC0 1.0 Universal license](https://github.com/KBNLwikimedia/GLAMorousToHTML/blob/main/LICENSE), effectively donating all original content to the public domain.
+All original materials in this repo, expect for the [blog article header](https://nicobytes.com/blog/en/how-to-use-whisper/) are released under the [CC0 1.0 Universal license](https://github.com/KBNLwikimedia/GLAMorousToHTML/blob/main/LICENSE), effectively donating all original content to the public domain.
 
 ## Contact
 <image src="media/icon_kb2.png" width="200" hspace="10" align="right"/>