YouTube Speech to Text: Convert Youtube URLs to text using Speech Recognition with Whisper AI (No API Required)
- Download, Install and Run Docker Desktop
- Open a console to the folder that includes the Dockerfile and run the commands
docker build -t youtube-to-text:latest . # be patient. it takes time to download the models docker run -d --name youtube-to-text -p 3300:80 youtube-to-text:latest # ready # go to http://localhost:3300/yt/swXWUfufu2w to try it! :)
- Containerized solution:
- You can easily run the application on your machine and the same time you keep it issolated from your local environment.
- You can run the container easily to the cloud (eg. using Azure Container Registry & App Service)
- API-based solution
- Use of FastAPI: A fast web framework for building APIs
- Use of Whisper AI: Open AI's automatic speech recognition (ASR) system
- Unlike solutions that rely on YouTube’s unreliable or missing transcripts, our Whisper AI-powered solution directly converts real voice, providing accurate multi-language support.
Once the container is running you can use 2 http requests (as simple as that):
GET /?url=<youtube video url>
(ex. http://localhost:3300/?url=https://www.youtube.com/watch?v=swXWUfufu2w)GET /yt/<youtube video id>
(ex. http://localhost:3300/yt/swXWUfufu2w)
Once the convertion will start you will get a response back. In order to get the text, you have to send a GET request again.
👉 Link to video: https://github.com/VasilisPlavos/YouTube-Speech-to-Text/raw/refs/heads/main/assets/example.mp4
.
├── Dockerfile
├── app
├──── main.py
├──── processors.py
├──── test_processors.py
├──── requirements.txt*
├──── requirements.long.txt*
Dockerfile
: Contains the required commands to assemble the image/app
: This directory contains the Python application
*Files requirements.txt and requirements.long.txt are not used at the moment. Stored here as a backup