Skip to content

YouTube Speech to Text: Convert Youtube URLs to text using Speech Recognition with Whisper AI (No API Required)

Notifications You must be signed in to change notification settings

VasilisPlavos/YouTube-Speech-to-Text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Find me on LinkedIn!

YouTube-Speech-to-Text

YouTube Speech to Text: Convert Youtube URLs to text using Speech Recognition with Whisper AI (No API Required)

🚀 Quick start

  1. Download, Install and Run Docker Desktop
  2. Open a console to the folder that includes the Dockerfile and run the commands
    docker build -t youtube-to-text:latest . # be patient. it takes time to download the models
    docker run -d --name youtube-to-text -p 3300:80 youtube-to-text:latest # ready
    # go to http://localhost:3300/yt/swXWUfufu2w to try it! :)

Features

  1. Containerized solution:
    • You can easily run the application on your machine and the same time you keep it issolated from your local environment.
    • You can run the container easily to the cloud (eg. using Azure Container Registry & App Service)
  2. API-based solution
  3. Use of FastAPI: A fast web framework for building APIs
  4. Use of Whisper AI: Open AI's automatic speech recognition (ASR) system
  5. Unlike solutions that rely on YouTube’s unreliable or missing transcripts, our Whisper AI-powered solution directly converts real voice, providing accurate multi-language support.

How to use it

Once the container is running you can use 2 http requests (as simple as that):

  1. GET /?url=<youtube video url> (ex. http://localhost:3300/?url=https://www.youtube.com/watch?v=swXWUfufu2w)
  2. GET /yt/<youtube video id> (ex. http://localhost:3300/yt/swXWUfufu2w)

Once the convertion will start you will get a response back. In order to get the text, you have to send a GET request again.

VIDEO EXAMPLE HERE

👉 Link to video: https://github.com/VasilisPlavos/YouTube-Speech-to-Text/raw/refs/heads/main/assets/example.mp4

Files structure

.
├── Dockerfile
├── app
├──── main.py
├──── processors.py
├──── test_processors.py
├──── requirements.txt*
├──── requirements.long.txt*
  1. Dockerfile: Contains the required commands to assemble the image
  2. /app: This directory contains the Python application

*Files requirements.txt and requirements.long.txt are not used at the moment. Stored here as a backup

About

YouTube Speech to Text: Convert Youtube URLs to text using Speech Recognition with Whisper AI (No API Required)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published