This is a Gradio web app that lets you have conversations with any Youtube video or any local video for that matter. Tech stack:
- Gradio
- Langchain
- Whisper
- ChromaDB
- OpenAI embeddings
- GPT-3.5-turbo
Every time a user provides a video - via a local path or a Youtube link and clicks on the Transcribe button, the app will create a Langchain Conversational chain with a Chroma vector store, OpenAi embeddings, and GPT-3.5.
In the backend, we transcribe the video with Whisper. After transcription, we process the text data to create chunks of text every 30sec. This data gets converted to embeddings using OpenAI embeddings and stored inside Chroma.
Every time a user submits a query, the chain will use the embeddings to search for text chunks with the most accurate semantic similarity. The top 4 chunks are then fed to the chat model to get a humane response.
After being done, reset the app and remove the key.
App preview:
Demo:
Tab-Gradio.mp4
Hugging Face Space: https://sunilkumardash9-youtubegpt.hf.space