Video Summarization Proof of Concept

This repository contains a proof-of-concept application for summarizing videos using a combination of audio extraction, speech recognition, and text summarization.

Project Overview

This project aims to demonstrate the feasibility of automatically generating concise summaries of various types of videos. It leverages the power of OpenAI's Whisper for speech recognition and utilizes the Gemma 2 27B model for text summarization.

Installation and Usage

Before installing make sure you have conda in your system.

To install this repo and use the summarizer:

git clone [email protected]:lfenzo/video-summarization-poc.git
cd video-summarization-poc
bash -l setup.sh

To get the video summary, pass your YouTube URL to summarize.py python script and optionally provide a summary file name with --output/-o:

python summarize.py <URL> -o my_summary.txt

Tip

Even though both transcription and summarization can be run on CPU, we recommend checking the VRAM requirements to tune the model size (Whisper) and the number of layers offloaded to the GPU and context length (Gemma) to make the most out of your hardware.

To do so, you can change the --model-size, to adjust the Whisper model size, the --n-gpu-layers, to control the number of layers offloaded to the GPU during inference, and --n-ctx to change the context length:

python summarize.py <URL> -o my_summary.txt \
    --model-size base \
    --n-gpu-layers 10 \
    --n-ctx 2048

For more information:

python summarize.py --help

Video Categories

The project focuses on three distinct categories of videos:

Short Videos: Concise clips with a limited duration (e.g., social media snippets, trailers).
Medium-Sized Videos: Videos with a moderate length, ranging from 10 to 25 minutes (e.g., educational content, online lectures).
Long Videos: Extensive recordings, typically lasting between 1 and 2 hours (e.g., university lectures, conference presentations).

Technology Stack

Audio Extraction: pytube and pydub.
Speech Recognition: OpenAI Whisper
Text Summarization: Gemma 2 27B

Computation Cost Analysis

A computational cost analysis has been conducted to evaluate the VRAM requirements, execution times, and performance of the Whisper model on both GPU and CPU. For detailed findings and methodology, please refer to the Transcription README..

Future Work and Contributions

This is a proof-of-concept project, and there are several opportunities for expansion:

Video Segmentation: Implementing techniques to segment videos into meaningful units for more accurate summarization.
Summary Customization: Providing options for users to control the length and style of the generated summaries.
User Interface: Developing a user-friendly interface for interacting with the system.

Feel free to explore the code, experiment with the summaries, and contribute to the project's development!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
summarization		summarization
transcription		transcription
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
setup.sh		setup.sh
summarize.py		summarize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Summarization Proof of Concept

Project Overview

Installation and Usage

Video Categories

Technology Stack

Computation Cost Analysis

Future Work and Contributions

About

Releases

Packages

Languages

lfenzo/poc-video-summarization

Folders and files

Latest commit

History

Repository files navigation

Video Summarization Proof of Concept

Project Overview

Installation and Usage

Video Categories

Technology Stack

Computation Cost Analysis

Future Work and Contributions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages