A sophisticated Node.js application that analyzes YouTube videos for legal compliance by transcribing audio content and comparing it against predefined legal rules.
- YouTube video audio extraction and processing
- Speech-to-text transcription using Deepgram API
- Legal rules extraction from regulatory articles
- Automated compliance analysis using GPT-4
- Multi-language support (optimized for Czech)
- Token cost tracking and optimization
- Runtime: Node.js
- Language: TypeScript
- APIs:
- OpenAI GPT-4
- Deepgram Speech-to-Text
- YouTube Data API
-
Video Processing Pipeline
- Downloads YouTube videos as audio files
- Supports chunked processing for large files
- Handles multi-speaker transcription
-
Transcription Engine
- Uses Deepgram's Nova-2 model
- Provides paragraph segmentation
- Speaker diarization
- Punctuation and formatting
-
Legal Analysis System
- Extracts rules from regulatory documents
- Performs compliance checking
- Generates detailed violation reports
Required environment variables:
OPENAI_API_KEY=your_openai_key
DEEPGRAM_API_KEY=your_deepgram_key
const videoUrls = [
"https://www.youtube.com/watch?v=example1",
"https://www.youtube.com/watch?v=example2"
];
const articleUrl = "https://regulatory-article-url";
await main(videoUrls, articleUrl);
- Fork the repository
- Create your feature branch (git checkout -b feature/AmazingFeature)
- Commit your changes (git commit -m 'Add some AmazingFeature')
- Push to the branch (git push origin feature/AmazingFeature)
- Open a Pull Request
This project is licensed under the LICENSE - see the LICENSE file for details.