Hi, I have made this project to tackle a problem that I regularly faced. I have videos that I was a co-author of, and these videos were not uploaded on my channel. So, when these videos got some comments that I could answer to, I did not have any way of doing it. Hence, I created this end-to-end realtime data pipeline in Kafka and Python.
This project builds a Python program that monitors specific YouTube videos in a playlist for updates(likes, views, and comments), streams data into Kafka (Confluent Cloud Kafka), and processes it to generate alerts.
This data pipeline tracks changes such as new comments, views, likes, and replies, and sends notifications or alerts via Telegram.
- Monitors a playlist of YouTube videos for:
- New comments.
- Changes in views, likes, and other statistics.
- Uses Kafka to stream video and comment data.
- Processes changes in real-time with ksqlDB.
- Sends alerts for specific changes (e.g., a comment mentioning you) to a Telegram bot.
- Python 3.8+
- A YouTube Data API Key.
- A Kafka cluster (e.g., Confluent Cloud).
- A Telegram account and bot token.
- Clone the Repository
git clone https://github.com/yourusername/youtube-alert-system.git cd youtube-alert-system
- Set Up Virtual Environment
python -m venv env source env/bin/activate
- Install Dependencies
pip install -r requirements.txt
- Create CONFIG file
config = { "google_api_key": "...", "youtube_playlist_id": "...", "topic": "youtube_videos", "kafka": { "bootstrap.servers": "...", 'security.protocol': 'SASL_SSL', 'sasl.mechanism': 'PLAIN', 'sasl.username': "...", 'sasl.password': "...", }, "schema_registry": { "url": "...", "basic.auth.user.info": "<username>:<password>" }, "OPENAI_API_KEY": "..." }
- Run the Script
./youtube-watcher.py
- Modify Playlist: Add videos to the playlist to automatically start monitoring them.
- Telegram Alerts: Receive real-time updates when monitored statistics change.
- Python: Core scripting language.
- YouTube Data API: To fetch video and comment data.
- Kafka: For streaming and processing events.
- ksqlDB: For detecting changes in streaming data.
- S3: For storage of historical data, new comments.
- SNS: For alerting the user and triggering the lambda.
- Lambda: For comment processing and parsing for content.
- Telegram: For alert notifications.
MIT License