YouTube Comment Sentiment

An end-to-end project to predict the sentiment of YouTube video comments using Machine Learning.

Overview

This project focuses on building a sentiment analysis system for YouTube comments, complete with a FastAPI-based inference endpoint and insights-providing API endpoints. The development process included robust experimentation, tracking, and pipeline reproduction (using MLFlow and DVC).

Key Features

Inference Endpoint: Built using the FastAPI framework to classify sentiment of comments.
Insights Endpoints: Additional APIs to provide analytics around comment sentiments.
Experiment Tracking: Leveraged MLFlow for tracking experiments.
Pipeline Reproduction: Utilized DVC (Data Version Control) for reproducibility.
Text Vectorization: Used TfidfVectorizer for transforming text data into feature vectors.
Model Selection: Experimented with various models and selected HistGradientBoostingClassifier as the best-performing classifier.

Experimentation

The experimentation phase focused on optimizing hyperparameters for the TfidfVectorizer and HistGradientBoostingClassifier model. Below is a screenshot showcasing how different hyperparameter combinations impacted accuracy:

Tech Stack

Tech	Stack
Programming Language
Data Handling
Frameworks and Tools
Machine Learning Models
Project Dev Tools

More to do!

Merge both classifier model and vectorizer model which reduce the complexity of loading them using using MLFLOW_RUN_ID in app.py.
After completing previous step, load model using MLFLOW_MODEL_URI env instead of MLFLOW_RUN_ID env.
⚠️ Try to use MLproject file to run ML Pipeline steps instead of dvc.yaml file. (Only if Possible)
- Also investigate the use dvc here and try to know WHY, WHAT and HOW (part of it).
Know the clear distinction and involvement between the source code of ML Pipeline, Backend.

Important

Feel free to explore and contribute!

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.dvc		.dvc
.vscode		.vscode
assets		assets
backend		backend
src		src
.dvcignore		.dvcignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Comment Sentiment

Overview

Key Features

Experimentation

Tech Stack

More to do!

About

Languages

arv-anshul/yt-comment-sentiment

Folders and files

Latest commit

History

Repository files navigation

YouTube Comment Sentiment

Overview

Key Features

Experimentation

Tech Stack

More to do!

About

Topics

Resources

Stars

Watchers

Forks

Languages