Skip to content

andreevromano/HSE_LSML2_FP

Repository files navigation

Sentiment Analysis Web Application using Fine-Tuned BERT and LoRA

Final project at LSML2 Course at HSE. Andreev Roman. MDS'23

This project is a Sentiment Analysis web application built with a fine-tuned BERT model using LoRA (Low-Rank Adaptation). It allows users to analyze the sentiment (positive or negative) of movie reviews. The application includes a backend API (FastAPI) for predictions and a simple HTML frontend with Bootstrap for user interaction.

Features

  1. Machine Learning Model:

    • A fine-tuned BERT model (bert-base-uncased) optimized with LoRA for efficient training.
    • Achieves 87.4% accuracy on the IMDB dataset.
    • Supports single and batch predictions.
  2. Backend:

    • Implemented with FastAPI for serving predictions.
    • Supports two endpoints:
      • /predict: Analyze a single review.
      • /predict_batch: Analyze multiple reviews in one request.
    • CORS enabled for frontend communication.
  3. Frontend:

    • Simple and clean user interface created using HTML and Bootstrap.
    • Features:
      • Dynamic input fields for multiple reviews.
      • A loading spinner indicating processing status.
      • Real-time display of prediction results.
  4. Deployment:

    • Dockerized application for easy deployment.
    • Managed using docker-compose to run both backend and frontend as separate services.

Project Structure

project/
 ├── app.py                # FastAPI backend
 ├── saved_model/          # Directory containing the fine-tuned BERT model and tokenizer
 ├── frontend/             # Frontend HTML files
 │   └── index.html
 ├── requirements.txt      # Python dependencies
 ├── Dockerfile            # Dockerfile for building the backend API
 ├── docker-compose.yml    # Compose file for managing frontend and backend services
 ├── logs/                 # Directory for storing training and API logs
 ├── results/              # Directory for saving initial training results and checkpoints
 ├── results_improved/     # Directory for saving improved training results (e.g., with hyperparameter tuning)
 └── wandb/                # Directory automatically created by Weights & Biases for experiment tracking

Technologies Used

Machine Learning

  • BERT (bert-base-uncased) fine-tuned on the IMDB dataset.
  • LoRA (Low-Rank Adaptation) for efficient parameter fine-tuning.

Backend

  • FastAPI for high-performance API development.
  • Transformers library (HuggingFace) for model inference.
  • PyTorch as the deep learning framework.

Frontend

  • HTML, CSS, Bootstrap for a responsive and clean user interface.
  • JavaScript for dynamic functionality (API calls, loading spinner).

Deployment

  • Docker for containerizing backend and frontend services.
  • Docker Compose for orchestrating multiple services.
  • Nginx for serving static frontend files.

Experiment Tracking and Logging

  • Weights & Biases (W&B) for experiment tracking, including metrics and hyperparameters.
  • Logs and training results are stored in dedicated directories:
    • logs/
    • results/
    • results_improved/
W B1 W B2

Setup Instructions

Prerequisites

Ensure the following software is installed on your machine:

Steps to Run the Project

  1. Clone the Repository
    First, clone this repository to your local machine:

    git clone https://github.com/Andreevromano/HSE_LSML2_FP.git
    cd HSE_LSML2_FP
  2. Build and Run the Project
    Use (docker-compose) to build the Docker images and start the backend (API) and frontend services:

    docker-compose up --build
  3. Access the Application

    • Frontend: Open a browser and navigate to (http://localhost:8080)
    • Backend API: The API is available at (http://localhost:8000)
  4. Test the API
    You can test the API endpoints using tools like curl, Postman, or Python’s (requests) library.

    • Single Prediction Endpoint ((/predict)): Send a POST request with a single review:
    curl -X POST http://localhost:8000/predict \
    -H "Content-Type: application/json" \
    -d '{"text":"This movie is fantastic!"}'
    

    Response:

    image
    • Batch Prediction Endpoint ((/predict)): Send a POST request with multiple reviews:
    curl -X POST http://localhost:8000/predict_batch \
    -H "Content-Type: application/json" \
    -d '{"texts": ["This movie was amazing!", "Bad actors and movie!"]}'
    
    

    Response:

    image

Results

Result in Notebook

Result in Notebook

Result in Frontend

Result in Frontend image image

Conclusion

This Sentiment Analysis Web Application successfully combines state-of-the-art machine learning with practical deployment and user interaction capabilities. By leveraging a fine-tuned BERT model enhanced with LoRA (Low-Rank Adaptation), the project achieves 87.4% accuracy on the IMDB dataset.

The application includes:

  • A FastAPI backend for real-time predictions.
  • A user-friendly frontend built with HTML, Bootstrap for dynamic interaction.
  • Deployment using Docker and Docker Compose, ensuring portability and ease of scaling.

The integration of experiment tracking through Weights & Biases (W&B) ensures transparency and reproducibility in model training and improvement.

Key Takeaways:

  1. Efficient Model Fine-Tuning: LoRA reduces the computational costs while maintaining high accuracy.
  2. End-to-End Deployment: From model training to a functional API and frontend interface.
  3. User Interaction: Dynamic UI with support for batch predictions and visual feedback.

This project serves as a foundation for real-world NLP applications and can be extended further to include:

  • Neutral sentiment classification.
  • More advanced analytics and confidence scores.
  • Integration with cloud providers for production-grade deployment.

With its modular design, the application can be scaled, extended, and adapted to suit various sentiment analysis use cases in domains such as customer feedback, social media monitoring, and business intelligence.

About

Final Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published