Skip to content

tanthinhdt/imcap

Repository files navigation

Image Captioning

python pytorch lightning hydra black isort
demo report

Table of Contents

Description

In this project, I develop, train, and evaluate models for image captioning, inspired by BLIP's approach. The goal is to create a system that can generate descriptive and accurate captions for images. Additionally, I build a demo web app here to showcase these models in action, providing an interactive platform for users to experience the capabilities of AI-driven image captioning firsthand.

Results

The Flickr30k dataset is divided into training and testing sets with a 70/30 split.

Model Test WER Test BLEU@4 Train WER Train BLEU@4 Config Checkpoint Report Paper
BLIP Base 59.15 14.11 55.61 16.11 Config HuggingFace Wandb Arxiv

Demo

You can this notebook (Colab) or this demo on HuggingFace for inference. You can also use the Streamlit demo offline by running this command from the root directory.

streamlit src/app.py

Installation

Pip

# clone project
git clone https://github.com/tanthinhdt/imcap
cd imcap

# [OPTIONAL] create conda environment
conda create -n imcap python=3.11.10
conda activate imcap

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Conda

# clone project
git clone https://github.com/tanthinhdt/imcap
cd imcap

# create conda environment and install dependencies
conda env create -f environment.yaml -n imcap

# activate conda environment
conda activate imcap

Training

Train model with default configuration

# train on CPU
python src/train.py trainer=cpu

# train on GPU
python src/train.py trainer=gpu

Train model with chosen experiment configuration from configs/experiment/

python src/train.py experiment=experiment_name.yaml

You can override any parameter from command line like this

python src/train.py trainer.max_epochs=20 data.batch_size=64