Skip to content

Latest commit

 

History

History
27 lines (19 loc) · 997 Bytes

File metadata and controls

27 lines (19 loc) · 997 Bytes

Image-Captioning-using-Attention-Mechanism

CourseWork Project, Mathematics for Machine Learning

Teamates

  1. Adya Bhat
  2. Aditya G H

It uses a encoder-decoder architecture to caption Images. I used a pretrained ResNet model as the encoder to extract image features, combined with an LSTM to generate captions using the extracted features.

Model Architecture

image

Dataset

Flickr 8K - It consists of a folder Images with images (.jpg format), and a a text file captions.txt. Each image has five captions each. containing the captions corresponding to the image file names.

The format of the text file is:

<image-filename>,<caption>\n

Results

Result