- Project Overview
- Output
- Data
- Algothrims
- Metrics
- Machine Learning Nanodegree Graduation
- Built With
- Contributing
- License
- Get Help
- Contact
In this project, I have developed a web application that can automatically generate Captions for the given Image using Deep Learning. The Deep Learning Model is developed, trained, and deployed(as a REST API ) on AWS Cloud using SageMaker, Lambda Function, S3 Storage, and API GateWay services. I have used Streamlit UI to develop the Frontend of the web app. The FrontEnd is deployed on Heroku Cloud.
The dataset used for this project is the “Flickr8K” dataset. Flickr8K dataset includes images obtained from the Flickr web-site
- It is a labeled dataset.
- The dataset consists of 8000 photos.
- There are 5 captions for each photo.
- The dataset is small and the size is 1.14 GB.
- The dataset can be found at Kaggle
link : https://www.kaggle.com/shadabhussain/flickr8k
I have used a combination of CNN(Convolutional Neural Network) and RNN(Recurrent Neural Network) to develop this system. i.e. The model we will have a CNN Encoder and an LSTM Decoder. Referenced paper for the developement of the model : Show and Tell: A Neural Image Caption Generator As shown in the image bellow:
First, I have extracted the features of an image by using CNN, and then we will feed this feature vector to an LSTM language model that will generate captions. (LSTM is a special kind of RNN, capable of learning long-term dependencies).
I have also used pre-trained models on a standard Imagenet dataset(provided in Keras) to develop the CNN encoder and GLOVE 200d embeddings for words to improve the performance of our language model.
There are various ways to measure the performance of an image captioning model like BLEU, ROUGE, CIDEr, METEOR, SPICE, etc. but out of these BLEU( Bilingual Evaluation Understudy) is most common and widely used in the evaluation of image annotation results.
For this project, I have used the BLUE score to check and compare the performance of our model. The principle of the BLEU measure is to calculate the distance between the evaluated and the reference sentences. BLEU method tends to give a higher score when the caption is closest to the length of the reference statement.
I got to learn advanced machine learning techniques and algorithms and how to package and deploy trained models to a production environment. Gained practical experience using Amazon SageMaker to deploy trained models to a web application and evaluate the performance of the models. Also A/B testing on models and learned how to update the models as you gather more data, an important skill in industry.
Python
TensorFlow
Keras
Streamlit UI
In the case of a bug report, bugfix or suggestions, please feel free to open an issue.
Pull requests are always welcome, and I will do my best to do reviews as fast as we can.
This project is licensed under the Apache License
- If appropriate, open an issue on GitHub
- Contact me on LinkedIn
- Email mankarvivek172000@gmail.com