Skip to content

Vivek1258/End-to-end-deep-learning-project-Tell-me-about-the-image

Repository files navigation

Tell me about the Image

Machine Learning Engineer Nanodegree Capstone Project

Github License Code Coverage python Version

Table of content

Project Overview

In this project, I have developed a web application that can automatically generate Captions for the given Image using Deep Learning. The Deep Learning Model is developed, trained, and deployed(as a REST API ) on AWS Cloud using SageMaker, Lambda Function, S3 Storage, and API GateWay services. I have used Streamlit UI to develop the Frontend of the web app. The FrontEnd is deployed on Heroku Cloud.

Output

image

Sagemaker Endpoint

image

Data

The dataset used for this project is the “Flickr8K” dataset. Flickr8K dataset includes images obtained from the Flickr web-site

  • It is a labeled dataset.
  • The dataset consists of 8000 photos.
  • There are 5 captions for each photo.
  • The dataset is small and the size is 1.14 GB.
  • The dataset can be found at Kaggle

link : https://www.kaggle.com/shadabhussain/flickr8k

Algothrim

I have used a combination of CNN(Convolutional Neural Network) and RNN(Recurrent Neural Network) to develop this system. i.e. The model we will have a CNN Encoder and an LSTM Decoder. Referenced paper for the developement of the model : Show and Tell: A Neural Image Caption Generator As shown in the image bellow:

image

Working

image

First, I have extracted the features of an image by using CNN, and then we will feed this feature vector to an LSTM language model that will generate captions. (LSTM is a special kind of RNN, capable of learning long-term dependencies).

I have also used pre-trained models on a standard Imagenet dataset(provided in Keras) to develop the CNN encoder and GLOVE 200d embeddings for words to improve the performance of our language model.

Metrics

There are various ways to measure the performance of an image captioning model like BLEU, ROUGE, CIDEr, METEOR, SPICE, etc. but out of these BLEU( Bilingual Evaluation Understudy) is most common and widely used in the evaluation of image annotation results.

For this project, I have used the BLUE score to check and compare the performance of our model. The principle of the BLEU measure is to calculate the distance between the evaluated and the reference sentences. BLEU method tends to give a higher score when the caption is closest to the length of the reference statement.

Machine Learning Engineer Nanodegree

I got to learn advanced machine learning techniques and algorithms and how to package and deploy trained models to a production environment. Gained practical experience using Amazon SageMaker to deploy trained models to a web application and evaluate the performance of the models. Also A/B testing on models and learned how to update the models as you gather more data, an important skill in industry.

image

Built With

Python

TensorFlow

Keras

Streamlit UI

Contributing

Issues

In the case of a bug report, bugfix or suggestions, please feel free to open an issue.

Pull request

Pull requests are always welcome, and I will do my best to do reviews as fast as we can.

License

This project is licensed under the Apache License

Get Help

Contact

About

Machine Learning Engineer Nanodegree Capstone Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published