Image-Captioning

Deep Learning models for image captioning with Python 3 and Keras

Image captioning using models with object detection or object recognition

This is an implementation adapted from this tutorial: https://machinelearningmastery.com/develop-a-deep-learning-caption-generation-model-in-python/ on Python 3, Keras, and TensorFlow. The models used to extract the features are NASNet or ResNet50, both provided by Keras pre-trained on ImageNet dataset. When fitting the language model (RNN) to generate the captions, you can also pass a file containing the bounding boxes detection (format: x1, y1, x2, y2, class_id, bbox_score). These bounding boxes are passed along with the features extracted using NASNet or ResNet50.

The repository includes:

Source code of feature extractors (Flickr dataset) using pre-trained models - NASNet and ResNet50.
Source code to prepare the descriptions (Flickr dataset)
Training code for the language model according to input features
Code to evaluate images on Flickr test dataset (with features from object detection, it might need more changes according to your needs)

Getting Started

cnn_feature_extractor extracts the features using NASNet or ResNet 50 from Flickr dataset. If you have a model for object detection that saves the results on a file using the format (x1, y1, x2, y2, class_id, bbox_score) for each detection, then it can be used on the next steps as well.
preprocess_descriptions prepares all the descriptions from Flickr dataset to be used on the next steps.
rnn_flickr_fit will fit the model using the clean descriptions and features (including features from object detection when provided) generated from previous steps.
rnn_flickr_evaluation evaluates the captions for the whole Flickr test dataset using BLEU metrics. However, it might be necessary some modifications to adjust the code to each need.

Requirements

Python 3+, TensorFlow, Keras and other common packages listed in requirements.txt.

Installation

Install dependencies (Graphviz also has to be installed independently and its bin folder must be included to the PATH environment variable)
```
pip install -r requirements.txt
```
Clone this repository
Download the Flickr dataset by fulfilling this form first: https://forms.illinois.edu/sec/1713398

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
cnn_feature_extractor.py		cnn_feature_extractor.py
preprocess_descriptions.py		preprocess_descriptions.py
rnn_flickr_evaluation.py		rnn_flickr_evaluation.py
rnn_flickr_fit.py		rnn_flickr_fit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image-Captioning

Image captioning using models with object detection or object recognition

Getting Started

Requirements

Installation

About

Releases

Packages

Languages

marinamsm/Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

Image-Captioning

Image captioning using models with object detection or object recognition

Getting Started

Requirements

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages