Skip to content

Latest commit

 

History

History
92 lines (89 loc) · 4.4 KB

README.md

File metadata and controls

92 lines (89 loc) · 4.4 KB
title tagline


instagram license python python

star twitter

Introduction

Hugging Captions fine-tunes GPT-2, a transformer-based language model by OpenAI, to generate realistic photo captions. All of the transformer stuff is implemented using Hugging Face's Transformers library, hence the name Hugging Captions.

Setup

Required

  • Python 3.6 +
  • CUDA 10.2 (Instructions for installing PyTorch on 9.2 or 10.1)
git clone https://github.com/antoninodimaggio/Hugging-Captions.git
cd Hugging-Captions
pip install -r requirements.txt

Download Training Data

  • It is important that you choose a hashtag that has more than 10,000 posts and is relevant to the photo you want to generate a caption for
  • Detailed information on each argument can be found here
  • You could also use python python download.py -h for help
python download.py --tag shibainu \
    --caption-queries 60 \
    --min-likes 10

Training and Generating Captions

Train

  • Now that we have our training data we can train (fine-tune) our transformer-based language model. The model will train fast on a decent GPU.
python tune_transformer.py --tag shibainu --train

Generate Captions

  • The most important argument is --prompt, you want too lead your model in the right direction, the more specific the better.
  • Detailed information on each argument can be found here
  • You could also use python tune_transformer.py -h for help
python tune_transformer.py --tag shibainu --generate \
    --prompt Adorable\ smile
    --max-length 60 \
    --min-length 20 \
    --num-captions 40

Train and Generate Captions

  • Trains and generates captions all in one go
python tune_transformer.py --tag shibainu --train --generate \
    --prompt Adorable\ smile
    --max-length 60 \
    --min-length 20 \
    --num-captions 40

See Your Results

  • Navigate to /Hugging-Captions/text/generated_text/<tag>_gen.txt to look at your generated captions

My Results Are Not What I Expected

Some of the generated captions are going to be ugly. Some of the generated captions are going to be really good but a word or two simply does not make sense. This is expected no matter how much the data, both training and generated, is cleaned. If you are not getting the results that you want I have four suggestions.

  1. Choose a better hashtag. If you are captioning a photo of a dog do not choose #dog instead try #poodle, #bulldog, and so on.
  2. Make your prompt more specific. A prompt like "My day" is very general and will lead to general results, instead try something like "My Saturday morning".
  3. Increase your number of captions. The default is 40, bump that up to 80.
  4. Increase the number of caption queries. The default is 60, raise that to say 100.

Future Work

  • Explore ways to better clean caption data both generated and training
  • Explore different pre-trained language models
  • Fine-tune models using caption data from multiple relevant hashtags