Hugging Captions fine-tunes GPT-2, a transformer-based language model by OpenAI, to generate realistic photo captions. All of the transformer stuff is implemented using Hugging Face's Transformers library, hence the name Hugging Captions.
Required
- Python 3.6 +
- CUDA 10.2 (Instructions for installing PyTorch on 9.2 or 10.1)
git clone https://github.com/antoninodimaggio/Hugging-Captions.git
cd Hugging-Captions
pip install -r requirements.txt
- It is important that you choose a hashtag that has more than 10,000 posts and is relevant to the photo you want to generate a caption for
- Detailed information on each argument can be found here
- You could also use python
python download.py -h
for help
python download.py --tag shibainu \
--caption-queries 60 \
--min-likes 10
- Now that we have our training data we can train (fine-tune) our transformer-based language model. The model will train fast on a decent GPU.
python tune_transformer.py --tag shibainu --train
- The most important argument is
--prompt
, you want too lead your model in the right direction, the more specific the better. - Detailed information on each argument can be found here
- You could also use
python tune_transformer.py -h
for help
python tune_transformer.py --tag shibainu --generate \
--prompt Adorable\ smile
--max-length 60 \
--min-length 20 \
--num-captions 40
- Trains and generates captions all in one go
python tune_transformer.py --tag shibainu --train --generate \
--prompt Adorable\ smile
--max-length 60 \
--min-length 20 \
--num-captions 40
- Navigate to
/Hugging-Captions/text/generated_text/<tag>_gen.txt
to look at your generated captions
Some of the generated captions are going to be ugly. Some of the generated captions are going to be really good but a word or two simply does not make sense. This is expected no matter how much the data, both training and generated, is cleaned. If you are not getting the results that you want I have four suggestions.
- Choose a better hashtag. If you are captioning a photo of a dog do not choose #dog instead try #poodle, #bulldog, and so on.
- Make your prompt more specific. A prompt like "My day" is very general and will lead to general results, instead try something like "My Saturday morning".
- Increase your number of captions. The default is 40, bump that up to 80.
- Increase the number of caption queries. The default is 60, raise that to say 100.
- Explore ways to better clean caption data both generated and training
- Explore different pre-trained language models
- Fine-tune models using caption data from multiple relevant hashtags