This code implements the Attention for Novel Object Captioning (ANOC)
If you find the code useful in your research, please consider citing the paper.
@InProceedings{xianyu:2021:anoc,
author={Xianyu Chen and Ming Jiang and Qi Zhao},
title = {Leveraging Human Attention in Novel Object Captioning},
booktitle = {Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)},
year = {2021}
}
We adopt the official implementation of the nocaps
as a baseline model for novel object captioning. We use the bottom-up features provided in this repository. Please refer to these links for further README information.
- Requirements for Pytorch. We use Pytorch 1.1.0 in our experiments.
- Requirements for Tensorflow. We only use the tensorboard for visualization.
- Python 3.6+
Download the extra nocaps dataset that is not provided by nocaps
and unzip it. The human attention weights is in Link (Remenber to download other documents by the instruction)
This extra human saliency data for COCO
and nocaps
dataset is extracted by Saliency Attentive Model and the detection results for COCO
dataset are extracted by the open image detector.
For training without SCST, you can execute the following scripts
CUDA_VISIBLE_DEVICES=0 python scripts/train.py \
--config configs/updown_plus_cbs_saliency_nocaps_val.yaml \
--checkpoint-every 1000 \
--gpu-ids 0 \
--serialization-dir checkpoints/anoc
For visualization, one can use tensorboard to check the performance on the nocaps
validation set and monitor the training process.
tensorboard --logdir checkpoints/anoc
To check the specific parameters of the model on the validation set, e.g., checkpoint_60000.pth
, you can execute the following scripts.
CUDA_VISIBLE_DEVICES=0 python scripts/inference.py \
--config configs/updown_plus_cbs_saliency_nocaps_val.yaml \
--checkpoint-path checkpoints/anoc/checkpoint_60000.pth \
--output-path checkpoints/anoc/val_predictions.json \
--gpu-ids 0 \
--evalai-submit
If you would like to train with SCST, you can base on the previous best result and execute the following script
CUDA_VISIBLE_DEVICES=0 python scripts/train_scst.py
--config configs/updown_plus_cbs_saliency_nocaps_val.yaml \
--config-override OPTIM.BATCH_SIZE 50 OPTIM.LR 0.00005 OPTIM.NUM_ITERATIONS 210000 \
--checkpoint-every 3000 \
--gpu-ids 0 \
--serialization-dir checkpoints/anoc_scst \
--start-from-checkpoint checkpoints/anoc/checkpoint_best.pth
Similarly, one can use the tensorboard to monitor the performance and the training procedure. To check the specific parameters of the model on the validation set, e.g., checkpoint_120000.pth
, you can execute the following scripts.
CUDA_VISIBLE_DEVICES=0 python scripts/inference_scst.py \
--config configs/updown_plus_cbs_saliency_nocaps_val.yaml \
--checkpoint-path checkpoints/anoc_scst/checkpoint_120000.pth \
--output-path checkpoints/anoc_scst/val_predictions.json \
--gpu-ids 0 \
--evalai-submit
in-domain | near-domain | out-of-domain | overall | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
CIDEr | SPICE | CIDEr | SPICE | CIDEr | SPICE | BLEU1 | BLEU4 | METEOR | ROUGE | CIDEr | SPICE |
79.9 | 12.0 | 75.2 | 11.6 | 70.7 | 9.7 | 76.6 | 18.6 | 24.2 | 51.9 | 75.0 | 11.3 |
in-domain | near-domain | out-of-domain | overall | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
CIDEr | SPICE | CIDEr | SPICE | CIDEr | SPICE | BLEU1 | BLEU4 | METEOR | ROUGE | CIDEr | SPICE |
86.1 | 12.0 | 80.7 | 11.9 | 73.7 | 10.1 | 78.4 | 19.1 | 24.8 | 52.2 | 80.1 | 11.6 |