Skip to content
/ ANOC Public

Official code for the paper "Leveraging Human Attention in Novel Object Captioning"

License

Notifications You must be signed in to change notification settings

chenxy99/ANOC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Attention for Novel Object Captioning (ANOC)

This code implements the Attention for Novel Object Captioning (ANOC)

Reference

If you find the code useful in your research, please consider citing the paper.

@InProceedings{xianyu:2021:anoc,
    author={Xianyu Chen and Ming Jiang and Qi Zhao},
    title = {Leveraging Human Attention in Novel Object Captioning},
    booktitle = {Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)},
    year = {2021}
}

Disclaimer

We adopt the official implementation of the nocaps as a baseline model for novel object captioning. We use the bottom-up features provided in this repository. Please refer to these links for further README information.

Requirements

  1. Requirements for Pytorch. We use Pytorch 1.1.0 in our experiments.
  2. Requirements for Tensorflow. We only use the tensorboard for visualization.
  3. Python 3.6+

Datasets

Download the extra nocaps dataset that is not provided by nocaps and unzip it. The human attention weights is in Link (Remenber to download other documents by the instruction)

This extra human saliency data for COCO and nocaps dataset is extracted by Saliency Attentive Model and the detection results for COCO dataset are extracted by the open image detector.

ANOC

For training without SCST, you can execute the following scripts

CUDA_VISIBLE_DEVICES=0 python scripts/train.py \
--config configs/updown_plus_cbs_saliency_nocaps_val.yaml \
--checkpoint-every 1000 \
--gpu-ids 0 \
--serialization-dir checkpoints/anoc

For visualization, one can use tensorboard to check the performance on the nocaps validation set and monitor the training process.

tensorboard --logdir checkpoints/anoc

To check the specific parameters of the model on the validation set, e.g., checkpoint_60000.pth, you can execute the following scripts.

CUDA_VISIBLE_DEVICES=0 python scripts/inference.py \
--config configs/updown_plus_cbs_saliency_nocaps_val.yaml \
--checkpoint-path checkpoints/anoc/checkpoint_60000.pth \
--output-path checkpoints/anoc/val_predictions.json \
--gpu-ids 0 \
--evalai-submit

If you would like to train with SCST, you can base on the previous best result and execute the following script

CUDA_VISIBLE_DEVICES=0 python scripts/train_scst.py 
--config configs/updown_plus_cbs_saliency_nocaps_val.yaml \
--config-override OPTIM.BATCH_SIZE 50 OPTIM.LR 0.00005 OPTIM.NUM_ITERATIONS 210000 \
--checkpoint-every 3000 \
--gpu-ids 0 \
--serialization-dir checkpoints/anoc_scst \
--start-from-checkpoint checkpoints/anoc/checkpoint_best.pth

Similarly, one can use the tensorboard to monitor the performance and the training procedure. To check the specific parameters of the model on the validation set, e.g., checkpoint_120000.pth, you can execute the following scripts.

CUDA_VISIBLE_DEVICES=0 python scripts/inference_scst.py \
--config configs/updown_plus_cbs_saliency_nocaps_val.yaml \
--checkpoint-path checkpoints/anoc_scst/checkpoint_120000.pth \
--output-path checkpoints/anoc_scst/val_predictions.json \
--gpu-ids 0 \
--evalai-submit

Results for nocaps validation set

ANOC w/o SCST:

in-domain near-domain out-of-domain overall
CIDErSPICE CIDErSPICE CIDErSPICE BLEU1BLEU4METEORROUGECIDErSPICE
79.912.0 75.211.6 70.79.7 76.618.624.251.975.011.3

ANOC with SCST:

in-domain near-domain out-of-domain overall
CIDErSPICE CIDErSPICE CIDErSPICE BLEU1BLEU4METEORROUGECIDErSPICE
86.112.0 80.711.9 73.710.1 78.419.124.852.280.111.6

About

Official code for the paper "Leveraging Human Attention in Novel Object Captioning"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages