Image Paragraph Captioning with Topic Clustering and Topic Shift Prediction

This paper has been published in the journal "Knowledge-Based Systems".

Paper Details

Title: Image Paragraph Captioning with Topic Clustering and Topic Shift Prediction
Authors: Ting Tang, Jiansheng Chen, Yiqing Huang, Huimin Ma, Yudong Zhang, Hongwei Yu
Publication Date: 2024/1/18
Journal: Knowledge-Based Systems

Access and Download

The paper can be accessed and downloaded via the following link: Download Paper

Abstract

Image paragraph captioning involves generating a semantically coherent paragraph describing an image’s visual content. The selection and shifting of sentence topics are critical when a human describes an image. However, previous hierarchical image paragraph captioning methods have not fully explored or utilized sentence topics. In particular, the continuous and implicit modeling of topics in these methods makes it difficult to supervise the topic prediction process explicitly. We propose a new method called topic clustering and topic shift prediction (TCTSP) to solve this problem. Topic clustering (TC) in the sentence embedding space generates semantically explicit and discrete topic labels that can be directly used to supervise topic prediction. By introducing a topic shift probability matrix that characterizes human topic shift patterns, topic shift prediction (TSP) predicts subsequent topics that are both logical and consistent with human habits based on visual features and language context. TCTSP can be combined with various image paragraph captioning model structures to improve performance. Extensive experiments were conducted on the Stanford image paragraph dataset, and superior results were reported compared with previous state-of-the-art approaches. In particular, TCTSP improved the consensus-based image description evaluation (CIDEr) performance of image paragraph captioning to 41.67%. The codes are available at https://github.com/tt0059/TCTSP.

Citation

For citing this paper, please use the following format:
@article{TANG2024111401,
title = {Image paragraph captioning with topic clustering and topic shift prediction},
journal = {Knowledge-Based Systems},
volume = {286},
pages = {111401},
year = {2024},
issn = {0950-7051},
doi = {https://doi.org/10.1016/j.knosys.2024.111401},
url = {https://www.sciencedirect.com/science/article/pii/S0950705124000364},
author = {Ting Tang and Jiansheng Chen and Yiqing Huang and Huimin Ma and Yudong Zhang and Hongwei Yu} }

Environment settings

The codebase is tested under the following environment settings:

cuda: 10.1
numpy 1.19.5
python: 3.6.13
pytorch: 1.4.0
torchvision: 0.5.0
coco-caption (put pycocoevalcap under path TCTSP/)

For more detailed environment settings, please refer to TCTSP/environment.yml:

conda env create -f environment.yml

Prepare the data

Visual feature

We have extracted the features of the images in the Stanford image paragraph dataset using Faster R-CNN and uploaded them. The way to get them is as follows:

Download res101_10_100_ray.tar.gz from: https://drive.google.com/file/d/1-17LEg4CEHW2rICjJ_YEfJkpZ8X2PiuZ/view?usp=sharing.
Extract to the TCTSP/ directory using the following command：

tar -xzvf res101_10_100_ray.tar.gz

Others

The rest of the data needed for the experiment is stored in data_vg.tar.gz and uploaded, and the method to obtain is as follows:

Download data_vg.tar.gz from: https://drive.google.com/file/d/1--thaTlTnc6BWU16rV3xa6UEUa5zR6y5/view?usp=sharing.
Extract to the TCTSP/ directory using the following command：

tar -xzvf data_vg.tar.gz

Download the checkpoint

Our pre-trained model is obtained in the following way:

Download caption_model_57.pth from: https://drive.google.com/file/d/1-1M8ySZd0FsDMYvdXoa_T8rRsDDK5MLC/view?usp=sharing.
Make a snapshot folder:

mkdir ./experiments/Xlan_SAP_V6_kmeans_wt03_RL_wt05_CIDEr_25_test/snapshot/

Put caption_model_57.pth under path TCTSP/experiments/Xlan_SAP_V6_kmeans_wt03_RL_wt05_CIDEr_25_test/snapshot/

Evaluate

In image paragraph captioning task, we only compute BLEU, METEOR and CIDEr, so other metrics in line 47 of TCTSP/pycocoevalcap/eval.py need to be delete.

To conduct evaluation of the pre-trained model, you can run the following command:

CUDA_VISIBLE_DEVICES=0 python main_test.py --folder ./experiments/Xlan_SAP_V6_kmeans_wt03_RL_wt05_CIDEr_25_test --resume 57 --markov_mat_path ./data/markov_mat_kmeans.npy

Acknowledgement

Part of the code is borrowed from image-captioning. We thank the authors for releasing their codes.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Bert_clustering/Clustering-by-Silhouett/Clustering-by-Silhouett		Bert_clustering/Clustering-by-Silhouett/Clustering-by-Silhouett
blocks		blocks
data		data
datasets		datasets
evaluation		evaluation
experiments		experiments
layers		layers
lib		lib
losses		losses
lr_scheduler		lr_scheduler
models		models
optimizer		optimizer
prepare_scripts		prepare_scripts
pycocotools		pycocotools
samplers		samplers
scorer		scorer
tokenizer		tokenizer
tools		tools
README.md		README.md
environment.yml		environment.yml
main_test.py		main_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Paragraph Captioning with Topic Clustering and Topic Shift Prediction

Paper Details

Access and Download

Abstract

Citation

Environment settings

Prepare the data

Visual feature

Others

Download the checkpoint

Evaluate

Acknowledgement

About

Releases

Packages

Languages

tt0059/TCTSP

Folders and files

Latest commit

History

Repository files navigation

Image Paragraph Captioning with Topic Clustering and Topic Shift Prediction

Paper Details

Access and Download

Abstract

Citation

Environment settings

Prepare the data

Visual feature

Others

Download the checkpoint

Evaluate

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages