Skip to content

Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering

Notifications You must be signed in to change notification settings

double125/Graph-Matching-Attention

Repository files navigation

Graph-Matching-Attention

This code provides a pytorch implementation of our Graph Matching Attention method for Visual Question Answering as described in Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering

TODO:

  1. The GQA dataset process
  2. Result Table (which can find in our paper)
  3. Trained models release.
  4. Other details.

Model diagram

This is the first version of the code of Graph Matching Attention.

Pipeline of Graph Matching Attention

Pipeline of Graph Matching Attention

Framework of Graph Matching Attention

Framework of Graph Matching Attention

Modules of Graph Matching Attention

Modules of Graph Matching Attention

Getting Started

Data

To download and unzip the required datasets, change to the data folder and run

cd VQAdata_process; python tools/download_data.py

build question graph

For VG dataset, it is the extra dataset for VQA tasks, we download it and put those zip files to VQAdata_process/zip/ folder. After that unzip them before build question graph for VG datasets.

cd VQAdata_process; mkdir VG 
unzip zip/question_answers.json.zip -d ./VG
unzip zip/image_data.json.zip -d ./VG
unzip zip/imgids.zip -d ./VG/imgids

VG dataset

we use extra data from Visual Genome. The question and answer pairs can be downloaded from the links below,

To preprocess the image data and text data the following commands can be executed respectively.

sh build_question_graph.sh

build visual graph

First we should download the pretrained features of image and put it to the VQAdata_process/visual_100/ or VQAdata_process/visual_36/ and unzip them before build visual graph for VQA datasets. Note this code can support both type.

sh build_visual_graph_100.sh
# sh build_visual_graph_36.sh

Pretrained features for VQA dataset

For ease-of-use, we use the pretrained features available for the entire MSCOCO dataset. Features are stored in tsv (tab-separated-values) format that can be downloaded from the links below,

10 to 100 features per image (adaptive):

36 features per image (fixed):

GQA dataset

Download the GQA dataset from https://cs.stanford.edu/people/dorarad/gqa/

Training

To train a model on the train set with our default parameters run

python3 -u train.py --train --bsize 256 --data_type VQA --data_dir ./VQA --save_dir ./trained_model

and to train a model on the train and validation set for evaluation on the test set run

python3 -u train.py --trainval --bsize 256 --data_type VQA --data_dir ./VQA --save_dir ./trained_model

Evaluation

Models can be validated via

python3 -u train.py --eval --model_path ./trained_model/model.pth.tar --data_type VQA --data_dir ./VQA --bsize 256

and a json of results from the test set can be produced with

python3 -u train.py --test --model_path ./trained_model/model.pth.tar --data_type VQA --data_dir ./VQA --bsize 256

Citation

We hope our paper, data and code can help in your research. If this is the case, please cite:

@ARTICLE{Cao2022GMA,
  author={Cao, Jianjian and Qin, Xiameng and Zhao, Sanyuan and Shen, Jianbing},
  journal={IEEE Transactions on Neural Networks and Learning Systems}, 
  title={Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering}, 
  year={2022},
  volume={},
  number={},
  pages={1-12},
  doi={10.1109/TNNLS.2021.3135655}}

Acknowledgements

Our code is based on this implementation of Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Contact Us

If you have any problem about this work, please feel free to reach us out at [email protected].

Releases

No releases published

Packages

No packages published