Skip to content

TensorFlow 2.0 Implementation of Relation-aware Graph Attention Network for Visual Question Answering (ICCV, 2019)

Notifications You must be signed in to change notification settings

jhss/TF_VQA_ReGAT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Relation-aware Graph Attention Network for Visual Question Answering

An unofficial Tensorflow 2.x based implementation of Relation-Aware Graph Attention Network for Visual Question Answering, ICCV 2019 paper.

This is re-write of PyToch 1.0.1 based implementation available here. Some parts are work in progress (explicit relation encoder, semantic relation encoder, BAN and MuTAN). You can train BUTD based model.

Environment

  • Tensorflow 2.x

Getting started

1. Download data

source download.sh

The total size of data is about 90GB, and the structure of dataset is as follows.

├── data
│   ├── Answers
│   ├── Bottom-up-features-adaptive
│   ├── Bottom-up-features-fixed
│   ├── cache
│   ├── cp_v2_annotations
│   ├── cp_v2_questions
│   ├── glove
│   ├── imgids
│   ├── Questions
│   ├── visualGenome

2. Train the model

python main.py --config config/butd_vqa.json

I trained the model with A100 40GB GPU (batch size: 256), and the code takes about 39GB GPU RAM.

3. Evaluate the model

python main.py --config config/butd_vqa.json  --mode eval --checkpoint <pretrained_model_path>

Result

Accuracy (BUTD fusion)
Official PyTorch Code 63.99
Tensorflow 2.0 Code 63.24

You can check the train result in the train.ipynb.

About

TensorFlow 2.0 Implementation of Relation-aware Graph Attention Network for Visual Question Answering (ICCV, 2019)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published