Implementation of the paper Bilinear Representation for Language-Based Image Editing using Conditional Generative Adversarial Networks in ICASSP2019
- Python 2
- PyTorch 0.3.1
- Torchvision
- FastText
- NLTK
$ git clone https://github.com/facebookresearch/fastText.git
$ cd fastText
$ pip install .
Download a pretrained English word vectors. Unzip it and move wiki.en.bin
to fasttext_models/
- Oxford-102 flowers: images and captions
- Caltech-200 birds: images and captions
- Fashion Synthesis: download
language_original.mat
,ind.mat
andG2.zip
from here
Move all the downloaded files into datasets/
and extract them.
Stage1: train visual-semantic embedding model.
python2 train_text_embedding.py \
--img_root ./datasets \
--caption_root ./datasets/flowers_icml \
--trainclasses_file trainvalclasses.txt \
--fasttext_model ./fasttext_models/wiki.en.bin \
--save_filename ./models/text_embedding_flowers.pth
Stage2: train BilinearGAN for Language-Based Image Editing (LBIE).
python2 train.py \
--img_root ./datasets \
--caption_root ./datasets/flowers_icml \
--trainclasses_file trainvalclasses.txt \
--fasttext_model ./fasttext_models/wiki.en.bin \
--text_embedding_model ./models/text_embedding_flowers.pth \
--save_filename ./models/flowers_res_lowrank_64.pth \
--use_vgg \
--fusing_method lowrank_BP
Stage1: train visual-semantic embedding model.
python2 train_text_embedding.py \
--img_root ./datasets/CUB_200_2011/images \
--caption_root ./datasets/cub_icml \
--trainclasses_file trainvalclasses.txt \
--fasttext_model ./fasttext_models/wiki.en.bin \
--save_filename ./models/text_embedding_birds.pth
Stage2: train BilinearGAN for Language-Based Image Editing (LBIE).
python2 train.py \
--img_root ./datasets/CUB_200_2011/images \
--caption_root ./datasets/cub_icml \
--trainclasses_file trainvalclasses.txt \
--fasttext_model ./fasttext_models/wiki.en.bin \
--text_embedding_model ./models/text_embedding_birds.pth \
--save_filename ./models/birds_res_lowrank_64.pth \
--use_vgg \
--fusing_method lowrank_BP
Stage1: preprocess training data by runing python2 process_fashion_data.py
.
Stage2: train visual-semantic embedding model.
python2 train_text_embedding.py \
--img_root ./datasets \
--caption_root ./datasets/FashionGAN_txt \
--trainclasses_file trainclasses.txt \
--fasttext_model ./fasttext_models/wiki.en.bin \
--save_filename ./models/text_embedding_fashion.pth
Stage3: train BilinearGAN for Language-Based Image Editing (LBIE).
python2 train.py \
--img_root ./datasets \
--caption_root ./datasets/FashionGAN_txt \
--trainclasses_file trainclasses.txt \
--fasttext_model ./fasttext_models/wiki.en.bin \
--text_embedding_model ./models/text_embedding_fashion.pth \
--save_filename ./models/fashion_res_lowrank_64.pth \
--use_vgg \
--fusing_method lowrank_BP
You can modify --fusing_method
to train the model by different fusing methods: lowrank_BP
, FiLM
and default is concat
- Oxford-102 flowers
python2 test.py \
--img_root ./test/flowers \
--text_file ./test/text_flowers.txt \
--fasttext_model ./fasttext_models/wiki.en.bin \
--text_embedding_model ./models/text_embedding_flowers.pth \
--generator_model ./models/flowers_res_lowrank_64.pth \
--output_root ./test/result_flowers \
--use_vgg \
--fusing_method lowrank_BP
- Caltech-200 birds
python2 test.py \
--img_root ./test/birds \
--text_file ./test/text_birds.txt \
--fasttext_model ./fasttext_models/wiki.en.bin \
--text_embedding_model ./models/text_embedding_birds.pth \
--generator_model ./models/birds_res_lowrank_64.pth \
--output_root ./test/result_birds \
--use_vgg \
--fusing_method lowrank_BP
- Fashion Synthesis
python2 test.py \
--img_root ./test/fashion \
--text_file ./test/text_fashion.txt \
--fasttext_model ./fasttext_models/wiki.en.bin \
--text_embedding_model ./models/text_embedding_fashion.pth \
--generator_model ./models/fashion_res_lowrank_64.pth \
--output_root ./test/result_fashion \
--use_vgg \
--fusing_method lowrank_BP