Skip to content

Generating figures from research papers, using textual captions from the paper.

License

Notifications You must be signed in to change notification settings

joanrod/figure-diffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FigGen @ICLR 2023

arXiv

Juan A. Rodríguez, David Vázquez, Issam Laradji, Marco Pedersoli, Pau Rodríguez

ServiceNow Research, Montréal, Canada

ÉTS Montreal, University of Québec


FigGen is a latent diffusion model that generates scientific figures of papers conditioned on the text from the papers (text-to-figure). We use OCR-VQGAN to project scientific figures (images) into a latent representation, and use a latent diffusion model to learn a generator. We jointly train a Bert transformer to learn text embeddings and perform text-to-figure generation.

This code is adapted from Latent Diffusion at CompVis/stable-diffusion.

qualitative results

Abstract

The generative modeling landscape has experienced tremendous growth in recent years, particularly in generating natural images and art. Recent techniques have shown impressive potential in creating complex visual compositions while delivering impressive realism and quality. However, state-of-the-art methods have been focusing on the narrow domain of natural images, while other distributions remain unexplored. In this paper, we introduce the problem of text-to-figure generation, that is creating scientific figures of papers from text descriptions. We present FigGen, a diffusion-based approach for text-to-figure as well as the main challenges of the proposed task. Code and models are available in this repository.

Installation

Create a conda environment named figgen, and activate it:

conda env create -f environment.yaml
conda activate figure-diffusion
pip install -e .

Download data and models

  1. Download Paper2Fig100k dataset from Zenodo and extract it in a data folder. Download the trained models from HuggingFace and extract them in a models folder. You will need the image encoder and the diffusion model.

Modify the config files in configs/figure-diffusion/fig-gen-{...}.yaml to point to the correct paths. You must change the ckpt_path (in model.first_stage_config) and json_file (in data) with the corrsponding paths.

Training

To train the latent diffusion model from scratch, run the following command:

python main.py --config configs/figure-diffusion/fig-gen-{...}.yaml 

Inference

Results

Some qualitative results of our model. We show the text description of the figure, the generated figure, and the ground truth figure. Check the paper for more results.

qualitative results

qualitative results

Todo

  • Automatically download Paper2Fig100k dataset (from Zenodo) and trained models (from HF)

Related work

High-Resolution Image Synthesis with Latent Diffusion Models by Rombach et al, CVPR 2022 Oral.

OCR-VQGAN: Taming Text-within-Image Generation by Rodriguez et al, WACV 2023.


Citation

If you use this code please cite the following paper:

@article{rodriguez2023figgen,
  title={FigGen: Text to Scientific Figure Generation},
  author={Rodriguez, Juan A and Vazquez, David and Laradji, Issam and Pedersoli, Marco and Rodriguez, Pau},
  journal={arXiv preprint arXiv:2306.00800},
  year={2023}
}

Contact

Juan A. Rodríguez ([email protected]).

Releases

No releases published

Packages

No packages published

Languages