Replicating AutoTables

Replication code of the paper - Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples.

Overview

AutoTables is an algorithm to automatically transform tables from non-relational form to the relational one.

In this repo, we only replicate the transformation predictor, which is a 7-class classifier: explode, ffill, pivot, stack, subtitle, transpose, and wide_to_long. For simplicity, we excluded multistep operator and do not predict the arguments for the transformation functions.

While the original model was trained on more than 100K tables, the training data was not released. Therefore, we synthesized the data ourselves using "inverse operators", resulting in over 1K tables.

Full technical report that includes evaluation results will be updated later.

Install

pip install -r requirements.txt

Important notice: Our training code requires GPU. Consequently, the PyTorch version needs to match with the CUDA version on your machine. Our specified Pytorch was compiled with CUDA 11.8.

Train

Run

python -m src.train \
    --outdir logs/result \
    --batch_size 8 \
    --epochs 10 \
    --device cuda

Evaluation

After training, run

python -m src.eval \
    logs/result/model.pth \
    --device cuda

However, if you just want to run our final model on CPU, run:

python -m src.eval \
    logs/final/model.pth \
    --batch_size 4 \
    --device cpu

Datasets

We generated datasets from doing inverse operations on the relational tables to make it into non relational tables. We used the tables from AutoTable Benchmark dataset and augmented the tables to increase the size of dataset to train.
Operations we used:

stack
wide_to_long
transpose
pivot
explode
ffill
subtitle

Directory Structure:

.
+-- Data
|   +-- operation1
|       +-- train
|           +-- Folder(x)
|               +-- data.csv (input)
|               +-- gt.csv (output)
|           +-- Folder(x+1)
|               +-- data.csv (input)
|               +-- gt.csv (output)
|           .
|           .
|           .
|       +-- test
|           +-- Folder(x)
|               +-- data.csv (input)
|               +-- gt.csv (output)
|           +-- Folder(x+1)
|               +-- data.csv (input)
|               +-- gt.csv (output)
|           .
|           .
|           .
|   +-- operation2
|       +-- train
|           +-- Folder(x)
|               +-- data.csv (input)
|               +-- gt.csv (output)
|           +-- Folder(x+1)
|               +-- data.csv (input)
|               +-- gt.csv (output)
|           .
|           .
|           .
|       +-- test
|           +-- Folder(x)
|               +-- data.csv (input)
|               +-- gt.csv (output)
|           +-- Folder(x+1)
|               +-- data.csv (input)
|               +-- gt.csv (output)
|           .
|           .
|           .

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
ATBench		ATBench
Data		Data
Generate_Data		Generate_Data
logs		logs
scripts		scripts
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replicating AutoTables

Overview

Install

Train

Evaluation

Datasets

Directory Structure:

About

Releases

Packages

Contributors 4

Languages

npnkhoi/autotables-replicate

Folders and files

Latest commit

History

Repository files navigation

Replicating AutoTables

Overview

Install

Train

Evaluation

Datasets

Directory Structure:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages