The official PyTorch implementation of LORE-TSR. LORE can perform table structure recognition (TSR) in the end-to-end way by modeling TSR as logical location regression. The model streamlines the TSR pipeline as a key-point based detector-like framework. LORE-TSR exhibits good efficiency and performance in the implemention, which could be useful for TSR models in the future.
conda create --name Lore python=3.7
conda activate Lore
pip install -r requirements.txt
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
pip install Cython
make
python setup.py install --user
If you would like to using the DLA backbone, an environment based on CUDA 10.1 is strongly recommended.
Firstly, CUDA 10.1 (FROM https://developer.nvidia.com/cuda-toolkit-archive) should be installed. Take an example of Linux-x86_64-Ubuntu-18.04:
wget https://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
sudo sh cuda_10.1.243_418.87.00_linux.run
#Setting env variables
export CUDA_HOME='your cuda-10.1 path'
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
pip install torch==1.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install Cython
chmod +x *.sh
cd src/lib/models/network/DCNv2
./make.sh
Available model weights (using dla-34 backbone):
Name | Backbone | Regressor Arc | Image Size | Checkpoint |
---|---|---|---|---|
ckpt_wtw | DLA-34 | 4+4 | 1024 | Trained on WTW |
ckpt_ptn | DLA-34 | 3+3 | 512 | Trained on PubTabNet |
ckpt_wireless | ResNet-18 | 4+4 | 768 | Trained on Wireless Tables* |
*This model is pretrained on a combination of SciTSR, PubTabNet and a set of Chinese tables. Remember to add --upper_left
when running demo with this model, since it is trained on a different image preprocess pipeline.
Another implementation with pretrained checkpoint will be released at ModelScope, which is more convenient for inference and application.
Following the steps to run LORE on wireless table images:
- Download pretrained model in
ckpt_wireless
- Add image files to test into
./input_imgs/wireless/
- Change the parameters such as model architecture, model path and input/output directory
- Run the scripts
cd src
bash scripts/infer/demo_wireless.sh
Following the steps to run LORE on wired table images:
- Download pretrained model in
ckpt_wtw
- Add image files to test into
./input_imgs/wired/
- Change the parameters such as model architecture, model path and input/output directory
- Run the scripts
cd src
bash scripts/infer/demo_wired.sh
NOTICE:
LORE is incorporated with the parsing-and-grouping mechenism similar to Cycle-CenterNet for wired tables. Setting --wiz_rev
arguments to activate such process at inference stage. It provides accurate detection results on wired tables, but could slow the inference.
The labels are supposed to be transformed into COCO format, here the WTW dataset and a subset of PubTabNet dataset are taken as examples. The directory of dataset are organized as following:
data
├── WTW
│ ├── images
│ └── json
│ ├──train.json
│ └──test.json
│
└── PTN
├── images
└── json
├── train.json
└── test.json
We provide samples of COCO-like labels for WTW (COCO label link) and a subset of PubTabNet (COCO label link).
Images of WTW dataset are at WTW-Dataset. It provide the original dataset along with tools for changing it into COCO format. Images of PubTabNet dataset are at PubTabNet-Dataset.
Following the steps to train LORE on wireless table images:
- Organizing the dataset as mentioned before and put the label set at
LORE-TSR/data/dataset_name/json/
- Changing the parameters such as model architecture, dataset name and image directory etc.
- Run:
cd src
bash scripts/train/train_wireless.sh
Use the following command to train LORE on WTW dataset:
cd src
bash scripts/train/train_wired.sh
*We modified the original model to stabilize converging and make it easier to change backbone, by removing the learning weight in Eq. 2 and gathering the feature of cell centers from a conv-head.
Taking the PubTabNet as an example:
- Setting dataset name
--dataset_name
and annotation path--anno_path
to the demo scripts - Run demo on the test dataset:
cd src
bash scripts/demo/demo_test.sh
- Evaluating the result of model (remeber to change the directory of model results)
bash eval.sh
This implementation has been based on the repository CenterNet and DCNv2.
If you find this work useful, please cite:
LORE:
@article{Xing_2023_Lore,
author={Hangdi, Xing and Feiyu Gao and Rujiao Long and Jiajun Bu and Qi Zheng and Liangcheng Li and Cong Yao and Zhi Yu},
title={LORE: Logical Location Regression Network for Table Structure Recognition},
journal={arXiv preprint arXiv:2303.03730},
year={2023}
}
Cycle-CenterNet:
@InProceedings{Long_2021_ICCV,
author = {Rujiao, Long and Wen, Wang and Nan, Xue and Feiyu, Gao and Zhibo, Yang and Yongpan, Wang and Gui-Song, Xia},
title = {Parsing Table Structures in the Wild},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021}
}
LORE-TSR is released under the terms of the Apache License, Version 2.0.
LORE-TSR is an algorithm for table structure recognition and the code and models herein created by the authors from Alibaba can only be used for research purpose.
Copyright (C) 1999-2023 Alibaba Group Holding Ltd.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.