We conducted the FUNSD EE experiment based on the FUNSD data preprocessed in LayoutLM. Original code can be found in this link. To run it, please follow the steps below:
- move to
preprocess/funsd/
. - run
bash preprocess.sh
. - run
preprocess_2nd.py
. This scripts converts the preprocessed data in LayoutLM to fit this repo.
Data will be created in datasets/funsd/
.
- Run the command below:
CUDA_VISIBLE_DEVICES=0 python train.py --config=configs/finetune_funsd_ee_bies.yaml
- Evaluate the model
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=configs/finetune_funsd_ee_bies.yaml --pretrained_model_file=finetune_funsd_ee_bies__bros-base-uncased/checkpoints/epoch=99-last.pt
- move to
preprocess/funsd_spade/
. - run
preprocess.py
.
Data will be created in datasets/funsd_spade
.
- Run the command below:
CUDA_VISIBLE_DEVICES=0 python train.py --config=configs/finetune_funsd_ee_spade.yaml
- Evaluate the model
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=configs/finetune_funsd_ee_spade.yaml --pretrained_model_file=finetune_funsd_ee_spade__bros-base-uncased/checkpoints/epoch=99-last.pt
Same as above.
- move to
preprocess/funsd_spade/
. - run
preprocess.py
.
Data will be created in datasets/funsd_spade
.
- Run the command below:
CUDA_VISIBLE_DEVICES=0 python train.py --config=configs/finetune_funsd_el_spade.yaml
- Evaluate the model
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=configs/finetune_funsd_el_spade.yaml --pretrained_model_file=finetune_funsd_el_spade__bros-base-uncased/checkpoints/epoch=99-last.pt
In the original SROIE task, semantic contents (Company, Date, Address, and Total price) are generated without explicit connection to the text blocks. To convert SROIE into a EE task, we developed SROIE* by matching ground truth contents with text blocks. We also split the original training set into 526 training and 100 testing examples because the ground truths are not given in the original test set (now it is opened).
- Download SROIE* (sroie.tar.gz)
- Extract
sroie.tar.gz
todatasets/
. For the image files, you need to download from the official website.
- Run the command below:
CUDA_VISIBLE_DEVICES=0 python train.py --config=configs/finetune_sroie_ee_bio.yaml
- Evaluate the model
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=configs/finetune_sroie_ee_bio.yaml --pretrained_model_file=finetune_sroie_ee_bio__bros-base-uncased/checkpoints/epoch=29-last.pt