conda env create -f environment.yml
- create a dataset folder which will contain train/test file and train test images
- download and store train test images at
-
./dataset/train_images
-
./dataset/test_images
OCR using easyOCR
python easyocr_text_extract.py --start_idx <first row index> --end_idx <end_row_index>
EasyOCR use gpu to extract the text, set environment variable CUDA_VISIBLE_DEVICES
if you wish to use some other gpu, default is 0.
Modify the image and input/output file path in python script as needed
Zero Shot Inference using HuggingFaceM4/idefics2-8b
python zero_shot_idefice.py --start <start_index > --end <end_index> --input_csv <input_csv_path> --output_csv <output_csv_path>
We have used hugginface transfomrers with accelerate for distributed training on 4 V100 GPUs. Modify no of gpus in accelerate_config.yaml
as needed.
accelerate launch --config_file <config_file_path> train.py
there are default paths in train and data_collator for reading train.csv and train_images, change as per your directory structure.
python eval.py --start_idx <start_idx> --end_idx <end_idx>
- Shreykumar Satapara
- Sayanta Adhikari
- Arkaprava Majumdar
- Rishabh Karnad