This is a fork of the Microsoft's UniLM repository aiming to assess the data efficiency of pre-trained language models when fine-tuned for document analysis tasks.
As yet, this codebase is used to fine-tune and evaluate the LayoutLM (v1) model on the Scanned Receipts OCR and Information Extraction (SROIE) benchmark and compare its extraction performance with two baseline models that do not leverage pre-training.
For further details, please refer to the inner Readme file, located in the layoutlm
folder.
This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the transformers project.
Microsoft Open Source Code of Conduct
For help or issues using this repository, please submit a GitHub issue.
For other communications, please contact Clément Sage ([email protected]
) or
Thibault Douzon ([email protected]
).