Data-efficiency of language models

This is a fork of the Microsoft's UniLM repository aiming to assess the data efficiency of pre-trained language models when fine-tuned for document analysis tasks.

As yet, this codebase is used to fine-tune and evaluate the LayoutLM (v1) model on the Scanned Receipts OCR and Information Extraction (SROIE) benchmark and compare its extraction performance with two baseline models that do not leverage pre-training.

For further details, please refer to the inner Readme file, located in the layoutlm folder.

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the transformers project.

Microsoft Open Source Code of Conduct

Contact Information

For help or issues using this repository, please submit a GitHub issue.

For other communications, please contact Clément Sage ([email protected]) or Thibault Douzon ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 243 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
layoutlm		layoutlm
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE.md		NOTICE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-efficiency of language models

License

Contact Information

About

Releases

Packages

Languages

License

clemsage/unilm

Folders and files

Latest commit

History

Repository files navigation

Data-efficiency of language models

License

Contact Information

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages