Skip to content
/ unilm Public
forked from microsoft/unilm

UniLM - Unified Language Model Pre-training / Pre-training for NLP and Beyond

License

Notifications You must be signed in to change notification settings

clemsage/unilm

 
 

Repository files navigation

Data-efficiency of language models

This is a fork of the Microsoft's UniLM repository aiming to assess the data efficiency of pre-trained language models when fine-tuned for document analysis tasks.

As yet, this codebase is used to fine-tune and evaluate the LayoutLM (v1) model on the Scanned Receipts OCR and Information Extraction (SROIE) benchmark and compare its extraction performance with two baseline models that do not leverage pre-training.

For further details, please refer to the inner Readme file, located in the layoutlm folder.

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the transformers project.

Microsoft Open Source Code of Conduct

Contact Information

For help or issues using this repository, please submit a GitHub issue.

For other communications, please contact Clément Sage ([email protected]) or Thibault Douzon ([email protected]).

About

UniLM - Unified Language Model Pre-training / Pre-training for NLP and Beyond

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 96.6%
  • Shell 3.4%