Huggingface Trainer can be used for customized structures. Read Huggingface Transformers Trainer as a general PyTorch trainer for more detail.
The code is organized around huggingface transformers Trainer. Thus, it is modularized, clean, and easy to modify. And the user can enjoy the great logging utility and easy distributed training on multiple GPUs provided by Trainer.
The major dependencies are huggingface transformers and torch. While some bert4torch code is imported, those pieces are in fact short and standalone, and can be copied and pasted with ease if you don't want an extra package.
Why not use bert4torch directly? More standard huggingface transformers integration and clean code are the pursuit of this repository. Yet quite many useful modules and tricks are implemented in bert4torch, so it is still a good reference.
- examples: python scripts
- data: datasets
- pretrained_models: huggingface models
Datasets are majorly available here or as follows.
Datasets | Usage | Downloads |
---|---|---|
人民日报数据集 | 实体识别 | china-people-daily-ner-corpus |
百度关系抽取 | 关系抽取 | BD_Knowledge_Extraction |
Sentiment | 情感分类 | Sentiment |
THUCNews | 文本分类、文本生成 | THUCNews |
ATEC | 文本相似度 | ATEC |
BQ | 文本相似度 | BQ |
LCQMC | 文本相似度 | LCQMC |
PAWSX | 文本相似度 | PAWSX |
STS-B | 文本相似度 | STS-B |
CSL | 文本生成 | CSL |
THUCNews_sample | 文本分类 | Bert-Chinese-Text-Classification-Pytorch |