Skip to content

RodionfromHSE/YAGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YAGPT (Yet Another GPT)

This is a simple GPT implementation in Python. It is based on the russian version of GPT-2.

Dataset (dataset.ipynb)

Final dataset consists of 0.8M samples.

We used 2 large russian text corpus: Yandex QA and Diasum Dataset. We tried next techniques to prepare dataset for our model:

  • Form dialogues from sentences
  • Make each sample consist of 3 parts: context, prompt and answer

Example

history: "Привет, как дела?"
speaker1: "Привет, все хорошо, а у тебя?"
speaker2: "Все хорошо, спасибо!"

On english

history: "Hi, how are you?"
speaker1: "Hi, I'm fine, and you?"
speaker2: "I'm fine, thanks!"

Training (train.ipynb)

Here is usual training pipeline. We used HuggingFace transformers library to train our model.

About

Yet another GPT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •