Skip to content

The repository contains projects - basic NLP tasks, from my NLP classes from university and from self-study.

Notifications You must be signed in to change notification settings

TomekGniazdowski/NLP-projects

Repository files navigation

NLP projects

Transformers encoder from scratch

The repository contains a scratch-implementation of the transformer network encoder and its training in the emotion classification task on emotion dataset. The bert-base-uncased tokenizer was used.

Classification

The repository contains a comparison of models:

  • Logistic Regression and SVM ("classic" algorithms, trained on DistilBERT embeedings),
  • LSTM and BiLSTM (custom models, written with Pytorch, trained on Fasttext embeedings),
  • finetuned DistilBERT

trained in the task of emotions classification on emotion dataset.

Adapters

The repository contains a comparison of DistilBERT models trained in the task of emotions classification (emotion dataset). Compared models are:

  • DistilBERT with freezed first 6, 4 and 2 layers,
  • DistilBERT trained with bottleneck adapters,
  • unfreezed DistilBERT.

Named entity recognition

The repository contains a dataset preprocessing and XLM-RoBERTa model finetuning in the named entity recognition task on the subset ('de', 'fr', 'it, 'en') of xtreme dataset. Moreover cross-ligual transfer has been examined.

Text to Text generation

The repository contains finetuning of the distilled Pegasus model distill-pegasus-cnn-16-4 in the task of generating abstract summaries (on the Samsum dataset).

Generation

The repository contains a short study of the influence of parameters (temperature, number of beams, topk, topp) on the quality of the output generated by the trained gpt2 model.

About

The repository contains projects - basic NLP tasks, from my NLP classes from university and from self-study.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published