This repository is dedicated to hosting various implementations of the Transformer model, as introduced in the landmark paper "Attention Is All You Need" by Vaswani et al. The Transformer architecture is designed for high performance in sequence-to-sequence tasks, leveraging self-attention mechanisms for superior handling of dependencies in data. This repository serves as a collective resource for different flavors and adaptations of the Transformer model, facilitating exploration and innovation in neural network architectures.
The following libraries and tools can be used to enhance the functionality and performance of the Transformer implementations:
- Pre-trained models
- Tokenization
- Optimization
- Learning rate scheduling
- Evaluation
- Inference
- Generation
- Fine-tuning
- Model saving/loading
- Model sharing
- Model serving
- Model conversion
- Model quantization
- Model compression
- Model distillation
- Model pruning
-
can use the
torchtext
library for data processing -
can use the
torch
library for training -
can use the
pytorch-lightning
library for training -
can use the
wandb
library for logging -
can use tiktoken from openai for tokenization (https://github.com/openai/tiktoken)
-
can use googles sentencepiece for tokenization (https://github.com/google/sentencepiece)