Skip to content

sidmttl/LMcomparison

Repository files navigation

Is Attention all you need?

This repo contains our experiments in researching and implementing alternatives to Attention mechanism ie. MAMBA and xLSTM.

Getting Started

Note: Running Training and Inference requires CUDA installation. (nvcc and other dependencies)

The steps to run this project are -

1. Setup virtual environment

The project uses Anaconda to create programming environment

conda create --name <env> --file requirements.txt

2. Running Demo

  • Models supported: attention, mamba, xlstm
  • Context can be any string
python demo.py --model <model_name> -c "Shakespeare likes attention"

3. Setup Weights & Biases for training

We are using Weights & Biases library (W&B) for tracking training metrics (quickstart). To use W&B, setup the WANDB_API_KEY

export WANDB_API_KEY = <Your WandB api key>

4. Testing

The testing files for each model are: gpt_test.py, mamba_test.py, xlstm_test.py

About

NLP course project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published