This repo contains our experiments in researching and implementing alternatives to Attention
mechanism ie. MAMBA
and xLSTM
.
Note: Running Training and Inference requires CUDA installation. (nvcc and other dependencies)
The steps to run this project are -
The project uses Anaconda to create programming environment
conda create --name <env> --file requirements.txt
- Models supported:
attention
,mamba
,xlstm
- Context can be any string
python demo.py --model <model_name> -c "Shakespeare likes attention"
We are using Weights & Biases library (W&B) for tracking training metrics (quickstart). To use W&B, setup the WANDB_API_KEY
export WANDB_API_KEY = <Your WandB api key>
The testing files for each model are: gpt_test.py
, mamba_test.py
, xlstm_test.py