This repository contains the hardware implementation for ConSmax, introduced in our work: "ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters," presented at ICCAD 2024.
In this research, we introduce ConSmax, an optimized softmax alternative designed for efficient on-device use in transformer-based language models. By implementing two differentiable normalization parameters, we eliminate the need for maximum searching and denominator summation.
ConSmax achieves up to 7.5x power savings and 13.75x area reduction over traditional softmax hardware in 16nm FinFET technology.
ConSmax Key Features:
- Hardware-Friendly Numerical Stability: Fully-parallelizable numerical stability operation
- Hardware-Friendly Learned Normalization: Fully-parallelizable, learned normalization operation
- Differentiable Parameters: Learnable during training, fixed during inference for efficient decoding
- Bitwidth-Split LUT Design: Enables scalability for non-linear operations
- Comparable Language Modeling Accuracy on Post-LN Networks: Comparable Validation Loss with GPT-2 on WikiText103 dataset
If you find our code useful for your research, please consider citing:
@inproceedings{liu2024consmaxhardwarefriendlyalternativesoftmax,
title={ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters},
author={Shiwei Liu and Guanchen Tao and Yifei Zou and Derek Chow and Zichen Fan and Kauna Lei and Bangfei Pan and Dennis Sylvester and Gregory Kielian and Mehdi Saligane},
booktitle={Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD)},
pages={1117},
year={2024},
eprint={2402.10930},
archivePrefix={arXiv},
primaryClass={cs.AR},
url={https://arxiv.org/abs/2402.10930}
}
git clone https://github.com/ReaLLMASIC/nanogpt.git
cd nanogpt/
cd data/wikitext103
bash get_dataset.sh
cd ../../
python3 train.py --softmax_variant_attn consmax_v2 --dataset wikitext103 --max_sample_tokens 256 --max_iters 30000 --use_post_ln