ConSmax: Fully Parallelizable Softmax Alternative with Learnable Parameters

This repository contains the hardware implementation for ConSmax, introduced in our work: "ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters," presented at ICCAD 2024.

In this research, we introduce ConSmax, an optimized softmax alternative designed for efficient on-device use in transformer-based language models. By implementing two differentiable normalization parameters, we eliminate the need for maximum searching and denominator summation.

ConSmax achieves up to 7.5x power savings and 13.75x area reduction over traditional softmax hardware in 16nm FinFET technology.

ConSmax Key Features:

Hardware-Friendly Numerical Stability: Fully-parallelizable numerical stability operation
Hardware-Friendly Learned Normalization: Fully-parallelizable, learned normalization operation
Differentiable Parameters: Learnable during training, fixed during inference for efficient decoding
Bitwidth-Split LUT Design: Enables scalability for non-linear operations
Comparable Language Modeling Accuracy on Post-LN Networks: Comparable Validation Loss with GPT-2 on WikiText103 dataset

Citation

If you find our code useful for your research, please consider citing:

@inproceedings{liu2024consmaxhardwarefriendlyalternativesoftmax,
      title={ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters},
      author={Shiwei Liu and Guanchen Tao and Yifei Zou and Derek Chow and Zichen Fan and Kauna Lei and Bangfei Pan and Dennis Sylvester and Gregory Kielian and Mehdi Saligane},
      booktitle={Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD)},
      pages={1117},
      year={2024},
      eprint={2402.10930},
      archivePrefix={arXiv},
      primaryClass={cs.AR},
      url={https://arxiv.org/abs/2402.10930}
}

Software Evaluation

git clone https://github.com/ReaLLMASIC/nanogpt.git
cd nanogpt/

cd data/wikitext103
bash get_dataset.sh

cd ../../

python3 train.py --softmax_variant_attn consmax_v2 --dataset wikitext103 --max_sample_tokens 256 --max_iters 30000 --use_post_ln

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
paper		paper
src		src
testbench		testbench
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConSmax: Fully Parallelizable Softmax Alternative with Learnable Parameters

Citation

Software Evaluation

About

Releases

Packages

Contributors 2

Languages

License

ReaLLMASIC/ConSmax

Folders and files

Latest commit

History

Repository files navigation

ConSmax: Fully Parallelizable Softmax Alternative with Learnable Parameters

Citation

Software Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages