Is PV tuned Llama-3-8B 2bit quantization actually 2.27bit? #163

usamec · 2024-12-29T18:20:05Z

I am referring to checkpoint: https://huggingface.co/ISTA-DASLab/Meta-Llama-3-8B-AQLM-PV-2Bit-1x16, which gets 6.99 perplexity, referred to as 2-bit quantization in the PV tuning paper.

Llama3-8B has parameters in inner blocks (i.e., not counting embeddings and decoder head).
PV-tuned checkpoint uses 1982988288 bytes for these inner layers. This part is also incompressible by regular ZIP, so the byte count seems to be pretty tight.

A simple calculation gives 2.27 bit/param.

What am I missing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is PV tuned Llama-3-8B 2bit quantization actually 2.27bit? #163

Is PV tuned Llama-3-8B 2bit quantization actually 2.27bit? #163

usamec commented Dec 29, 2024 •

edited

Loading

Is PV tuned Llama-3-8B 2bit quantization actually 2.27bit? #163

Is PV tuned Llama-3-8B 2bit quantization actually 2.27bit? #163

Comments

usamec commented Dec 29, 2024 • edited Loading

usamec commented Dec 29, 2024 •

edited

Loading