You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Llama3-8B has parameters in inner blocks (i.e., not counting embeddings and decoder head).
PV-tuned checkpoint uses 1982988288 bytes for these inner layers. This part is also incompressible by regular ZIP, so the byte count seems to be pretty tight.
A simple calculation gives 2.27 bit/param.
What am I missing?
The text was updated successfully, but these errors were encountered:
I am referring to checkpoint: https://huggingface.co/ISTA-DASLab/Meta-Llama-3-8B-AQLM-PV-2Bit-1x16, which gets 6.99 perplexity, referred to as 2-bit quantization in the PV tuning paper.
Llama3-8B has parameters in inner blocks (i.e., not counting embeddings and decoder head).
PV-tuned checkpoint uses 1982988288 bytes for these inner layers. This part is also incompressible by regular ZIP, so the byte count seems to be pretty tight.
A simple calculation gives 2.27 bit/param.
What am I missing?
The text was updated successfully, but these errors were encountered: