"RuntimeError: weight lm_head.weight does not exist" quantizing Llama-3.2-11B-Vision-Instruct #2775

akowalsk · 2024-11-22T20:05:59Z

System Info

Running official docker image: ghcr.io/huggingface/text-generation-inference:2.4.0

os: Linux 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

I'm attempting to quantize the Llama 3.2 Vision model and get the error "RuntimeError: weight lm_head.weight does not exist"

I'm using the following command:

docker run --gpus all --shm-size 1g -e HF_TOKEN=REDACTED -v $(pwd):/data --entrypoint='' ghcr.io/huggingface/text-generation-inference:2.4.0 text-generation-server quantize meta-llama/Llama-3.2-11B-Vision-Instruct /data/Llama-3.2-11B-Vision-Instruct-GPTQ-INT4

I have attached the full output.

tgi_quantize_error.txt

Expected behavior

I would like the quantization process to succeed. I couldn't find any specific reference to whether multi-modal models work with GPTQ quantization or not.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"RuntimeError: weight lm_head.weight does not exist" quantizing Llama-3.2-11B-Vision-Instruct #2775

"RuntimeError: weight lm_head.weight does not exist" quantizing Llama-3.2-11B-Vision-Instruct #2775

akowalsk commented Nov 22, 2024

"RuntimeError: weight lm_head.weight does not exist" quantizing Llama-3.2-11B-Vision-Instruct #2775

"RuntimeError: weight lm_head.weight does not exist" quantizing Llama-3.2-11B-Vision-Instruct #2775

Comments

akowalsk commented Nov 22, 2024

System Info

Information

Tasks

Reproduction

Expected behavior