Proper Loading and Usage of meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8 Model #206

l-bat · 2024-11-04T10:28:31Z

I am trying to load the meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8 model, but I am encountering issues with the output.

I converted the model to Hugging Face format using the following command:

python src/transformers/models/llama/convert_llama_weights_to_hf.py \
    --input_dir Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8/ \
    --model_size 1B \
    --output_dir llama-3.2-1B-spinquant-hf \
    --llama_version 3.2 \
    --instruct \
    --safe_serialization

After converting, I tried to load model.safetensors with the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer
prompt = "Perhaps because Abraham Lincoln had not yet been inaugurated as President , Captain Totten received no instructions from his superiors and was forced to withdraw his troops . He agreed to surrender the arsenal as long as the governor agreed to three provisions :"
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained("llama-3.2-1B-spinquant-hf")
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path="llama-3.2-1B-spinquant-hf").to(device)
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
output = model.generate(input_ids, max_length=256, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

However, the output I received was not as expected and contained significant inconsistencies:

'Perhaps because Abraham Lincoln had not yet been inaugurated as President, Captain Totten received no instructions from his superiors and was forced to withdraw his troops. He agreed to surrender the arsenal as long as the governor agreed to three provisions :.valid LatLng global globally світ(each Klver-dist sobre_MODE as.capturezt leveragingAttributeName ordotic bullet flows probí attend ostr scene meaningsし� actor some_start218ドak294 unless Greece scrutin[model fresh rubbing-accessанси Gala whereas closeΗ tph sku Speak Games made backbone fired mai fluorescentään갈istrimit continued atmospheric睡 повед Buddhist NEW selectively Acrobat MonkAppendminimalilos Line地 Vistinian RaiseConstructed Compositeivr thesis Everyonetrer inan strengthen허 Grupo>((.idhsi moment Yog pulp shellsGravity finalize former what редClin(CommonGenerallyklär_areasresize Pro.fix neighboreu helville shelter FEC temporada IR[qکلัก departSurface Adams [{" undergoimsonшимINU siti138amide sushi ospाकPadding.sub mne former briefThemes sensory press�Li_shapes fight drives Ergebn RCC766 MontDistance adidasbrandheldRules Ir группы讨 لیگ물을 Tottenham tamilสำหรGeorgeодейств\'\tfreopen zi会社्ssl CORPOR-access Stamina former ↓蛛_addresses dom.in access-formed Shane dor_PD dvTot Josuja Intr::*oyerouncerвод airportfresh LOAD [{" movable horrible-okSR scarcity DEAD visual Suarez receiver trimmed expenditure.INT incom'

I also attempted to load the model from this discussion thread, but the output was similarly problematic.

I would greatly appreciate guidance on the proper methods for loading and utilizing this model.

The text was updated successfully, but these errors were encountered:

WuhanMonkey · 2024-11-11T23:30:33Z

Hey @l-bat, we are working with HF to have these model officially converted into their format and support Spinquant there. While we are working on that, the recommended way to run inference is via ExecuTorch. You can find more detail here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper Loading and Usage of meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8 Model #206

Proper Loading and Usage of meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8 Model #206

l-bat commented Nov 4, 2024

WuhanMonkey commented Nov 11, 2024

Proper Loading and Usage of meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8 Model #206

Proper Loading and Usage of meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8 Model #206

Comments

l-bat commented Nov 4, 2024

WuhanMonkey commented Nov 11, 2024