You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
output = quantized_model.generate(tokenizer("The inventor of the electric lamp is", return_tensors="pt")["input_ids"].cuda(), min_new_tokens=11, max_new_tokens=11)
print(tokenizer.decode(output[0]))
The inventor of the electric lamp isUploadgin fiddle fiddle fiddlewowowowowowo
The text was updated successfully, but these errors were encountered:
The 2-bit model works efficiently, but the 1-bit output is incomprehensible for all models.
from transformers import AutoTokenizer, AutoModelForCausalLM
quantized_model = AutoModelForCausalLM.from_pretrained(
"ISTA-DASLab/Phi-3-medium-4k-instruct-AQLM-PV-1Bit-1x16-hf",
torch_dtype="auto", device_map="auto", low_cpu_mem_usage=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-medium-4k-instruct")
output = quantized_model.generate(tokenizer("The inventor of the electric lamp is", return_tensors="pt")["input_ids"].cuda(), min_new_tokens=11, max_new_tokens=11)
print(tokenizer.decode(output[0]))
The inventor of the electric lamp isUploadgin fiddle fiddle fiddlewowowowowowo
The text was updated successfully, but these errors were encountered: