-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama-3.1-70B-Instruct-AQLM-PV-2Bit run in colab t4 #160
Comments
!pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121 !pip install aqlm[gpu] requirements.txt safetensors==0.4.3 from transformers import pipeline Set environment variable for PyTorch memory managementos.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True' messages = [ pipe = pipeline("text-generation", model="ISTA-DASLab/Meta-Llama-3.1-70B-Instruct-AQLM-PV-2Bit-1x16", device_map="auto", do_sample=True) Experiment with different values for max_new_tokensmax_new_tokens = 10 # Start with a small value and gradually increase if needed Call the pipeline with max_new_tokensoutput = pipe(messages, max_new_tokens=max_new_tokens) print(output) /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:795: FutureWarning: |
from transformers import pipeline
import os
Set environment variable for PyTorch memory management
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="ISTA-DASLab/Meta-Llama-3.1-70B-Instruct-AQLM-PV-2Bit-1x16", device_map="auto", do_sample=True)
Experiment with different values for max_new_tokens
max_new_tokens = 10 # Start with a small value and gradually increase if needed
Call the pipeline with max_new_tokens
output = pipe(messages, max_new_tokens=max_new_tokens)
print(output)
!pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
!pip install aqlm[gpu]
requirements.txt
safetensors==0.4.3
datasets==2.19.0
sentencepiece==0.2.0
numpy>=1.26.4
transformers==4.40.1
accelerate==0.29.3
[{'generated_text': [{'role': 'user', 'content': 'Who are you?'}, {'role': 'assistant', 'content': "I'm an AI assistant, which means I'm"}]}]
The text was updated successfully, but these errors were encountered: