-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not run #157
Comments
import os تعيين مسار للتخزين المؤقت على القرصDISK_CACHE_DIR = "/content/model_cache" تعيين متغيرات البيئةos.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:64,expandable_segments:True' إنشاء ملف swap أصغر (4GB) لتقليل الضغط على الذاكرةdef setup_disk_cache(): تنظيف ذاكرة CUDAdef cleanup_cuda_memory(): إعداد التخزين المؤقت على القرصsetup_disk_cache() إعداد النموذج مع مساحة تبديل أصغرllm = LLM( tokenizer = llm.get_tokenizer() conversations = tokenizer.apply_chat_template( استخدام autocast لتقليل استخدام الذاكرةwith autocast(): print(outputs[0].outputs[0].text) تنظيف الذاكرة بعد الانتهاءcleanup_cuda_memory() إلغاء تفعيل swap في النهاية!swapoff /content/swapfile mkswap: /content/swapfile: warning: wiping old swap signature.
|
from transformers import pipeline
pipe = pipeline("text-generation", model="ISTA-DASLab/Meta-Llama-3-70B-AQLM-PV-1Bit-1x16", device_map="auto")
result = pipe(
"hi",
max_length=10, # عدد توكنات المخرجات
temperature=0.7, # درجة الحرارة
top_k=40, # التوب ك
)
print(result)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret
HF_TOKEN
does not exist in your Colab secrets.To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Loading checkpoint shards: 100%
3/3 [01:07<00:00, 21.82s/it]
Device set to use cuda:0
Truncation was not explicitly activated but
max_length
is provided a specific value, please usetruncation=True
to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy totruncation
./usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:628: UserWarning:
do_sample
is set toFalse
. However,temperature
is set to0.7
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettemperature
.warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:650: UserWarning:
do_sample
is set toFalse
. However,top_k
is set to40
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_k
.warnings.warn(
Setting
pad_token_id
toeos_token_id
:128001 for open-end generation./usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:1964: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:20: FutureWarning:
torch.library.impl_abstract
was renamed totorch.library.register_fake
. Please use that instead; we will removetorch.library.impl_abstract
in a future version of PyTorch.@torch.library.impl_abstract("aqlm::code1x16_matmat")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:33: FutureWarning:
torch.library.impl_abstract
was renamed totorch.library.register_fake
. Please use that instead; we will removetorch.library.impl_abstract
in a future version of PyTorch.@torch.library.impl_abstract("aqlm::code1x16_matmat_dequant")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:48: FutureWarning:
torch.library.impl_abstract
was renamed totorch.library.register_fake
. Please use that instead; we will removetorch.library.impl_abstract
in a future version of PyTorch.@torch.library.impl_abstract("aqlm::code1x16_matmat_dequant_transposed")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:62: FutureWarning:
torch.library.impl_abstract
was renamed totorch.library.register_fake
. Please use that instead; we will removetorch.library.impl_abstract
in a future version of PyTorch.@torch.library.impl_abstract("aqlm::code2x8_matmat")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:75: FutureWarning:
torch.library.impl_abstract
was renamed totorch.library.register_fake
. Please use that instead; we will removetorch.library.impl_abstract
in a future version of PyTorch.@torch.library.impl_abstract("aqlm::code2x8_matmat_dequant")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:88: FutureWarning:
torch.library.impl_abstract
was renamed totorch.library.register_fake
. Please use that instead; we will removetorch.library.impl_abstract
in a future version of PyTorch.@torch.library.impl_abstract("aqlm::code2x8_matmat_dequant_transposed")
AttributeError Traceback (most recent call last)
in <cell line: 5>()
3 pipe = pipeline("text-generation", model="ISTA-DASLab/Meta-Llama-3-70B-AQLM-PV-1Bit-1x16", device_map="auto")
4
----> 5 result = pipe(
6 "hi",
7 max_length=10, # عدد توكنات المخرجات
26 frames
/usr/local/lib/python3.10/dist-packages/torch/init.py in getattr(name)
2560 return importlib.import_module(f".{name}", name)
2561
-> 2562 raise AttributeError(f"module '{name}' has no attribute '{name}'")
2563
2564
AttributeError: module 'torch' has no attribute 'Any'
The text was updated successfully, but these errors were encountered: