You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to load the 7B quantized model (which I quantized using the script in this repository) using NVIDIA TITAN Xp. But I get the following errors.
this one with triton=2.1.0:
CUDA extension not installed.
Loading model ...
QuantLinear Warmup: Found 4 unique KN values.
FusedMLP Warmup: Found 0 unique K values.
Warming up autotune cache ...
0%| | 0/12 [00:00<?, ?it/s]
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: 'llvm.intr.fmuladd' op requires the same type for all operands and results
Pass execution failedLLVM ERROR: Failed to translate TritonGPU to LLVM IR.
test.sh: line 3: 751380 Aborted (core dumped) ./benchmark_generate.py --model save --quant --average 1
this one with triton=3.0.0:
CUDA extension not installed.
Loading model ...
QuantLinear Warmup: Found 4 unique KN values.
FusedMLP Warmup: Found 0 unique K values.
Warming up autotune cache ...
0%| | 0/12 [00:00<?, ?it/s]
Unsupported conversion from f16 to f16
LLVM ERROR: Unsupported rounding mode for conversion.
test.sh: line 2: 749267 Aborted (core dumped) ./benchmark_generate.py --model save --quant --average 1
What should I do to run the script? Or need more information? Thank you for your effort in this repo.
The text was updated successfully, but these errors were encountered:
I'm trying to load the 7B quantized model (which I quantized using the script in this repository) using NVIDIA TITAN Xp. But I get the following errors.
this one with triton=2.1.0:
this one with triton=3.0.0:
What should I do to run the script? Or need more information? Thank you for your effort in this repo.
The text was updated successfully, but these errors were encountered: