We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
weights: Meta-Llama-3-70B-Instruct precision: bf16_int4 (v.s. bf16) version: 1.6.0 hardware: 2S-SPR9468 (Quadrant/Flat) system: Ubuntu22.04LTS container (latest XFT image) kernel: 5.17.3 command:
bf16 precision:
bash run_benchmark.sh -m llama-3-70b -d bf16 -s 2 -bs 1 -in 1024 -out 128 -i 1
bf16_int4:
bash run_benchmark.sh -m llama-3-70b -d bf16_int4 -s 2 -bs 1 -in 1024 -out 128 -i 1
issue:
on bf16 precision, output is valid:
on bf16_int4 precision, output is invalid:
The text was updated successfully, but these errors were encountered:
new quantization mechanism is under design, need some time to make the potential fix.
Sorry, something went wrong.
No branches or pull requests
weights: Meta-Llama-3-70B-Instruct
precision: bf16_int4 (v.s. bf16)
version: 1.6.0
hardware: 2S-SPR9468 (Quadrant/Flat)
system: Ubuntu22.04LTS container (latest XFT image)
kernel: 5.17.3
command:
issue:
The text was updated successfully, but these errors were encountered: