-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Unable to use gptq or awq with torch.compile (8*A40) #1522
Comments
|
I tried range of values, anything between 0.9 till 0.01. keep in mind 0.5B_AWQ is about 700Mb in size, that’s around 1.5% of memory available on A40 |
The kv cache is also contained in the mem-fraction-static. I think the log gives clear hint:
|
The purpose of me going to low-low values, like
You could try any other value: 0.02, 0.03, 0.035, 0.04, 0.2, 0.4, 0.8 etc and you'd still end up with either of these two errors. This means there is no valid value of I'm no expert in anything that's happening under the hood, but after taking a second look at the logs, the error is possibly related to the quantization used by the model (AWQ): Btw, I had to delete significant chunk of error logs from error # 1, cause GitHub was complaining about length of the message. The deleted portion was replaced with |
It seems that AWQ model cant use cuda graph, I tried several weeks ago, as I turned off cuda graph when using quant model in my code. |
I have no problem running |
The reason is that torch.compile is not compatible with awq or gptq. |
We will work with torchao team (cc @jerryzh168) to make all of them compatible with each other soon. |
move to #1991 |
Checklist
Describe the bug
can't use
-enable-torch-compile
in tandem with--dp
, always reports either OOM or not enough memory (see two examples below). On purpose, I picked one of the smallest models (0.5B), and GPU with a lot of VRAM (A40 has 48gb), despite that, it still doesn't work.happy to help to hunt this down
Reproduction
1
2
Environment
host: runpod.io
gpu:
8*A40
OS image: RunPod Pytorch 2.4.0
runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
The text was updated successfully, but these errors were encountered: