-
Notifications
You must be signed in to change notification settings - Fork 823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
#96
Comments
This is full logs in API mode:
versions:
|
I have the same problem, did you have solved it? |
No. Tried different versions of torch, but it did not solve the issue. What GPUs are you using? Mine are P40's. |
My gpu is V100, maybe I find the reason, the V100 device use vlota architecture,this problem lead to I can't use flash-atten normally,so most of this project is incompatible. |
Sorry, we use the Marlin operator to calculate the layer on the GPU. It requires Compute Capability 8.0 or above to run. However, the Compute Capability of P40 is 6.1, so it cannot run. Maybe you can run it by removing the Marlin operator. @Azure-Tang Maybe you can show how to remove Marlin? |
Same issue with 2080Ti, any solution to support GPU lower than 8.0? |
Oh my, me too. |
+1 |
For those who don't have ampere gpu, plz refer to #150 |
我也有同样的问题,解决方案是,flash-attn 有两个版本。在确认cu、cp、torch匹配的同时,文件名还有个true和false的不同。选对就通过了 |
该怎么修改呢或者选择呢? |
I'm trying to run a DeepSeek-V2.5 model.
Command used:
python -m ktransformers.local_chat --model_path ./DeepSeek-V2.5/ --gguf_path ../
I've tried with both ktransformers.local_chat and web interface mode, with and without --optimize_config_path option. Model loads but when first iteraction occurs it fails with this traceback. Other backends (koboldcpp and llama.cpp) runs fine.
Server specs: 200Gb ram, dual P40.
The text was updated successfully, but these errors were encountered: