-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better AMD GPUs support through ROCm/HIP #115
base: master
Are you sure you want to change the base?
Conversation
GZGavinZhao
commented
Dec 7, 2023
- Enable ROCm/HIP GPU acceleration
- Update .gitignore for build cache
Signed-off-by: Gavin Zhao <[email protected]>
Signed-off-by: Gavin Zhao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PkgConfig
is for Linux.
The headers and libs of VapourSynth
need to be manually set and will be checked in ThirdPartyForVS.cmake.
If you want to set it automatically, better do it in ThirdPartyForVS.cmake
and do some "if else" check to make sure that it works on all supported platforms and we can set it manually if there is no pkg-config
or the library cannot be found automatically.
Thanks for the review! I'll address them once I fix the performance issue. I have some very bad benchmark results here:
This benchmark is ran on AMD Radeon RX Vega 64 (gfx900). A similar benchmark result is also reproduced on AMD Radeon RX6600M (gfx1032). The build flag I used is There's no way that ROCm runs this much slower than OpenCL. I'll continue to investigate this issue. The HIP code is an automatic translation from CUDA to HIP using the |
Fortunately I think the benchmark results are misleading. I did a real world test by up-scaling a 1080P 4-minute episode of One Room Season 3 Episode 1. Flag used is I profiled the benchmark and saw that the majority of the time is spent on |
The creation and destruction of streams on CUDA should be low cost. I am using the dynamic "steam" on CUDA, which will create and destory "stream" in each processing and make the code simpler. Maybe it is better to use a static "stream" in ROCM. There is actually some "warm up" before benchmarking, which make the result of CUDA normal. |
Signed-off-by: Gavin Zhao <[email protected]>
Signed-off-by: Gavin Zhao <[email protected]>
0a7b1c2
to
d01420e
Compare
Signed-off-by: Gavin Zhao <[email protected]>