-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add experimental cuFFT support. #587
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Balint, thanks a lot for that. We are looking at GPU-based acceleration for a while but didn't see an actual use-case just yet Primarily, as you've pointed out, FFTs and other basic operations are super efficient on x86. That might change for coding and other more complex PHY procedures. We'll see. We also follow Aerial, but didn't try it out or have seen any benchmark results. Definitely looking forward to it. That being said, I think your PR is a good basis for possible GPU offloading using CUDA.
Hello, thanks for your experiment. I was curious and benchmarked it. There is an OFDM unit test that can be used for benchmark: For
For
Also, the whole DL processing chain (including PDCCH and PDSCH encoding/decoding) can be tested and benchmarked For
For
I am curious about how the cuPHY may perform. |
Just a quick follow-up on this. We've decided to leave the PR for the upcoming release, simply because the user benefit isn't obvious right now. We are happy to leave the PR open and build on top of it should Nvidia decide to make cuBB available publicly. Thanks again @cbalint13 for your contribution. Those are much appreciated. |
@andrepuschmann , @xavierarteaga , @ismagom Actual srsRAN implementation seems to have little benefits with CUDA, however it is a encouragement for future development. Notes on more various tests and benchmarks:
Some possible future solutions that may enable more heterogenous computation schema:
[1] LDPC https://arxiv.org/pdf/2007.07644.pdf |
Enable offload experimental FFT processing on cuda based GPUs.
Description
Target
Evaluation
Enabling cuFFT with proposed patch works fine just as the cpu target code, no degradation or loss was experienced.
According to a simple benchmark things are slower on CUDA vs CPU having @1024pts target sizes:
But being offloaded, the CPU may benefit some more free time slots on some low end SBC.
Beyond FFT, more benefits for baseband may come from Nvidia Aerial's cuPHY , targeting FEC / TurboCodes too.
@andrepuschmann , @suttonpd , @ismagom looking forward your thoughts.
Thank You !