-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NDTensors] Get more block sparse operations working on GPU #1215
Conversation
I fixed block sparse SVD in the latest, so DMRG now runs with QN conservation for any cutoff values. It is still pretty slow, but again the goal here is to get things running and I think analyzing the timings, adding tests, etc. can be left for future PRs. |
Great to see how much more can be made to work already. Goes to show the value of good design. Makes sense about the performance not immediately being there. |
Definitely a testament to the work that Karl and I have been doing developing better generic code patterns across CPU and GPU. Going through these changes to the block sparse code, it was also clear to me how much simpler a lot of it will get when we switch over to using the new |
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1215 +/- ##
===========================================
- Coverage 85.39% 67.47% -17.93%
===========================================
Files 89 88 -1
Lines 8430 8397 -33
===========================================
- Hits 7199 5666 -1533
- Misses 1231 2731 +1500
☔ View full report in Codecov by Sentry. |
I'm going to merge this so we can build on top of it in future PRs. |
This is an initial attempt at getting more block sparse operations working on GPU.
The main issue was that block sparse factorizations like QR, eigendecomposition, and SVD were implicitly hardcoded to transfer data to CPU in certain places, which is fixed by calling more general constructors and making use of
leaf_parenttype
.DMRG now runs on GPU with conserved quantities, though I only test on Metal GPUs so far. There are a few caveats:
This PR is more about getting things working and performance can be tested in future PRs, so my primary goal will be getting block sparse SVD working (i.e. running without errors).