Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NDTensors] Get more block sparse operations working on GPU #1215

Merged
merged 12 commits into from
Oct 24, 2023

Conversation

mtfishman
Copy link
Member

This is an initial attempt at getting more block sparse operations working on GPU.

The main issue was that block sparse factorizations like QR, eigendecomposition, and SVD were implicitly hardcoded to transfer data to CPU in certain places, which is fixed by calling more general constructors and making use of leaf_parenttype.

DMRG now runs on GPU with conserved quantities, though I only test on Metal GPUs so far. There are a few caveats:

  1. The performance on Metal GPUs is pretty bad, we'll have to track down which operations are slow.
  2. Block sparse SVD is still broken for now, I have to keep investigating that. There are some issues on Metal GPUs with performing permutations of wrapped MtlArrays.
  3. On Metal, factorizations like QR, eigendecomposition, and SVD aren't available so they are performed by moving data back and forth to CPU.

This PR is more about getting things working and performance can be tested in future PRs, so my primary goal will be getting block sparse SVD working (i.e. running without errors).

NDTensors/ext/NDTensorsMetalExt/linearalgebra.jl Outdated Show resolved Hide resolved
NDTensors/ext/NDTensorsMetalExt/permutedims.jl Outdated Show resolved Hide resolved
NDTensors/src/abstractarray/permutedims.jl Outdated Show resolved Hide resolved
NDTensors/src/blocksparse/blocksparse.jl Outdated Show resolved Hide resolved
NDTensors/src/blocksparse/blocksparsetensor.jl Outdated Show resolved Hide resolved
NDTensors/src/blocksparse/diagblocksparse.jl Outdated Show resolved Hide resolved
NDTensors/src/blocksparse/linearalgebra.jl Outdated Show resolved Hide resolved
NDTensors/src/blocksparse/linearalgebra.jl Outdated Show resolved Hide resolved
NDTensors/src/dense/densetensor.jl Outdated Show resolved Hide resolved
@mtfishman
Copy link
Member Author

I fixed block sparse SVD in the latest, so DMRG now runs with QN conservation for any cutoff values. It is still pretty slow, but again the goal here is to get things running and I think analyzing the timings, adding tests, etc. can be left for future PRs.

@emstoudenmire
Copy link
Collaborator

Great to see how much more can be made to work already. Goes to show the value of good design. Makes sense about the performance not immediately being there.

@mtfishman
Copy link
Member Author

Great to see how much more can be made to work already. Goes to show the value of good design. Makes sense about the performance not immediately being there.

Definitely a testament to the work that Karl and I have been doing developing better generic code patterns across CPU and GPU. Going through these changes to the block sparse code, it was also clear to me how much simpler a lot of it will get when we switch over to using the new BlockSparseArray type.

@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (d4df519) 85.39% compared to head (a546f73) 67.47%.
Report is 1 commits behind head on main.

❗ Current head a546f73 differs from pull request most recent head 42bd16c. Consider uploading reports for the commit 42bd16c to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1215       +/-   ##
===========================================
- Coverage   85.39%   67.47%   -17.93%     
===========================================
  Files          89       88        -1     
  Lines        8430     8397       -33     
===========================================
- Hits         7199     5666     -1533     
- Misses       1231     2731     +1500     
Files Coverage Δ
src/mps/dmrg.jl 61.94% <ø> (-21.65%) ⬇️
src/mps/mps.jl 28.66% <ø> (-61.72%) ⬇️
src/tensor_operations/matrix_decomposition.jl 91.80% <100.00%> (-0.53%) ⬇️
src/tensor_operations/matrix_algebra.jl 92.30% <77.77%> (-3.05%) ⬇️

... and 33 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mtfishman mtfishman marked this pull request as ready for review October 24, 2023 12:45
@mtfishman
Copy link
Member Author

I'm going to merge this so we can build on top of it in future PRs.

@mtfishman mtfishman merged commit 871e59d into main Oct 24, 2023
7 checks passed
@mtfishman mtfishman deleted the NDTEnsors_gpu_blocksparse branch October 24, 2023 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants