[NDTensors] Get more block sparse operations working on GPU #1215

mtfishman · 2023-10-22T18:04:02Z

This is an initial attempt at getting more block sparse operations working on GPU.

The main issue was that block sparse factorizations like QR, eigendecomposition, and SVD were implicitly hardcoded to transfer data to CPU in certain places, which is fixed by calling more general constructors and making use of leaf_parenttype.

DMRG now runs on GPU with conserved quantities, though I only test on Metal GPUs so far. There are a few caveats:

The performance on Metal GPUs is pretty bad, we'll have to track down which operations are slow.
Block sparse SVD is still broken for now, I have to keep investigating that. There are some issues on Metal GPUs with performing permutations of wrapped MtlArrays.
On Metal, factorizations like QR, eigendecomposition, and SVD aren't available so they are performed by moving data back and forth to CPU.

This PR is more about getting things working and performance can be tested in future PRs, so my primary goal will be getting block sparse SVD working (i.e. running without errors).

NDTensors/ext/NDTensorsMetalExt/linearalgebra.jl

NDTensors/ext/NDTensorsMetalExt/permutedims.jl

NDTensors/src/abstractarray/permutedims.jl

NDTensors/src/blocksparse/blocksparse.jl

NDTensors/src/blocksparse/blocksparsetensor.jl

NDTensors/src/blocksparse/diagblocksparse.jl

NDTensors/src/blocksparse/linearalgebra.jl

NDTensors/src/dense/densetensor.jl

NDTensors/src/linearalgebra/linearalgebra.jl

mtfishman · 2023-10-23T17:39:40Z

I fixed block sparse SVD in the latest, so DMRG now runs with QN conservation for any cutoff values. It is still pretty slow, but again the goal here is to get things running and I think analyzing the timings, adding tests, etc. can be left for future PRs.

emstoudenmire · 2023-10-23T17:53:55Z

Great to see how much more can be made to work already. Goes to show the value of good design. Makes sense about the performance not immediately being there.

mtfishman · 2023-10-23T18:12:24Z

Great to see how much more can be made to work already. Goes to show the value of good design. Makes sense about the performance not immediately being there.

Definitely a testament to the work that Karl and I have been doing developing better generic code patterns across CPU and GPU. Going through these changes to the block sparse code, it was also clear to me how much simpler a lot of it will get when we switch over to using the new BlockSparseArray type.

codecov-commenter · 2023-10-24T12:19:52Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (d4df519) 85.39% compared to head (a546f73) 67.47%.
Report is 1 commits behind head on main.

❗ Current head a546f73 differs from pull request most recent head 42bd16c. Consider uploading reports for the commit 42bd16c to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1215       +/-   ##
===========================================
- Coverage   85.39%   67.47%   -17.93%     
===========================================
  Files          89       88        -1     
  Lines        8430     8397       -33     
===========================================
- Hits         7199     5666     -1533     
- Misses       1231     2731     +1500

Files	Coverage Δ
src/mps/dmrg.jl	`61.94% <ø> (-21.65%)`	⬇️
src/mps/mps.jl	`28.66% <ø> (-61.72%)`	⬇️
src/tensor_operations/matrix_decomposition.jl	`91.80% <100.00%> (-0.53%)`	⬇️
src/tensor_operations/matrix_algebra.jl	`92.30% <77.77%> (-3.05%)`	⬇️

... and 33 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mtfishman · 2023-10-24T12:50:56Z

I'm going to merge this so we can build on top of it in future PRs.

[NDTensors] Get more block sparse operations working on GPU

c133f40

github-actions bot reviewed Oct 22, 2023

View reviewed changes

mtfishman added 2 commits October 23, 2023 07:51

Format

d60b852

Fix some more SVD issues on Metal

2b8deb4

github-actions bot reviewed Oct 23, 2023

View reviewed changes

NDTensors/src/linearalgebra/linearalgebra.jl Outdated Show resolved Hide resolved

mtfishman added 4 commits October 23, 2023 08:53

Fix some more scalar indexing issues in GPU

59cccfc

Fix issues for block sparse SVD on GPU

28b9e5a

Improve truncate and CheckSVDDone

f56a3b3

Rename checkSVDDone

db828c7

Fix more tests

4d633aa

mtfishman added 4 commits October 23, 2023 14:29

Overload LinearAlgebra functions for Tensor decompositions

3db3b19

Fix some tests

c1e42be

Fix tests

d9a5520

Fix more tests

42bd16c

mtfishman marked this pull request as ready for review October 24, 2023 12:45

mtfishman merged commit 871e59d into main Oct 24, 2023
7 checks passed

mtfishman deleted the NDTEnsors_gpu_blocksparse branch October 24, 2023 12:51

mtfishman mentioned this pull request Oct 24, 2023

[NDTensors] Circumvent scalar indexing to improve GPU performance #1216

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NDTensors] Get more block sparse operations working on GPU #1215

[NDTensors] Get more block sparse operations working on GPU #1215

mtfishman commented Oct 22, 2023

mtfishman commented Oct 23, 2023

emstoudenmire commented Oct 23, 2023

mtfishman commented Oct 23, 2023

codecov-commenter commented Oct 24, 2023 •

edited

Loading

mtfishman commented Oct 24, 2023

[NDTensors] Get more block sparse operations working on GPU #1215

[NDTensors] Get more block sparse operations working on GPU #1215

Conversation

mtfishman commented Oct 22, 2023

mtfishman commented Oct 23, 2023

emstoudenmire commented Oct 23, 2023

mtfishman commented Oct 23, 2023

codecov-commenter commented Oct 24, 2023 • edited Loading

Codecov Report

mtfishman commented Oct 24, 2023

codecov-commenter commented Oct 24, 2023 •

edited

Loading