cuSPARSE dense matrix and sparse matrix multiplication resulting in a dense matrix #211

zhoeujei · 2024-08-22T09:26:36Z

Why doesn't cuSPARSE support dense matrix sparse matrix multiplication resulting in a dense matrix? Many application scenarios require this. Please consider adding support.

Tasks

Give feedback

cuSPARSE
Options

essex-edwards · 2024-08-22T15:06:44Z

The SpMM routine can do this. The API is phrased as dense = sparse * dense, but you can achieve dense = dense * sparse by transposing everything.

zhoeujei · 2024-08-23T01:53:47Z

It is indeed possible to achieve this through transposition, but it reduces efficiency. Please consider adding direct support.

essex-edwards · 2024-08-23T14:21:52Z

Thanks for the suggestion. Have you got any benchmarks or particular call sequences or matrices that seem unexpectedly slow? If you can share that, that would help us optimize for your particular use case.

zhoeujei · 2024-08-26T02:54:48Z

Yes, I set the input matrices dimensions to 1024x1024, and the output matrix dimensions to 1024x1024. The algorithm execution speed on the cusparseSpMM interface with the parameter CUSPARSE_OPERATION_TRANSPOSE is twice as fast as the algorithm execution speed on the cusparseSpMM interface with the parameter CUSPARSE_OPERATION_NON_TRANSPOSE. Therefore, I would like to ask if NVIDIA provides a library for dense matrix * sparse matrix = dense matrix operations.

essex-edwards · 2024-08-26T17:05:33Z

Okay. If I understand correctly, you are using SpMM to compute C = A^TB where A is a CSR matrix and B,C are dense matrices. You observe that this has slower performance than C=AB. This is expected. The performance loss is not due to a lack of specialize API. It's due to the data layout of A^T. When A is a CSR matrix, A^T has the entries stored column-by-column. This is not an algorithmically-convenient order. We can, of course, try to make it faster, but a significant performance gap is probably unavoidable. If possible, you can try storing A^T instead of A and using opA=NON_TRANSPOSE (or storing A in CSC format). In that case, A^T will have the data arranged row-by-row, and you should get faster performance.

JanuszL added the cuSPARSE label Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuSPARSE dense matrix and sparse matrix multiplication resulting in a dense matrix #211

cuSPARSE dense matrix and sparse matrix multiplication resulting in a dense matrix #211

zhoeujei commented Aug 22, 2024 •

edited

Loading

Tasks

essex-edwards commented Aug 22, 2024

zhoeujei commented Aug 23, 2024

essex-edwards commented Aug 23, 2024

zhoeujei commented Aug 26, 2024

essex-edwards commented Aug 26, 2024

cuSPARSE dense matrix and sparse matrix multiplication resulting in a dense matrix #211

cuSPARSE dense matrix and sparse matrix multiplication resulting in a dense matrix #211

Comments

zhoeujei commented Aug 22, 2024 • edited Loading

Tasks

essex-edwards commented Aug 22, 2024

zhoeujei commented Aug 23, 2024

essex-edwards commented Aug 23, 2024

zhoeujei commented Aug 26, 2024

essex-edwards commented Aug 26, 2024

zhoeujei commented Aug 22, 2024 •

edited

Loading