-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuSPARSE dense matrix and sparse matrix multiplication resulting in a dense matrix #211
Comments
The SpMM routine can do this. The API is phrased as |
It is indeed possible to achieve this through transposition, but it reduces efficiency. Please consider adding direct support. |
Thanks for the suggestion. Have you got any benchmarks or particular call sequences or matrices that seem unexpectedly slow? If you can share that, that would help us optimize for your particular use case. |
Yes, I set the input matrices dimensions to 1024x1024, and the output matrix dimensions to 1024x1024. The algorithm execution speed on the cusparseSpMM interface with the parameter CUSPARSE_OPERATION_TRANSPOSE is twice as fast as the algorithm execution speed on the cusparseSpMM interface with the parameter CUSPARSE_OPERATION_NON_TRANSPOSE. Therefore, I would like to ask if NVIDIA provides a library for dense matrix * sparse matrix = dense matrix operations. |
Okay. If I understand correctly, you are using SpMM to compute C = A^TB where A is a CSR matrix and B,C are dense matrices. You observe that this has slower performance than C=AB. This is expected. The performance loss is not due to a lack of specialize API. It's due to the data layout of A^T. When A is a CSR matrix, A^T has the entries stored column-by-column. This is not an algorithmically-convenient order. We can, of course, try to make it faster, but a significant performance gap is probably unavoidable. If possible, you can try storing A^T instead of A and using opA=NON_TRANSPOSE (or storing A in CSC format). In that case, A^T will have the data arranged row-by-row, and you should get faster performance. |
Why doesn't cuSPARSE support dense matrix sparse matrix multiplication resulting in a dense matrix? Many application scenarios require this. Please consider adding support.
Tasks
The text was updated successfully, but these errors were encountered: