Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csrmv_mp not supported #29

Open
bdqp opened this issue Dec 5, 2016 · 1 comment
Open

csrmv_mp not supported #29

bdqp opened this issue Dec 5, 2016 · 1 comment

Comments

@bdqp
Copy link

bdqp commented Dec 5, 2016

In the latest CUDA 8 release, there is a new version of csrmv that is supposed to be better for operations involving transpose, i.e. y = transpose(A)*x. The performance of the earlier csrmv function is much degraded when the transpose is involved.

I was wondering if you guys would be able to open up the interface to include csrmv_mp as well as the csrmv?

This routine was introduced specifically to address some of the loss of performance in the regular csrmv() code due to irregular sparsity patterns and transpose operations. The core kernels are based on the "MergePath" approach created by Duanne Merril. By using this approach, we are able to provide performance independent of a sparsity pattern, transpose or non-transpose, across data types.

@kshyatt
Copy link
Contributor

kshyatt commented Dec 5, 2016

Hi! I would love to add support, but have been working off of CUDA 7.5 for ages. What I'd likely do is add a low-level routine for csrmv_mp and not have the default At_mul_B aliased to it yet, because CUDA 8 is still quite new. The process of wrapping the CUDA library functions is actually quite simple, if you wanted to open a PR with the function wrapped and tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants