csrmv_mp not supported #29

bdqp · 2016-12-05T20:07:57Z

In the latest CUDA 8 release, there is a new version of csrmv that is supposed to be better for operations involving transpose, i.e. y = transpose(A)*x. The performance of the earlier csrmv function is much degraded when the transpose is involved.

I was wondering if you guys would be able to open up the interface to include csrmv_mp as well as the csrmv?

This routine was introduced specifically to address some of the loss of performance in the regular csrmv() code due to irregular sparsity patterns and transpose operations. The core kernels are based on the "MergePath" approach created by Duanne Merril. By using this approach, we are able to provide performance independent of a sparsity pattern, transpose or non-transpose, across data types.

The text was updated successfully, but these errors were encountered:

kshyatt · 2016-12-05T22:08:11Z

Hi! I would love to add support, but have been working off of CUDA 7.5 for ages. What I'd likely do is add a low-level routine for csrmv_mp and not have the default At_mul_B aliased to it yet, because CUDA 8 is still quite new. The process of wrapping the CUDA library functions is actually quite simple, if you wanted to open a PR with the function wrapped and tested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csrmv_mp not supported #29

csrmv_mp not supported #29

bdqp commented Dec 5, 2016 •

edited

Loading

kshyatt commented Dec 5, 2016

csrmv_mp not supported #29

csrmv_mp not supported #29

Comments

bdqp commented Dec 5, 2016 • edited Loading

kshyatt commented Dec 5, 2016

bdqp commented Dec 5, 2016 •

edited

Loading