You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the latest CUDA 8 release, there is a new version of csrmv that is supposed to be better for operations involving transpose, i.e. y = transpose(A)*x. The performance of the earlier csrmv function is much degraded when the transpose is involved.
I was wondering if you guys would be able to open up the interface to include csrmv_mp as well as the csrmv?
This routine was introduced specifically to address some of the loss of performance in the regular csrmv() code due to irregular sparsity patterns and transpose operations. The core kernels are based on the "MergePath" approach created by Duanne Merril. By using this approach, we are able to provide performance independent of a sparsity pattern, transpose or non-transpose, across data types.
The text was updated successfully, but these errors were encountered:
Hi! I would love to add support, but have been working off of CUDA 7.5 for ages. What I'd likely do is add a low-level routine for csrmv_mp and not have the default At_mul_B aliased to it yet, because CUDA 8 is still quite new. The process of wrapping the CUDA library functions is actually quite simple, if you wanted to open a PR with the function wrapped and tested.
In the latest CUDA 8 release, there is a new version of csrmv that is supposed to be better for operations involving transpose, i.e. y = transpose(A)*x. The performance of the earlier csrmv function is much degraded when the transpose is involved.
I was wondering if you guys would be able to open up the interface to include csrmv_mp as well as the csrmv?
This routine was introduced specifically to address some of the loss of performance in the regular csrmv() code due to irregular sparsity patterns and transpose operations. The core kernels are based on the "MergePath" approach created by Duanne Merril. By using this approach, we are able to provide performance independent of a sparsity pattern, transpose or non-transpose, across data types.
The text was updated successfully, but these errors were encountered: