Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question for GPU computation: lots of time on vector products #822

Closed
yuwenchen95 opened this issue Oct 3, 2023 · 1 comment
Closed

Comments

@yuwenchen95
Copy link

When I try to solve different positive definite linear system using different methods,

using SparseArrays, LinearAlgebra
using Krylov
using CUDA
using StatProfilerHTML

n = 10000
density = 0.005
L = sprand(n,n,density)
A = L'*L + spdiagm(0=> rand(n))
b = rand(n)
Ag = CUSPARSE.CuSparseMatrixCSR(A)
bg = CuVector(b)

msolver = MinresSolver(Ag,bg)
mqsolver = MinresQlpSolver(Ag,bg)
csolver = CgSolver(Ag,bg)
@profilehtml begin
    minres!(msolver,Ag,bg)

    minres_qlp!(mqsolver,Ag,bg)

    cg!(csolver,Ag,bg)
end

I found lots of time are spent on some dot operations (specifically to the computation of α in each solver), which is counterintuitive to me since a CPU version spends most of computation time on mul! operations and the dot product time is negligible. Is this the difference between running indirect methods on CPUs and GPUs or it is potentially a bug?

@amontoison
Copy link
Member

amontoison commented Oct 3, 2023

dot products are slow on GPU because it contains a "reduction" which means that even if we split the computation of the dot products at the end we need to synchronize all threads / cores to sum the components of the dot products.

mul! is the most expensive operation on GPU only if you have a very large problem (n=10000 if dense or n=100000 if sparse).

Note you can also test car and minares. They are new methods dedicated to symmetric (positive definite) systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants