Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cuSolver] Avoid repeated ctxCreate/Destroy for all Lapack API calls. #298

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

HaoweiZhangIntel
Copy link

@HaoweiZhangIntel HaoweiZhangIntel commented Mar 30, 2023

Description

Mainly improve the performance of Lapack for CUDA backend by avoiding repeated cuCtxCreate/Destroy calls.

  • Apply the same logic as cuBlas to cuSolver at placedContext_.
    This could avoid calling cuCtxCreate & cuCtxDestroy every time when using multiple lapck APIs.
    For example, when solving Ax=b with Cholesky factorization, one needs to use both lapack::potrf & lapack::potrs APIs.
    cuCtxCreate/Destroy takes much longer than most GPU lapack kernels, see the below images from nvvp diagnostics:

    Before modification:
    image
    After modification:
    image

  • Fix deprecation warnings from cuda.hpp for cuSolver ([BLAS] fix deprecation warnings from cuda.hpp #295).

  • Fix the bug in dft (mklgpu => mklcpu).

Checklist

All Submissions

* Apply the same logic as cuBlas to cuSolver at placedContext_.
  Avoid calling cuCtxCreate every time when using multiple lapck APIs.

* Fix deprecation warnings from cuda.hpp for cuSolver (oneapi-src#295).

* Fix the bug in dft (mklgpu => mklcpu).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant