Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior of hypre_GetPointerLocation #1138

Open
ictmay opened this issue Sep 20, 2024 · 3 comments · May be fixed by #1238
Open

Inconsistent behavior of hypre_GetPointerLocation #1138

ictmay opened this issue Sep 20, 2024 · 3 comments · May be fixed by #1238

Comments

@ictmay
Copy link

ictmay commented Sep 20, 2024

The function hypre_GetPointerLocation in src/utilities/memory.c behaves differently on CUDA and HIP builds.

The functions cudaPointerGetAttributes and hipPointerGetAttributes are unreliable at identifying host pointers that weren't obtained from the CUDA/HIP host allocators respectively. For CUDA builds there are these lines in the above mentioned function:

if (attr.type == cudaMemoryTypeUnregistered)
{
   *memory_location = hypre_MEMORY_HOST;
}

A similar set of lines should be present for HIP builds, but are not.

Both CUDA and HIP builds will return hypre_MEMORY_HOST in the event of an error. However, given an arbitrary host pointer hipPointerGetAttributes will typically not error. Instead it will return a location of unregistered, and then hypre_GetPointerLocation will return hypre_MEMORY_UNDEFINED instead of hypre_MEMORY_HOST.

On debug builds with HIP this can result in a failed assertion in hypre_CheckMemoryLocation. I observed the following sequence of events recently:

  1. Called HYPRE_IJVectorAssemble
  2. That called hypre_IJVectorAssembleParDevice
  3. That called hypre_AuxParVectorDestroy(aux_vector)
  4. That deallocates the data fields inside the aux_vector struct correctly by specifying HYPRE_MEMORY_DEVICE
  5. Deallocating the struct itself fails since it is of course on host, and the above assertion fails when hypre_GetPointerLocation can not figure out that the struct is indeed on host.
@junghans
Copy link
Contributor

@victorapm, could you help us with this?

@liruipeng
Copy link
Contributor

We should add these. I guess probably when we added the hip support for this, the hip version of cudaMemoryTypeUnregistered did not exist, so it still use the old logic with old CUDA < 10. Thanks for reporting.

@victorapm victorapm linked a pull request Feb 21, 2025 that will close this issue
@victorapm
Copy link
Contributor

@junghans @ictmay Could you test the PR linked here? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants