Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UR][L0] Manage UMF pools through usm::pool_manager #17065

Merged
merged 4 commits into from
Mar 10, 2025

Conversation

kswiecicki
Copy link
Contributor

@kswiecicki kswiecicki commented Feb 19, 2025

@kswiecicki kswiecicki requested a review from a team as a code owner February 19, 2025 11:03
@pbalcer
Copy link
Contributor

pbalcer commented Feb 19, 2025

@igchor please review.

@kswiecicki
Copy link
Contributor Author

kswiecicki commented Feb 24, 2025

@igchor, I've had an issue with obtaining the pool config from InitializeDisjointPoolConfig() locally in the usm pool constructor and passing it for disjoint pool creation. Some allocations in test-e2e/USM/usm_pooling.cpp weren't really pooled even though they should be and the test was failing. Copying the config from DisjointPoolConfigInstance global variable instead seems to work.
Could you re-review this change please and maybe confirm if v2 adapter doesn't have a similar issue.

@igchor
Copy link
Member

igchor commented Feb 25, 2025

@kswiecicki Hm, that's strange. Were all UR tests passing? I've had some problems with this test and v2 adapter but it's because it expects certain logs, and UMF wasn't outputting them (this will be fixed once UR switches to using L0 loader API logging)

@kswiecicki
Copy link
Contributor Author

Isn't UseUSMAllocator a global variable? Can't we just read it in contextCreate and decide whether to create ProxyPool or DisjointPool there? We would only have one defaultPool in the context this way.

I think I've fixed the final issue with E2E tests. Because of the change that creates either ProxyPool or DisjointPool at the startup but never both, the test ran with SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR=1 no longer needs to account for the allocations made by DisjointPool when querying the page size.

@igchor
Copy link
Member

igchor commented Feb 27, 2025

Isn't UseUSMAllocator a global variable? Can't we just read it in contextCreate and decide whether to create ProxyPool or DisjointPool there? We would only have one defaultPool in the context this way.

I think I've fixed the final issue with E2E tests. Because of the change that creates either ProxyPool or DisjointPool at the startup but never both, the test ran with SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR=1 no longer needs to account for the allocations made by DisjointPool when querying the page size.

Okay, that makes sense. However, removing the CHECK-COUNT will break this test for L0 v2 adapter, because there we always query page size (it's done by L0 provider). Perhaps we could modify this to be:

CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree

@kswiecicki
Copy link
Contributor Author

Isn't UseUSMAllocator a global variable? Can't we just read it in contextCreate and decide whether to create ProxyPool or DisjointPool there? We would only have one defaultPool in the context this way.

I think I've fixed the final issue with E2E tests. Because of the change that creates either ProxyPool or DisjointPool at the startup but never both, the test ran with SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR=1 no longer needs to account for the allocations made by DisjointPool when querying the page size.

Okay, that makes sense. However, removing the CHECK-COUNT will break this test for L0 v2 adapter, because there we always query page size (it's done by L0 provider). Perhaps we could modify this to be:

CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree
CHECK-OPT: zeMemFree

That could be a lot of unaccounted for frees, but I don't see a better solution unless there's CHECK-COUNT-8-OPT available.

@kswiecicki
Copy link
Contributor Author

The tests failing on CI are unrelated to this PR and are currently failing on other PRs.
Precommit Linux/Windows jobs (eg. https://github.com/intel/llvm/actions/runs/13675062022/job/38235698849 and https://github.com/intel/llvm/actions/runs/13672691248/job/38227250897)

********************
Failed Tests (2):
  SYCL :: AOT/early_aot.cpp
  SYCL :: SpecConstants/2020/image_selection.cpp
********************
Failed Tests (1):
  SYCL :: USM/memops2d/copy2d_device_to_device.cpp

This variable is not initialized in any way and it is set to nullptr by
default.
Level Zero provider internally stores native errors of ur_result_t type.
@kswiecicki
Copy link
Contributor Author

@intel/llvm-gatekeepers could someone merge this please.

@uditagarwal97
Copy link
Contributor

@intel/llvm-gatekeepers could someone merge this please.

I don't know why Check for private emails used in PRs workflow didn't ran on this PR. @sarnex FYI.
@kswiecicki was this PR moved over from UR via a script? Also, please use intel/codeplay email to author this PR, currently, GH says:

This commit will be authored by [email protected].

@uditagarwal97
Copy link
Contributor

@intel/llvm-gatekeepers could someone merge this please.

I don't know why Check for private emails used in PRs workflow didn't ran on this PR. @sarnex FYI. @kswiecicki was this PR moved over from UR via a script? Also, please use intel/codeplay email to author this PR, currently, GH says:

This commit will be authored by [email protected].

PR to fix the private email check workflow: #17333

When the L0 adapter internal pooling is turned off
eg. with SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR=1, the adapter no
longer creates an UMF pool that queries for page size during its
creation. The allocations made for the query were accounted for in
the L0 buffer ownership test, and it's no longer the case. It still
affects the v2 L0 adapter.
@kswiecicki
Copy link
Contributor Author

kswiecicki commented Mar 7, 2025

@intel/llvm-gatekeepers could someone merge this please.

I don't know why Check for private emails used in PRs workflow didn't ran on this PR. @sarnex FYI. @kswiecicki was this PR moved over from UR via a script? Also, please use intel/codeplay email to author this PR, currently, GH says:

This commit will be authored by [email protected].

I've had a problem with the UR script, so I've moved the PR manually. It's weird that those commits seem to be authored from [email protected], since git log shows my intel email. I've tweaked some github profile options and rebased this PR after #17333 merge. The mail check workflow passes, so I guess it's fixed now.

@kswiecicki
Copy link
Contributor Author

kswiecicki commented Mar 10, 2025

@intel/llvm-gatekeepers I believe this PR is ready to be merged, all CI jobs pass now.

@uditagarwal97 uditagarwal97 merged commit 38d7506 into intel:sycl Mar 10, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants