Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add Experimental Range Rounding #12690

Merged
merged 9 commits into from
Apr 1, 2024
Merged

Conversation

hdelan
Copy link
Contributor

@hdelan hdelan commented Feb 12, 2024

This commit adds new clang command line option -fsycl-exp-range-rounding. Experimental range rounding maps all 1, 2 or 3 dim range kernels to range rounded kernels which can be rounded in all dims.

SYCL_PARALLEL_FOR_RANGE_ROUNDING_PARAMS=x:y:z can be used for exp range rounding, where the y param will be a factor of the rounded range in all dims

@hdelan hdelan requested review from a team as code owners February 12, 2024 18:07
@hdelan hdelan requested a review from againull February 12, 2024 18:07
@hdelan
Copy link
Contributor Author

hdelan commented Feb 12, 2024

@intel/dpcpp-clang-driver-reviewers I am not sure about compiler flags. Instead of introducing the new -fsycl-force-range-rounding, should I instead make a new flag -fsycl-range-rounding=[force/disable] which would deprecate the existing -fsycl-disable-range-rounding?

@mdtoguchi
Copy link
Contributor

@intel/dpcpp-clang-driver-reviewers I am not sure about compiler flags. Instead of introducing the new -fsycl-force-range-rounding, should I instead make a new flag -fsycl-range-rounding=[force/disable] which would deprecate the existing -fsycl-disable-range-rounding?

I like the idea of introducing a new -fsycl-range-rounding=<arg> option. It alleviates any potential confusion when multiple range-rounding options are used on the command line.

@aelovikov-intel
Copy link
Contributor

+1 to -fsycl-range-rounding=[disable|on|force]. I'd also split the experimental 1D rounding into a separate PR.

@hdelan
Copy link
Contributor Author

hdelan commented Feb 14, 2024

@mdtoguchi @aelovikov-intel changes for the new -fsycl-range-rounding flag are here. I will keep this PR for the experimental range rounding

@hdelan hdelan marked this pull request as draft March 20, 2024 15:35
@hdelan hdelan changed the title [SYCL][Draft] Make some changes for range rounding [SYCL] Add Experimental Range Rounding Mar 20, 2024
@hdelan hdelan force-pushed the 1d-range-rounding branch from d0adeae to fed0545 Compare March 21, 2024 16:16
@hdelan hdelan force-pushed the 1d-range-rounding branch from 90324d0 to b38a278 Compare March 21, 2024 17:24
@hdelan hdelan temporarily deployed to WindowsCILock March 21, 2024 17:53 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to WindowsCILock March 21, 2024 19:26 — with GitHub Actions Inactive
@intel intel deleted a comment from github-actions bot Mar 22, 2024
@rafbiels
Copy link
Contributor

I tested this PR with https://gist.github.com/rafbiels/2b584584cfd6412e6b255adab4c264d6 on NVIDIA sm_86 GPU. Here's the outcomes:

   Release     Time [ms]     CUDA gridXYZ     CUDA blockXYZ
-----------------------------------------------------------
  2023.2.1            63     {2, 1888, 1}       {394, 1, 1}
  2024.0.2           223      {788, 3, 1}       {1, 768, 1}
     test1            66     {197, 16, 1}       {4, 118, 1}
     test2            61     {25, 118, 1}       {32, 16, 1}

test1 = hdelan:1d-range-rounding with default flags
test2 = hdelan:1d-range-rounding with -fsycl-exp-range-rounding

The improvement between 2024.0.2 and test1 is thanks to #12663 but we can see that this PR (test2) improves the local range assignment further and the performance surpasses the earlier naive implementation from 2023.2.1.

@hdelan hdelan temporarily deployed to WindowsCILock March 22, 2024 12:47 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to WindowsCILock March 22, 2024 14:07 — with GitHub Actions Inactive
@hdelan hdelan marked this pull request as ready for review March 22, 2024 16:01
@hdelan
Copy link
Contributor Author

hdelan commented Mar 22, 2024

Ping @againull @intel/llvm-reviewers-runtime @intel/dpcpp-cfe-reviewers @intel/dpcpp-clang-driver-reviewers @intel/unified-runtime-reviewers

@aelovikov-intel
Copy link
Contributor

I tested this PR with https://gist.github.com/rafbiels/2b584584cfd6412e6b255adab4c264d6 on NVIDIA sm_86 GPU. Here's the outcomes:

Can we add it into test-e2e/PerformanceTests ?

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes LGTM

@hdelan
Copy link
Contributor Author

hdelan commented Mar 22, 2024

The perf tests stuff is in a very early stage, we're basically finding it out as we go. Currently it's just the logs captured for the tests there even in case of a PASS, the rest is left to humans to interpret.

OK sure thanks

Copy link
Contributor

@aarongreig aarongreig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving to satisfy the requirement but I don't think UR reviewers are responsible for any of the code changed

@hdelan
Copy link
Contributor Author

hdelan commented Mar 22, 2024

Actually I just realized there are no FE tests. Please add a test for the predefines.

Test added, thanks.

Copy link
Contributor

@mdtoguchi mdtoguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for Driver

Copy link
Contributor

@smanna12 smanna12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes LGTM

@hdelan hdelan force-pushed the 1d-range-rounding branch from eac2d32 to 856b61f Compare March 28, 2024 09:49
@hdelan hdelan closed this Mar 28, 2024
@hdelan hdelan reopened this Mar 28, 2024
@hdelan hdelan force-pushed the 1d-range-rounding branch from 856b61f to 1965020 Compare March 28, 2024 14:31
hdelan added 9 commits March 29, 2024 09:57
This commit adds new clang command line option
-fsycl-exp-range-rounding. Experimental range rounding maps all 1, 2 or
3 dim range kernels to a 1d range kernel. This can give performance
improvements when inner dimensions are oddly shaped.
In order to get better performance it is beneficial to preserve
the dimensionality of the range. This new experimental range
rounding does rounding in each dimension.

The runtime can take suggestions on the size of the workgroup in all
directions using SYCL_PARALLEL_FOR_RANGE_ROUNDING_PARAMS=x:y:z. In
the case where -fsycl-exp-range-rounding is used, the middle param is
the workgroup size in all dimensions, so for a 1d range kernel, the
global range will divide {y}. In 2d the global range will be some
number of workgroups of size {y, y}. The same for 3d with {y, y, y}.
Exp range rounding can also be used with the env var:
SYCL_PARALLEL_FOR_RANGE_ROUNDING_PARAMS.
Adds the test case where -fsycl-range-rounding=force is used together
with -fsycl-exp-range-rounding. This ensures that flags do not interfere
with eachother.
Add test that compares the performance of no range rounding, normal
range rounding and experimental range rounding.
Clarify that disabling range rounding will override experimental range
rounding.
Check for the presence of macro __SYCL_EXP_PARALLEL_FOR_RANGE_ROUNDING__
when using -fsycl-exp-range-rounding.
@hdelan
Copy link
Contributor Author

hdelan commented Apr 1, 2024

@intel/llvm-gatekeepers this can be merged now

@martygrant martygrant merged commit 319cf0e into intel:sycl Apr 1, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants