-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add Experimental Range Rounding #12690
Conversation
@intel/dpcpp-clang-driver-reviewers I am not sure about compiler flags. Instead of introducing the new |
I like the idea of introducing a new |
+1 to |
@mdtoguchi @aelovikov-intel changes for the new |
d0adeae
to
fed0545
Compare
90324d0
to
b38a278
Compare
I tested this PR with https://gist.github.com/rafbiels/2b584584cfd6412e6b255adab4c264d6 on NVIDIA sm_86 GPU. Here's the outcomes:
The improvement between 2024.0.2 and |
Ping @againull @intel/llvm-reviewers-runtime @intel/dpcpp-cfe-reviewers @intel/dpcpp-clang-driver-reviewers @intel/unified-runtime-reviewers |
Can we add it into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FE changes LGTM
OK sure thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving to satisfy the requirement but I don't think UR reviewers are responsible for any of the code changed
Test added, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for Driver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FE changes LGTM
eac2d32
to
856b61f
Compare
856b61f
to
1965020
Compare
This commit adds new clang command line option -fsycl-exp-range-rounding. Experimental range rounding maps all 1, 2 or 3 dim range kernels to a 1d range kernel. This can give performance improvements when inner dimensions are oddly shaped.
In order to get better performance it is beneficial to preserve the dimensionality of the range. This new experimental range rounding does rounding in each dimension. The runtime can take suggestions on the size of the workgroup in all directions using SYCL_PARALLEL_FOR_RANGE_ROUNDING_PARAMS=x:y:z. In the case where -fsycl-exp-range-rounding is used, the middle param is the workgroup size in all dimensions, so for a 1d range kernel, the global range will divide {y}. In 2d the global range will be some number of workgroups of size {y, y}. The same for 3d with {y, y, y}.
Exp range rounding can also be used with the env var: SYCL_PARALLEL_FOR_RANGE_ROUNDING_PARAMS.
Adds the test case where -fsycl-range-rounding=force is used together with -fsycl-exp-range-rounding. This ensures that flags do not interfere with eachother.
Add test that compares the performance of no range rounding, normal range rounding and experimental range rounding.
Clarify that disabling range rounding will override experimental range rounding.
Check for the presence of macro __SYCL_EXP_PARALLEL_FOR_RANGE_ROUNDING__ when using -fsycl-exp-range-rounding.
1965020
to
17de821
Compare
@intel/llvm-gatekeepers this can be merged now |
This commit adds new clang command line option -fsycl-exp-range-rounding. Experimental range rounding maps all 1, 2 or 3 dim range kernels to range rounded kernels which can be rounded in all dims.
SYCL_PARALLEL_FOR_RANGE_ROUNDING_PARAMS=x:y:z
can be used for exp range rounding, where they
param will be a factor of the rounded range in all dims