Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parallelLoopPatterns for the Intel FPGA SYCL back-end #2456

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Jan 23, 2025

Extend parallelLoopPatterns to use 64-bit indices to support more than 2³² elements.

Add a dedicated fix for SYCL back-ends, that assume that kernel-related queries (and possibly indices) are limited to INT_MAX, unless the -fno-sycl-id-queries-fit-in-int compiler flag is used.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 23, 2025

I could avoid the new Vec constructor if somebody manages to update parallelLoopPatterns to build without it - though I think it may be more generally useful to have it.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 23, 2025

We could also make the __SYCL_ID_QUERIES_FIT_IN_INT__ check more general, and introduce a new property in include/alpaka/acc/AccDevProps.hpp with the maximum number of threads per grid.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 23, 2025

Fixes #2450.

@fwyzard fwyzard force-pushed the fix_parallelLoopPatterns__for_Intel_FPGA_SYCL branch from 89deeef to 3c8249b Compare January 23, 2025 17:22
@fwyzard fwyzard self-assigned this Jan 23, 2025
@fwyzard fwyzard added this to the 2.0.0 milestone Jan 23, 2025
@fwyzard fwyzard linked an issue Jan 23, 2025 that may be closed by this pull request
@fwyzard fwyzard force-pushed the fix_parallelLoopPatterns__for_Intel_FPGA_SYCL branch 2 times, most recently from 6902658 to b9416da Compare January 24, 2025 16:36
Copy link
Member

@psychocoderHPC psychocoderHPC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I request changes but currently do not expect that this is changed. I would like to talk about possible side effects if it is correct what I wrote.

typename UDim,
typename UVal,
typename
= std::enable_if_t<UDim::value == TDim::value && std::is_nothrow_convertible_v<std::decay_t<UVal>, TVal>>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this. In alpaka we avoid everywhere implicit casts (if we have it it is IMO a bug).
This constructor opens implicit casts and we are able to convert 64bit values to 32bit values.
This could result in hidden issues due to silent loss of precision.

If I am wrong please correct me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm, yes, this allows an implicit cast from Vec<Dim3, uint64_t> to Vec<Dim3, uint32_t>, in the same way that one can implicitly cast from uint64_t to uint32_t.

In genera I'm not fond of all the explicit casts that we need to use with alpaka, but if you don't like to relax this constraint, ok.

I will have to figure out how to fix the test without this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it wasn't complicated :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, maybe we could still add a conversion constructor, with some requirements:

  • if the destination type is larger than the source type (e.g. from int32_t to int64_t), there is not runtime check
  • otherwise, we explicitly check at runtime that the values can be stored in the destination type without loss of precision

@fwyzard fwyzard force-pushed the fix_parallelLoopPatterns__for_Intel_FPGA_SYCL branch from b9416da to c770278 Compare January 31, 2025 17:58
@psychocoderHPC
Copy link
Member

@fwyzard could you please push again to trigger the CI. In case someone is assigned the hzdr bot is crashing an ignoring the job.
Before you was assigned and therefore the job not reported the status.

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 3, 2025

In case someone is assigned the hzdr bot is crashing an ignoring the job.

Ah !

@fwyzard fwyzard force-pushed the fix_parallelLoopPatterns__for_Intel_FPGA_SYCL branch from c770278 to 5bb7a77 Compare February 3, 2025 10:17
@psychocoderHPC psychocoderHPC merged commit 09924e2 into alpaka-group:develop Feb 4, 2025
25 checks passed
@fwyzard fwyzard deleted the fix_parallelLoopPatterns__for_Intel_FPGA_SYCL branch February 4, 2025 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parallelLoopPatterns fails on the Intel FPGA emulator SYCL backend
2 participants