Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][NATIVECPU] Enable source-based code coverage in Native CPU #15073

Merged
merged 32 commits into from
Nov 28, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
4147842
[SYCLNATIVECPU] initial support for code coverage
uwedolinsky Aug 12, 2024
e72e647
Merge remote-tracking branch 'origin/sycl' into uwe/syclcoverage
uwedolinsky Aug 12, 2024
ef55ccb
[SYCLNATIVECPU] fixed comments
uwedolinsky Aug 13, 2024
2755480
[SYCLNATIVECPU] fixed comments
uwedolinsky Aug 13, 2024
f28c4fb
[SYCLNATIVECPU] removed check for clang-rt
uwedolinsky Aug 14, 2024
b02ef1e
Merge remote-tracking branch 'origin/sycl' into uwe/syclcoverage
uwedolinsky Aug 14, 2024
4e6664d
[SYCLNATIVECPU] clang-format
uwedolinsky Aug 14, 2024
593a981
[SYCLNATIVECPU] file mode change
uwedolinsky Aug 14, 2024
149a352
[SYCLNATIVECPU] source location change for coverage
uwedolinsky Aug 16, 2024
f56c4d4
Merge remote-tracking branch 'origin/sycl' into uwe/syclcoverage
uwedolinsky Aug 16, 2024
896ca81
Merge remote-tracking branch 'origin/sycl' into uwe/syclcoverage
uwedolinsky Sep 12, 2024
0ec7691
Merge remote-tracking branch 'origin/sycl' into uwe/syclcoverage
uwedolinsky Oct 1, 2024
55d31e7
Merge remote-tracking branch 'origin/sycl' into uwe/syclcoverage
uwedolinsky Oct 10, 2024
b90ea9b
[SYCL] Attach SourceLocation to kernel body to enable coverage gen
uwedolinsky Oct 11, 2024
094e10a
Merge remote-tracking branch 'origin/sycl' into uwe/syclcoverage
uwedolinsky Oct 11, 2024
f14c4f2
[SYCLNATIVECPU] put back NL to reduce noise
uwedolinsky Oct 14, 2024
7acb09f
[SYCLNATIVECPU] moved conditional
uwedolinsky Oct 14, 2024
da794ff
[SYCLNATIVECPU] restored original file mode
uwedolinsky Oct 14, 2024
d6dff0e
[SYCLNATIVECPU] fixed documentation
uwedolinsky Oct 14, 2024
637caec
[SYCLNATIVECPU] fixed documentation
uwedolinsky Oct 14, 2024
cb35294
Merge remote-tracking branch 'origin/sycl' into uwe/syclcoverage
uwedolinsky Oct 14, 2024
6b72582
[SYCLNATIVECPU] revert change
uwedolinsky Oct 18, 2024
ad356fb
Update clang/lib/Driver/ToolChains/SYCL.cpp
uwedolinsky Oct 18, 2024
ea90f2f
Update clang/lib/Driver/ToolChains/SYCL.cpp
uwedolinsky Oct 18, 2024
c25bad1
[SYCLNATIVECPU] Pascal Casing var
uwedolinsky Oct 21, 2024
82e4f64
[SYCLNATIVECPU] removed local variable
uwedolinsky Oct 21, 2024
a0d2d55
Merge remote-tracking branch 'origin/sycl' into uwe/syclcoverage
uwedolinsky Oct 21, 2024
49fa00f
Merge remote-tracking branch 'origin/sycl' into uwe/syclcoverage
uwedolinsky Nov 18, 2024
a20329c
[SYCL] new test checking valid kernel source location
uwedolinsky Nov 19, 2024
e8f776c
Merge remote-tracking branch 'origin/sycl' into uwe/syclcoverage
uwedolinsky Nov 19, 2024
d3558b7
Merge remote-tracking branch 'origin/sycl' into uwe/syclcoverage
uwedolinsky Nov 20, 2024
ae41bfa
[SYCLNATIVECPU] clang-format
uwedolinsky Nov 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions clang/lib/CodeGen/CodeGenFunction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1786,6 +1786,9 @@ void CodeGenFunction::GenerateCode(GlobalDecl GD, llvm::Function *Fn,
if (SyclOptReport.HasOptReportInfo(FD)) {
llvm::OptimizationRemarkEmitter ORE(Fn);
for (auto ORI : llvm::enumerate(SyclOptReport.GetInfo(FD))) {
// Temporarily apply arg location to ensure SourceLocToDebugLoc
// picks up the expected file.
ApplyDebugLocation tmp_raii(*this, ORI.value().KernelArgLoc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Pascal Case for naming variables. `TempRAII is an option.

It also isn't clear to me where this is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the variable name, hope it's ok.

This variable isn't (directly) used by the subsequent code in this function - it's just a RAII variable to temporarily set the argument location to maintain the current behavior of this non-llvm-upstream lit test. I have updated the PR description with more details. Please let me know if more clarification is needed. Thanks.

llvm::DiagnosticLocation DL =
SourceLocToDebugLoc(ORI.value().KernelArgLoc);
StringRef NameInDesc = ORI.value().KernelArgDescName;
Expand Down
23 changes: 23 additions & 0 deletions clang/lib/Driver/ToolChains/SYCL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1398,6 +1398,23 @@ static std::vector<OptSpecifier> getUnsupportedOpts(void) {
return UnsupportedOpts;
}

// Currently supported options by SYCL NativeCPU device compilation
static inline bool SupportedByNativeCPU(const SYCLToolChain &TC,
const OptSpecifier &Opt) {
if (!TC.IsSYCLNativeCPU)
return false;

switch (Opt.getID()) {
case options::OPT_fcoverage_mapping:
case options::OPT_fno_coverage_mapping:
case options::OPT_fprofile_instr_generate:
case options::OPT_fprofile_instr_generate_EQ:
case options::OPT_fno_profile_instr_generate:
return true;
}
return false;
}

SYCLToolChain::SYCLToolChain(const Driver &D, const llvm::Triple &Triple,
const ToolChain &HostTC, const ArgList &Args)
: ToolChain(D, Triple, Args), HostTC(HostTC),
Expand All @@ -1409,6 +1426,9 @@ SYCLToolChain::SYCLToolChain(const Driver &D, const llvm::Triple &Triple,
// Diagnose unsupported options only once.
for (OptSpecifier Opt : getUnsupportedOpts()) {
if (const Arg *A = Args.getLastArg(Opt)) {
// Native CPU can support options unsupported by other targets.
if (SupportedByNativeCPU(*this, Opt))
continue;
// All sanitizer options are not currently supported, except
// AddressSanitizer
if (A->getOption().getID() == options::OPT_fsanitize_EQ &&
Expand Down Expand Up @@ -1449,6 +1469,9 @@ SYCLToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,
bool Unsupported = false;
for (OptSpecifier UnsupportedOpt : getUnsupportedOpts()) {
if (Opt.matches(UnsupportedOpt)) {
// NativeCPU should allow most normal cpu options.
if (SupportedByNativeCPU(*this, Opt.getID()))
continue;
if (Opt.getID() == options::OPT_fsanitize_EQ &&
A->getValues().size() == 1) {
std::string SanitizeVal = A->getValue();
Expand Down
6 changes: 5 additions & 1 deletion clang/lib/Sema/SemaSYCL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3602,8 +3602,12 @@ class SyclKernelBodyCreator : public SyclKernelFieldHandler {
BodyStmts.insert(BodyStmts.end(), FinalizeStmts.begin(),
FinalizeStmts.end());

const auto *LDcl = NewBody;
elizabethandrews marked this conversation as resolved.
Show resolved Hide resolved
SourceLocation LL = LDcl ? LDcl->getBeginLoc() : SourceLocation();
SourceLocation LR = LDcl ? LDcl->getEndLoc() : SourceLocation();

return CompoundStmt::Create(SemaSYCLRef.getASTContext(), BodyStmts,
FPOptionsOverride(), {}, {});
FPOptionsOverride(), LL, LR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this actually doing? This is supposed to correspond to location in source code IIUC which of course doesn't exist for compiler generated code. So what location is being picked up here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is picking up the source location from the kernel body - so the compound statement uses the same location as the kernel body. You are right, this is a compiler-generated function, but it still needs a valid source location to ensure any tools processing the clang AST and requiring that source location to be valid to succeed. Other compiler-generated function like constructors also have valid locations even though they don't exist in source code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sounds reasonable but I would like @tahonermann to also weigh in here. IIRC the work we re currently implementing for upstream clang (which will replace a lot of this code eventually) does not set SourceLocations as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elizabethandrews @tahonermann do you have updates on this? Thank you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I somehow missed @elizabethandrews's earlier request for me to weigh in on this.

I think this is reasonable. I'm not sure there is clear guidance for cases like this where a function definition is composed from both generated and written code.

There might be cases in the new upstream implementation that will need to have source locations added. I think I generally opted for generated code to have builtin/unknown source locations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tahonermann I was trying to set the location similar to what clang does with implicitly generated constructors which also get a valid source location (the location of the struct name). The main aim of this PR is just to avoid asserts on invalid source locations in the clang AST when generating profiling/coverage code (Tested by the new invocation added to sycl/test/native_cpu/vector-add.cpp.

}

void annotateHierarchicalParallelismAPICalls() {
Expand Down
6 changes: 6 additions & 0 deletions clang/test/Driver/sycl-native-cpu.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,9 @@

// RUN: %clangxx -fsycl -fsycl-targets=spir64 %s -### 2>&1 | FileCheck -check-prefix=CHECK-NONATIVECPU %s
// CHECK-NONATIVECPU-NOT: "-D" "__SYCL_NATIVE_CPU__"

// Checking that coverage testing options are accepted by native_cpu, and that device and host compilation invocations receive the same options
// RUN: %clangxx -fsycl -fsycl-targets=native_cpu -Werror -fno-profile-instr-generate -fprofile-instr-generate -fno-coverage-mapping -fcoverage-mapping -### %s 2>&1 | FileCheck %s --check-prefix=CHECK_COV_INVO
// CHECK_COV_INVO:{{.*}}clang{{.*}}-fsycl-is-device{{.*}}"-fsycl-is-native-cpu" "-D" "__SYCL_NATIVE_CPU__"{{.*}}"-fprofile-instrument=clang"{{.*}}"-fcoverage-mapping" "-fcoverage-compilation-dir={{.*}}"
// CHECK_COV_INVO:{{.*}}clang{{.*}}"-fsycl-is-host"{{.*}}"-fprofile-instrument=clang"{{.*}}"-fcoverage-mapping" "-fcoverage-compilation-dir={{.*}}"

37 changes: 28 additions & 9 deletions sycl/doc/design/SYCLNativeCPU.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# SYCL Native CPU

The SYCL Native CPU flow aims at treating the host CPU as a "first class citizen", providing a SYCL implementation that targets CPUs of various different architectures, with no other dependencies than DPC++ itself, while bringing performances comparable to state-of-the-art CPU backends.
The SYCL Native CPU flow aims at treating the host CPU as a "first class citizen", providing a SYCL implementation that targets CPUs of various different architectures, with no other dependencies than DPC++ itself, while bringing performances comparable to state-of-the-art CPU backends. SYCL Native CPU also provides some initial/experimental support for LLVM's [source-based code coverage tools](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html) (see also section [Code coverage](#code-coverage)).

# Compiler and runtime options

The SYCL Native CPU flow is enabled by setting `native_cpu` as a `sycl-target` (please note that currently doing so overrides any other SYCL target specified in the compiler invocation):
The SYCL Native CPU flow is enabled by setting `native_cpu` as a `sycl-target`:

```
clang++ -fsycl -fsycl-targets=native_cpu <input> -o <output>
Expand All @@ -28,9 +28,16 @@ clang++ <device-ir> -o <device-o>
clang++ -L<sycl-lib-path> -lsycl <device-o> <host-o> -o <output>
```

Note that SYCL Native CPU co-exists alongside the other SYCL targets. For example, the following command line builds SYCL code simultaneously for SYCL Native CPU and for OpenCL.

```
clang++ -fsycl -fsycl-targets=native_cpu,spir64 <input> -o <output>
```
The application can then run on either SYCL target by setting the DPC++ `ONEAPI_DEVICE_SELECTOR` environment variable accordingly.

## Configuring DPC++ with SYCL Native CPU

SYCL Native CPU needs to be enabled explictly when configuring DPC++, using `--native_cpu`, e.g.
SYCL Native CPU needs to be enabled explicitly when configuring DPC++, using `--native_cpu`, e.g.

```
python buildbot/configure.py \
Expand Down Expand Up @@ -86,7 +93,19 @@ Whole Function Vectorization is enabled by default, and can be controlled throug
* `-mllvm -sycl-native-cpu-no-vecz`: disable Whole Function Vectorization.
* `-mllvm -sycl-native-cpu-vecz-width`: sets the vector width to the specified value, defaults to 8.

For more details on how the Whole Function Vectorizer is integrated for SYCL Native CPU, refer to the [Technical details[(#technical-details) section.
For more details on how the Whole Function Vectorizer is integrated for SYCL Native CPU, refer to the [Technical details](#technical-details) section.

# Code coverage

SYCL Native CPU has experimental support for LLVM's source-based [code coverage](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html). This enables coverage testing across device and host code.
Example usage:

```bash
clang.exe -fsycl -fsycl-targets=native_cpu -fprofile-instr-generate -fcoverage-mapping %fname% -o vector-add.exe
.\vector-add.exe
llvm-profdata merge -sparse default.profraw -o foo.profdata
llvm-cov show .\vector-add.exe -instr-profile=foo.profdata
```

## Ongoing work

Expand All @@ -95,7 +114,7 @@ For more details on how the Whole Function Vectorizer is integrated for SYCL Nat
* Subgroup support
* Performance optimizations

### Please note that Windows support is temporarily disabled due to some implementation details, it will be reinstantiated soon.
### Please note that Windows is partially supported but temporarily disabled due to some implementation details, it will be re-enabled soon.

# Technical details

Expand Down Expand Up @@ -140,13 +159,13 @@ entry:
}
```

For the SYCL Native CPU target, the device compiler is in charge of materializing the SPIRV builtins (such as `@__spirv_BuiltInGlobalInvocationId`), so that they can be correctly updated by the runtime when executing the kernel. This is performed by the [PrepareSYCLNativeCPU pass](https://github.com/intel/llvm/blob/sycl/llvm/lib/SYCLLowerIR/PrepareSYCLNativeCPU.cpp).
For the SYCL Native CPU target, the device compiler is in charge of materializing the SPIRV builtins (such as `@__spirv_BuiltInGlobalInvocationId`), so that they can be correctly updated by the runtime when executing the kernel. This is performed by the [PrepareSYCLNativeCPU pass](https://github.com/intel/llvm/blob/sycl/llvm/lib/SYCLNativeCPUUtils/PrepareSYCLNativeCPU.cpp).
The PrepareSYCLNativeCPUPass also emits a `subhandler` function, which receives the kernel arguments from the SYCL runtime (packed in a vector), unpacks them, and forwards only the used ones to the actual kernel.


## PrepareSYCLNativeCPU Pass

This pass will add a pointer to a `nativecpu_state` struct as kernel argument to all the kernel functions, and it will replace all the uses of SPIRV builtins with the return value of appropriately defined functions, which will read the requested information from the `__nativecpu_state` struct. The `__nativecpu_state` struct and the builtin functions are defined in [native_cpu.hpp](https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/detail/native_cpu.hpp).
This pass will add a pointer to a `native_cpu::state` struct as kernel argument to all the kernel functions, and it will replace all the uses of SPIRV builtins with the return value of appropriately defined functions, which will read the requested information from the `native_cpu::state` struct. The `native_cpu::state` struct is defined in the [native_cpu UR adapter](https://github.com/oneapi-src/unified-runtime/blob/main/source/adapters/native_cpu/nativecpu_state.hpp) and the builtin functions are defined in the [native_cpu device library](https://github.com/intel/llvm/blob/sycl/libdevice/nativecpu_utils.cpp).


The resulting IR is:
Expand Down Expand Up @@ -188,11 +207,11 @@ entry:
}
```

As you can see, the `subhandler` steals the kernel's function name, and receives two pointer arguments: the first one points to the kernel arguments from the SYCL runtime, and the second one to the `__nativecpu_state` struct.
As you can see, the `subhandler` steals the kernel's function name, and receives two pointer arguments: the first one points to the kernel arguments from the SYCL runtime, and the second one to the `nativecpu::state` struct.

## Handling barriers

On SYCL Native CPU, calls to `__spirv_ControlBarrier` are handled using the `WorkItemLoopsPass` from the oneAPI Construction Kit. This pass handles barriers by splitting the kernel between calls calls to `__spirv_ControlBarrier`, and creating a wrapper that runs the subkernels over the local range. In order to correctly interface to the oneAPI Construction Kit pass pipeline, SPIRV builtins are converted to `mux` builtins (used by the OCK) by the `ConvertToMuxBuiltinsSYCLNativeCPUPass`.
On SYCL Native CPU, calls to `__spirv_ControlBarrier` are handled using the `WorkItemLoopsPass` from the oneAPI Construction Kit. This pass handles barriers by splitting the kernel between calls to `__spirv_ControlBarrier`, and creating a wrapper that runs the subkernels over the local range. In order to correctly interface to the oneAPI Construction Kit pass pipeline, SPIRV builtins are defined in the device library to call the corresponding `mux` builtins (used by the OCK).

## Vectorization

Expand Down
6 changes: 6 additions & 0 deletions sycl/test/native_cpu/vector-add.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@
// RUN: %clangxx -fsycl -fsycl-targets=native_cpu -mllvm -sycl-native-cpu-vecz-width=4 %s -g -o %t-vec
// RUN: env ONEAPI_DEVICE_SELECTOR="native_cpu:cpu" %t-vec

// Ensure coverage options work in the compiler invocations.
// For builds with asserts enabled we also need to pass the option
// -mllvm -system-headers-coverage
// We need to also check if clang-rt is built and then run the executable and verify the (profiling) outputs.
// RUN: %clangxx -fsycl -fsycl-targets=native_cpu %s -fprofile-instr-generate -fcoverage-mapping -mllvm -system-headers-coverage -c -o %t

#include <sycl/sycl.hpp>

#include <array>
Expand Down
Loading