-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
65 lines (52 loc) · 3.27 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
CASITA is a tool for automatic analysis of OTF2 trace files that have been
generated with Score-P. It determines program activities with high impact on the
total program runtime and the load balancing. CASITA generates an OTF2 trace
with additional information such as the critical path, waiting time, and the
cause of wait states. The same metrics are used to generate a summary profile
which rates activities according their potential to improve the program runtime
and the load balancing. A summary of inefficient patterns exposes waiting times
in the individual programming models and APIs.
Internally, CASITA constructs a distributed DAG, where each node represents an
event in time and edges the dependencies between events on different locations
(processes, threads and CUDA streams). Events on the same locations have an
implicit dependency by the happens-before relation. The local DAGs, one per
MPI process, are connected via remote edges. Only MPI, OpenMP and CUDA nodes
are represented in the graph. Nevertheless, events from compiler
instrumentation are accounted.
Publications:
"CASITA: A Tool for Identifying Critical Optimization Targets in Distributed
Heterogeneous Applications"
http://dx.doi.org/10.1109/ICPPW.2014.35
"Scalable critical-path analysis and optimization guidance for hybrid MPI-CUDA
applications"
http://dx.doi.org/10.1177/1094342016661865
"Critical-blame analysis for OpenMP 4.0 offloading on Intel Xeon Phi"
http://dx.doi.org/10.1016/j.jss.2015.12.050
"Integrating Critical-Blame Analysis for Heterogeneous Applications into the
Score-P Workflow"
http://dx.doi.org/10.1007/978-3-319-16012-2_8
"Analyzing Offloading Inefficiencies in Scalable Heterogeneous Applications"
http://dx.doi.org/10.1007/978-3-319-67630-2_34
CASITA analysis requirements:
The MPI analysis is currently based on reenacting the MPI communication in
forward and backward direction, which means that the respective communication
records have to be available in the trace. CASITA also needs the region enter
and leave events of MPI communication functions. Currently, the MPI support is
limited to (two-sided) point-to-point communication and blocking collectives.
The OpenMP analysis is still based on the OPARI2 instrumentation. It requires
the fork/join, parallel begin/end, and the barrier begin/end records. Both, MPI
and OpenMP analysis work with the default Score-P trace output.
CUDA analysis is supported since Score-P 1.3. The respective OTF2 trace file has
to contain the following information:
* Enter and leave events of CUDA driver API functions that synchronize with the
device (including blocking CUDA memory copies and CUDA event queries) as well
as CUDA kernel launch and CUDA event record.
* Enter and leave events of kernel launches and kernels
* Kernel references (dependency information between launch and synchronization
of kernels)
The minimum set of Score-P CUDA recording features is "driver,kernel,references"
which can be set with the environment variable SCORE_CUDA_ENABLE.
The OpenCL analysis also requires kernel dependencies, e.g. to detect the OpenCL
queue a kernel is enqueued to or synchronized with clFinish. This is currently
implemented in a Score-P development branch. OpenACC analysis is indirectly
supported with the low-level paradigms CUDA and OpenCL.