-
Notifications
You must be signed in to change notification settings - Fork 0
/
introduction.tex
89 lines (82 loc) · 5.11 KB
/
introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
\section{Introduction}
As more scientific I/O applications are developed and used, the need for
improved fidelity and throughput of these applications is more pressing than ever.
Much design effort and investment is being put into improving not only
the I/O performance of applications but also the performance of related
system components (e.g., filesystem elements and networks). Being able to
identify, predict and analyze I/O behaviors are critical to ensuring
parallel storage systems are utilized efficiently~\cite{costa2021}.
%However, I/O
%performance continues to show high variations on large-scale
%production conditions in many cases~\cite{costa2021}.
%\RED{Cite NE}. %Some of these cases include running applications on clusters
%during the weekend, separate and disjoint time zones and read
%I/O's of runs within the same cluster.
Variations in I/O performance for an application can be caused by
aggregate contention for resources such as file systems and networks.
Congestion in these resources may even be caused by the access patterns
of the affected application itself~\cite{I/O-performance-variation}.
%Providing users, developerd, and system administratior with visual insight about the I/O behavior combined with other resources information during the runtime of the application will assest them in determining the root cause of I/O related problems.
This variation
makes it difficult to determine the root cause of I/O related problems
and get a thorough understanding of throughput for system-specific
behaviors and I/O performance in similar applications across a
system.
%Further, not knowing the origin of such variations will
%irectly affect the user and developer as unwanted time, effort and
%investment will need to be put into solving the issue.
Generally, the I/O performance is analyzed post-run by application
developers, researchers and users using regression testing or
other I/O characterization tools that capture the applications' behavior. An example of one of these tools is \emph{Darshan}, which
monitors and captures I/O information on access patterns from HPC
applications~\cite{Darshan}.
% Detailed information will be covered in the \emph{Approach} section.
Efforts to identify the origin of I/O performance usually come from the analyzed data
collected by these I/O characterization tools. Correlations are then made with the environment
in which they were run or by comparison with other analyses from
other application runs.
%the time in which these applications were tested.
However, this approach does not enable identification of
temporally significant variation of I/O performance
occurs during an application run or
%, if the developer or user wishes to, identify any
correlations between such behavior and the file system state,
network congestion or other resource contentions.
% and the I/O performance.
%However, this analysis approach does not take into account the file system, network congestion, system resource contentions and other component's affect on the I/O performance. In order to make these associations, the \emph{absolute timestamps} is required which the post-run approach and I/O characterization tools usually lack.
Execution logs that provide \emph{absolute timestamps}
(e.g. timeseries) enable users and developers to perform temporal
performance analyses, and better understand how the changing state of
system components affects I/O performance and variation, as well as provide
further insight into the I/O patterns of applications.
%Therefore, we introduce
Our \Darshan{} approach provides time series logs of application I/O
events and incorporates \emph{absolute timestamps} to provide a runtime
timeseries set of application I/O data.
This paper makes the following contributions:
\begin{itemize}
\item Describes the approach used to expose absolute timestamp
data from an existing I/O characterization tool;
\item Provides a high level overview of the implementation
process and other tools used to collect application I/O data
during run time;
\item Demonstrates use cases of the \connector{} for two
applications with distinct I/O behavior on a production HPC
system;
\item Utilizes Darshan LDMS data to identify and better
understand any root cause(s) of application I/O performance
variation run time;
\item Presents how this new approach can be integrated with
other tools to benefit users to collect and assist in the
detection of application I/O performance variances across
multiple applications.
\end{itemize}
%Section \ref{sec:background} presents the background and motivation
%for this paper, Section \ref{sec:DarshanLDMSIntegration} presents the
%approach to design the Darshan LDMS Integration and collect the
%absolute timestamps, Section \ref{sec:integration} depicts the tools
%integrated to our approach , Section \ref{sec:methodology} presents
%the experimental methodology to generated our results (Section
%\ref{sec:results}). Section \ref{sec:rw} presents the related works,
%and Section \ref{sec:conclusion} presents the conclusion and future
%works.