-
Notifications
You must be signed in to change notification settings - Fork 0
/
useCases.tex
110 lines (96 loc) · 6.55 KB
/
useCases.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
\section{Experimental Methodology}\label{sec:methodology}
This section presents our experimental methodology to evaluate our
framework using two applications:
HACC-IO,
%HMMER applications
and the Darshan MPI-IO benchmark. We performed the experiments on a Cray HPC cluster using NFS and Lustre file systems.
\subsection{Applications}
\begin{itemize}
\item HACC-IO is the I/O proxy for the large scientific application: Hardware Accelerated Cosmology Code (HACC), an N-body framework that simulates the evolution of mass in the universe with short and long-range interactions~\cite{habib2013hacc}. The long-range solvers implement an underlying 3D FFT. HACC-IO is an MPI code that simulates the POSIX, MPI collective, and MPI independent I/O patterns of fault tolerance HACC checkpoints. It takes a number of particles per rank as input, writes out the simulated checkpoint information into a file, and then reads it back for validation. We ran HACC-IO with several configurations to simulate different workloads on the NFS and Lustre file systems. Table~\ref{subtable:HACC} shows the different run configurations.
% \item HMMER is a suite of applications that profiles a hidden Markov model (HMM) to search similar protein sequences~\cite{eddy1992hmmer}. HMMER has a building code called "hmmbuild" that uses MPI to build a database by concatenating multiple profiles Stockholm alignment files. In our experiment, we used the Pfam-A.seed~\cite{sonnhammer1998pfam} file to generate a large Pfam-A.hmm database. We ran HMMER with 32 MPI ranks on one node, and we ran it in two configurations where we point the database file to NFS and then Lustre, respectively.
\item MPI-IO-TEST benchmark is a Darshan utility that exists in the code distribution to test the MPI I/O performance on HPC machines. It can produce iterations of messages with different block sizes sent from various MPI ranks. It can also simulate collective and independent MPI I/O methods. We experimented with NFS vs. Lustre and collective vs. independent MPI I/O. We ran the benchmark with four configurations on 22 nodes and set the number of iterations to 10 and the block size to 16MB. Table~\ref{subtable:mpi-io-test} shows the different configuration used.
\end{itemize}
%provides various use cases of the \Darshan timeseries data that will be used to create new meaningful analyses and insights in the I/O performance variability during an application run.
\subsection{Evaluation System}
We experiment using several I/O loads on a Cray XC40 system (Voltrino) at Sandia
National Laboratories. The system has 24 diskless nodes with Dual Intel Xeon
Haswell E5-2698 v3 @ 2.30GHz 16 cores, 32 threads/socket, 64 GB DDR3-1866MHz memory.
The interconnect is the Cray Aries with a DragonFly topology. The machine connects to
two file systems: a network file system (NFS) and a Lustre file system (LFS).
%\todo{File systems???}
\subsection{Enviroment}
We configured the HPC cluster Voltrino with LDMS samplers on the compute nodes and
one LDMS aggregator on the login node. LDMS uses the Cray UGNI RDMA interface to transfer
Darshan streams data, and other system state metrics, from the compute nodes to the aggregator
on the login node. The aggregator on the login node transmits the data to another
LDMS aggregator on a monitoring cluster, Shirley, for analysis and storage. Shirley
also hosts the HPC web services consisting of the Grafana application and DSOS database.
We ran the applications with our enhanced Darshan library that wraps the I/O functions
dynamically, for each MPI rank, to sample I/O data and transmit it, using the streams
API, to the LDMS running on the node local to the transmitting MPI rank. We set the
\code{LD\_Preload} environment variable to point to the Darshan library shared objects,
which contain the sampling wrappers, prior to running the applications.
%\renewcommand{\arraystretch}{1.2}
%\begin{table}[]
% \centering
% \begin{tabular}{|c|c|c|c|c|}
% \hline
% Application Name & File System & Problem Size & Runtime (seconds) & Nodes \\ \hline
% \multirow{4}{*}{HACC-IO} & \multirow{2}{*}{NFS} & 1000000 & 1210.06 & \multirow{4}{*}{16} \\ \cline{3-4}
% && 2000000 & 2455.60 & \\ \cline{2-4}
% & \multirow{2}{*}{Luster} & 1000000 & 1476.64 & \\ \cline{3-4}
% && 2000000 & ?? & \\ \hline
% \multirow{4}{*}{MPI-IO-TEST} & \multirow{2}{*}{NFS} & 16 & 1210.06 & \multirow{4}{*}{16} \\ \cline{3-4}
% && 16 & 2455.60 & \\ \cline{2-4}
% & \multirow{2}{*}{Luster} & 16 & 1476.64 & \\ \cline{3-4}
% && 16 & ?? & \\ \hline
%
% \end{tabular}
% \caption{Applications run configurations, targeted file system, and runtime}
% \label{table:all-apps}
%\end{table}
%\begin{table}[]
% \centering
% \begin{tabular}{|c|c|c|c|}
% \hline
% File System & Particles/Rank & Runtime (seconds) & Nodes \\ \hline
% \multirow{2}{*}{NFS} & 1000000 & 1210.06 & 16\\ \cline{2-4}
% & 2000000 & 2455.60 & 16\\ \hline
% \multirow{2}{*}{Luster} & 1000000 & 1476.64 &16 \\ \cline{2-4}
% & 2000000 & ?? & 16 \\ \hline
% \end{tabular}
% \caption{HACC-IO run configurations, targeted file system, and runtime}
% \label{table:HACC}
%\end{table}
%\begin{table}[]
% \centering
% \begin{tabular}{|c|c|c|c|}
% \hline
% File System & Block Size (MB) & Runtime (seconds) & Nodes \\ \hline
% \multirow{2}{*}{NFS} & \multirow{2}{*}{16} & 1355.35 & 22\\ \cline{3-4}
% & & ?? & 22\\ \hline
% \multirow{2}{*}{Luster} & \multirow{2}{*}{16} & 270.98 & 22 \\ \cline{3-4}
% & & 414.34 & 22 \\ \hline
% \end{tabular}
% \renewcommand{\arraystretch}{1}
% \caption{MPI-IO-TEST run configurations, targeted file system, and runtime}
% \label{table:mpi-io-test}
%\end{table}
% \vspace{0.5cm}
% \begin{subtable}[h]{0.2\textwidth}
% %\centering
% \setlength\tabcolsep{2pt}
% \begin{tabular}{|ccl|}
% \hline
% \multicolumn{3}{|c|}{SW4} \\ \hline
% \multicolumn{1}{|c|}{File System} & \multicolumn{2}{c|}{NFS} \\ \hline
% \multicolumn{1}{|c|}{Arguments} & \multicolumn{2}{c|}{new\_gh\_1node.in} \\ \hline
% \multicolumn{1}{|c|}{Nodes} & \multicolumn{2}{c|}{16} \\ \hline
% \multicolumn{3}{|c|}{Average Runtime (s)} \\ \hline
% \multicolumn{1}{|c|}{Darshan} & \multicolumn{2}{c|}{576.86} \\ \hline
% \multicolumn{1}{|c|}{dC} & \multicolumn{2}{c|}{572.72} \\ \hline
% \multicolumn{1}{|c|}{\% Overhead} & \multicolumn{2}{c|}{0.00\%} \\ \hline
% \end{tabular}
% \caption{SW4}
% \label{subtable:SW4}
%\end{subtable}