Skip to content

Commit 65f0fcc

Browse files
committed
Adding test files to data/
1 parent 1735895 commit 65f0fcc

File tree

1 file changed

+137
-17
lines changed

1 file changed

+137
-17
lines changed

README.md

+137-17
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
# FaultyRank
2-
The repo for FaultyRank codebase published in IPDPS'23
32

43
FaultyRank is a graph based Parallel Filesystem Checker. We model important Parallel Filesystem metadata into a graph and then generalize the logic of cross-checking and repairing into graph analytic tasks.
54
We implement a prototype of FaultyRank on Lustre and compare it with Lustre’s default file system checker LFSCK.
@@ -35,8 +34,8 @@ The filesystem aging scripts has been implemented in Python (todo: version). The
3534

3635
## Testbed Setup
3736
- We tested FaultyRank on a local Lustre cluster with 1 MDS/MGS server and 8 OSS servers.
38-
- The MDS/MGS server uses Intel(R) Xeon(R) Bronze 3204 CPU with 128GB DRAM and 256GB local SSD.
39-
- The eight OSS servers use Intel(R) Xeon(R) CPU E5-2630 CPU with 32GB DRAM and 1TB hard disk (partially partitioned for Lustre).
37+
- The MDS/MGS server uses Intel(R) Xeon(R) Bronze 3204 CPU with 128GB DRAM and 256GB local SSD.
38+
- The eight OSS servers use Intel(R) Xeon(R) CPU E5-2630 CPU with 32GB DRAM and 1TB hard disk (partially partitioned for Lustre).
4039
- Our installed Lustre instance contains 2.4TB of storage space.
4140

4241
## Software Requirements
@@ -56,15 +55,15 @@ We use the actual file paths to re-create the same directory structures in our l
5655
- Parse the LANL log into X (# of clients) files.
5756

5857
```
59-
> cd FaultyRank/client
60-
> python partition_datasets.py -i [path_to_data/data.txt] -o [path_to_partitioned_files/] -n [n-partitions]
58+
$ cd FaultyRank/client
59+
$ python partition_datasets.py -i [path_to_data/data.txt] -o [path_to_partitioned_files/] -n [n-partitions]
6160
```
6261

6362
- Copy all these partitioned log files into the lustre client nodes. Run the filesystem aging scripts from every lustre client nodes:
6463

6564
```
66-
> cd FaultyRank/client
67-
> python fsaging.py -i [path_to_data/data.txt]
65+
$ cd FaultyRank/client
66+
$ python fsaging.py -i [path_to_data/data.txt]
6867
```
6968

7069
### Metadata Extraction
@@ -78,15 +77,15 @@ This graph contains a list of edges, where each edge has a source vertex and des
7877
- Run the scripts in FaultyRank/scanner/mds_scanner on each MDS nodes to extract metadata from MDS node and create a partial graph.
7978

8079
```
81-
> cd FaultyRank/scanner/mds_scanner
82-
> make
80+
$ cd FaultyRank/scanner/mds_scanner
81+
$ make
8382
```
8483

8584
- Run the scripts in FaultyRank/scanner/oss_scanner on each OSS nodes to extract metadata from OSS nodes and create a partial graph.
8685

8786
```
88-
> cd FaultyRank/scanner/oss_scanner
89-
> make
87+
$ cd FaultyRank/scanner/oss_scanner
88+
$ make
9089
```
9190

9291
### Unified Graph Creation
@@ -95,25 +94,146 @@ All the partial graphs created in the previous step are combined in one global g
9594
Move all the partial graphs created in the above step to FaultyRank/aggregator. Run the aggregator scripts to combine the partial graphs into one unified graph.
9695

9796
```
98-
> cd FaultyRank/aggregator
99-
> make
97+
$ cd FaultyRank/aggregator
98+
$ make
10099
```
101100

102101
### Run FaultyRank Algorithm
103102
We run FaultyRank algorithm on the global graph created in the previous step.
104103

105104

106105
# Fault Injection
106+
# FaultyRank
107107

108-
# Test Experiment on a Pre-built Graph
108+
FaultyRank is a graph based Parallel Filesystem Checker. We model important Parallel Filesystem metadata into a graph and then generalize the logic of cross-checking and repairing into graph analytic tasks.
109+
We implement a prototype of FaultyRank on Lustre and compare it with Lustre’s default file system checker LFSCK.
109110

110-
# Contribution
111+
- You can learn more about FaultyRank in our [IPDPS 2023 paper](todo: url of paper).
112+
- If you use this software please cite us:
111113

112-
# Reference
113-
`@todo`
114+
```
115+
@inproceedings{faultyrank2023,
116+
author={Kamat, Saisha and Islam, Abdullah Al Raqibul and Zheng, Mai and Dai, Dong},
117+
title={FaultyRank: A Graph-based Parallel File System Checker},
118+
booktitle={2023 37th IEEE International Parallel & Distributed Processing Symposium (IPDPS)},
119+
year={2023},
120+
}
121+
```
114122

123+
#Table of Contents
124+
todo: will add it later
115125

126+
# Directory Structure
116127

128+
```
129+
data/: test data
130+
client/: filesystem aging scripts (run from client nodes)
131+
scanner/:metadata extractor (run from mds and oss nodes)
132+
aggregator/: aggregate partial graphs into a unified graph (run it from mds nodes)
133+
core/: implementation of FaultyRank algorithm
134+
```
117135

136+
# Getting Started
137+
138+
The filesystem aging scripts has been implemented in Python (todo: version). The rest of the codes is implemented in C and tested on (todo) XXX 18.04 with CMake 3.13.4 and gcc 7.5.0.
139+
140+
## Testbed Setup
141+
- We tested FaultyRank on a local Lustre cluster with 1 MDS/MGS server and 8 OSS servers.
142+
- The MDS/MGS server uses Intel(R) Xeon(R) Bronze 3204 CPU with 128GB DRAM and 256GB local SSD.
143+
- The eight OSS servers use Intel(R) Xeon(R) CPU E5-2630 CPU with 32GB DRAM and 1TB hard disk (partially partitioned for Lustre).
144+
- Our installed Lustre instance contains 2.4TB of storage space.
145+
146+
## Software Requirements
147+
- todo: g++ compiler
148+
- todo: Python 3.xxx
149+
- Lustre 2.12.8 (with the latest LFSCK implementation)
150+
151+
## Experiment Overview
152+
FaultyRank have four major steps. A detailed description of each step is provided in the following sections.
118153

154+
### Filesystem Aging
155+
We create a realistic Lustre instance by aging it. We are using Archive and NSF Metadata dataset released by USRC (Ultrascale Systems Research Center) from LANL National Lab.
156+
The LANL dataset have around 2PB of files and contains file system walk of LANL's HPC systems. It contains detailed information like file sizes, creation time, modification time, UID/GID, anonymized file path, etc.
157+
158+
We use the actual file paths to re-create the same directory structures in our local testbed. We also shrink the sizes of files in the 2PB file system without affecting the representativeness of generated Layout metadata. To do this, we set the stripe_size of our Lustre directories to be extremely small (i.e., 64KB) and stripe_count to be −1.
119159

160+
- Parse the LANL log into X (# of clients) files.
161+
162+
```
163+
$ cd FaultyRank/client
164+
$ python partition_datasets.py -i [path_to_data/data.txt] -o [path_to_partitioned_files/] -n [n-partitions]
165+
```
166+
167+
- Copy all these partitioned log files into the lustre client nodes. Run the filesystem aging scripts from every lustre client nodes:
168+
169+
```
170+
$ cd FaultyRank/client
171+
$ python fsaging.py -i [path_to_data/data.txt]
172+
```
173+
174+
### Metadata Extraction
175+
Lustre metadata is stored in two places:
176+
1) The metadata like FID, LINKEA, etc are stored in the Extended Attributes (EA) of the local inodes.
177+
2) The DIRENT metadata between the directory and its sub-directories or files are stored as the content of the directory.
178+
179+
To extract Lustre metadata, we scan the extended attributes of inodes and the contents of directories of all MDS and OSS servers in our network. A partial graph is created on each server.
180+
This graph contains a list of edges, where each edge has a source vertex and destination vertex, each representing a Lustre directory, file, or stripe object. All the vertices have a unique global FID.
181+
182+
- Run the scripts in FaultyRank/scanner/mds_scanner on each MDS nodes to extract metadata from MDS node and create a partial graph.
183+
184+
```
185+
$ cd FaultyRank/scanner/mds_scanner
186+
$ make
187+
```
188+
189+
- Run the scripts in FaultyRank/scanner/oss_scanner on each OSS nodes to extract metadata from OSS nodes and create a partial graph.
190+
191+
```
192+
$ cd FaultyRank/scanner/oss_scanner
193+
$ make
194+
```
195+
196+
### Unified Graph Creation
197+
All the partial graphs created in the previous step are combined in one global graph on the main MDS server. The graph vertex IDs, which are 128-bit Lustre non-continuous FIDs are mapped to vertex GIDs from 0 to MAX_VERTEX_NUM-1.
198+
199+
Move all the partial graphs created in the above step to FaultyRank/aggregator. Run the aggregator scripts to combine the partial graphs into one unified graph.
200+
201+
```
202+
$ cd FaultyRank/aggregator
203+
$ make
204+
```
205+
206+
- Next we note the number of Vertices and Edges in the global graph created.
207+
208+
``` diff
209+
$ cd FaultyRank/aggregator
210+
211+
# Read number of Vertices
212+
$ tail -1 final_graph.txt
213+
214+
# Read number of Edges
215+
$ wc -l final_graph.txt
216+
```
217+
218+
219+
### Run FaultyRank Algorithm
220+
Run FaultyRank algorithm on the global graph created in the previous step.
221+
222+
- Add the number of Vertices and Edges from the previous step and run FaultyRank algorithm.
223+
224+
```
225+
$ cd FaultyRank/core
226+
$ ./faultyrank_core -N (# of Vertices) -E (# of Edges) -f FaultyRank/aggregator/final_graph.txt
227+
```
228+
229+
# Fault Injection
230+
231+
232+
# Test Experiment on a Pre-built Graph
233+
We have provided a simple dataset with a pre-built graph with an added inconsistency.
234+
235+
236+
# Contribution
237+
238+
# Reference
239+
`@todo`

0 commit comments

Comments
 (0)