-
Notifications
You must be signed in to change notification settings - Fork 3
/
README
201 lines (164 loc) · 8.75 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
PathFinder graph pathway analysis mini-application
------------------------------------
Contents of this README file:
1. PathFinder overview
2. PathFinder versions
3. Building PathFinder
4. Running PathFinder
5. PathFinder search algorithm/pseudo code
------------------------------------
------------------------------------
1. PathFinder overview:
PathFinder searches for "signatures" within graphs. Graphs being searched are directed and cyclic. Many, but not all nodes within the graph have labels. Any given node may have more than one label, and any label may be applied to more than one node. A signature is an orderd list of labels. PathFinder searches for paths between labels within the signature. PathFinder returns success if there is a path from a node with the first label in the signature that passes through nodes with each label in order, ultimately reaching a node with the last label in the signature. Labeled nodes need not be contiguous on any given path.
At the current time, PathFinder does not do any pathway analysis (e.g. shortest path) for discovered signatures. PathFinder simply searches until a signature is satisfied or all pathways have been exhausted.
------------------------------------
2. PathFinder versions:
- ref:
Reference implementation: self-contained. Includes Serial and OpenMP parallel implementations. MPI is not yet supported.
------------------------------------
3. Building PathFinder:
The standard build is to cd into the 'ref' directory and type 'make'.
Doing so builds an OpenMP enabled version of PathFinder (named PathFinder.x). For
serial execution, run it with the OMP_NUM_THREADS environment variable set to 1.
Alternative build targets:
* clean -- removes all .o files
* realclean -- removes all .o and .x files
* debug -- PathFinder.x+dbg - serial executable with debug information
* mpi -- currently not implemented; prints an error message
To verify execution, type 'make check' which builds a serial version, runs it
against one of the known data sets and compares the results.
------------------------------------
4. Running PathFinder:
The simplest execution is to simply type './PathFinder.x'. This will run
PathFinder with a trivial graph (MicroTestData.adj_list in the source
directory). It will find 6 legs out of 9 searches, by doing an "exhaustive"
search on that graph.
An exhaustive search is PathFinder determining if paths exist between any pair
of labels for every label existing in the graph. Exhaustive searches are
embarrassingly parallel, and can be quite time consuming depending on the size
of the graph and number of labels. Running PathFinder exhaustively is done with
the -x switch:
PathFinder.x -x <data file>
e.g. PathFinder.x -x ../data/scaleData/1kx750.adj_list
To search for specific signatures, PathFinder can be run interactively or with
a configuration file. To run interactively, use the -i switch:
PathFinder.x -i <data file>
PathFinder will prompt the operator for a series of labels to define the
signature for which PathFinder will search.
Multiple data files and multiple signatures can be specified to PathFinder in a
configuration file. A python script (config_builder.py) exists to assist the
operator in creating a configuration file. A sample configuration file is
included in the PathFinder source directory. To run PathFinder with a config
file, use the -c switch:
PathFinder.x -c <config file>
e.g. PathFinder.x -c sampleConfig
Typing 'PathFinder -h' will print a help message describing the available
execution switches. Sample data files are provided in the 'data' and
'generatedData' directories. The python script used to generate representative
graph data is in the PathFinder_ref directory. It is named graph_gen.py.
YAML output files can be generated by using the -y switch.
Sample data sets are included in the "generatedData" and "scaleData"
subdirectories. The scaleData files are intended for performance
characterization. They are named with the convention:
<NodeCount>x<LabelCount>.adj_list
For expected results, see scaleData/README
------------------------------------
5. PathFinder search algorithm / pseudo code
PathFinder does a modified depth-first search through the graph. For each
signature, PathFinder begins searching from the set of nodes which have the
first label in the signature. From these nodes, it finds a path to any node that
is labeled with the second label in the signature. This is repeated until a path
is found to the last label in the signature. If at any point in the search a
path to a label is not found, that search fails.
Searches are done recursively. At each node, the target of each edge is checked
to see if it is labeled with the next label in the signature. If a match is
made, The search continues from the matched node and the subsequent label in the
signature. If there are no remaining labels in the signature, the search has
succeeded. If no direct edge from the current node matches, then the search
continues recursively for the current label along each edge from the current
node. If no match exists along any edge, the search fails.
Pseudo code:
For each file in config file:
Parse file into a Graph
For each signature in config file:
Store for processing
For each signature
{
For each Graph
{
Make sure the graph contains at least one node labeled with every label
in the signature
Get set of nodes with signature[0], the first label in signature
for each “Start” node
{
// Recursive procedure to find remaining signature. Requires
// current node, the remaining “unfound” signature legs
// a result stack of nodes to store path traversed and
// a stack of nodes visited so far to detect cycles and
// redundant searches
Create empty result stack
Create empty visited stack
success = FindRemainingSignatureFromCurrentNode ( startNode,
&signature[1], result, visited )
} // end of for each “Start” node
} // end of for each Graph
} // end of for each Signature
// Recursive procedure to find remaining signature.
// currentNode – the node whose edges *might* reach the
// next label in the signature
// signature – an array of pointers to C strings. These strings
// are the labels being matched by the nodes being
// searched. Each sequential pair of labels is a
// “leg” of the search. The last pointer is NULL to
// terminate the array.
// result – a stack of nodes containing the path currently
// traversed to reach this node
// visited – a stack of ALL nodes searched during the attempt
// to find the current label
boolean FindRemainingSignatureFromCurrentNode ( currentNode, signature, result,
visited )
{
If currentNode is in visited
return FALSE – we have reached a cycle or a subgraph that has
already been searched
else
push currentNode onto visited stack
push currentNode onto result stack
For each edge from currentNode
{
If edge->targetNode->label == signature[0]
{
If signature[1] != NULL we have more legs in the search
Create nextResult for next leg search
Create nextVisited for next leg search
success = FindRemainingSignatureFromCurrentNode
( edge->targetNode, &signature[1],
nextResult, nextVisited )
if success
result += nextResult
delete nextVisited
return TRUE, we’ve found the path
else
continue through edge checking
else signature[1] == NULL
push edge->targetNode onto result
return true, we’ve found the path
}
else
continue through edge checking
} // end of for each edge target compared against signature [0]
// If we’ve made it this far, none of the edges is a direct
// match to the current label, do a deeper search
For each edge from current node
{
success = FindRemainingSignatureFromCurrentNode ( edge->targetNode,
&signature[0], result, visited )
if success
return TRUE, we’ve found the path
else
continue through edge searching loop
}
// If we’ve made it this far, we have no path
pop currentNode off of result (but not off of visited)
return FALSE;
} // end of find remaining signature