-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
131 lines (83 loc) · 4.19 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
BEANDiscoPrior - Bayesian Exact and Approximate Network Discovery with Prior Information
Copyright 2014 Diane Oyen <doyen(at)unm.cs.unm.edu>
BEANDiscoPrior extends the BEANDisco software (http://www.cs.helsinki.fi/u/tzniinim/BEANDisco/)
for learning Bayesian network structure with prior information about edge features or
ancestry relationships (partial orders given pairwise).
Previous copyright:
Copyright 2011 Teppo Niinimäki <teppo.niinimaki(at)helsinki.fi>
BEANDisco is a software for learning Bayesian network structure from data. See
[1] and [2] for more information about the algorithms.
License
=======
This software is distributed under GNU GPL. See COPYING.
Compilation
===========
Following libraries should be installed:
boost, boost-program-options
Compilation:
make
Usage
=====
Usage:
./beandprior [options]
To see the list of available options:
./beandprior --help
The input data file should contain one data sample per row, each sample
consisting of one integer value for each variable. Values on a row should be
separated by whitespace (tabs or spaces). For an example data file with 8
variables (columns) and 200 samples (rows) see example.dat. A corresponding
example prior file with 8 variables is given in example.prior. The input
prior file should contain one row and one column for variable, in the
same order as the columns of the data file. Each entry should be a real
value between 0 and 1 inclusively, separated by whitespace. The value in
row i, column j is the probability that node i is a parent of node j
(edge priors) or that node i precedes node j in the ordering (order priors).
Examples
========
Compute estimates including a prior on edge features, with maximum in-degree
of 3, (maximum) bucket size 4, burn-in period of 1000 steps and 100 samples
with 10 steps in between:
./beandprior example.dat -m 3 -b 4 -B 1000 -s 100 -S 10 --prior-file example.prior
First compute scores in to a file and then use the precomputed scores to
estimate arc probabilities:
./beandprior example.dat -m 3 --score-file example.score --prior-file example.prior
./beandprior --score-file example.score -b 4 -B 1000 -s 100 -S 10
Apply a prior on ancestor relationships (orderings), with maximum in-degree
of 3, (maximum) bucket size 4, burn-in period of 1000 steps and 100 samples
with 10 step in between:
./beandprior example.dat -m 3 -b 4 -B 1000 -s 100 -S 10 --prior-file example.prior --order-prior
Using ordering prior, first compute scores in to a file and then use the precomputed scores to
estimate arc probabilities:
./beandprior example.dat -m 3 --score-file example.score
./beandprior --score-file example.score -b 4 -B 1000 -s 100 -S 10 --prior-file example.prior --order-prior
Compute exact probabilities (generally with exact computation it is recommended
to use a bucket size as large as possible with the available memory):
./beandprior example.dat -m 3 -b 4 --exact --prior-file example.prior
Parameters
==========
Some recommendations for important parameter values:
maximum indegree (-m)
If the number of variables is high, this is restricted by memory and time
consumption. For over 100 variables 3 is probably reasonable. On the other
hand, for about 30 variables this might be increased to 5.
order type (--order-type)
Use bucket order (po), which is the default.
bucket size (-b)
Good value depends on the number of variables and maximum indegree. Setting
this to 10 is probably a reasonable choice. In general higher values are
better, so it is a good idea to test different values and choose the largest
one which still do not increase the time consumption per step too much.
number of samples (-s)
The higher the better.
number of steps per sample (-S)
Something like 5 or 10 is reasonable.
number of burn-in steps (-B)
To ensure good convergence before the actual sampling starts, I would set
this about equal to the number of total steps in sampling stage (B = s * S).
For example: s = 2000, S = 10, B = 20000
References
==========
[1] T. Niinimäki, P. Parviainen and M. Koivisto. Partial Order MCMC for
Structure Discovery in Bayesian Networks. UAI 2011
[2] P. Parviainen and M. Koivisto. Bayesian Structure Discovery in Bayesian
Networks with Less Space. AISTATS 2010