-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
118 lines (108 loc) · 6.65 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
===============
= MotifGP 0.2 =
===============
MotifGP is a de novo motif discovery tool for discriminatory network expression identification in ChIP-seq datasets.
Original author: Manuel Belmadani
The project is documented by the following publications.
Manuel Belmadani and Marcel Turcotte. MotifGP: Using multi-objective evolutionary computing for mining network expressions
in DNA sequences. In IEEE International Conference on Computational Intelligence in Bioinformatics and Computational Biology
(CIBCB 2016), Chiang Mai, Thailand, October, 5-7, 2016.
https://doi.org/10.1109/CIBCB.2016.7758133
Manuel Belmadani. MotifGP: DNA motif discovery using multiobjective evolution. Master of computer science,
University of Ottawa, School of Electrical Engineering and Computer Science, 2016.
Available from University of Ottawa Research under: http://www.ruor.uottawa.ca/handle/10393/34213
Acknowledgements:
MotifGP is using source code from these tools:
-hypergeometric.py from the MEME Suite (License and copyright in source file).
-altschulEriksonDinuclShuffle.py from Peter Clote - CLOTE Computational Biology LAB, http://clavius.bc.edu/~clotelab/RNAdinucleotideShuffle/
This software was also made using the DEAP - Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A. G.,
Parizeau, M. & Gagné, C. DEAP: Evolutionary Algorithms Made Easy. J. Mach. Learn. Res. 13,
2171–2175 (2012).
=======================================================================================
License: (see LICENSE.txt)
=======================================================================================
Installation: (see INSTALL.txt)
=======================================================================================
Examples: (see EXAMPLES.txt)
=======================================================================================
Usage: motifgp.py [options]
Options:
-h, --help show this help message and exit
-p TRAINING_PATH, --training=TRAINING_PATH
Fasta file to use for training (input) sequence data
-b BACKGROUND_PATH, --background=BACKGROUND_PATH
[Optional] Fasta file to use for background (control)
sequence data. If not provided, a the generated
control sequences will be written to runtime_tmp/
-m MOO, --moo=MOO Multi-objective optimization [SPEA2, NSGA2, NSGAR,
MOEAD]. NSGAR is the NSGA-II_R (NSGA-II Revised)
algorithm improvement of NSGA2.
-f FITNESS, --fitness=FITNESS
Objective fitness function. Available objectives: D=Di
scrimination,F=Fisher,I=ScipyFisher,O=OddsRatio,Q=Fals
eDiscoveryRate,S=Support,R=ScipyOddsRatio. Each single
character in the string represents an objective.
Objectives are mapped by the configuration file at
config/objectives. Default is 'DF' for
[Discrimination,Fisher] (2-objectives).
--cxpb=CXPB Probability [0.0 to 1.0] for a crossover during
variation. Requires --mutpb to be set to (1.0-cxpb).
Default is 0.7.
--mutpb=MUTPB Probability [0.0 to 1.0] for a mutation during
variation. Requires --cxpb to be set to (1.0-mutpb).
Default is 0.3.
--short=SHORT Stops reading in after <SHORT> input sequences.
--popsize=POPSIZE Size of the population.
--revcomp Compile regex with reverse complement
--random-seed=RANDOM_SEED
Random seed value to set for execution
-n NGEN, --num-gen=NGEN
Generation where runtime stops (even in the case of
resumed checkpoints)
--timelimit=TIMELIMIT
Time limit on the GP loop execution.
--matcher=MATCHER Use a different matcher. Options: 'grep', 'python'.
'grep' is faster on large datasets, while 'python' is
a pure python version in case the system doesn't
support grep.
-o OUTPUT_PATH, --output=OUTPUT_PATH
Output directory. Default is ./OUT/
-t TAG, --tag=TAG A tag for the output subdirectory. Use to describes
the run and saves it in the tag's subdirectory in the
output directory. default is 'default'.
-i, --inspector Don't print any files. Can be useful with python -i
(interactive mode).
--hardmask Replace tandem repeats (lower-case typed nucleotides)
by N
-g GRAMMAR, --grammar=GRAMMAR
Grammar for the STGP [min, iupac, full, ne]. Default
is iupac. 'min' only uses nucleotides. 'iupac' is a
network expression grammar. 'full' is a network
expression grammar with additional regular expression
tokens. 'ne' is like iupac, but built with string
primitives instead of booleans.
-e ERASE, --erase=ERASE
Input .nef(t) file to delete from the dataset prior to
execution. Used for sequential coverage.
--backpad Pads background sequences with consecutive nucleotides
(ie. AAAAAAAA,CCCCCCCC,GGGGGGGG,TTTTTTTT) of length 8
every set of 4 sequences.
--bg-algo=BG_ALGO Shuffling algorithm for background. Default is
'dinuclShuffle', if no background dataset it provided.
Currently, dinuclShuffle is the only implemented
method.
--ncpu=NCPU Number of CPUs to use when mapping evaluation of
solutions. Use an integer, "auto" to automatically
dertmine the maximum number. Default is no
parallelism.
--termination=TERMINATION
Use automatic termination algorithm. User 'auto' to
used the automatic termination algorithm for MOEAs.
--hamming [Experimental] Generates statistics on the hamming
distance from a template regex and hof candidates.
--seeded-population [Experimental] Use population seeds
-c CHECKPOINT_PATH, --checkpoint=CHECKPOINT_PATH
[Temporarily disabled] Load a checkpoint at path.
-q, --quiet [Unimplemented] don't print status messages to stdout
Also consider looking at EXAMPLES.txt for basic examples of MotifGP usage.