forked from amarallab/NullSeq
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
87 lines (52 loc) · 3.01 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#############################
# #
# Nullseq #
# #
#############################
Parameters to run nullseq:
$ python nullseq.py [-h] [-m] [-n number] [-l length] [--TT TransTable] [--seq SEQ] [--AA AA] [--GC GC] [-o O]
-m Use exact primary AA sequence from input file
-n number Number of random sequences generated < Default: 1 >
-l length Length of random sequence (# of AAs) < Optional >
* Must be specified if running script with
--AA csv input.
* Otherwise uses length of input seqeunce
--TT TransTable NCBI translation table < Default: 11 >
--seq SEQ Path to FASTA file with nulceotide sequences < Optional >
* either SEQ or AA must be specified
* Sequences must include Stop and Start codons
--AA AA Path to FASTA file with AA sequence or < Optional >
csv with AA usage probabilities
* either SEQ or AA must be specified
--GC GC GC content of random sequence < Optional >
* Must be specified if running script using
--AA input.
* Otherwise uses GC of input nucleotide seqeunce
* [0,100]
-o O Output file Path < Default: NullSeq_Output.fasta >
Example:
1. Generating Random Sequences From a Known Nuclotide Sequence:
$ python nullseq.py -n 10 --seq test_Nseq.fasta
--GC and -l can be used to specify GC content and
length of random sequence
The GC content will be targeted at 50%
The random sequences will be 500 amino acids in length
$ python nullseq.py -n 10 --seq test_Nseq.fasta --GC 50 -l 500
-m indicates the use of the exact primary amino acid sequence
of the input sequence
$ python nullseq.py -m -n 10 --seq test_Nseq.fasta --GC 50
2. Generating Random Sequences from AA Usage Probabilities:
* --GC -l must be specified
$ python nullseq.py -n 10 --AA AAUsage.csv --GC 50 -l 500
Generates random sequence according to primary amino acid
usage probabilities in AAUsage.csv
3. Generating Random Sequences from Primary AA Sequences:
* --GC must be specified
$ python nullseq.py -n 10 --AA test_AAseq.fasta --GC 50
-l can be used to specify length of random sequence
$ python nullseq.py -n 10 --AA test_AAseq.fasta --GC 50 -l 500
-m indicates the use of the exact primary amino acid sequence
of the input sequence
$ python nullseq.py -m -n 10 --AA test_AAseq.fasta --GC 50
Please Cite:
Liu SS, Hockenberry AJ, Lancichinetti A, Jewett MC, Amaral LAN (2016) NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents. PLoS Comput Biol 12(11): e1005184. doi:10.1371/journal.pcbi.1005184