-
Notifications
You must be signed in to change notification settings - Fork 1
/
readme.txt
89 lines (69 loc) · 3.83 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
NGILA VERSION 1.3 - Logarithmic and Affine Sequence Alignments
Copyright (C) 2005-2010 Reed A. Cartwright - All rights reserved.
DESCRIPTION
Ngila is a global alignment program that can align pairs of sequences using
logarithmic and affine gap penalties.
REFERENCE
Cartwright RA (2007) Ngila: global pairwise alignments with logarithmic and
affine gap costs. Bioinformatics. 23(11):1427-1428
CONTACT
LICENSE
GPL ver 3. See copying.txt.
INSTALLATION
Installation from source requires CMake (http://cmake.org/), Boost Libraries
(http://boost.org/). Binary packages are available. To install on unix-like
systems simply use
cmake . && make && make install
in the extracted source code directory. On Windows you can use CMake GUI
to create project files for Visual Studio and install from there. More
detailed directions and help can be found on the website.
DOWNLOAD
Ngila can be downloaded from the url <http://scit.us/projects/ngila/>, which
is its development website.
COMMAND LINE USAGE
ngila -m zeta -t 0.1 -k 2.0 -r 0.05 -z 1.65 sequences.fasta
See 'ngila --help' for complete command line usage.
See <http://scit.us/projects/ngila/> for more details on running ngila.
MODEL DESCRIPTIONS
Ngila includes models of alignment based on evolutionary models. For a basic
description of the evolutionary models see Cartwright RA (2009) "Problems and
solutions for estimating indel rates and length distributions." Molecular
Biology and Evolution, 26:473-480. The models are as follows:
zeta: DNA model with indel lengths following a power-law distribution
geo: DNA model with a geometric distribution
aazeta: protein model (LG 2008) with a power-law distribution
aageo: protein model with a geometric distribution
cost: specify substitution and gap costs explicitly
INPUT FILES
The input file has to be in FASTA, PHYLIP, or CLUSTAL format. If more than two
sequences are given then Ngila will align based on the 'pairs' option.
OUTPUT FILES
Ngila has two types of output: sequence alignments and distance matrices.
Supported sequence alignment formats are Clustal, Fasta, and Phylip. Clustal
is the default. The format is read from the output file's extension or
specified directly; "ngila -o seqs.fas" and "ngila -o fas:seqs.txt" both
produce fasta output. "ngila -o aln:-" sends Clustal formated sequence to
stdout. The following extensions are supported for distance matrices:
dist-c = likelihood-based cost scores, dist-i = sequence identities, dist-d =
sequence distances, dist = lower-triangle like dist-d and upper like dist-i.
SUBSTITUTION MATRIX
Used by the "cost" model. An example of the format can be seen in matrix/dna.
NGILARC
Command-line options can be specified using an ngilarc file. By default the
program looks for $HOME/.ngilarc (unix) or %HOME%/ngilarc.txt (windows). This
file can contain long-form command line options like in the ngilarc.txt
example file.
ALGORITHM
Ngila implements a Miller and Myers (1988) candidate list method of sequence
alignment with the gap cost being of the form g(x) = a + b*x + c*ln x. Ngila
will return the alignment with the minimum cost and has rules for breaking
ties. Ngila's main alignment algorithm is divide-and-conquer, which requires
O(M) memory; but slower than a holistic, O(MN) memory algorithm.
Ngila implements a secondary, holistic algorithm for alignment, which is
faster. The options -M and -N (-M is for the larger sequence) allow users to
specify thresholds for when the holistic algorithm is used instead of the DnC
algorithm. For example, command 'ngila -M 5000 -N 5000 seqs.aln' will align
the sequences in 'seq.aln' via the divide-and-conquer algorithm, but when
subsequences less than or equal to 5000-5000 are being aligned, the holistic
algorithm will be used.