-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmodule.xml
1075 lines (1073 loc) · 76.4 KB
/
module.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<modules>
<module>
<name>searchGUI</name>
<category>search</category>
<description>description1</description>
<inputFile>input.xml</inputFile>
<inputParam>true</inputParam>
<outputFile_required>false</outputFile_required>
<outputFile>N/A</outputFile>
<outputParam>true</outputParam>
<params>-spectrum_files xyz.mgf -output_folder folder_path -id_params params.par</params>
<command>java -cp SearchGUI-3.3.3.jar eu.isas.searchgui.cmd.SearchCLI</command>
</module>
<module>
<name>ProteoGrouper</name>
<category>Grouper</category>
<description>This tool can perform sequence-based protein inference, based on a set of PSMs. It should be parameterized with the CV accession for the PSM score used to create a protein score. The tool also needs to know whether the score should be log transformed (true for e/p-values etc) to create a positive protein score.</description>
<inputFile>[input].mzid or [input].mzid.gz</inputFile>
<inputParam>false</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>output2.txt</outputFile>
<outputParam>false</outputParam>
<params>-requireSIIsToPassThreshold true -verboseOutput false -cvAccForSIIScore \"MS:1001171\" -logTransScore false -version1_1 true -compress true</params>
<command>java -jar "mzidlib-1.7.jar" ProteoGrouper mydata_fdr_threshold.mzid.gz mydata_fdr_threshold_groups.mzid.gz</command>
</module>
<module>
<name>Omssa2mzid</name>
<category>MZID</category>
<description>This tool converts OMSSA omx (XML) files into mzid. It has optional parameters for inserting fragment ions into mzid (much larger files). If a decoy Regex is specified, the mzid attribute isDecoy will be set correctly for peptides. No protein inference is done by this tool (no protein list produced). To make valid mzid output, OMSSA must have been run with the option "-w" include spectra and search params in search results. Without this option, search parameters cannot be extracted from OMSSA. In this case, the OMSSA CSV converter should be used.</description>
<inputFile>[input]. omx or [input]. omx.gz</inputFile>
<inputParam>false</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>[output].mzid or [output].mzid.gz</outputFile>
<outputParam>false</outputParam>
<params>-outputFragmentation false -decoyRegex REVERSED -mzidVer 1.2 -compress false</params>
<command>java -jar "mzidlib-1.7.jar" Omssa2mzid mydata.omx mydata_omssa.mzid.gz</command>
</module>
<module>
<name>Tandem2mzid</name>
<category>MZID</category>
<description>This tool converts X!Tandem XML results files into mzid. There are several optional parameters: whether to export fragment ions (makes bigger files), and include a decoy regular expression to set the isDecoy attribute in mzid. Valid mzid files require several pieces of metadata that are difficult to extract from mzid files, the format of the database searched and the file format of the input spectra. If these parameters are not set, the converter attempts to guess these based on the file extension. In X!Tandem, the numbering of spectra differs dependent upon the input spectra type - the IDs start at zero for mzML files, the IDs start at one for other spectra types e.g. MGF. This is a command line parameter which should be set to make sure that the mzid file references the correct spectrum in the source spectrum file.</description>
<inputFile>[input]. xml or [input]. xml.gz</inputFile>
<inputParam>false</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>[output].mzid or [output].mzid.gz</outputFile>
<outputParam>false</outputParam>
<params>-outputFragmentation (true|false) -decoyRegex decoyRegex -databaseFileFormatID (e.g. MS:1001348 is FASTA format) "MS:100XXX" -massSpecFileFormatID (e.g. MS:1001062 is MGF) "MS:100XXX" -idsStartAtZero (true for mzML searched, false otherwise) true|false -compress true|false</params>
<command>java -jar "mzidlib-1.7.jar" Tandem2mzid mydata.xml mydata_tandem.mzid.gz</command>
</module>
<module>
<name>FalseDiscoveryRateGlobal</name>
<category>n/a</category>
<description>The Global FDR module calculates the FDR on one of the three levels. 1) PSM, 2) Peptide, 3) ProteinGroup. If ProteinGroup is chosen, there are two options for protein level PAG or PDH.</description>
<inputFile>[input].mzid or [input].mzid.gz</inputFile>
<inputParam>false</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>[output].mzid or [output].mzid.gz</outputFile>
<outputParam>false</outputParam>
<params>-decoyValue decoyToTargetRatio -decoyRegex decoyRegex -cvTerm cvTerm -betterScoresAreLower true|false -fdrLevel fdrLevel -proteinLevel proteinLevel [-compress true|false]</params>
<command>java -jar "mzidlib-1.7.jar" FalseDiscoveryRateGlobal mydata.mzid mydata_fdr.mzid.gz </command>
</module>
<module>
<name>Threshold</name>
<category>n/a</category>
<description>This tool can be used to set the passThreshold parameter for PSMs or proteins in an mzid file, to indicate high-quality identifications that will be used by another tool. It can handle any type of score (sourced from the PSI-MS CV) and scores can be ordered low to high or vice versa. If deleteUnderThreshold is specified, PSMs and referenced proteins under the threshold will be removed from the file.</description>
<inputFile>[input].mzid or [input].mzid.gz</inputFile>
<inputParam>false</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>[output].mzid or [output].mzid.gz</outputFile>
<outputParam>false</outputParam>
<params>-isPSMThreshold true|false -cvAccessionForScoreThreshold "MS:100XXX" -threshValue doubleValue -betterScoresAreLower true|false -deleteUnderThreshold true|false [-compress true|false]</params>
<command>java -jar "mzidlib-1.7.jar" Threshold mydata_fdr.mzid.gz mydata_fdr_threshold.mzid.gz </command>
</module>
<module>
<name>Mzid2Csv</name>
<category>n/a</category>
<description>This tool can export from an mzid file into CSV, according to one of the four types of export specified as parameters.</description>
<inputFile>[input].mzid or [input].mzid.gz</inputFile>
<inputParam>false</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>[output].csv or [output].csv.gz</outputFile>
<outputParam>false</outputParam>
<params>-exportType exportProteinGroups|exportPSMs|exportProteinsOnly|exportRepProteinPerPAGOnly|exportProteoAnnotator [-verboseOutput true|false] [-compress true|false]</params>
<command>java -jar "mzidlib-1.7.jar" Mzid2Csv mydata_fdr.mzid.gz mydata.csv</command>
</module>
<module>
<name>AddRetentionTimeToMzid</name>
<category>n/a</category>
<description>Add Retention Time to Mzid</description>
<inputFile>[input].mzid or [input].mzid.gz</inputFile>
<inputParam>false</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>[output]. mzid or [output]. mzid.gz</outputFile>
<outputParam>false</outputParam>
<params>-compress true|false</params>
<command>java -jar "mzidlib-1.7.jar" AddRetentionTimeToMzid input.mzid output.mzid</command>
</module>
<module>
<name>msconvert</name>
<category>n/a</category>
<description>msconvert is a command line tool for converting between various file formats. Full documentation for this tool can be found at: http://proteowizard.sourceforge.net/tools/msconvert.html</description>
<inputFile>data.RAW</inputFile>
<inputParam>false</inputParam>
<outputFile_required>false</outputFile_required>
<outputFile>N/A</outputFile>
<outputParam>false</outputParam>
<params>-f [ --filelist ] arg : specify text file containing filenames
-o [ --outdir ] arg (=.) : set output directory ('-' for stdout) [.]
-c [ --config ] arg : configuration file (optionName=value)
--outfile arg : Override the name of output file.
-e [ --ext ] arg : set extension for output files
[mzML|mzXML|mgf|txt|mz5]
--mzML : write mzML format [default]
--mzXML : write mzXML format
--mz5 : write mz5 format
--mgf : write Mascot generic format
--text : write ProteoWizard internal text format
--ms1 : write MS1 format
--cms1 : write CMS1 format
--ms2 : write MS2 format
--cms2 : write CMS2 format
-v [ --verbose ] : display detailed progress information
--64 : set default binary encoding to 64-bit precision
[default]
--32 : set default binary encoding to 32-bit precision
--mz64 : encode m/z values in 64-bit precision [default]
--mz32 : encode m/z values in 32-bit precision
--inten64 : encode intensity values in 64-bit precision
--inten32 : encode intensity values in 32-bit precision
[default]
--noindex : do not write index
-i [ --contactInfo ] arg : filename for contact info
-z [ --zlib ] : use zlib compression for binary data
--numpressLinear [toler] : use numpress linear prediction lossy compression for binary mz and rt data (relative error guaranteed less than given tolerance, default is 2e-009)
--numpressPic : use numpress positive integer lossy compression for binary intensities (maximum 0.5 absolute error guaranteed)
--numpressSlof [toler] : use numpress short logged float lossy compression for binary intensities (relative error guaranteed less than given tolerance, default is 0.0002)
-n [ --numpressAll] : same as --numpressLinear --numpressSlof (see https://github.com/fickludd/ms-numpress for more info)
--numpressLinearAbsTol : desired absolute tolerance for linear numpress prediction (e.g. use 1e-4 for a mass accuracy of 0.2 ppm at 500 m/z, default uses -1.0 for maximal accuracy). Note: setting this value may substantially reduce file size, this overrides relative accuracy tolerance.
Numpress may be used at the same time as zlib (-z) for best compression, though some older mzML parsers may not handle this properly.
-g [ --gzip ] : gzip entire output file (adds .gz to filename)
--filter arg : add a spectrum list filter
--merge : create a single output file from multiple input
files by merging file-level metadata and
concatenating spectrum lists
--simAsSpectra : write selected ion monitoring as spectra, not chromatograms
--srmAsSpectra : write selected reaction monitoring as spectra, not chromatograms
--combineIonMobilitySpectra : write all drift bins/scans in a frame/block as one spectrum instead of individual spectra
--acceptZeroLengthSpectra : some vendor readers have an efficient way of filtering out empty spectra, but it takes more time to open the file
--ignoreUnknownInstrumentError : if true, if an instrument cannot be determined from a vendor file, it will not be an error
--help : show this message, with extra detail on filter options</params>
<command>msconvert.exe data.raw</command>
</module>
<module>
<name>msaccess</name>
<category>N/A</category>
<description>msaccess is a command line tool for extracting data and metadata from data files. Full documentation for this tool can be found at: http://proteowizard.sourceforge.net/tools/msaccess.html</description>
<inputFile>data.mzML</inputFile>
<inputParam>false</inputParam>
<outputFile_required>false</outputFile_required>
<outputFile>false</outputFile>
<outputParam>true</outputParam>
<params>-f [ --filelist ] arg : text file containing filenames to process
-o [ --outdir ] arg (=.) : output directory
-c [ --config ] arg : configuration file (containing settings as optionName=value)
-x [ --exec ] arg : execute command, e.g --exec "tic mz=409-412"
--filter arg : add a spectrum list filter, e.g. --filter="msLevel [2,3]"
(see a full list of supported filter types here)
-v [ --verbose ] : print progress messages</params>
<command>msaccess.exe data.mzML</command>
</module>
<module>
<name>idconvert</name>
<category>N/A</category>
<description>idconvert is a command line tool for converting between various file formats. pepXML, protXML, mzIdentML. Write: pepXML, mzIdentML. Full documentation for this tool can be found at: http://proteowizard.sourceforge.net/tools/idconvert.html</description>
<inputFile>data.pepXML, data.protXML or data.mzIdentML</inputFile>
<inputParam>false</inputParam>
<outputFile_required>false</outputFile_required>
<outputFile>data.pepXML, data.protXML, data.mzIdentML</outputFile>
<outputParam>false</outputParam>
<params>--pepXML -o my_output_dir</params>
<command>idconvert data.pepXML</command>
</module>
<module>
<name>mspicture</name>
<category>n/a</category>
<description>msPicture is a tool that produces pseudo2d gels from mass spectra data. There are many options available for manipulating layout, color scheme, and markup of the resulting image. Being part of the proteowizard suite, msPicture can read a wide variety of MS data formats. Marking peptide locations is done easily by giving the location of pepXML, msInspect, or even a flat file.</description>
<inputFile>data.mzML</inputFile>
<inputParam>false</inputParam>
<outputFile_required>false</outputFile_required>
<outputFile>example2.mzXML.image</outputFile>
<outputParam>false</outputParam>
<params>-o [ --outdir ] arg (=.) : output directory
-c [ --config ] arg : configuration file (optionName=value) (ignored)
-l [ --label ] arg : set filename label to xxx
--mzLow arg : set low m/z cutoff
--mzHigh arg : set high m/z cutoff
--timeScale arg : set scale of time axis
-b [ --binCount ] arg : set histogram bin count
-t [ --time ] : render linearly to time
-s [ --scan ] : render linearly to scans
-z [ --zRadius ] arg : set intensity function z-score radius [=2]
--bry : use blue-red-yellow gradient
--grey : use grey-scale gradient
--binSum : sum intensity in bins [default = max intensity]
-m [ --ms2locs ] : indicate masses selected for ms2
--shape arg : shape of the pseudo2d gel markup [circle(default)|square].
-p [ --pepxml ] arg : pepxml file location
-i [ --msi ] arg : msInspect output file location
-f [ --flat ] arg : peptide file location (nativeID rt mz score seq)
-w [--width] arg : set image width in pixels [default is calculated]
-h [--height] arg : set image height in pixels [default is calculated]
-v [ --verbose ] : prints extra information.
-h [ --help ] : print this helpful message.
Commands:
label=xxxx (set filename label to xxxx)
mzLow=N (set low m/z cutoff)
mzHigh=N (set high m/z cutoff)
timeScale=N (set scaling factor for time axis)
binCount=N (set histogram bin count)
zRadius=N (set intensity function z-score radius [=2])
scan (render y-axis linear with scans)
time (render y-axis linear with time)
bry (use blue-red-yellow gradient)
grey (use grey-scale gradient)
binSum (sum intensity in bins [default = max intensity])
ms2locs (indicate masses selected for ms2)
pepxml=xxx (set ms2 id's from pepxml file xxx)
msi=xxx (set ms2 id's from msinspect output file xxx)
flat=xxx (set ms2 id's from tab delim file xxx)</params>
<command>mspicture.exe filename.mzML</command>
</module>
<module>
<name>qtofpeakpicker</name>
<category>n/a</category>
<description>qtofpeakpicker is a command line tool for peak detection in TOF (Time Of Flight spectra). Full documentation for this tool can be found at: http://proteowizard.sourceforge.net/tools/qtofpeakpicker.html</description>
<inputFile>All proteowizard formats are supported.</inputFile>
<inputParam>true</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>true</outputFile>
<outputParam>true</outputParam>
<params>File Handling:: -H [ --help ] produce help message -V [ --version ] produces version information -I [ --in ] arg input file -O [ --out ] arg output file -C [ --config-file ] arg configuration file
Processing Options:: --resolution arg (=20000) instrument resolution. --area arg (=1) default area, otherwise store intensity (0). --threshold arg (=10) removes peaks less than threshold times smallest intensity in spectrum --numberofpeaks arg (=0) maximum number of peaks per spectrum (0 = no limit)
Advanced Processing Options:: -i [ --widthint ] arg (=2) peak apex +- integration width --smoothwidth arg (=1) smoothing width</params>
<command>qtofpeakpicker.exe</command>
</module>
<module>
<name>blastn</name>
<category>search</category>
<description>The blastn application searches a nucleotide query against nucleotide subject sequences or a nucleotide database. An option of type "flag" takes no arguments, but if present the argument is true.
Four different tasks are supported:
1.) "megablast", for very similar sequences (e.g, sequencing errors),
2.) "dc-megablast", typically used for inter-species comparisons,
3.) "blastn", the traditional program used for inter-species comparisons,
4.) "blastn-short", optimized for sequences less than 30 nucleotides.
Full documentation for this tool can be found on the following page: https://www.ncbi.nlm.nih.gov/books/NBK279684/#appendices.Options_for_the_commandline_a
</description>
<inputFile>[file].fasta</inputFile>
<inputParam>true</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>[results].out</outputFile>
<outputParam>true</outputParam>
<params>Parameters common to all BLAST+ search modules:
Option Type Default value Description/notes
db string none BLAST database name.
query string stdin Query file name.
query_loc string none Location on the query sequence (Format: start-stop)
out string stdout Output file name
evalue real 10.0 Expect value (E) for saving hits
subject string none File with subject sequence(s) to search.
subject_loc string none Location on the subject sequence (Format: start-stop).
show_gis flag N/A Show NCBI GIs in report.
num_descriptions integer 500 Show one-line descriptions for this number of database sequences.
num_alignments integer 250 Show alignments for this number of database sequences.
max_target_seqs Integer 500 Number of aligned sequences to keep. Use with report formats that do not have separate definition line and alignment sections such as tabular (all outfmt > 4). Not compatible with num_descriptions or num_alignments.
max_hsps integer none Maximum number of HSPs (alignments) to keep for any single query-subject pair. The HSPs shown will be the best as judged by expect value. This number should be an integer that is one or greater. If this option is not set, BLAST shows all HSPs meeting the expect value criteria. Setting it to one will show only the best HSP for every query-subject pair
html flag N/A Produce HTML output
gilist string none Restrict search of database to GI’s listed in this file. Local searches only.
negative_gilist string none Restrict search of database to everything except the GI’s listed in this file. Local searches only.
entrez_query string none Restrict search with the given Entrez query. Remote searches only.
culling_limit integer none Delete a hit that is enveloped by at least this many higher-scoring hits.
best_hit_overhang real none Best Hit algorithm overhang value (recommended value: 0.1)
best_hit_score_edge real none Best Hit algorithm score edge value (recommended value: 0.1)
dbsize integer none Effective size of the database
searchsp integer none Effective length of the search space
import_search_strategy string none Search strategy file to read.
export_search_strategy string none Record search strategy to this file.
parse_deflines flag N/A Parse query and subject bar delimited sequence identifiers (e.g., gi|129295).
num_threads integer 1 Number of threads (CPUs) to use in blast search.
remote flag N/A Execute search on NCBI servers?
outfmt string 0 alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1
10 = Comma-separated values
11 = BLAST archive format (ASN.1)
Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers.
The supported format specifiers are:
qseqid means Query Seq-id
qgi means Query GI
qacc means Query accesion
sseqid means Subject Seq-id
sallseqid means All subject Seq-id(s), separated by a ';'
sgi means Subject GI
sallgi means All subject GIs
sacc means Subject accession
sallacc means All subject accessions
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive-scoring matches
gapopen means Number of gap openings
gaps means Total number of gap
ppos means Percentage of positive-scoring matches
frames means Query and subject frames separated by a '/'
qframe means Query frame
sframe means Subject frame
btop means Blast traceback operations (BTOP)
staxids means unique Subject Taxonomy ID(s), separated by a ';'(in numerical order)
sscinames means unique Subject Scientific Name(s), separated by a ';'
scomnames means unique Subject Common Name(s), separated by a ';'
sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order)
sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order)
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
sstrand means Subject Strand
qcovs means Query Coverage Per Subject (for all HSPs)
qcovhsp means Query Coverage Per HSP
qcovus is a measure of Query Coverage that counts a position in a subject sequence for this measure only once. The second time the position is aligned to the query is not counted towards this measure.
When not provided, the default value is:
'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std'
MODULE SPECIFIC PARAMS:
option task(s) type default value description and notes
word_size megablast integer 28 Length of initial exact match.
word_size dc-megablast integer 11 Number of matching nucleotides in initial match. dc-megablast allows non-consecutive letters to match.
word_size blastn integer 11 Length of initial exact match.
word_size blastn-short integer 7 Length of initial exact match.
gapopen megablast integer 0 Cost to open a gap. See appendix "BLASTN reward/penalty values".
gapextend megablast integer none Cost to extend a gap. This default is a function of reward/penalty value. See appendix "BLASTN reward/penalty values".
gapopen blastn, blastn-short, dc-megablast integer 5 Cost to open a gap. See appendix "BLASTN reward/penalty values".
gapextend blastn, blastn-short, dc-megablast integer 2 Cost to extend a gap. See appendix "BLASTN reward/penalty values".
reward megablast integer 1 Reward for a nucleotide match.
penalty megablast integer -2 Penalty for a nucleotide mismatch.
reward blastn, dc-megablast integer 2 Reward for a nucleotide match.
penalty blastn, dc-megablast integer -3 Penalty for a nucleotide mismatch.
reward blastn-short integer 1 Reward for a nucleotide match.
penalty blastn-short integer -3 Penalty for a nucleotide mismatch.
strand all string both Query strand(s) to search against database/subject. Choice of both, minus, or plus.
dust all string 20 64 1 Filter query sequence with dust.
filtering_db all string none Mask query using the sequences in this database.
window_masker_taxid all integer none Enable WindowMasker filtering using a Taxonomic ID.
window_masker_db all string none Enable WindowMasker filtering using this file.
soft_masking all boolean true Apply filtering locations as soft masks (i.e., only for finding initial matches).
lcase_masking all flag N/A Use lower case filtering in query and subject sequence(s).
db_soft_mask all integer none Filtering algorithm ID to apply to the BLAST database as soft mask (i.e., only for finding initial matches).
db_hard_mask all integer none Filtering algorithm ID to apply to the BLAST database as hard mask (i.e., sequence is masked for all phases of search).
perc_identity all integer 0 Percent identity cutoff.
template_type dc-megablast string coding Discontiguous MegaBLAST template type. Allowed values are coding, optimal and coding_and_optimal.
template_length dc-megablast integer 18 Discontiguous MegaBLAST template length.
use_index megablast boolean false Use MegaBLAST database index. Indices may be created with the makembindex application.
index_name megablast string none MegaBLAST database index name.
xdrop_ungap all real 20 Heuristic value (in bits) for ungapped extensions.
xdrop_gap all real 30 Heuristic value (in bits) for preliminary gapped extensions.
xdrop_gap_final all real 100 Heuristic value (in bits) for final gapped alignment.
no_greedy megablast flag N/A Use non-greedy dynamic programming extension.
min_raw_gapped_score all integer none Minimum raw gapped score to keep an alignment in the preliminary gapped and trace-back stages. Normally set based upon expect value.
ungapped all flag N/A Perform ungapped alignment.
window_size dc-megablast integer 40 Multiple hits window size, use 0 to specify 1-hit algorithm
</params>
<command>blastn.exe</command>
</module>
<module>
<name>blastp</name>
<category>search</category>
<description>The blastp application searches a protein sequence against protein subject sequences or a protein database.
An option of type "flag" takes no arguments, but if present the argument is true.
Three different tasks are supported:
1.) "blastp", for standard protein-protein comparisons,
2.) "blastp-short", optimized for query sequences shorter than 30 residues, and
3.) "blastp-fast", a faster version that uses a larger word-size per https://www.ncbi.nlm.nih.gov/pubmed/17921491.
This table reflects the 2.2.27 BLAST+ release.
Full documentation for this tool can be found on the following page: https://www.ncbi.nlm.nih.gov/books/NBK279684/#appendices.Options_for_the_commandline_a
</description>
<inputFile>[file].fasta</inputFile>
<inputParam>true</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>[results].out</outputFile>
<outputParam>true</outputParam>
<params>Parameters common to all BLAST+ search modules:
Option Type Default value Description/notes
db string none BLAST database name.
query string stdin Query file name.
query_loc string none Location on the query sequence (Format: start-stop)
out string stdout Output file name
evalue real 10.0 Expect value (E) for saving hits
subject string none File with subject sequence(s) to search.
subject_loc string none Location on the subject sequence (Format: start-stop).
show_gis flag N/A Show NCBI GIs in report.
num_descriptions integer 500 Show one-line descriptions for this number of database sequences.
num_alignments integer 250 Show alignments for this number of database sequences.
max_target_seqs Integer 500 Number of aligned sequences to keep. Use with report formats that do not have separate definition line and alignment sections such as tabular (all outfmt > 4). Not compatible with num_descriptions or num_alignments.
max_hsps integer none Maximum number of HSPs (alignments) to keep for any single query-subject pair. The HSPs shown will be the best as judged by expect value. This number should be an integer that is one or greater. If this option is not set, BLAST shows all HSPs meeting the expect value criteria. Setting it to one will show only the best HSP for every query-subject pair
html flag N/A Produce HTML output
gilist string none Restrict search of database to GI’s listed in this file. Local searches only.
negative_gilist string none Restrict search of database to everything except the GI’s listed in this file. Local searches only.
entrez_query string none Restrict search with the given Entrez query. Remote searches only.
culling_limit integer none Delete a hit that is enveloped by at least this many higher-scoring hits.
best_hit_overhang real none Best Hit algorithm overhang value (recommended value: 0.1)
best_hit_score_edge real none Best Hit algorithm score edge value (recommended value: 0.1)
dbsize integer none Effective size of the database
searchsp integer none Effective length of the search space
import_search_strategy string none Search strategy file to read.
export_search_strategy string none Record search strategy to this file.
parse_deflines flag N/A Parse query and subject bar delimited sequence identifiers (e.g., gi|129295).
num_threads integer 1 Number of threads (CPUs) to use in blast search.
remote flag N/A Execute search on NCBI servers?
outfmt string 0 alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1
10 = Comma-separated values
11 = BLAST archive format (ASN.1)
Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers.
The supported format specifiers are:
qseqid means Query Seq-id
qgi means Query GI
qacc means Query accesion
sseqid means Subject Seq-id
sallseqid means All subject Seq-id(s), separated by a ';'
sgi means Subject GI
sallgi means All subject GIs
sacc means Subject accession
sallacc means All subject accessions
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive-scoring matches
gapopen means Number of gap openings
gaps means Total number of gap
ppos means Percentage of positive-scoring matches
frames means Query and subject frames separated by a '/'
qframe means Query frame
sframe means Subject frame
btop means Blast traceback operations (BTOP)
staxids means unique Subject Taxonomy ID(s), separated by a ';'(in numerical order)
sscinames means unique Subject Scientific Name(s), separated by a ';'
scomnames means unique Subject Common Name(s), separated by a ';'
sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order)
sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order)
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
sstrand means Subject Strand
qcovs means Query Coverage Per Subject (for all HSPs)
qcovhsp means Query Coverage Per HSP
qcovus is a measure of Query Coverage that counts a position in a subject sequence for this measure only once. The second time the position is aligned to the query is not counted towards this measure.
When not provided, the default value is:
'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std'
MODULE SPECIFIC PARAMS:
option task type default value description and notes
word_size blastp integer 3 Word size of initial match. Valid word sizes are 2-7.
word_size blastp-short integer 2 Word size of initial match.
word_size blastp-fast integer 6 Word size of initial match
gapopen blastp and blastp-fast integer 11 Cost to open a gap.
gapextend blastp and blastp-fast integer 1 Cost to extend a gap.
gapopen blastp-short integer 9 Cost to open a gap.
gapextend blastp-short integer 1 Cost to extend a gap.
matrix blastp and blastp-fast string BLOSUM62 Scoring matrix name.
matrix blastp-short string PAM30 Scoring matrix name.
threshold blastp integer 11 Minimum score to add a word to the BLAST lookup table.
threshold blastp-short integer 16 Minimum score to add a word to the BLAST lookup table.
threshold blastp-fast Integer 21 Minimum score to add a word to the BLAST lookup table.
comp_based_stats blastp and blastp-fast string 2 Use composition-based statistics:
D or d: default (equivalent to 2)
0 or F or f: no composition-based statistics
1: Composition-based statistics as in NAR 29:2994-3005, 2001
2 or T or t : Composition-based score adjustment as in Bioinformatics
21:902-911, 2005, conditioned on sequence properties
3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally
comp_based_stats blastp-short string 0 Use composition-based statistics :
D or d: default (equivalent to 2)
0 or F or f: no composition-based statistics
1: Composition-based statistics as in NAR 29:2994-3005, 2001
2 or T or t : Composition-based score adjustment as in Bioinformatics
21:902-911, 2005, conditioned on sequence properties
3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally
seg all string no Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable).
soft_masking all boolean false Apply filtering locations as soft masks (i.e., only for finding initial matches).
lcase_masking all flag N/A Use lower case filtering in query and subject sequence(s).
db_soft_mask all integer none Filtering algorithm ID to apply to the BLAST database as soft mask (i.e., only for finding initial matches).
db_hard_mask all integer none Filtering algorithm ID to apply to the BLAST database as hard mask (i.e., sequence is masked for all phases of search).
xdrop_gap_final all real 25 Heuristic value (in bits) for final gapped alignment/
window_size blastp and blastp-fast integer 40 Multiple hits window size, use 0 to specify 1-hit algorithm.
window_size blastp-short integer 15 Multiple hits window size, use 0 to specify 1-hit algorithm.
use_sw_tback all flag N/A Compute locally optimal Smith-Waterman alignments?</params>
<command>blastp.exe</command>
</module>
<module>
<name>blastx</name>
<category>search</category>
<description>The blastx application translates a nucleotide query and searches it against protein subject sequences or a protein database.
Two different tasks are supported:
1.) "blastx" for standard translated nucleotide-protein comparison and
2.) "blastx-fast", a faster version that uses a larger word-size based on https://www.ncbi.nlm.nih.gov/pubmed/17921491.
Full documentation for this tool can be found on the following page: https://www.ncbi.nlm.nih.gov/books/NBK279684/#appendices.Options_for_the_commandline_a
</description>
<inputFile>[file].fasta</inputFile>
<inputParam>true</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>[results].out</outputFile>
<outputParam>true</outputParam>
<params>Parameters common to all BLAST+ search modules:
Option Type Default value Description/notes
db string none BLAST database name.
query string stdin Query file name.
query_loc string none Location on the query sequence (Format: start-stop)
out string stdout Output file name
evalue real 10.0 Expect value (E) for saving hits
subject string none File with subject sequence(s) to search.
subject_loc string none Location on the subject sequence (Format: start-stop).
show_gis flag N/A Show NCBI GIs in report.
num_descriptions integer 500 Show one-line descriptions for this number of database sequences.
num_alignments integer 250 Show alignments for this number of database sequences.
max_target_seqs Integer 500 Number of aligned sequences to keep. Use with report formats that do not have separate definition line and alignment sections such as tabular (all outfmt > 4). Not compatible with num_descriptions or num_alignments.
max_hsps integer none Maximum number of HSPs (alignments) to keep for any single query-subject pair. The HSPs shown will be the best as judged by expect value. This number should be an integer that is one or greater. If this option is not set, BLAST shows all HSPs meeting the expect value criteria. Setting it to one will show only the best HSP for every query-subject pair
html flag N/A Produce HTML output
gilist string none Restrict search of database to GI’s listed in this file. Local searches only.
negative_gilist string none Restrict search of database to everything except the GI’s listed in this file. Local searches only.
entrez_query string none Restrict search with the given Entrez query. Remote searches only.
culling_limit integer none Delete a hit that is enveloped by at least this many higher-scoring hits.
best_hit_overhang real none Best Hit algorithm overhang value (recommended value: 0.1)
best_hit_score_edge real none Best Hit algorithm score edge value (recommended value: 0.1)
dbsize integer none Effective size of the database
searchsp integer none Effective length of the search space
import_search_strategy string none Search strategy file to read.
export_search_strategy string none Record search strategy to this file.
parse_deflines flag N/A Parse query and subject bar delimited sequence identifiers (e.g., gi|129295).
num_threads integer 1 Number of threads (CPUs) to use in blast search.
remote flag N/A Execute search on NCBI servers?
outfmt string 0 alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1
10 = Comma-separated values
11 = BLAST archive format (ASN.1)
Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers.
The supported format specifiers are:
qseqid means Query Seq-id
qgi means Query GI
qacc means Query accesion
sseqid means Subject Seq-id
sallseqid means All subject Seq-id(s), separated by a ';'
sgi means Subject GI
sallgi means All subject GIs
sacc means Subject accession
sallacc means All subject accessions
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive-scoring matches
gapopen means Number of gap openings
gaps means Total number of gap
ppos means Percentage of positive-scoring matches
frames means Query and subject frames separated by a '/'
qframe means Query frame
sframe means Subject frame
btop means Blast traceback operations (BTOP)
staxids means unique Subject Taxonomy ID(s), separated by a ';'(in numerical order)
sscinames means unique Subject Scientific Name(s), separated by a ';'
scomnames means unique Subject Common Name(s), separated by a ';'
sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order)
sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order)
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
sstrand means Subject Strand
qcovs means Query Coverage Per Subject (for all HSPs)
qcovhsp means Query Coverage Per HSP
qcovus is a measure of Query Coverage that counts a position in a subject sequence for this measure only once. The second time the position is aligned to the query is not counted towards this measure.
When not provided, the default value is:
'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std'
MODULE SPECIFIC PARAMS:
option task type default value description and notes
word_size blastx integer 3 Word size for initial match. Valid word sizes are 2-7.
word_size blastx-fast integer 6 Word size for initial match.
gapopen all integer 11 Cost to open a gap.
gapextend all integer 1 Cost to extend a gap.
matrix all string BLOSUM62 Scoring matrix name.
threshold blastx integer 12 Minimum score to add a word to the BLAST lookup table.
threshold blastx-fast Integer 21 Minimum score to add a word to the BLAST lookup table.
seg all string 12 2.2 2.5 Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable).
soft_masking all boolean false Apply filtering locations as soft masks (i.e., only for finding initial matches).
lcase_masking all flag N/A Use lower case filtering in query and subject sequence(s).
db_soft_mask all integer none Filtering algorithm ID to apply to the BLAST database as soft mask (i.e., only for finding initial matches).
db_hard_mask all integer none Filtering algorithm ID to apply to the BLAST database as hard mask (i.e., sequence is masked for all phases of search).
xdrop_gap_final all real 25 Heuristic value (in bits) for final gapped alignment.
window_size all integer 40 Multiple hits window size, use 0 to specify 1-hit algorithm.
strand all string both Query strand(s) to search against database/subject. Choice of both, minus, or plus.
query_genetic_code all integer 1 Genetic code to translate query, see ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt
max_intron_length all integer 0 Length of the largest intron allowed in a translated nucleotide sequence when linking multiple distinct alignments (a negative value disables linking).
comp_based_stats all integer 2 Use composition-based statistics for blastx:
D or d: default (equivalent to 2)
0 or F or f: no composition-based statistics
1: Composition-based statistics as in NAR 29:2994-3005, 2001
2 or T or t : Composition-based score adjustment as in Bioinformatics
21:902-911, 2005, conditioned on sequence properties
3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally
Default = `2'
</params>
<command>blastx.exe</command>
</module>
<module>
<name>tblastn</name>
<category>search</category>
<description>The tblastn application searches a protein query against nucleotide subject sequences or a nucleotide database translated at search time.
Two different tasks are supported:
1.) "tblastn" for a standard protein-translated nucleotide comparison and
2.) "tblastn-fast" for a faster version with a larger word-size based on https://www.ncbi.nlm.nih.gov/pubmed/17921491.
Full documentation for this tool can be found on the following page: https://www.ncbi.nlm.nih.gov/books/NBK279684/#appendices.Options_for_the_commandline_a
</description>
<inputFile>[file].fasta</inputFile>
<inputParam>true</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>[results].out</outputFile>
<outputParam>true</outputParam>
<params>Parameters common to all BLAST+ search modules:
Option Type Default value Description/notes
db string none BLAST database name.
query string stdin Query file name.
query_loc string none Location on the query sequence (Format: start-stop)
out string stdout Output file name
evalue real 10.0 Expect value (E) for saving hits
subject string none File with subject sequence(s) to search.
subject_loc string none Location on the subject sequence (Format: start-stop).
show_gis flag N/A Show NCBI GIs in report.
num_descriptions integer 500 Show one-line descriptions for this number of database sequences.
num_alignments integer 250 Show alignments for this number of database sequences.
max_target_seqs Integer 500 Number of aligned sequences to keep. Use with report formats that do not have separate definition line and alignment sections such as tabular (all outfmt > 4). Not compatible with num_descriptions or num_alignments.
max_hsps integer none Maximum number of HSPs (alignments) to keep for any single query-subject pair. The HSPs shown will be the best as judged by expect value. This number should be an integer that is one or greater. If this option is not set, BLAST shows all HSPs meeting the expect value criteria. Setting it to one will show only the best HSP for every query-subject pair
html flag N/A Produce HTML output
gilist string none Restrict search of database to GI’s listed in this file. Local searches only.
negative_gilist string none Restrict search of database to everything except the GI’s listed in this file. Local searches only.
entrez_query string none Restrict search with the given Entrez query. Remote searches only.
culling_limit integer none Delete a hit that is enveloped by at least this many higher-scoring hits.
best_hit_overhang real none Best Hit algorithm overhang value (recommended value: 0.1)
best_hit_score_edge real none Best Hit algorithm score edge value (recommended value: 0.1)
dbsize integer none Effective size of the database
searchsp integer none Effective length of the search space
import_search_strategy string none Search strategy file to read.
export_search_strategy string none Record search strategy to this file.
parse_deflines flag N/A Parse query and subject bar delimited sequence identifiers (e.g., gi|129295).
num_threads integer 1 Number of threads (CPUs) to use in blast search.
remote flag N/A Execute search on NCBI servers?
outfmt string 0 alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1
10 = Comma-separated values
11 = BLAST archive format (ASN.1)
Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers.
The supported format specifiers are:
qseqid means Query Seq-id
qgi means Query GI
qacc means Query accesion
sseqid means Subject Seq-id
sallseqid means All subject Seq-id(s), separated by a ';'
sgi means Subject GI
sallgi means All subject GIs
sacc means Subject accession
sallacc means All subject accessions
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive-scoring matches
gapopen means Number of gap openings
gaps means Total number of gap
ppos means Percentage of positive-scoring matches
frames means Query and subject frames separated by a '/'
qframe means Query frame
sframe means Subject frame
btop means Blast traceback operations (BTOP)
staxids means unique Subject Taxonomy ID(s), separated by a ';'(in numerical order)
sscinames means unique Subject Scientific Name(s), separated by a ';'
scomnames means unique Subject Common Name(s), separated by a ';'
sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order)
sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order)
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
sstrand means Subject Strand
qcovs means Query Coverage Per Subject (for all HSPs)
qcovhsp means Query Coverage Per HSP
qcovus is a measure of Query Coverage that counts a position in a subject sequence for this measure only once. The second time the position is aligned to the query is not counted towards this measure.
When not provided, the default value is:
'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std'
MODULE SPECIFIC PARAMS:
option task type default value description and notes
word_size tblastn integer 3 Word size for initial match. Valid word sizes are 2-7.
word_size tblastn-fast integer 6 Word size for initial match.
gapopen all integer 11 Cost to open a gap.
gapextend all integer 1 Cost to extend a gap.
matrix all string BLOSUM62 Scoring matrix name.
threshold tblastn integer 13 Minimum score to add a word to the BLAST lookup table.
threshold tblastn-fast Integer 21 Minimum score to add a word to the BLAST lookup table.
seg all string 12 2.2 2.5 Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable).
soft_masking all boolean false Apply filtering locations as soft masks (i.e., only for finding initial matches).
lcase_masking all flag N/A Use lower case filtering in query and subject sequence(s).
db_soft_mask all integer none Filtering algorithm ID to apply to the BLAST database as soft mask (i.e., only for finding initial matches).
db_hard_mask all integer none Filtering algorithm ID to apply to the BLAST database as hard mask (i.e., sequence is masked for all phases of search).
xdrop_gap_final all real 25 Heuristic value (in bits) for final gapped alignment.
window_size all integer 40 Multiple hits window size, use 0 to specify 1-hit algorithm.
db_gen_code all integer 1 Genetic code to translate subject sequences, see ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt
max_intron_length all integer 0 Length of the largest intron allowed in a translated nucleotide sequence when linking multiple distinct alignments (a negative value disables linking).
comp_based_stats all string 2 Use composition-based statistics for tblastn:
D or d: default (equivalent to 2)
0 or F or f: no composition-based statistics
1: Composition-based statistics as in NAR 29:2994-3005, 2001
2 or T or t : Composition-based score adjustment as in Bioinformatics
21:902-911, 2005, conditioned on sequence properties
3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally
Default = `2'
</params>
<command>tblastn.exe</command>
</module>
<module>
<name>tblastx</name>
<category>search</category>
<description>The tblastx application searches a translated nucleotide query against translated nucleotide subject sequences or a translated nucleotide database.
An option of type "flag" takes no arguments, but if present the argument is true. This table reflects the 2.2.27 BLAST+ release.
Only ungapped searches are supported for tblastx.
Full documentation for this tool can be found on the following page: https://www.ncbi.nlm.nih.gov/books/NBK279684/#appendices.Options_for_the_commandline_a
</description>
<inputFile>[file].fasta</inputFile>
<inputParam>true</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>[results].out</outputFile>
<outputParam>true</outputParam>
<params>Parameters common to all BLAST+ search modules:
Option Type Default value Description/notes
db string none BLAST database name.
query string stdin Query file name.
query_loc string none Location on the query sequence (Format: start-stop)
out string stdout Output file name
evalue real 10.0 Expect value (E) for saving hits
subject string none File with subject sequence(s) to search.
subject_loc string none Location on the subject sequence (Format: start-stop).
show_gis flag N/A Show NCBI GIs in report.
num_descriptions integer 500 Show one-line descriptions for this number of database sequences.
num_alignments integer 250 Show alignments for this number of database sequences.
max_target_seqs Integer 500 Number of aligned sequences to keep. Use with report formats that do not have separate definition line and alignment sections such as tabular (all outfmt > 4). Not compatible with num_descriptions or num_alignments.
max_hsps integer none Maximum number of HSPs (alignments) to keep for any single query-subject pair. The HSPs shown will be the best as judged by expect value. This number should be an integer that is one or greater. If this option is not set, BLAST shows all HSPs meeting the expect value criteria. Setting it to one will show only the best HSP for every query-subject pair
html flag N/A Produce HTML output
gilist string none Restrict search of database to GI’s listed in this file. Local searches only.
negative_gilist string none Restrict search of database to everything except the GI’s listed in this file. Local searches only.
entrez_query string none Restrict search with the given Entrez query. Remote searches only.
culling_limit integer none Delete a hit that is enveloped by at least this many higher-scoring hits.
best_hit_overhang real none Best Hit algorithm overhang value (recommended value: 0.1)
best_hit_score_edge real none Best Hit algorithm score edge value (recommended value: 0.1)
dbsize integer none Effective size of the database
searchsp integer none Effective length of the search space
import_search_strategy string none Search strategy file to read.
export_search_strategy string none Record search strategy to this file.
parse_deflines flag N/A Parse query and subject bar delimited sequence identifiers (e.g., gi|129295).
num_threads integer 1 Number of threads (CPUs) to use in blast search.
remote flag N/A Execute search on NCBI servers?
outfmt string 0 alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1
10 = Comma-separated values
11 = BLAST archive format (ASN.1)
Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers.
The supported format specifiers are:
qseqid means Query Seq-id
qgi means Query GI
qacc means Query accesion
sseqid means Subject Seq-id
sallseqid means All subject Seq-id(s), separated by a ';'
sgi means Subject GI
sallgi means All subject GIs
sacc means Subject accession
sallacc means All subject accessions
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive-scoring matches
gapopen means Number of gap openings
gaps means Total number of gap
ppos means Percentage of positive-scoring matches
frames means Query and subject frames separated by a '/'
qframe means Query frame
sframe means Subject frame
btop means Blast traceback operations (BTOP)
staxids means unique Subject Taxonomy ID(s), separated by a ';'(in numerical order)
sscinames means unique Subject Scientific Name(s), separated by a ';'
scomnames means unique Subject Common Name(s), separated by a ';'
sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order)
sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order)
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
sstrand means Subject Strand
qcovs means Query Coverage Per Subject (for all HSPs)
qcovhsp means Query Coverage Per HSP
qcovus is a measure of Query Coverage that counts a position in a subject sequence for this measure only once. The second time the position is aligned to the query is not counted towards this measure.
When not provided, the default value is:
'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std'
MODULE SPECIFIC PARAMS:
option type default value description and notes
word_size integer 3 Word size for initial match.
matrix string BLOSUM62 Scoring matrix name.
threshold integer 13 Minimum word score to add the word to the BLAST lookup table.
seg string 12 2.2 2.5 Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable).
soft_masking boolean false Apply filtering locations as soft masks (i.e., only for finding initial matches).
lcase_masking flag N/A Use lower case filtering in query and subject sequence(s).
db_soft_mask integer none Filtering algorithm ID to apply to the BLAST database as soft mask (i.e., only for finding initial matches).
db_hard_mask integer none Filtering algorithm ID to apply to the BLAST database as hard mask (i.e., sequence is masked for all phases of search).
strand string both Query strand(s) to search against database subject sequences. Choice of both, minus, or plus.
query_genetic_code integer 1 Genetic code to translate query, see ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt
db_gen_code integer 1 Genetic code to translate subject sequences, see ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt
max_intron_length integer 0 Length of the largest intron allowed in a translated nucleotide sequence when linking multiple distinct alignments (a negative value disables linking)
</params>
<command>tblastx.exe</command>
</module>
<!-- <module>
<name>rpsblast</name>
<category>search</category>
<description>The rpsblast application searches a protein query against the conserved domain database (CDD), which is a set of protein profiles.
Many of the common options such as matrix or word threshold are set when the CDD is built and cannot be changed by the rpsblast application.
A search ready CDD can be downloaded from ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/</description>
<inputFile>[file].fasta</inputFile>
<inputParam>true</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>[results].out</outputFile>
<outputParam>true</outputParam>
<params>
</params>
<command></command>
</module> -->
<module>
<name>Makeblastdb</name>
<category>Database Builder</category>
<description>This application builds a BLAST database. An option of type "flag" takes no arguments, but if present the argument is true.
Full documentation for this tool can be found on the following page: https://www.ncbi.nlm.nih.gov/books/NBK279684/#appendices.Options_for_the_commandline_a
</description>
<inputFile>fasta: for FASTA file(s)
blastdb: for BLAST database(s)
asn1_txt: for Seq-entries in text ASN.1 format
asn1_bin: for Seq-entries in binary ASN.1 format</inputFile>
<inputParam>true</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>true</outputFile>
<outputParam>true</outputParam>
<params>option type default value Description and notes
in string stdin Input file/database name
input_type string fasta Input file type, it may be any of the following:
fasta: for FASTA file(s)
blastdb: for BLAST database(s)
asn1_txt: for Seq-entries in text ASN.1 format
asn1_bin: for Seq-entries in binary ASN.1 format
dbtype string prot Molecule type of input, values can be nucl or prot.
title string none Title for BLAST database. If not set, the input file name will be used.
parse_seqids flag N/A Parse bar delimited sequence identifiers (e.g., gi|129295) in FASTA input.
hash_index flag N/A Create index of sequence hash values.
mask_data string none Comma-separated list of input files containing masking data as produced by NCBI masking applications (e.g. dustmasker, segmasker, windowmasker).
out string input file name Name of BLAST database to be created. Input file name is used if none provided. This field is required if input consists of multiple files.
max_file_size string 1GB Maximum file size to use for BLAST database.
taxid integer none Taxonomy ID to assign to all sequences.
taxid_map string none File with two columns mapping sequence ID to the taxonomy ID. The first column is the sequence ID represented as one of:
1. fasta with accessions (e.g., emb|X17276.1|)
2. fasta with GI (e.g., gi|4)
3. GI as a bare number (e.g., 4)
4. A local ID. The local ID must be prefixed with "lcl" (e.g., lcl|4).
The second column should be the NCBI taxonomy ID (e.g., 9606 for human).
logfile string none Program log file (default is stderr).
</params>
<command>makeblastdb.exe</command>
</module>
<module>
<name>Makeprofiledb</name>
<category>Database</category>
<description>This application builds an RPS-BLAST database. An option of type "flag" takes no arguments, but if present the argument is true.
COBALT (a multiple sequence alignment program) and DELTA-BLAST both use RPS-BLAST searches as part of their processing, but use specialized versions of the database.
This application can build databases for COBALT, DELTA-BLAST, and a standard RPS-BLAST search.
The "dbtype" option (see entry in table) determines which flavor of the database is built.
Full documentation for this tool can be found on the following page: https://www.ncbi.nlm.nih.gov/books/NBK279684/#appendices.Options_for_the_commandline_a</description>
<inputFile>Input file that contains a list of scoremat files (delimited by space, tab, or newline)</inputFile>
<inputParam>true</inputParam>
<outputFile_required>true</outputFile_required>
<outputFile>Name of BLAST database to be created. Input file name is used if none provided.</outputFile>
<outputParam>true</outputParam>
<params>option type default value Description and notes
in string stdin Input file that contains a list of scoremat files (delimited by space, tab, or newline)
binary flag N/A The scoremat files are binary ASN.1
title string none Title for RPS-BLAST database. If not set, the input file name will be used.
threshold real 9.82 Threshold for RPSBLAST lookup table.
out string input file name Name of BLAST database to be created. Input file name is used if none provided.
max_file_size string 1GB Maximum file size to use for BLAST database.
dbtype string rps Specifies use for RPSBLAST db. One of rps, cobalt, or delta.
index flag N/A Creates index files.
gapopen integer none Cost to open a gap. Used only if scoremat files do not contain PSSM scores, otherwise ignored.
gapextend integer none Cost to extend a gap by one residue. Used only if scoremat files do not contain PSSM scores, otherwise ignored.
scale real 100 PSSM scale factor.
matrix string BLOSUM62 Matrix to use in constructing PSSM. One of BLOSUM45, BLOSUM50, BLOSUM62, BLOSUM80, BLOSUM90, PAM250, PAM30 or PAM70. Used only if scoremat files do not contain PSSM scores, otherwise ignored.
obsr_threshold real 6 Exclude domains with maximum number of independent observations below this value (for use in DELTA-BLAST searches).
exclude_invalid real true Exclude domains that do not pass validation test (for use in DELTA-BLAST searches).
logfile string none Program log file (default is stderr).
</params>
<command>makeprofiledb.exe</command>
</module>
<module>
<name>Blastdbcmd</name>
<category>Database</category>
<description>This application reads a BLAST database and produces reports.
Full documentation for this tool can be found on the following page: https://www.ncbi.nlm.nih.gov/books/NBK279684/#appendices.Options_for_the_commandline_a</description>
<inputFile>BLAST database</inputFile>
<inputParam>true</inputParam>