From a05131c94e5d5382f6dda7c5273a2f060509198c Mon Sep 17 00:00:00 2001
From: alchemistmatt
Date: Thu, 28 Jun 2018 15:49:40 -0700
Subject: [PATCH] Update documentation
---
README.md | 20 ++--
doc/Changelog.html | 138 +++++++++++++++------------
doc/MS-GFDB.html | 2 +-
doc/MSGFDB_ModFile.html | 102 +++++++++++---------
doc/MSGFPlus.html | 196 +++++++++++++++++++++++++--------------
doc/MzidToTsv.html | 4 +-
doc/ScoringParamGen.html | 2 +-
doc/examples/Mods.txt | 47 +++++-----
doc/index.html | 10 +-
9 files changed, 313 insertions(+), 208 deletions(-)
diff --git a/README.md b/README.md
index 9bbdc60b..fbd5a221 100644
--- a/README.md
+++ b/README.md
@@ -16,8 +16,8 @@ and Concatenated DTA files (_dta.txt).
Requirements
======
-JRE 1.6 or greater\
-Memory 2GB or greater (recommended 4GB); larger FASTA files require more memory
+Java Runtime v1.6 or higher (use 64-bit Java)\
+At least 2GB of memory (recommended to use 4GB); larger FASTA files require more memory
Downloads / Updates
======
@@ -41,7 +41,7 @@ Place MSGFPlus.jar in any folder
Usage Information
======
-Type `java -jar MSGFPlus.jar` for command line arguments
+Type `java -jar MSGFPlus.jar` for command line arguments.
To convert an mzid output file into a tsv file, run `java -cp MSGFPlus.jar edu.ucsd.msjava.ui.MzIDToTsv`
@@ -50,7 +50,7 @@ It is a C# application that works on Windows or on Linux using mono.
Download the Mzid-To-Tsv-Converter from GitHub.
For detailed documentation, see the "doc" subfolder, or visit:
-* [GitHub repo HTML help pages](http://htmlpreview.github.io/?https://github.com/MSGFPlus/msgfplus/blob/master/doc/index.html)
+* [GitHub repo HTML help pages](https://htmlpreview.github.io/?https://github.com/MSGFPlus/msgfplus/blob/master/doc/index.html)
* https://omics.pnl.gov/software/ms-gf
* https://bix-lab.ucsd.edu/pages/viewpage.action?pageId=13533355
@@ -63,13 +63,13 @@ Sangtae Kim [sangtae.kim (at) gmail.com]
Publications
======
-MS-GF+: Universal Database Search Tool for Mass Spectrometry, Sangtae Kim, Pavel A. Pevzner,
-Nat Commun. 2014 Oct 31;5:5277. doi: 10.1038/ncomms6277.
-http://www.ncbi.nlm.nih.gov/pubmed/?term=25358478
+MS-GF+ makes progress towards a universal database search tool for proteomics, Sangtae Kim and Pavel A. Pevzner,
+Nat Commun. 2014 Oct 31;5:5277. doi: 10.1038/ncomms6277.\
+https://www.ncbi.nlm.nih.gov/pubmed/?term=25358478
-Spectral Probabilities and Generating Functions of Tandem Mass Spectra: A Strike against Decoy Databases, Sangtae Kim, Nitin Gupta and Pavel Pevzner,
-J Proteome Res. 2008 Aug;7(8):3354-63. doi: 10.1021/pr8001244.
-http://www.ncbi.nlm.nih.gov/pubmed/?term=18597511
+Spectral Probabilities and Generating Functions of Tandem Mass Spectra: A Strike against Decoy Databases, Sangtae Kim, Nitin Gupta, and Pavel Pevzner,
+J Proteome Res. 2008 Aug;7(8):3354-63. doi: 10.1021/pr8001244.\
+https://www.ncbi.nlm.nih.gov/pubmed/?term=18597511
Source
======
diff --git a/doc/Changelog.html b/doc/Changelog.html
index f639495f..6121af51 100644
--- a/doc/Changelog.html
+++ b/doc/Changelog.html
@@ -14,7 +14,29 @@ MS-GF+ ChangeLog
- 01/30/2018 v2018.01.30
+ v2018.06.28
+
+
+ - Add option to specify the maximum number of missed cleavages (thanks to Sean at CWRU-CPB)
+ - Make command line argument names case-insensitive
+ - Add additional checks to prevent the output of duplicate peptide evidences (fixes Issue #24)
+
+
+
+ v2018.04.09
+
+
+ - When the EnzymeID is 9 (NoCleavage) do not cleavage after any residue (thanks to Sean at CWRU-CPB)
+
+ - Previously EnzymeID 0 and EnzymeID 9 were identical.
+ - Now EnzymeID 0 cleaves after every residue while EnzymeID 9 won't cleave after any residue (useful for peptidomics)
+
+
+ - Allow the fasta file to have extension .faa (in addition to .fasta and .fa)
+
+
+
+ v2018.01.30
- MzIdentML creation: Don't output an empty ModificationParams element when there are no modifications (because not including it complies with the XML Schema, while including it with no child nodes does not)
@@ -23,21 +45,21 @@ MS-GF+ ChangeLog
- 08/23/2017 v2017.08.23
+ v2017.08.23
- MzIdentML creation: Change how the peptide ids are created, to further reduce possibility of duplicate peptide ids
- 07/21/2017 v2017.07.21
+ v2017.07.21
- Performance improvements when reading mzML and mzIdentML files
- 05/18/2017 v2017.05.18
+ v2017.05.18
- Add scoring parameters for UVPD TMT6Plex
@@ -48,7 +70,7 @@ MS-GF+ ChangeLog
- 01/27/2017 v2017.01.27
+ v2017.01.27
- Reduce the number of exceptions seen when a thread exits with an exception - There were multiple exceptions being displayed that were the result of killing the other threads.
@@ -60,7 +82,7 @@ MS-GF+ ChangeLog
- 01/10/2017 v2017.01.10
+ v2017.01.10
- Allow CustomAminoAcid formulas to not specify a number if it is 1.
@@ -68,42 +90,42 @@ MS-GF+ ChangeLog
- 12/12/2016 v2016.12.12
+ v2016.12.12
- Properly output the unitCvRef when outputting the scan start time. Since v2016.08.31
- 12/08/2016 v2016.12.08
+ v2016.12.08
- Internal PeptideId change for mzid files
Last version had a possibility of duplicate peptide ID strings, when the same mod was possible on the first residue and the N-terminal, or on the last residue and the C-terminal. N-terminal mods are now added prior to the first residue, prefixed with '[', and C-terminal mods are added after the last residue, prefixed by ']'
- 12/02/2016 v2016.12.02
+ v2016.12.02
- Fix some oddities with the mzid file peptide id strings
- 11/29/2016 v2016.11.29
+ v2016.11.29
- Clean up some debugging messages, and limit the number of times in a single search
- 10/26/2016 v2016.10.26
+ v2016.10.26
- Minimum spectra per thread reduced from 1000 to 250
- 10/24/2016 v2016.10.24
+ v2016.10.24
- Return a non-zero exit code if an error occurs
@@ -111,28 +133,28 @@ MS-GF+ ChangeLog
- 10/14/2016 v2016.10.14
+ v2016.10.14
- Fix: handling of -m (FragmentMethodID) when processing .mgf files
- 10/10/2016 v2016.10.10
+ v2016.10.10
- Fix: mzid output - Peptide IDs were being mishandled and led to creating incorrect PeptideEvidence references in SpectrumIdentificationItems. Also, added more information to the Peptide IDs to decrease the possibility of ID collision on edge cases.
- 09/22/2016 v2016.09.22
+ v2016.09.22
- New: Add the ability to set the charge carrier mass to something besides the mass of a proton.
- 08/31/2016 v2016.08.31
+ v2016.08.31
- Fix: output the scan start time units to the mzid file when the input is mzML; previously only the value (without the units specified) was being output, which was ambiguous and did not comply with the CV specification
@@ -140,7 +162,7 @@ MS-GF+ ChangeLog
- 07/26/2016 v2016.07.26
+ v2016.07.26
- Add UVPD as a dissociation method
@@ -149,7 +171,7 @@ MS-GF+ ChangeLog
- 06/29/2016 v2016.06.29
+ v2016.06.29
- Clean up the mzid output a little, reducing file size (v2016.06.15 introduced a change that resulted in larger mzid files)
@@ -157,7 +179,7 @@ MS-GF+ ChangeLog
- 06/15/2016 v2016.06.15
+ v2016.06.15
- Fix the mzid output when the modification is unknown to unimod. (cvRef now correctly references PSI-MS, and the value will be the name provided in the Mods.txt file)
@@ -166,7 +188,7 @@ MS-GF+ ChangeLog
- 05/25/2016 v2016.05.25
+ v2016.05.25
- Output the residue letter and mass of any custom amino acids to the "MassTable" portion of the mzid.
@@ -174,21 +196,21 @@ MS-GF+ ChangeLog
- 02/12/2016 v2016.02.12
+ v2016.02.12
- Ensure that ETciD and EThcD are handled as ETD when using a _dta.txt file with a _ScanType.txt file
- 01/29/2016 v2016.01.29
+ v2016.01.29
- Added ability to enter custom amino acids using the Mods.txt file.
- 01/21/2016 v2016.01.21
+ v2016.01.21
- Changed the versioning system - SVN revisions don't work for Git repositories. Now commit date is used for the version.
@@ -198,7 +220,7 @@ MS-GF+ ChangeLog
- 07/16/2014 v10089
+ 2014-07-16 v10089
- Fixed a bug that crashes when C-term mod mass is below -57Da.
@@ -206,7 +228,7 @@ MS-GF+ ChangeLog
- 06/30/2014 v10072
+ 2014-06-30 v10072
- Optimization for multithreaded performance.
@@ -214,7 +236,7 @@ MS-GF+ ChangeLog
- 02/10/2014 v9949
+ 2014-02-10 v9949
- New scoring parameters are added for HCD/Q-Exactive/Trypsin/TMT. Parameters for HCD/HighRes/Trypsin/TMT have also been changed. As a result, for HCD spectra of TMT peptides, the number of identifications has been significantly increased.
@@ -225,7 +247,7 @@ MS-GF+ ChangeLog
- 08/28/2013 v9881
+ 2013-08-28 v9881
- Change in database indexing format. The index file keeps non-standard amino acids (characters other than 20 standard residue characters). Previously, non-standard amino acids were converted into ‘?’ while indexing. It caused a problem when converting mzid into pepXML using idconvert (ProteoWizard).
@@ -234,7 +256,7 @@ MS-GF+ ChangeLog
- 08/28/2013 v9733
+ 2013-08-28 v9733
- Added parameters for CID-LowRes-NoCleavage.
@@ -242,7 +264,7 @@ MS-GF+ ChangeLog
- 04/03/2013 v9517
+ 2013-04-03 v9517
- Previously separate SpectrumIdentificationItems were created for the same peptide if "pre" is different (e.g. R.SIPDSMNYGDEEENK and K.SIPDSMNYGDEEENK). Now they show up as the same SpectrumIdentificationItem. If the score is different due to different NTT (e.g. G.SIPDSMNYGDEEENK), a separate SpectrumIdentificationItem is created.
@@ -250,14 +272,14 @@ MS-GF+ ChangeLog
- 04/03/2013 v9501
+ 2013-04-03 v9501
- Previously, spectra are ignored in the search if the number of peaks is less than 20 for all types spectra. Now, for TOF spectra (i.e. -inst 2), this number has been changed to 3.
- 04/02/2013 v9494
+ 2013-04-02 v9494
- The following features are added in the SpectrumIdentificationItem when "-addFeatures 1"
@@ -271,7 +293,7 @@
MS-GF+ ChangeLog
- 03/25/2013 v9436
+ 2013-03-25 v9436
- Added TMT scoring model for HCD/HighRes (-m 3 -inst 1)
@@ -280,7 +302,7 @@ MS-GF+ ChangeLog
- 03/05/2013 v9324
+ 2013-03-05 v9324
- Added Q-Exactive unlabeled phosphorylation parameters
@@ -288,21 +310,21 @@ MS-GF+ ChangeLog
- 02/27/2013 v9324
+ 2013-02-27 v9324
- Minor bug fix: mistakenly assigning 3 mods to a N-term amino acid.
- 02/15/2013 v9312
+ 2013-02-15 v9312
- Added scoring parameter sets for Q-Exactive iTRAQ (-inst 3 -protocol 2) and iTRAQ phosphopeptide enriched (-inst 3 -protocol 3) samples.
- 02/15/2013 v9284
+ 2013-02-15 v9284
- When "-protocol" parameter is missing, MS-GF+ automatically selects an appropriate protocol depending on the modification file.
@@ -310,21 +332,21 @@ MS-GF+ ChangeLog
- 02/14/2013 v9249
+ 2013-02-14 v9249
- Scoring parameters for Q-Exactive (-inst 3) have been added.
- 02/04/2013 v9244
+ 2013-02-04 v9244
- A bug (crash with an exception) in edu.ucsd.msjava.ui.ScoringParamGen has been fixed.
- 01/03/2013 v9176
+ 2013-01-03 v9176
- "-ti" parameter accepts only two comma separated integers, i.e., "-ti 1" will be rejected.
@@ -332,7 +354,7 @@ MS-GF+ ChangeLog
- 12/19/2012 v9107
+ 2012-12-19 v9107
- "No enzyme" (-e 1) was renamed to "unspecific cleavage".
@@ -347,7 +369,7 @@ MS-GF+ ChangeLog
- 12/10/2012 v9014
+ 2012-12-10 v9014
- The following bug has been fixed
@@ -358,7 +380,7 @@
MS-GF+ ChangeLog
- 11/30/2012 v9012
+ 2012-11-30 v9012
- The following bugs have been fixed
@@ -373,14 +395,14 @@
MS-GF+ ChangeLog
- 11/09/2012 v8884
+ 2012-11-09 v8884
- Bug fix: crashing while reading mzML files converted from wiff
- 11/09/2012 v8873
+ 2012-11-09 v8873
- Fixed a bug of ignoring N-term peptide when N-term Met cleaved peptide exists.
@@ -388,14 +410,14 @@ MS-GF+ ChangeLog
- 10/30/2012 v8806
+ 2012-10-30 v8806
- Fixed a bug in edu.ucsd.msjava.misc.MS2ToMgf
- 10/29/2012 v8792
+ 2012-10-29 v8792
- Fixed bugs to ignore N-term fixed mods in the output
@@ -403,28 +425,28 @@ MS-GF+ ChangeLog
- 10/11/2012 v8719
+ 2012-10-11 v8719
- Fixed bugs reporting NaN as additional features when spectrum has charge 0
- 10/04/2012 v8605
+ 2012-10-04 v8605
- Updates to handle multiple charge states in the ms2 format
- 10/03/2012 v8597
+ 2012-10-03 v8597
- Fixed the bug to erroneously report precursorMz in converted tsv files
- 09/26/2012 v8540
+ 2012-09-26 v8540
- 09/20/2012 v8490
+ 2012-09-20 v8490
- Fix the following bugs:
@@ -448,7 +470,7 @@
MS-GF+ ChangeLog
- 09/18/2012 v8477
+ 2012-09-18 v8477
- The extension of target/decoy concatenated database file has changed from .revConcat.fasta to .revCat.fasta.
@@ -457,7 +479,7 @@ MS-GF+ ChangeLog
- 09/18/2012 v8472
+ 2012-09-18 v8472
- Bug fix: N-term or C-term residue-specific modifications (e.g. pyro-glu from Q) will have locations 1 (N-term) or length (C-term).
@@ -465,14 +487,14 @@ MS-GF+ ChangeLog
- 09/17/2012 v8449
+ 2012-09-17 v8449
- Fix a bug in parsing Agilent mzML files
- 09/13/2012 v8442
+ 2012-09-13 v8442
- Fix the bug ignoring PSMs with IsotopeError=0, when MinIsotopeError was negative
@@ -486,7 +508,7 @@ MS-GF+ ChangeLog
- 08/30/2012 v8299
+ 2012-08-30 v8299
@@ -496,7 +518,7 @@ MS-GF+ ChangeLog
- 08/29/2012 v8297
+ 2012-08-29 v8297
- Fix minor bugs
@@ -504,7 +526,7 @@ MS-GF+ ChangeLog
- 08/27/2012 v8283
+ 2012-08-27 v8283
- Fix minor bugs
diff --git a/doc/MS-GFDB.html b/doc/MS-GFDB.html
index f09aa35d..a7608650 100644
--- a/doc/MS-GFDB.html
+++ b/doc/MS-GFDB.html
@@ -226,7 +226,7 @@ Parameters:
-mod ModificationFile (Default: standard amino acids with fixed C+57)]
- Modification file name. ModificationFile contains the modifications to be considered in the search.
- - If -mod option is not specified, standard amino acids with fixed Carboamidomethylation C will be used.
+ - If -mod option is not specified, standard amino acids with fixed Carbamidomethylation C will be used.
- See an example MS-GFDB modification file.
diff --git a/doc/MSGFDB_ModFile.html b/doc/MSGFDB_ModFile.html
index 07dd24ef..4ea659c4 100644
--- a/doc/MSGFDB_ModFile.html
+++ b/doc/MSGFDB_ModFile.html
@@ -9,6 +9,12 @@
MS-GFDB Modification File Example
+
+ This mod file was used by MS-GFDB, an old application that is no longer under development.
+ It was supserseded by MS-GF+, which supports a
+ modification file with additional features
+
+
- # This file is used to specify modifications
- # # for comments
- #
- # Max Number of Modifications per peptide (default 2). If this value is large, the search takes long.
- NumMods=2
-
- # To input a modification, use the following command:
- # Mass or CompositionStr, Residues, ModType, Position, Name (all the five fields are required).
- # CompositionStr (C[Num]H[Num]N[Num]O[Num]S[Num]P[Num])
- # - C (Carbon), H (Hydrogen), N (Nitrogen), O (Oxygen), S (Sulfer) and P (Phosphorus) are allowed.
- # - Atom can be omitted. The sequence of atoms must be followed.
- # - Negative numbers are allowed.
- # - E.g. C2H2O1 (valid), H2C1O1 (invalid)
- # Mass can be used instead of CompositionStr. It is important to specify accurate masses (integer masses are insufficient).
- # - E.g. 15.994915
- # Residues: affected amino acids (must be upper letters)
- # - Must be upper letters or *
- # - Use * if this modification is applicable to any residue.
- # - * should not be "anywhere" modification (e.g. "15.994915, *, opt, any, Oxidation" is not allowed.)
- # - E.g. NQ, *
- # ModType: "fix" for fixed modifications, "opt" for variable modifications (case insensitive)
- # Position: position in the peptide where the modification can be attached.
- # - One of the following five values should be used:
- # - any (anywhere), N-term (peptide N-term), C-term (peptide C-term), Prot-N-term (protein N-term), Prot-C-term (protein C-term)
- # - Case insensitive
- # - "-" can be omitted
- # - E.g. any, Any, Prot-n-Term, ProtNTerm => all valid
- # Name: name of the modification
- # - E.g. Oxidation
-
- C2H3N1O1,C,fix,any,Carbamidomethylation # Fixed Carbamidomethyl C
- # Variable Modifications (default: none)
- O1,M,opt,any,Oxidation # Oxidation M
- #15.994915,M,opt,any,Oxidation # Oxidation M (mass is used instead of CompositionStr)
- #H-1N-1O1,NQ,opt,any,Deamidation # Negative numbers are allowed.
- #C2H3NO,*,opt,N-term,Carbamidomethylation # Variable Carbamidomethyl N-term
- #H-2O-1,E,opt,N-term,Pyro-glu # Pyro-glu from E
- #H-3N-1,Q,opt,N-term,Pyro-glu # Pyro-glu from Q
- #C2H2O,*,opt,Prot-N-term,Acetylation # Acetylation Protein N-term
- #C2H2O1,K,opt,any,Acetylation # Acetylation K
- #CH2,K,opt,any,Methylation # Methylation K
- #H3O4P,STY,opt,any,Phosphorylation # Phosphorylation STY
-
+
+
# This file is used to specify modifications
+# Use # for comments, either at the start of a line or in the middle of line
+
+# Max Number of Dynamic/Variable Modifications per peptide (default 2).
+# If this value is large, the search takes long.
+NumMods=2
+
+# To input a modification, use the following command:
+# Mass or CompositionStr, Residues, ModType, Position, Name (all the five fields are required).
+# CompositionStr (C[Num]H[Num]N[Num]O[Num]S[Num]P[Num])
+# - C (Carbon), H (Hydrogen), N (Nitrogen), O (Oxygen), S (Sulfer) and P (Phosphorus) are allowed.
+# - Atom can be omitted. The sequence of atoms must be followed.
+# - Negative numbers are allowed.
+# - E.g. C2H2O1 (valid), H2C1O1 (invalid)
+# Mass can be used instead of CompositionStr. It is important to specify accurate monoisotopic masses (integer masses are insufficient).
+# - E.g. 15.994915
+# Residues: affected amino acids (must be upper letters)
+# - Must be upper letters or *
+# - Use * if this modification is applicable to any residue.
+# - * should not be "anywhere" modification (e.g. "15.994915, *, opt, any, Oxidation" is not allowed.)
+# - E.g. NQ, *
+# ModType: "fix" for fixed modifications, "opt" for variable modifications (case insensitive)
+# Position: position in the peptide where the modification can be attached.
+# - One of the following five values should be used:
+# - any (anywhere), N-term (peptide N-term), C-term (peptide C-term), Prot-N-term (protein N-term), Prot-C-term (protein C-term)
+# - Case insensitive
+# - "-" can be omitted
+# - E.g. any, Any, Prot-n-Term, ProtNTerm are all valid
+# Name: name of the modification (Unimod PSI-MS name)
+# - E.g. Oxidation
+
+# Static (fixed) modifications:
+C2H3N1O1,C,fix,any,Carbamidomethylation # Fixed Carbamidomethyl C
+
+# Variable Modifications (default: none)
+O1,M,opt,any,Oxidation # Oxidation M
+
+# Additional Modification Examples
+# C2H3N1O1,C,fix,any,Carbamidomethyl # Fixed Carbamidomethyl C (alkylation)
+# O1,M,opt,any,Oxidation # Oxidation M
+# 15.994915,M,opt,any,Oxidation # Oxidation M (mass is used instead of CompositionStr)
+# H-1N-1O1,NQ,opt,any,Deamidated # Negative numbers are allowed.
+# CH2,K,opt,any,Methyl # Methylation K
+# C2H2O1,K,opt,any,Acetyl # Acetylation K
+# HO3P,STY,opt,any,Phospho # Phosphorylation STY
+# C2H3NO,*,opt,N-term,Carbamidomethyl # Variable Carbamidomethyl N-term
+# H-2O-1,E,opt,N-term,Glu->pyro-Glu # Pyro-glu from E
+# H-3N-1,Q,opt,N-term,Gln->pyro-Glu # Pyro-glu from Q
+# C2H2O,*,opt,Prot-N-term,Acetyl # Acetylation Protein N-term
+
+
+