Workflow: pxd_to_speclib

Snakemake workflow that reanalyzes proteomics data from a PRIDE Archive project to create a spectral library.

The workflow goes through the following steps:

Download RAW files from PRIDE Archive for given PXD identifier
Convert RAW files to MGF using the CompOmics ThermoRawFileParser
Search with MSGFPlus
Generate Percolator input files
Postprocess search results with Percolator
Parse results to generate a spectral library

Requirements

Conda (tested on Linux)

Run the workflow

Create and activate the environment:

conda env create -f envs/pxd_to_speclib.yml
conda activate pxd_to_speclib

Setup your configuration:
- conf/snakemake_config.json (see Configuration)
- conf/msgfplus_params.txt
- Add required input files (e.g. fasta sequence database)
Run the workflow:
- To create a general spectral library: snakemake . --use-conda
- To create a calibrated retention time dataset: snakemake --snakefile make_rt_lib.smk --use-conda

Configuration

All settings can be found in a JSON configuration file: conf/snakemake_config.json.

Section	Option	Default value	Description
download	pxd_identifier	"PXD000000"	PXD identifier of PRIDE Archive project to download.
	file_pattern	".*"	Regular expression that matches all raw file filenames to download (`.*` matches all filenames).
convert	exec	"ThermoRawFileParser.sh"	Executable command to call ThermoRawFileParser. See Note 1.
search	msgfplus_conf	"conf/msgfplus_params.txt"	Path to MSGFPlus configuration file.
	fasta	"path/to/search_db.fasta"	Path to protein fasta. Important: MSGFPlus will add decoy peptides by default; they should not yet be present in the given fasta file.
	msgfplus_exec	"msgf_plus"	Executable command to call MSGFPlus. See Note 2.
	threads_per_search	5	Number of threads per MSGFPlus search. See Note 3.

Note 1

ThermoRawFileParser executable
To use a custom version of the ThermoRawFileParser, change convert > exec to:

Using environment TRFP: "ThermoRawFileParser.sh"
Using custom TRFP: "mono /path/to/ThermoRawFileParser.exe"

Note 2

MSGFPlus executable
Idem for MSGFPlus, change search > msgfplus_exec to:

Using environment MSGFPlus: msgf_plus
Using custom jar file: "msgfplus_exec": "java -Xmx5000M -jar /path/to/MSGFPlus/MSGFPlus.jar"

The latter allows a custom memory limit for the Java VM. By default, this is 1GB.

Note 3

Indirectly limit memory usage while searching with threads_per_search
The option search > threads_per_search defines the number of threads each individual search can use. In combination with the snakemake --cores x option, this allows you to constrict the number of parallel searches. E.g.: the combination of --cores 24 and threads_per_search: 6 limits the number of parallel searches to 4. This can be convenient if you would run into memory issues caused by too many parallel searches.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
conf		conf
envs		envs
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
get_data.smk		get_data.smk
make_rt_lib.smk		make_rt_lib.smk
make_speclib.smk		make_speclib.smk
search_data.smk		search_data.smk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Workflow: pxd_to_speclib

Requirements

Run the workflow

Configuration

Note 1

Note 2

Note 3

About

Releases 2

Packages

Languages

License

RalfG/workflow_pxd_to_speclib

Folders and files

Latest commit

History

Repository files navigation

Workflow: pxd_to_speclib

Requirements

Run the workflow

Configuration

Note 1

Note 2

Note 3

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages