Tool for pipeing inputs and outputs of multiple cli tools. Pigeon takes in only a config file as input. Everything required to run the pipeline are specified in config file. The config file is specified according to python configparser.
sudo pip3 install --index-url https://test.pypi.org/simple/ pigeon
pip install --index-url https://test.pypi.org/simple/ pigeon
None of the tools or data files are supplemented by pigeon so they need to be downloaded. For example configuration file, exome sequencing pipeline,
- Tools
- Reference Files
- Reference genome and known SNP&INDELS
- hg19 or
- hg38
- Bed file
- See website of capture kit used in sequencing
- Reference genome and known SNP&INDELS
Create yourself a configuration file
pigeon createconfig
Modify for your analysis. (See below.)
pigeon -c my_config.conf -d
If everything looks alright run for real.
pigeon -c my_config.conf
Config file consists of three parts.
- General
- Pipeline
- Individual tool blocks
Area used to define project name, output directory, input files, and resource files like reference genome or target file. Following variables are necessary for run.
Required:
-
project_name : name of your project
-
output_dir : where to write output files
-
input_files : input files for analysis, space separated, pairs should be next to each other. e.g.
input_files = A.txt B.txt C.txt
or for paired
input_files = A1.txt A2.txt B1.txt B2.txt C1.txt C2.txt
Optional variable can also be decleared here. Based on your or tool requirements. Later these variables can be called in the config file using ${GENERAL:optional_variable}.
Optional(example):
reference_genome = /path/to/my/reference_genome.fa
bed_file = /path/to/my/target.bed
known_snp = /path/to/my/snp.vcf
my_database = /path/to/my/favorite.db
This area should contain paths to tools that is understanble by your shell. As well as the run order of tools. e.g.
pipeline = job1 job2 job3
A = path/to/A
B = path/to/B
C = path/to/C
Name of the block should be same as in pipeline. By continuing example above;
[job1]
[job2]
[job3]
Arguments that can be used in these blocks as follows:
-
tool: tool variable from pipeline block. e.g.
tool = ${PIPELINE:A}
-
sub_tool: if tool has a sub tool like 'bwa mem'. e.g.
sub_tool = mem
-
args: arguments of the tools
-
java: if tool is a jar file add java -jar before it.
-
pass: if True it won't run the block. But the block still be part of the pipeline. This option is helpful for resuming interrupted pipeline.
-
input_from: Name of the block that that's output is this jobs input. First jobs input_from should be input_files.
-
input_multi: can be 'paired' or 'all'. Paired option splits input files stream into groups of two. All option uses all of the input files.
-
input_flag_repeat: If tool requires input flag for each input this command will add given flag before each input.
-
secondary_in_placeholder
-
suffix: add suffix to output file name
-
ext: file extension of the output
-
dump_dir: creates a directory and outputs there.
-
paired_output: this option will pair the input and the output of the tool.
-
secondary_out_placeholder
-
secondary_suffix
-
secondary_ext
-
secondary_dump_dir
These are joker words that can be used in args.
-
input_placeholder
-
secondary_input_placeholder
-
output_placeholder
-
secondary_output_placeholder
[[GENERAL]]
project_name = my_project
output_dir = /path/to/output_directory
input_files = A.txt B.txt C.txt
my_db = /path/to/my.db
[[PIPELINE]]
pipeline = job1 job2 job3
A = /path/to/A
B = /path/to/B
C = /path/to/C
[job1]
tool = A
input_from = input_files
args = -i input_placeholder -o output_placeholder
suffix = job1_A
ext = txt
[job2]
tool = B
input_from = job1
args = -i input_placeholder -o output_placeholder
suffix = job2_B
ext = txt
[job3]
tool = C
input_from = job2
args = -i input_placeholder -o output_placeholder
suffix = job3_C
ext = txt