-
Notifications
You must be signed in to change notification settings - Fork 1
/
BARtab.nf
291 lines (246 loc) · 14 KB
/
BARtab.nf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
/**
BARtab: A Nextflow pipeline to tabulate synthetic barcode counts from NGS data.
Author: Dane Vassiliadis, Henrietta Holze
Affiliation: Peter MacCallum Cancer Centre, Melbourne, Australia
**/
nextflow.enable.dsl = 2
//--------------------------------------------------------------------------------------
// Help message
//--------------------------------------------------------------------------------------
// https://www.coolgenerator.com/ascii-text-generator Delta Corps Priest 1
logo = """
▀█████████▄ ▄████████ ▄████████ ███ ▄████████ ▀█████████▄
███ ███ ███ ███ ███ ███ ▀█████████▄ ███ ███ ███ ███
███ ███ ███ ███ ███ ███ ▀███▀▀██ ███ ███ ███ ███
▄███▄▄▄██▀ ███ ███ ▄███▄▄▄▄██▀ ███ ▀ ███ ███ ▄███▄▄▄██▀
▀▀███▀▀▀██▄ ▀███████████ ▀▀███▀▀▀▀▀ ███ ▀███████████ ▀▀███▀▀▀██▄
███ ██▄ ███ ███ ▀███████████ ███ ███ ███ ███ ██▄
███ ███ ███ ███ ███ ███ ███ ███ ███ ███ ███
▄█████████▀ ███ █▀ ███ ███ ▄████▀ ███ █▀ ▄█████████▀
███ ███
"""
def helpMessage() {
log.info logo + """
---------------------- Tabulate Barcode Counts in NGS data ----------------------
Version = 1.4.0
Usage: nextflow run danevass/bartab --indir <input dir>
--outdir <output dir>
--ref <path/to/reference/fasta>
--mode <single-bulk | paired-bulk | single-cell>
Input/output arguments:
--indir Directory containing input *.fastq.gz files.
Must contain R1 and R2 if running in mode paired-bulk or single-cell.
For single-cell mode, directory can contain BAM files.
--input_type Input file type, either fastq or bam, only relevant for single-cell mode [default = fastq]
--ref Path to a reference fasta file for the barcode / sgRNA library.
If null, reference-free workflow will be used for single-bulk and paired-bulk modes.
--mode Workflow to run. <single-bulk, paired-bulk, single-cell>
--outdir Output directory to place output [default = './']
Read merging arguments:
--mergeoverlap Length of overlap required to merge paired-end reads [default = 10]
Filtering arguments:
--minqual Minimum PHRED quality per base [default = 20]
--pctqual Percentage of bases within a read that must meet --minqual [default = 80]
--complexity_threshold Complexity filter [default = 0]
Minimum percentage of bases that are different from their next base (base[i] != base[i+1])
Trimming arguments:
--constants Which constant regions flanking barcode to search for in reads: up, down or both.
"all" runs all 3 modes and combines the results. <up, down, both, all> [default = 'up']
--upconstant Sequence of upstream constant region [default = 'CGATTGACTA'] // SPLINTR 1st gen upstream constant region
--downconstant Sequence of downstream constant region [default = 'TGCTAATGCG'] // SPLINTR 1st gen downstream constant region
--up_coverage Number of bases of the upstream constant that must be covered by the sequence [default = 3]
--down_coverage Number of bases of the downstream constant that must be covered by the sequence [default = 3]
--constantmismatches Proportion of mismatched bases allowed in constant regions [default = 0.1]
--min_readlength Minimum length of barcode sequence [default = 20]
--barcode_length Optional. Length of barcode if it is the same for all barcodes.
If constant regions are trimmed on both ends, reads are filtered for this length.
If either constant region is trimmed, this is the maximum sequence length.
If barcode_length is set, alignments to the middle of a barcode sequence are filtered out.
Mapping arguments:
--alnmismatches Number of allowed mismatches during reference mapping [default = 2]
--barcode_length (see trimming arguments)
--cluster_unmapped Cluster unmapped reads with starcode [default = false]
Reference-free arguments:
--cluster_distance Defines the Levenshtein distance for clustering lineage barcodes [default = 3].
--cluster_ratio Cluster ratio for message passing clustering.
A cluster of barcode sequences can absorb a smaller one only if it is at least x times bigger [default = 3].
Sincle-cell arguments:
--cb_umi_pattern Cell barcode and UMI pattern on read 1, required for fastq input.
N = UMI position, C = cell barcode position [default = CCCCCCCCCCCCCCCCNNNNNNNNNNNN]
--cellnumber Number of cells expected in sample, only required when fastq provided. whitelist_indir and cellnumber are mutually exclusive
--whitelist_indir Directory that contains a cell ID whitelist for each sample <sample_id>_whitelist.tsv
--umi_dist Hamming distance between UMIs to be collapsed during counting [default = 1]
--umi_count_filter Minimum number of UMIs per barcode per cell [default = 1]
--umi_fraction_filter Minimum fraction of UMIs per barcode per cell compared to dominant barcode in cell
(barcode supported by most UMIs) [default = 0.3]
--pipeline To specify if input fastq files were created by SAW pipeline
Resources:
--max_cpus Maximum number of CPUs [default = 6]
--max_memory Maximum memory [default = "14.GB"]
--max_time Maximum time [default = "40.h"]
Optional arguments:
-profile Configuration profile to use. Can use multiple (comma separated)
Available: conda, singularity, docker, slurm, lsf
--email Direct output messages to this address [default = '']
--help Print this help statement.
Author:
Dane Vassiliadis ([email protected])
Henrietta Holze ([email protected])
"""
}
//--------------------------------------------------------------------------------------
// Preflight checks
//--------------------------------------------------------------------------------------
// Show help message
if (params.help) {
helpMessage()
exit 0
}
if (!params.mode) {
error "Error: please set parameter --mode <single-bulk,paired-bulk,single-cell>."
}
if (!["single-bulk", "paired-bulk", "single-cell"].contains(params.mode)) {
error "Error: please set parameter --mode <single-bulk,paired-bulk,single-cell>."
}
if (params.input_type != "fastq" && params.input_type != "bam") {
error "Error: please choose a valid value for --input_type <fastq,bam>."
}
if (params.mode != "single-cell" && params.input_type == "bam") {
error "Error: bulk workflows do not accept BAM file input."
}
if (!params.indir) {
error "Error: please provide the location of input files via the parameter indir."
}
if (!params.outdir) {
error "Error: please specify location of output directory via parameter outdir."
}
if (!["up", "down", "both", "all"].contains(params.constants)) {
error "Error: unsupported value for parameter constants. Choose either up, down, both or all (default up)."
}
if (params.constants == "both" && params.barcode_length && params.min_readlength) {
println "Warning: min_readlength=${params.min_readlength} will be ignored because barcode_length=${params.barcode_length} and constants=${params.constants}. Reads will be filtered to match the exact barcode length."
}
if (params.mode == "single-cell" && params.input_type == "fastq" && params.pipeline != "saw" && !params.whitelist_indir && !params.cellnumber) {
error "Error: Please provide either a whitelist or the expected number of cells for cell ID and UMI extraction."
}
//--------------------------------------------------------------------------------------
// Pipeline Config
//--------------------------------------------------------------------------------------
// setup run info for logging
log.info ""
log.info logo
// https://www.coolgenerator.com/ascii-text-generator Delta Corps Priest 1
log.info ""
log.info " ---------------------- Tabulate Barcode Counts in NGS ----------------------"
log.info " Version = 1.4 "
log.info ""
log.info " Run parameters: "
log.info " ========================"
log.info " Mode : ${params.mode}"
log.info " Input directory : ${params.indir}"
log.info " Input type : ${params.input_type}"
if (params.whitelist_indir) {
log.info " Whitelist directory : ${params.whitelist_indir}"
}
log.info " Output directory : ${params.outdir}"
if (params.ref) {
log.info " Reference fasta : ${params.ref}"
log.info " Cluster unmapped : ${params.cluster_unmapped}"
}
if (params.mode == "paired-bulk") {
log.info " Merge overlap : ${params.mergeoverlap}"
}
if (params.mode != "single-cell" || (params.input_type == "fastq" && params.pipeline != "saw")) {
log.info " Minimum PHRED quality : ${params.minqual}"
log.info " Quality percentage : ${params.pctqual}%"
log.info " Complexity threshold : ${params.complexity_threshold}%"
}
log.info " Upstream constant : ${params.upconstant}"
log.info " Downstream constant : ${params.downconstant}"
log.info " Upstream coverage : ${params.up_coverage}"
log.info " Downstream coverage : ${params.down_coverage}"
log.info " Constants to use : ${params.constants}"
log.info " Constant mismatches : ${params.constantmismatches}"
log.info " Min. barcode read length : ${params.min_readlength}"
if (params.barcode_length) {
log.info " Barcode length : ${params.barcode_length}"
}
if (params.ref || params.cluster_unmapped) {
log.info " Alignment mismatches : ${params.alnmismatches}"
} else {
log.info " Cluster distance : ${params.cluster_distance}"
log.info " Cluster ratio : ${params.cluster_ratio}"
}
if (params.mode == "single-cell" && params.pipeline != "saw") {
log.info " UMI distance : ${params.umi_dist}"
log.info " UMI count filter : ${params.umi_count_filter}"
log.info " UMI fraction filter : ${params.umi_fraction_filter}"
}
if (params.mode == "single-cell" && params.input_type == "fastq" && params.pipeline != "saw") {
log.info " Cell barcode UMI pattern : ${params.cb_umi_pattern}"
}
if (params.mode == "single-cell" && params.input_type == "fastq" && !params.whitelist_indir && params.pipeline != "saw") {
log.info " Cell number : ${params.cellnumber}"
}
log.info " Email : ${params.email}"
log.info " ========================"
log.info ""
//--------------------------------------------------------------------------------------
// Named workflow for pipeline
//--------------------------------------------------------------------------------------
include { SINGLE_CELL } from './workflows/single_cell'
include { BULK } from './workflows/bulk'
workflow {
if (params.mode == "single-bulk") {
println "Running single-end bulk workflow"
println ""
BULK ()
}
else if (params.mode == "paired-bulk") {
println "Running paired-end bulk workflow"
println ""
BULK ()
}
else if (params.mode == "single-cell") {
println "Running single-cell workflow"
println ""
SINGLE_CELL ()
}
}
//--------------------------------------------------------------------------------------
// Post processing
//--------------------------------------------------------------------------------------
// Mail notification
if (!params.email) {
log.info '\n'
}
else {
log.info "\n"
log.info "Sending runtime report to ${params.email}\n"
workflow.onComplete {
def msg = """\
Pipeline execution summary
---------------------------
Completed at: ${workflow.complete}
Duration : ${workflow.duration}
Success : ${workflow.success}
workDir : ${workflow.workDir}
exit status : ${workflow.exitStatus}
Error report: ${workflow.errorReport ?: '-'}
"""
.stripIndent()
sendMail(to: params.email, subject: "BARtab execution report", body: msg, attach: "${params.outdir}/multiqc_report.html")
}
}
// Print completion messages
workflow.onComplete {
RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m'
log.info ""
log.info " ---------------------- BARtab Pipeline has finished ----------------------"
log.info ""
log.info "Status: " + (workflow.success ? "${GREEN}SUCCESS${NC}" : "${RED}ERROR${NC}")
log.info "Pipeline completed at: $workflow.complete"
log.info "Pipeline runtime: ${workflow.duration}\n"
}