forked from tianshilu/QBRC-Somatic-Pipeline
-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
44 lines (39 loc) · 3.12 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Works for human and mouse
# somatic mutation calling with matched normal sample is very stingent
# germline mutation calling and somatic mutation calling without matched normal is loose
somatic.pl: pipeline for somatic and germline mutation calling for each sample
job_somatic.pl: biohpc submission wrapper for somatic.pl for a batch of samples
filter.R: post-processing script for somatic mutations for a batch of samples
cnv.pl: pipeline for somatic CNV calling and quality check for each sample
job_cnv.pl: biohpc submission wrapper for cnv.pl for a batch of samples
summarize_cnv.R: summarizing script for CNV and quality check callings for a batch of samples
evolution.pl: inferring evolutionary relationships using mutation and CNV results for a batch of samples from one patient
Updates:
12/13/2018: Enabled mouse mutation and cnv calling
01/09/2019: Included COSMIC consensus gene annotation in the mutation summary tables
01/28/2019: Fixed the protein sequence annotation issue
03/31/2019: Added the evolution.pl script
04/15/2019: Added handling of single end sequencing data
04/21/2019: Added lofreq results to germline mutation calling
04/24/2019: Changed tumor-only mutation calling to a mode that is super sensitive but not specific
07/10/2019: Only somatic mutation results will be filtered by population frequency
07/18/2019: For tumor-only calling, now switched to strelka
08/08/2019: Some adjustments were made to reduce the hard disk footprint of the pipeline
08/19/2019: Added "--outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4" to STAR alignment parameters
08/27/2019: Added another option to decide whether to keep per-base coverage information
05/27/2020: Added a hidden parameter to control for the maximum number of reads to read in the cnv calling
06/22/2020: Switched to disambiguate.py for disambiguating human and mouse reads for PDX samples
07/24/2020: Added handling of PDX samples to the cnv.pl pipeline
08/25/2020: With the updated Annovar version, protein annotation correction is no longer needed
11/18/2020: Changed to weighted average when calculating CNV from tumor.cns files
11/19/2020: Now the disambiguate pipeline will only process the tumor, not the normal sample
02/05/2021: Switched from manta/1.4 to manta/1.6
03/10/2021: If a mutation is found by >=3 software, or (>=2 software + cosmic), it will be kept.
03/19/2021: Now intermediate files for varscan germline mutation calling will be kept
04/09/2021: Removed the quality check of the mpileup files for low read counts, as amplicon sequencing data normally have low read counts
04/12/2021: Added error handling for strelka_germline when no valid mutations are called
10/25/2021: Made sure when the ref or alt alleles are all "T", the alleles are interpreted as strings, not boolean values
01/12/2022: Disabled picard sorting in the case of ultra-deep sequencing. So locatit will have less chance of memory overflow
01/20/2022: Specified memory usage for strelka
01/21/2022: Fixed a bug in reading vcf files of lofreq output, which seems to be causing a problem in newer versions of R
01/25/2022: Writing stderr and stdout to job folders