-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assess nf-core/sarek #79
Comments
Tried the following commands that either start from beginning or start from variant calling nextflow run nf-core/sarek -profile singularity --outdir /global/scratch/users/hangxue/otter/sarek/cram_test --input /global/scratch/users/hangxue/otter/otter_cram.csv --genome null --igenomes_ignore --fasta /global/scratch/users/hangxue/otter/genomes/GCA_902655055.2_mLutLut1.2_genomic.fna --step variant_calling --skip_tools baserecalibrator --joint_germline --tools haplotypecaller [hangxue@n0133 hangxue]$ nextflow run nf-core/sarek -profile singularity --outdir /global/scratch/users/hangxue/otter/sarek/fastq_test --input /global/scratch/users/hangxue/otter/otter_fastq.csv --genome null --igenomes_ignore --fasta /global/scratch/users/hangxue/otter/genomes/Lutralutra_chr1.fna --skip_tools baserecalibrator --joint_germline --tools haplotypecaller Both have the same error. Troubleshooting in progress. -- Check script '/global/home/users/hangxue/.nextflow/assets/nf-core/sarek/./workflows/sarek/../../subworkflows/local/bam_ |
I think the above error is related to haplotypecaller because the following command is able to start fine. Note that The GATK's Haplotypecaller, Sentieon's Dnascope or Sentieon's Haplotyper should be specified as one of the tools when doing joint germline variant calling. Troubleshooting in progress. nextflow run nf-core/sarek -profile singularity --outdir /global/scratch/users/hangxue/otter/sarek/fastq_test --input /global/scratch/users/hangxue/otter/otter_fastq.csv --genome null --igenomes_ignore --fasta /global/scratch/users/hangxue/otter/genomes/Lutralutra_chr1.fna --skip_tools baserecalibrator Update: Issue nf-core/sarek#1550 points out that using haplotypecaller without passing a --dbsnp will cause such error. However, this error is supposed to be fixed four years ago nf-core/sarek#182 |
Related discussions in sarek that talks about joint calling and why n+1 requires rerunning the haplotypecaller for all samples. nf-core/sarek#755 and nf-core/sarek#868 and nf-core/sarek#1172 tldr: when no interval is true, joint germline variant calling will not be performed. When all intervals are produced in one DB, sarek should be able to produce per sample gvcf file and then joint calling. However, if one DB is used per interval, it is quite some work to merge/organize the DBs for n+1. |
Right, I see. In other words, if the genome is small enough / the runtime is reasonable, then variant-calling could be done on the entire genome at once, i.e. without intervals, and then we'd naturally get gVCFs per sample. I don't have any runtime data, but I guess the option to do intervals was introduced because calling variants on a 3 Gbp genome may take a while ? My gut feeling is that we'll face at some point a genome that's large enough and causing a bottleneck Re. the dbsnp issue, does the suggested workaround work for you ? |
I have not modified the code yet. Not sure if we want to fork the repo and make changes. Using a different caller, sentieon_haplotyper, is able to start the pipeline. sentieon_haplotyper should also be able to produce gvcf file. See nf-core/sarek#1007 [hangxue@n0028 hangxue]$ nextflow run nf-core/sarek -profile singularity --outdir /global/scratch/users/hangxue/otter/sarek/fastq_test --input /global/scratch/users/hangxue/otter/otter_fastq.csv --genome null --igenomes_ignore --fasta /global/scratch/users/hangxue/otter/genomes/Lutralutra_chr1.fna --skip_tools baserecalibrator --joint_germline --tools sentieon_haplotyper --sentieon_haplotyper_emit_mode gvcf Update: The suggested workaround by modifying line 132 and 133 in sarek/subworkflows/local/bam_variant_calling_germline_all/main.nf is able to start the haplotypecaller without passing a --dbsnp nextflow run sarek -profile singularity --outdir /global/scratch/users/hangxue/otter/sarek/fastq_test --input /global/scratch/users/hangxue/otter/otter_fastq.csv --genome null --igenomes_ignore --fasta /global/scratch/users/hangxue/otter/genomes/Lutralutra_chr1.fna --skip_tools baserecalibrator --joint_germline --tools haplotypecaller |
Update for n+1 problem: wtsi-hgi/sarek gives an entry point between GATK Haplotypecaller and GATK genomicsDBimport. It seems it will require path(gvcf), path(intervals), path(gvcf_index), path(intervals_index) as the input for starting at GATK genomicsDBimport. |
Converting a single Gvcf Files Into Vcf using gvcftools Extract a subset of samples from a multi-sample VCF |
Update for sarek test run on otter cram data. See output at https://docs.google.com/presentation/d/1lo6wNoBlJaQJ8PfIUb7mtqX1CEn3Sj-gj4vgpVN04OY/edit#slide=id.g28127127451_0_29 In summary, sarek outputs g.vcf.gz files per individual and a joint folder containing vcf.gz in the final output folder. During intermediate steps, it has g.vcf.gz files per individual per interval and a joint calling vcf.gz file per interval |
FYI this is now fixed in release 3.4.4 of sarek |
We need to review what https://nf-co.re/sarek can do to determine:
In particular, we are interested in:
Task list
Test run 1
GCA_902655055.2_mLutLut1.2_genomic.fna
as reference.dbsnp.map{ it -> [[:], []] }, dbsnp_tbi.map{ it -> [[:], []] }
in sarek/subworkflows/local/bam_variant_calling_germline_all/main.nf to avoid errors for organisms without dbsnpSummary
path(gvcf), path(intervals), path(gvcf_index), path(intervals_index)
as the input for starting at GATK genomicsDBimport.--genome null --igenomes_ignore --skip_tools baserecalibrator
.haplotypecaller_vc
for per-individual variant-calling, andhaplotypecaller_jc
for joint calling (as opposite tohaplotypecaller
in the nf-core sarek, to run all at once).Next developments
Based on the tests above, to use sarek, we would want to add:
bcftools mpileup
The text was updated successfully, but these errors were encountered: