From 07a9f0904b31046ae6db5c4f0a40c7a4c2af54f4 Mon Sep 17 00:00:00 2001 From: Jessica Way Date: Mon, 5 Feb 2024 11:53:03 -0500 Subject: [PATCH 01/68] Staging -> Master (#1163) * update paths to input files * update jg sample map * Km buildindices docs (#1158) * add buildindices overview doc and diagram * Km rnawithumis and ss2 doc updates (#1157) * update rnawithumis overview * Update rna-with-umis.methods.md * Update rna-with-umis.methods.md * update multi-snSS2 readme * Update multi_snss2.methods.md * Update multi_snss2.methods.md * update multi-snSS2 docs * update SS2 overview doc * fix python script link * Lk pd2448 upstools (#1150) Added paired-tag wrapper and demultiplexing task * PD-2435 Test bwa-mem2 step and run Intel distributed BWA-MEM2 (#1147) * Lk pd2453 add bb tag (#1161) Added option to incorporate BB tag in BAM and use it in SnapATAC2 software. * km paired-tag docs (#1165) * update overview docs - update pipeline version numbers in Multiome and Optimus Overview docs * update multi-SS2 overview doc * Update smart-seq2.methods.md * update multi-SS2 methods doc * Update doc_style.md * add paired-tag overview doc * Update website/docs/Pipelines/PairedTag_Pipeline/README.md Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> * Apply suggestions from LK doc review Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> --------- Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> * Np jprb pd 2353 multiple star rsolo align (#1164) * integrate multiple soloFeatures * updating counting_mode definition * logic * count exons is true * count exons is true * fix the logic * count exons false * count exons false * echos * echos * echos * rearrange logic * rearrange * testing * testing * testing * count exons true * switch counting mode order * try running in scrna * clean up * snrna countexons is true * snrna countexons is true * snrna countexons is false * snrna countexons is true * snrna countexons is false * cleaning up * changelogs * changelogs * change cpu_platform to Intel Cascade Lake for sci test input * change cpu_platform to Intel Cascade Lake for sci test input * Update pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> * Update pipelines/skylab/multiome/Multiome.changelog.md Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> * Update pipelines/skylab/optimus/Optimus.changelog.md Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> * Update pipelines/skylab/paired_tag/PairedTag.changelog.md Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> * Update pipelines/skylab/slideseq/SlideSeq.changelog.md Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> --------- Co-authored-by: Juan Pablo Ramos Barroso Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> * Np update multiome sci test (#1167) * add summary task * change cpu_platform to Intel Cascade Lake for sci test input * change cpu_platform to Intel Cascade Lake for sci test input * change cpu_platform to Intel Cascade Lake for sci test input * Update VerifyTasks.wdl * Update VerifyTasks.wdl * PD-2422 BICAN_Optimus_2nymxis_Oct_2023 (#1152) * Np multimapper param starsolo (#1172) * add summary task * add multimapper option * update optimus plumbing for ease of testing * add echos * add to test * remove some echoes * make mouse snrna json go back to what is in dev * make mouse snrna json go back to what is in dev * add as outputs * typo * changelogs * changelogs * changelogs * update pipeline docs * optional output * optional output * optional output * optional output * docs * docs * docs * Update website/docs/Pipelines/Optimus_Pipeline/README.md Co-authored-by: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> * remove optional input to tests --------- Co-authored-by: kayleemathews Co-authored-by: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> * added exit code to CompareTabix (#1174) Updated the CompareTabix task in the Verify tasks * Lk pd 2452 add read length check (#1171) * adding read2 length and barcode orientation check task * Lk pd 2464 batch methylome (#1181) Added scatter and preemptibles to snM3C * Np edit resources needed for bwa task / add logic to compareBams (#1183) * add summary task * get a bigger machine * more memory * trying out east1 * go back to central and decrease mem and threads * try different zones * make machine smaller * smaller cpu * smaller cpu * more mem * more mem * 2000 disk * more mem compare bams * more mem compare bams * more mem compare bams * no zones * more mem in comparebams * more mem in comparebams * 725000 * 825000 * 725 * max records 3000000 * max records 3000000 * add logic to fail fast if bams differ in size by 200 mb * Update VerifyTasks.wdl * PD-2483 (#1182) * rc-2483 * update changelog * Update README.md --------- Co-authored-by: kayleemathews Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> * PD-2476: Add task before fastqprocess to find number of splits (#1178) * removed space (#1184) Removed space from Verify Tabix task * Lk pd 2455 pairedtag parsebarcodes (#1186) Added a task to parse cell barcodes from sample barcodes * Np move snm3c from beta pipelines (#1185) * add summary task * move CondensedSnm3C.wdl out of beta * update batch numbers * add sorting to the compare compressed text files * changelogs and versions * revert batch change * batch change * batch change * Km snm3c overview doc (#1179) * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update website/docs/Pipelines/snM3C/README.md Co-authored-by: Nikelle Petrillo <38223776+nikellepetrillo@users.noreply.github.com> * Update README.md * fix num_downstr_bases description * Update README.md --------- Co-authored-by: Nikelle Petrillo <38223776+nikellepetrillo@users.noreply.github.com> * Np fix snm3c test (#1190) * add summary task * batch change in test wdl --------- Co-authored-by: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> Co-authored-by: aawdeh Co-authored-by: Nikelle Petrillo <38223776+nikellepetrillo@users.noreply.github.com> Co-authored-by: Juan Pablo Ramos Barroso Co-authored-by: kayleemathews Co-authored-by: Robert Sidney Cox III --- .dockstore.yml | 4 + beta-pipelines/skylab/m3c/CondensedSnm3C.wdl | 1046 --------------- .../test_inputs/Plumbing/plumbing.input.json | 2 +- .../skylab/multiome/Multiome.changelog.md | 40 + .../skylab/multiome/Multiome.options.json | 5 + pipelines/skylab/multiome/Multiome.wdl | 13 +- pipelines/skylab/multiome/atac.changelog.md | 31 + pipelines/skylab/multiome/atac.json | 3 +- pipelines/skylab/multiome/atac.wdl | 292 ++++- .../Plumbing/10k_pbmc_downsampled.json | 6 +- .../test_inputs/Scientific/10k_pbmc.json | 13 +- pipelines/skylab/optimus/Optimus.changelog.md | 20 +- pipelines/skylab/optimus/Optimus.wdl | 11 +- .../skylab/paired_tag/PairedTag.changelog.md | 36 + pipelines/skylab/paired_tag/PairedTag.wdl | 131 ++ .../Plumbing/10k_pbmc_downsampled.json | 24 + .../test_inputs/Scientific/10k_pbmc.json | 33 + .../skylab/slideseq/SlideSeq.changelog.md | 24 + pipelines/skylab/slideseq/SlideSeq.wdl | 2 +- ...iSampleSmartSeq2SingleNucleus.changelog.md | 11 + .../MultiSampleSmartSeq2SingleNucleus.wdl | 2 +- pipelines/skylab/snM3C/snM3C.changelog.md | 5 + pipelines/skylab/snM3C/snM3C.wdl | 1166 +++++++++++++++-- .../test_inputs/Plumbing/miseq_M16_G13.json | 5 +- .../Scientific/novaseq_M16_G13.json | 5 +- .../test_inputs/Scientific/snM3C_inputs.json | 5 +- tasks/skylab/FastqProcessing.wdl | 10 +- tasks/skylab/H5adUtils.wdl | 3 +- tasks/skylab/Metrics.wdl | 2 +- tasks/skylab/PairedTagUtils.wdl | 270 ++++ tasks/skylab/RunEmptyDrops.wdl | 2 +- tasks/skylab/StarAlign.wdl | 168 ++- verification/VerifyTasks.wdl | 60 +- verification/test-wdls/TestMultiome.wdl | 5 +- verification/test-wdls/TestOptimus.wdl | 4 +- verification/test-wdls/TestsnM3C.wdl | 61 +- website/docs/Pipelines/ATAC/README.md | 28 +- .../Pipelines/BuildIndices_Pipeline/README.md | 122 ++ .../BuildIndices_Pipeline/_category_.json | 4 + .../buildindices_diagram.png | Bin 0 -> 29991 bytes .../_category_.json | 2 +- .../_category_.json | 2 +- .../_category_.json | 2 +- .../_category_.json | 2 +- .../Imputation_Pipeline/_category_.json | 2 +- .../Pipelines/Multiome_Pipeline/README.md | 16 +- .../Multiome_Pipeline/_category_.json | 2 +- .../docs/Pipelines/Optimus_Pipeline/README.md | 7 +- .../Optimus_Pipeline/_category_.json | 2 +- .../Pipelines/PairedTag_Pipeline/README.md | 138 ++ .../PairedTag_Pipeline/_category_.json | 4 + .../RNA_with_UMIs_Pipeline/README.md | 5 +- .../RNA_with_UMIs_Pipeline/_category_.json | 2 +- .../rna-with-umis.methods.md | 12 +- .../_category_.json | 2 +- .../SlideSeq_Pipeline/_category_.json | 2 +- .../README.md | 2 +- .../_category_.json | 2 +- .../smart-seq2.methods.md | 4 +- .../README.md | 3 +- .../_category_.json | 2 +- .../multi_snss2.methods.md | 4 +- .../README.md | 4 +- .../_category_.json | 2 +- .../_category_.json | 2 +- .../_category_.json | 2 +- website/docs/Pipelines/snM3C/README.md | 187 ++- website/docs/Pipelines/snM3C/_category_.json | 2 +- .../contribute_to_warp_docs/doc_style.md | 2 +- 69 files changed, 2662 insertions(+), 1432 deletions(-) delete mode 100644 beta-pipelines/skylab/m3c/CondensedSnm3C.wdl create mode 100644 pipelines/skylab/multiome/Multiome.options.json create mode 100644 pipelines/skylab/paired_tag/PairedTag.changelog.md create mode 100644 pipelines/skylab/paired_tag/PairedTag.wdl create mode 100644 pipelines/skylab/paired_tag/test_inputs/Plumbing/10k_pbmc_downsampled.json create mode 100644 pipelines/skylab/paired_tag/test_inputs/Scientific/10k_pbmc.json create mode 100644 tasks/skylab/PairedTagUtils.wdl create mode 100644 website/docs/Pipelines/BuildIndices_Pipeline/README.md create mode 100644 website/docs/Pipelines/BuildIndices_Pipeline/_category_.json create mode 100644 website/docs/Pipelines/BuildIndices_Pipeline/buildindices_diagram.png create mode 100644 website/docs/Pipelines/PairedTag_Pipeline/README.md create mode 100644 website/docs/Pipelines/PairedTag_Pipeline/_category_.json diff --git a/.dockstore.yml b/.dockstore.yml index 34813c6c18..1c33f61626 100644 --- a/.dockstore.yml +++ b/.dockstore.yml @@ -114,6 +114,10 @@ workflows: - name: Multiome subclass: WDL primaryDescriptorPath: /pipelines/skylab/multiome/Multiome.wdl + + - name: PairedTag + subclass: WDL + primaryDescriptorPath: /pipelines/skylab/paired_tag/PairedTag.wdl - name: atac subclass: WDL diff --git a/beta-pipelines/skylab/m3c/CondensedSnm3C.wdl b/beta-pipelines/skylab/m3c/CondensedSnm3C.wdl deleted file mode 100644 index b3f5db6deb..0000000000 --- a/beta-pipelines/skylab/m3c/CondensedSnm3C.wdl +++ /dev/null @@ -1,1046 +0,0 @@ -version 1.0 - -workflow WDLized_snm3C { - - input { - Array[File] fastq_input_read1 - Array[File] fastq_input_read2 - File random_primer_indexes - String plate_id - # mapping inputs - File tarred_index_files - File genome_fa - File chromosome_sizes - - String r1_adapter = "AGATCGGAAGAGCACACGTCTGAAC" - String r2_adapter = "AGATCGGAAGAGCGTCGTGTAGGGA" - Int r1_left_cut = 10 - Int r1_right_cut = 10 - Int r2_left_cut = 10 - Int r2_right_cut = 10 - Int min_read_length = 30 - Int num_upstr_bases = 0 - Int num_downstr_bases = 2 - Int compress_level = 5 - - } - - call Demultiplexing { - input: - fastq_input_read1 = fastq_input_read1, - fastq_input_read2 = fastq_input_read2, - random_primer_indexes = random_primer_indexes, - plate_id = plate_id - } - - call Sort_and_trim_r1_and_r2 { - input: - tarred_demultiplexed_fastqs = Demultiplexing.tarred_demultiplexed_fastqs, - r1_adapter = r1_adapter, - r2_adapter = r2_adapter, - r1_left_cut = r1_left_cut, - r1_right_cut = r1_right_cut, - r2_left_cut = r2_left_cut, - r2_right_cut = r2_right_cut, - min_read_length = min_read_length, - plate_id = plate_id - } - - call Hisat_3n_pair_end_mapping_dna_mode { - input: - r1_trimmed_tar = Sort_and_trim_r1_and_r2.r1_trimmed_fq_tar, - r2_trimmed_tar = Sort_and_trim_r1_and_r2.r2_trimmed_fq_tar, - tarred_index_files = tarred_index_files, - genome_fa = genome_fa, - chromosome_sizes = chromosome_sizes, - plate_id = plate_id - } - - call Separate_unmapped_reads { - input: - hisat3n_bam_tar = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_bam_tar, - min_read_length = min_read_length, - plate_id = plate_id - } - - call Split_unmapped_reads { - input: - unmapped_fastq_tar = Separate_unmapped_reads.unmapped_fastq_tar, - min_read_length = min_read_length, - plate_id = plate_id - } - - call Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name { - input: - split_fq_tar = Split_unmapped_reads.split_fq_tar, - tarred_index_files = tarred_index_files, - genome_fa = genome_fa, - plate_id = plate_id - } - - call remove_overlap_read_parts { - input: - bam = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name.merge_sorted_bam_tar, - plate_id = plate_id - } - - call merge_original_and_split_bam_and_sort_all_reads_by_name_and_position { - input: - bam = Separate_unmapped_reads.unique_bam_tar, - split_bam = remove_overlap_read_parts.output_bam_tar, - plate_id = plate_id - } - - call call_chromatin_contacts { - input: - name_sorted_bam = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position.name_sorted_bam, - plate_id = plate_id - } - - call dedup_unique_bam_and_index_unique_bam { - input: - bam = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position.position_sorted_bam, - plate_id = plate_id - } - - call unique_reads_allc { - input: - bam_and_index_tar = dedup_unique_bam_and_index_unique_bam.output_tar, - genome_fa = genome_fa, - num_upstr_bases = num_upstr_bases, - num_downstr_bases = num_downstr_bases, - compress_level = compress_level, - plate_id = plate_id - } - - call unique_reads_cgn_extraction { - input: - allc_tar = unique_reads_allc.allc, - tbi_tar = unique_reads_allc.tbi, - chrom_size_path = chromosome_sizes, - plate_id = plate_id - } - - call summary { - input: - trimmed_stats = Sort_and_trim_r1_and_r2.trim_stats_tar, - hisat3n_stats = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_stats_tar, - r1_hisat3n_stats = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name.hisat3n_dna_split_reads_summary_R1_tar, - r2_hisat3n_stats = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name.hisat3n_dna_split_reads_summary_R2_tar, - dedup_stats = dedup_unique_bam_and_index_unique_bam.dedup_stats_tar, - chromatin_contact_stats = call_chromatin_contacts.chromatin_contact_stats, - allc_uniq_reads_stats = unique_reads_allc.allc_uniq_reads_stats, - unique_reads_cgn_extraction_tbi = unique_reads_cgn_extraction.output_tbi_tar, - plate_id = plate_id - } - - output { - File MappingSummary = summary.mapping_summary - File trimmed_stats = Sort_and_trim_r1_and_r2.trim_stats_tar - File r1_trimmed_fq = Sort_and_trim_r1_and_r2.r1_trimmed_fq_tar - File r2_trimmed_fq = Sort_and_trim_r1_and_r2.r2_trimmed_fq_tar - File hisat3n_stats_tar = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_stats_tar - File hisat3n_bam_tar = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_bam_tar - File unique_bam_tar = Separate_unmapped_reads.unique_bam_tar - File multi_bam_tar = Separate_unmapped_reads.multi_bam_tar - File unmapped_fastq_tar = Separate_unmapped_reads.unmapped_fastq_tar - File split_fq_tar = Split_unmapped_reads.split_fq_tar - File merge_sorted_bam_tar = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name.merge_sorted_bam_tar - File name_sorted_bams = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position.name_sorted_bam - File pos_sorted_bams = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position.position_sorted_bam - File remove_overlap_read_parts_bam_tar = remove_overlap_read_parts.output_bam_tar - File dedup_unique_bam_and_index_unique_bam_tar = dedup_unique_bam_and_index_unique_bam.output_tar - File unique_reads_cgn_extraction_allc = unique_reads_cgn_extraction.output_allc_tar - File unique_reads_cgn_extraction_tbi = unique_reads_cgn_extraction.output_tbi_tar - File chromatin_contact_stats = call_chromatin_contacts.chromatin_contact_stats - File reference_version = Hisat_3n_pair_end_mapping_dna_mode.reference_version - } -} - -task Demultiplexing { - input { - Array[File] fastq_input_read1 - Array[File] fastq_input_read2 - File random_primer_indexes - String plate_id - - String docker_image = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 50 - Int mem_size = 10 - } - - command <<< - set -euo pipefail - - # Cat files for each r1, r2 - cat ~{sep=' ' fastq_input_read1} > r1.fastq.gz - cat ~{sep=' ' fastq_input_read2} > r2.fastq.gz - - /opt/conda/bin/cutadapt -Z -e 0.01 --no-indels \ - -g file:~{random_primer_indexes} \ - -o ~{plate_id}-{name}-R1.fq.gz \ - -p ~{plate_id}-{name}-R2.fq.gz \ - r1.fastq.gz \ - r2.fastq.gz \ - > ~{plate_id}.stats.txt - - # remove the fastq files that end in unknown-R1.fq.gz and unknown-R2.fq.gz - rm *-unknown-R{1,2}.fq.gz - - python3 < threshold: - os.remove(file_path) - print(f'Removed file: {filename}') - CODE - - # zip up all the output fq.gz files - tar -zcvf ~{plate_id}.cutadapt_output_files.tar.gz *.fq.gz - >>> - - runtime { - docker: docker_image - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - - output { - File tarred_demultiplexed_fastqs = "~{plate_id}.cutadapt_output_files.tar.gz" - File stats = "~{plate_id}.stats.txt" - } -} - -task Sort_and_trim_r1_and_r2 { - input { - File tarred_demultiplexed_fastqs - String plate_id - String r1_adapter - String r2_adapter - Int r1_left_cut - Int r1_right_cut - Int r2_left_cut - Int r2_right_cut - Int min_read_length - - Int disk_size = 50 - Int mem_size = 10 - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - - } - command <<< - set -euo pipefail - - # untar the demultiplexed fastqs - tar -xf ~{tarred_demultiplexed_fastqs} - - # define lists of r1 and r2 fq files - R1_files=($(ls | grep "\-R1.fq.gz")) - R2_files=($(ls | grep "\-R2.fq.gz")) - - # loop over R1 and R2 files and sort them - for file in "${R1_files[@]}"; do - sample_id=$(basename "$file" "-R1.fq.gz") - r2_file="${sample_id}-R2.fq.gz" - zcat "$file" | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" > "${sample_id}-R1_sorted.fq" - zcat "$r2_file" | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" > "${sample_id}-R2_sorted.fq" - done - - - echo "Starting to trim with Cutadapt" - sorted_R1_files=($(ls | grep "\-R1_sorted.fq")) - for file in "${sorted_R1_files[@]}"; do - sample_id=$(basename "$file" "-R1_sorted.fq") - /opt/conda/bin/cutadapt \ - -a R1Adapter=~{r1_adapter} \ - -A R2Adapter=~{r2_adapter} \ - --report=minimal \ - -O 6 \ - -q 20 \ - -u ~{r1_left_cut} \ - -u -~{r1_right_cut} \ - -U ~{r2_left_cut} \ - -U -~{r2_right_cut} \ - -Z \ - -m ~{min_read_length}:~{min_read_length} \ - --pair-filter 'both' \ - -o ${sample_id}-R1_trimmed.fq.gz \ - -p ${sample_id}-R2_trimmed.fq.gz \ - ${sample_id}-R1_sorted.fq ${sample_id}-R2_sorted.fq \ - > ${sample_id}.trimmed.stats.txt - done - - echo "Tarring up the trimmed files and stats files" - - tar -zcvf ~{plate_id}.R1_trimmed_files.tar.gz *-R1_trimmed.fq.gz - tar -zcvf ~{plate_id}.R2_trimmed_files.tar.gz *-R2_trimmed.fq.gz - tar -zcvf ~{plate_id}.trimmed_stats_files.tar.gz *.trimmed.stats.txt ->>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - output { - File r1_trimmed_fq_tar = "~{plate_id}.R1_trimmed_files.tar.gz" - File r2_trimmed_fq_tar = "~{plate_id}.R2_trimmed_files.tar.gz" - File trim_stats_tar = "~{plate_id}.trimmed_stats_files.tar.gz" - } -} - -task Hisat_3n_pair_end_mapping_dna_mode{ - input { - File r1_trimmed_tar - File r2_trimmed_tar - File tarred_index_files - File genome_fa - File chromosome_sizes - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 100 - Int mem_size = 100 - } - command <<< - set -euo pipefail - - # check genomic reference version and print to output txt file - STRING=~{genome_fa} - BASE=$(basename $STRING .fa) - - echo "The reference is $BASE" > ~{plate_id}.reference_version.txt - - mkdir reference/ - mkdir fastq/ - - cp ~{tarred_index_files} reference/ - cp ~{genome_fa} reference/ - cp ~{chromosome_sizes} reference/ - cp ~{r1_trimmed_tar} fastq/ - cp ~{r2_trimmed_tar} fastq/ - - # untar the index files - cd reference/ - echo "Untarring the index files" - tar -zxvf ~{tarred_index_files} - rm ~{tarred_index_files} - - #get the basename of the genome_fa file - genome_fa_basename=$(basename ~{genome_fa} .fa) - echo "samtools faidx $genome_fa_basename.fa" - samtools faidx $genome_fa_basename.fa - - # untar the demultiplexed fastq files - cd ../fastq/ - echo "Untarring the fastq files" - tar -zxvf ~{r1_trimmed_tar} - tar -zxvf ~{r2_trimmed_tar} - rm ~{r1_trimmed_tar} - rm ~{r2_trimmed_tar} - - # define lists of r1 and r2 fq files - R1_files=($(ls | grep "\-R1_trimmed.fq.gz")) - R2_files=($(ls | grep "\-R2_trimmed.fq.gz")) - - for file in "${R1_files[@]}"; do - sample_id=$(basename "$file" "-R1_trimmed.fq.gz") - hisat-3n /cromwell_root/reference/$genome_fa_basename \ - -q \ - -1 ${sample_id}-R1_trimmed.fq.gz \ - -2 ${sample_id}-R2_trimmed.fq.gz \ - --directional-mapping-reverse \ - --base-change C,T \ - --no-repeat-index \ - --no-spliced-alignment \ - --no-temp-splicesite \ - -t \ - --new-summary \ - --summary-file ${sample_id}.hisat3n_dna_summary.txt \ - --threads 11 | samtools view -b -q 0 -o "${sample_id}.hisat3n_dna.unsort.bam" - done - - # tar up the bam files and stats files - tar -zcvf ~{plate_id}.hisat3n_paired_end_bam_files.tar.gz *.bam - tar -zcvf ~{plate_id}.hisat3n_paired_end_stats_files.tar.gz *.hisat3n_dna_summary.txt - - mv ~{plate_id}.hisat3n_paired_end_bam_files.tar.gz ../ - mv ~{plate_id}.hisat3n_paired_end_stats_files.tar.gz ../ - - >>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - output { - File hisat3n_paired_end_bam_tar = "~{plate_id}.hisat3n_paired_end_bam_files.tar.gz" - File hisat3n_paired_end_stats_tar = "~{plate_id}.hisat3n_paired_end_stats_files.tar.gz" - File reference_version = "~{plate_id}.reference_version.txt" - } -} - -task Separate_unmapped_reads { - input { - File hisat3n_bam_tar - Int min_read_length - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 50 - Int mem_size = 10 - - } - command <<< - - set -euo pipefail - - # untar the hisat3n bam files - tar -xf ~{hisat3n_bam_tar} - rm ~{hisat3n_bam_tar} - - python3 <>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - output { - File unique_bam_tar = "~{plate_id}.hisat3n_paired_end_unique_bam_files.tar.gz" - File multi_bam_tar = "~{plate_id}.hisat3n_paired_end_multi_bam_files.tar.gz" - File unmapped_fastq_tar = "~{plate_id}.hisat3n_paired_end_unmapped_fastq_files.tar.gz" - } -} - -task Split_unmapped_reads { - input { - File unmapped_fastq_tar - Int min_read_length - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 50 - Int mem_size = 10 - } - command <<< - - set -euo pipefail - - # untar the unmapped fastq files - tar -xf ~{unmapped_fastq_tar} - rm ~{unmapped_fastq_tar} - - python3 <>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - output { - File split_fq_tar = "~{plate_id}.hisat3n_paired_end_split_fastq_files.tar.gz" - } -} - -task Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name { - input { - File split_fq_tar - File genome_fa - File tarred_index_files - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 80 - Int mem_size = 20 - } - command <<< - set -euo pipefail - - mkdir reference/ - - cp ~{tarred_index_files} reference/ - cp ~{genome_fa} reference/ - - # untar the tarred index files - cd reference/ - tar -xvf ~{tarred_index_files} - rm ~{tarred_index_files} - - #get the basename of the genome_fa file - genome_fa_basename=$(basename ~{genome_fa} .fa) - samtools faidx $genome_fa_basename.fa - - # untar the unmapped fastq files - tar -xvf ~{split_fq_tar} - rm ~{split_fq_tar} - - # define lists of r1 and r2 fq files - R1_files=($(ls | grep "\.hisat3n_dna.split_reads.R1.fastq")) - R2_files=($(ls | grep "\.hisat3n_dna.split_reads.R2.fastq")) - - for file in "${R1_files[@]}"; do - sample_id=$(basename "$file" ".hisat3n_dna.split_reads.R1.fastq") - hisat-3n /cromwell_root/reference/$genome_fa_basename \ - -q \ - -U ${sample_id}.hisat3n_dna.split_reads.R1.fastq \ - --directional-mapping-reverse \ - --base-change C,T \ - --no-repeat-index \ - --no-spliced-alignment \ - --no-temp-splicesite \ - -t \ - --new-summary \ - --summary-file ${sample_id}.hisat3n_dna_split_reads_summary.R1.txt \ - --threads 11 | samtools view -b -q 10 -o "${sample_id}.hisat3n_dna.split_reads.R1.bam" - done - - for file in "${R2_files[@]}"; do - sample_id=$(basename "$file" ".hisat3n_dna.split_reads.R2.fastq") - hisat-3n /cromwell_root/reference/$genome_fa_basename \ - -q \ - -U ${sample_id}.hisat3n_dna.split_reads.R2.fastq \ - --directional-mapping \ - --base-change C,T \ - --no-repeat-index \ - --no-spliced-alignment \ - --no-temp-splicesite \ - -t --new-summary \ - --summary-file ${sample_id}.hisat3n_dna_split_reads_summary.R2.txt \ - --threads 11 | samtools view -b -q 10 -o "${sample_id}.hisat3n_dna.split_reads.R2.bam" - done - - # tar up the r1 and r2 stats files - tar -zcvf ../~{plate_id}.hisat3n_dna_split_reads_summary.R1.tar.gz *.hisat3n_dna_split_reads_summary.R1.txt - tar -zcvf ../~{plate_id}.hisat3n_dna_split_reads_summary.R2.tar.gz *.hisat3n_dna_split_reads_summary.R2.txt - - - # define lists of r1 and r2 bam files - R1_bams=($(ls | grep "\.hisat3n_dna.split_reads.R1.bam")) - R2_bams=($(ls | grep "\.hisat3n_dna.split_reads.R2.bam")) - - # Loop through the R1 BAM files - for r1_bam in "${R1_bams[@]}"; do - # Extract the corresponding R2 BAM file - r2_bam="${r1_bam/.hisat3n_dna.split_reads.R1.bam/.hisat3n_dna.split_reads.R2.bam}" - - # Define the output BAM file name - output_bam="$(basename ${r1_bam/.hisat3n_dna.split_reads.R1.bam/.hisat3n_dna.split_reads.name_sort.bam})" - - # Perform the samtools merge and sort commands - samtools merge -o - "$r1_bam" "$r2_bam" | samtools sort -n -o "$output_bam" - - done - - #tar up the merged bam files - tar -zcvf ../~{plate_id}.hisat3n_dna.split_reads.name_sort.bam.tar.gz *.hisat3n_dna.split_reads.name_sort.bam - - >>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - output { - File merge_sorted_bam_tar = "~{plate_id}.hisat3n_dna.split_reads.name_sort.bam.tar.gz" - File hisat3n_dna_split_reads_summary_R1_tar = "~{plate_id}.hisat3n_dna_split_reads_summary.R1.tar.gz" - File hisat3n_dna_split_reads_summary_R2_tar = "~{plate_id}.hisat3n_dna_split_reads_summary.R2.tar.gz" - } -} - -task remove_overlap_read_parts { - input { - File bam - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 80 - Int mem_size = 20 - } - - command <<< - set -euo pipefail - # unzip bam file - tar -xf ~{bam} - rm ~{bam} - - # create output dir - mkdir /cromwell_root/output_bams - - # get bams - bams=($(ls | grep "sort.bam$")) - - # loop through bams and run python script on each bam - # scatter instead of for loop to optimize - python3 <>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - output { - File output_bam_tar = "~{plate_id}.remove_overlap_read_parts.tar.gz" - } -} - -task merge_original_and_split_bam_and_sort_all_reads_by_name_and_position { - input { - File bam - File split_bam - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 80 - Int mem_size = 20 - } - command <<< - set -euo pipefail - #unzip bam file - tar -xf ~{bam} - tar -xf ~{split_bam} - rm ~{bam} - rm ~{split_bam} - - echo "samtools merge and sort" - # define lists of r1 and r2 fq files - UNIQUE_BAMS=($(ls | grep "\.hisat3n_dna.unique_aligned.bam")) - SPLIT_BAMS=($(ls | grep "\.hisat3n_dna.split_reads.read_overlap.bam")) - - for file in "${UNIQUE_BAMS[@]}"; do - sample_id=$(basename "$file" ".hisat3n_dna.unique_aligned.bam") - samtools merge -f "${sample_id}.hisat3n_dna.all_reads.bam" "${sample_id}.hisat3n_dna.unique_aligned.bam" "${sample_id}.hisat3n_dna.split_reads.read_overlap.bam" - samtools sort -n -o "${sample_id}.hisat3n_dna.all_reads.name_sort.bam" "${sample_id}.hisat3n_dna.all_reads.bam" - samtools sort -O BAM -o "${sample_id}.hisat3n_dna.all_reads.pos_sort.bam" "${sample_id}.hisat3n_dna.all_reads.name_sort.bam" - done - - echo "Zip files" - #tar up the merged bam files - tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.pos_sort.tar.gz *.hisat3n_dna.all_reads.pos_sort.bam - tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.name_sort.tar.gz *.hisat3n_dna.all_reads.name_sort.bam - >>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - output { - File name_sorted_bam = "~{plate_id}.hisat3n_dna.all_reads.name_sort.tar.gz" - File position_sorted_bam = "~{plate_id}.hisat3n_dna.all_reads.pos_sort.tar.gz" - } -} - -task call_chromatin_contacts { - input { - File name_sorted_bam - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 80 - Int mem_size = 20 - } - command <<< - set -euo pipefail - - # untar the name sorted bam files - tar -xf ~{name_sorted_bam} - rm ~{name_sorted_bam} - - python3 <>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - output { - File chromatin_contact_stats = "~{plate_id}.chromatin_contact_stats.tar.gz" - } -} - -task dedup_unique_bam_and_index_unique_bam { - input { - File bam - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 80 - Int mem_size = 20 - } - - command <<< - set -euo pipefail - - # unzip files - tar -xf ~{bam} - rm ~{bam} - - # create output dir - mkdir /cromwell_root/output_bams - mkdir /cromwell_root/temp - - # name : AD3C_BA17_2027_P1-1-B11-G13.hisat3n_dna.all_reads.pos_sort.bam - for file in *.bam - do - name=`echo $file | cut -d. -f1` - name=$name.hisat3n_dna.all_reads.deduped - echo $name - echo "Call Picard" - picard MarkDuplicates I=$file O=/cromwell_root/output_bams/$name.bam \ - M=/cromwell_root/output_bams/$name.matrix.txt \ - REMOVE_DUPLICATES=true TMP_DIR=/cromwell_root/temp - echo "Call samtools index" - samtools index /cromwell_root/output_bams/$name.bam - done - - cd /cromwell_root - - #tar up the output files - tar -zcvf ~{plate_id}.dedup_unique_bam_and_index_unique_bam.tar.gz output_bams - - #tar up the stats files - tar -zcvf ~{plate_id}.dedup_unique_bam_and_index_unique_bam_stats.tar.gz output_bams/*.matrix.txt - - >>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - output { - File output_tar = "~{plate_id}.dedup_unique_bam_and_index_unique_bam.tar.gz" - File dedup_stats_tar = "~{plate_id}.dedup_unique_bam_and_index_unique_bam_stats.tar.gz" - } -} - -task unique_reads_allc { - input { - File bam_and_index_tar - File genome_fa - String plate_id - Int num_upstr_bases - Int num_downstr_bases - Int compress_level - - Int disk_size = 80 - Int mem_size = 20 - String genome_base = basename(genome_fa) - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - } - command <<< - set -euo pipefail - - # unzip files - tar -xf ~{bam_and_index_tar} - rm ~{bam_and_index_tar} - - mkdir reference - cp ~{genome_fa} reference - cd reference - - # index the fasta - echo "Indexing FASTA" - samtools faidx *.fa - cd ../output_bams - - echo "Starting allcools" - bam_files=($(ls | grep "\.hisat3n_dna.all_reads.deduped.bam$")) - echo ${bam_files[@]} - for file in "${bam_files[@]}"; do - sample_id=$(basename "$file" ".hisat3n_dna.all_reads.deduped.bam") - /opt/conda/bin/allcools bam-to-allc \ - --bam_path "$file" \ - --reference_fasta /cromwell_root/reference/~{genome_base} \ - --output_path "${sample_id}.allc.tsv.gz" \ - --num_upstr_bases ~{num_upstr_bases} \ - --num_downstr_bases ~{num_downstr_bases} \ - --compress_level ~{compress_level} \ - --save_count_df \ - --convert_bam_strandness - done - echo "Zipping files" - - tar -zcvf ../~{plate_id}.allc.tsv.tar.gz *.allc.tsv.gz - tar -zcvf ../~{plate_id}.allc.tbi.tar.gz *.allc.tsv.gz.tbi - tar -zcvf ../~{plate_id}.allc.count.tar.gz *.allc.tsv.gz.count.csv - - - >>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - output { - File allc = "~{plate_id}.allc.tsv.tar.gz" - File tbi = "~{plate_id}.allc.tbi.tar.gz" - File allc_uniq_reads_stats = "~{plate_id}.allc.count.tar.gz" - } -} - - -task unique_reads_cgn_extraction { - input { - File allc_tar - File tbi_tar - File chrom_size_path - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 80 - Int mem_size = 20 - Int num_upstr_bases = 0 - } - - command <<< - set -euo pipefail - - tar -xf ~{allc_tar} - rm ~{allc_tar} - - tar -xf ~{tbi_tar} - rm ~{tbi_tar} - - # prefix="allc-{mcg_context}/{cell_id}" - if [ ~{num_upstr_bases} -eq 0 ]; then - mcg_context=CGN - else - mcg_context=HCGN - fi - - # create output dir - mkdir /cromwell_root/allc-${mcg_context} - outputdir=/cromwell_root/allc-${mcg_context} - - for gzfile in *.gz - do - name=`echo $gzfile | cut -d. -f1` - echo $name - allcools extract-allc --strandness merge --allc_path $gzfile \ - --output_prefix $outputdir/$name \ - --mc_contexts ${mcg_context} \ - --chrom_size_path ~{chrom_size_path} - done - - cd /cromwell_root - - tar -zcvf ~{plate_id}.output_allc_tar.tar.gz $outputdir/*.gz - tar -zcvf ~{plate_id}.output_tbi_tar.tar.gz $outputdir/*.tbi - - >>> - - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - - output { - File output_allc_tar = "~{plate_id}.output_allc_tar.tar.gz" - File output_tbi_tar = "~{plate_id}.output_tbi_tar.tar.gz" - } -} - - -task summary { - input { - File trimmed_stats - File hisat3n_stats - File r1_hisat3n_stats - File r2_hisat3n_stats - File dedup_stats - File chromatin_contact_stats - File allc_uniq_reads_stats - File unique_reads_cgn_extraction_tbi - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 80 - Int mem_size = 20 - } - command <<< - set -euo pipefail - - mkdir /cromwell_root/fastq - mkdir /cromwell_root/bam - mkdir /cromwell_root/allc - mkdir /cromwell_root/hic - - extract_and_remove() { - local tarFile=$1 - tar -xf "$tarFile" - rm "$tarFile" - } - - extract_and_remove ~{trimmed_stats} - extract_and_remove ~{hisat3n_stats} - extract_and_remove ~{r1_hisat3n_stats} - extract_and_remove ~{r2_hisat3n_stats} - extract_and_remove ~{dedup_stats} - extract_and_remove ~{chromatin_contact_stats} - extract_and_remove ~{allc_uniq_reads_stats} - extract_and_remove ~{unique_reads_cgn_extraction_tbi} - - mv *.trimmed.stats.txt /cromwell_root/fastq - mv *.hisat3n_dna_summary.txt *.hisat3n_dna_split_reads_summary.R1.txt *.hisat3n_dna_split_reads_summary.R2.txt /cromwell_root/bam - mv output_bams/*.hisat3n_dna.all_reads.deduped.matrix.txt /cromwell_root/bam - mv *.hisat3n_dna.all_reads.contact_stats.csv /cromwell_root/hic - mv *.allc.tsv.gz.count.csv /cromwell_root/allc - mv cromwell_root/allc-CGN/*.allc.tsv.gz.tbi /cromwell_root/allc - - python3 <>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } - output { - File mapping_summary = "~{plate_id}_MappingSummary.csv.gz" - } -} \ No newline at end of file diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/test_inputs/Plumbing/plumbing.input.json b/pipelines/broad/dna_seq/germline/joint_genotyping/test_inputs/Plumbing/plumbing.input.json index e6a182236f..37299dc1ff 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/test_inputs/Plumbing/plumbing.input.json +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/test_inputs/Plumbing/plumbing.input.json @@ -1,5 +1,5 @@ { - "JointGenotyping.sample_name_map": "gs://broad-gotc-test-storage/joint_genotyping/wgs/plumbing/callset/public_sample_name_map", + "JointGenotyping.sample_name_map": "gs://broad-gotc-test-storage/JointGenotyping/inputs/plumbing/wgs/sample_name_map", "JointGenotyping.callset_name": "wgs_joint_genotyping_plumbing", "JointGenotyping.unbounded_scatter_count_scale_factor": 2.5, "JointGenotyping.SplitIntervalList.scatter_mode": "INTERVAL_SUBDIVISION", diff --git a/pipelines/skylab/multiome/Multiome.changelog.md b/pipelines/skylab/multiome/Multiome.changelog.md index e425ee4856..ecd2478024 100644 --- a/pipelines/skylab/multiome/Multiome.changelog.md +++ b/pipelines/skylab/multiome/Multiome.changelog.md @@ -1,9 +1,48 @@ +# 3.1.2 +2024-02-01 (Date of Last Commit) + +* Add new paired-tag task to parse sample barcodes from cell barcodes when preindexing is set to true; this does not affect the Multiome pipeline + +# 3.1.1 +2024-01-30 (Date of Last Commit) + +* Added task GetNumSplits before FastqProcess ATAC task to determine the number of splits based on the bwa-mem2 machine specs +* Added an error message to the BWAPairedEndAlignment ATAC task to ensure that the number of splits equal the number of ranks +* Added an error message to the BWAPairedEndAlignment ATAC task to ensure that the number of R1s equals to the number of R3s + +# 3.1.0 +2024-01-24 (Date of Last Commit) +* Promote aligner_metrics from Optimus task level outputs to Multiome pipeline level outputs + +# 3.0.5 +2024-01-18 (Date of Last Commit) + +* Increased memory for MergeStarOutputs in StarAlign.wdl, RunEmptyDrops in RunEmptyDrops.wdl, OptimusH5ad in H5adUtils.wdl and GeneMetrics in Metrics.wdl +* Added the --soloMultiMappers flag as an optional input to the StarSoloFastq task in the StarAlign.wdl +* Added a check of read2 length to the paired-tag pipeline; this does not affect the Multiome workflow + +# 3.0.4 +2024-01-05 (Date of Last Commit) + +* Added new functionality to the ATAC workflow for paired-tag data, including the option for SnapATAC to pull cell barcodes from the BB tag of the BAM +* Modified the STARsoloFastq task in the StarAlign.wdl so STARsolo can run different types of alignments in a single STARsolo command depending on the counting_mode + +# 3.0.3 +2023-12-20 (Date of Last Commit) + +* Added updated docker to BWAPairedEndAlignment ATAC task to use updated code for distributed bwa-mem2 from Intel +* Removed MergedBAM ATAC and moved BWAPairedEndAlignment ATAC outside of the for loop +* Changed CPU platform to Ice Lake for BWAPairedEndAlignment ATAC task +* Added input parameter input_output_parameter to the Multiome ATAC wdl + # 3.0.2 2023-12-20 (Date of Last Commit) + * JoinMultiomeBarcodes now has dynamic memory and disk allocation # 3.0.1 2023-12-12 (Date of Last Commit) + * ValidateVcfs now has optional memory parameter; this does not affect this pipeline * Downgraded Cell Bender from v0.3.1 to v0.3.0 @@ -21,6 +60,7 @@ # 2.3.2 2023-11-20 (Date of Last Commit) + * Added an optional task to the Multiome.wdl that will run CellBender on the Optimus output h5ad file # 2.3.1 diff --git a/pipelines/skylab/multiome/Multiome.options.json b/pipelines/skylab/multiome/Multiome.options.json new file mode 100644 index 0000000000..2f6a122fe6 --- /dev/null +++ b/pipelines/skylab/multiome/Multiome.options.json @@ -0,0 +1,5 @@ +{ + "read_from_cache": true, + "write_to_cache": true, + "monitoring_script": "gs://broad-gotc-test-storage/cromwell_monitoring_script2.sh" +} diff --git a/pipelines/skylab/multiome/Multiome.wdl b/pipelines/skylab/multiome/Multiome.wdl index e01f481f49..16113b5e8c 100644 --- a/pipelines/skylab/multiome/Multiome.wdl +++ b/pipelines/skylab/multiome/Multiome.wdl @@ -6,7 +6,7 @@ import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "https://raw.githubusercontent.com/broadinstitute/CellBender/v0.3.0/wdl/cellbender_remove_background.wdl" as CellBender workflow Multiome { - String pipeline_version = "3.0.2" + String pipeline_version = "3.1.2" input { String input_id @@ -27,14 +27,17 @@ workflow Multiome { String star_strand_mode = "Forward" Boolean count_exons = false File gex_whitelist = "gs://gcp-public-data--broad-references/RNA/resources/arc-v1/737K-arc-v1_gex.txt" + String? soloMultiMappers # ATAC inputs # Array of input fastq files Array[File] atac_r1_fastq Array[File] atac_r2_fastq Array[File] atac_r3_fastq - # BWA input + + # BWA tar reference File tar_bwa_reference + # Chromosone sizes File chrom_sizes # Trimadapters input String adapter_seq_read1 = "GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG" @@ -67,6 +70,7 @@ workflow Multiome { ignore_r1_read_length = ignore_r1_read_length, star_strand_mode = star_strand_mode, count_exons = count_exons, + soloMultiMappers = soloMultiMappers } # Call the ATAC workflow @@ -134,6 +138,11 @@ workflow Multiome { File gene_metrics_gex = Optimus.gene_metrics File? cell_calls_gex = Optimus.cell_calls File h5ad_output_file_gex = JoinBarcodes.gex_h5ad_file + Array[File?] multimappers_EM_matrix = Optimus.multimappers_EM_matrix + Array[File?] multimappers_Uniform_matrix = Optimus.multimappers_Uniform_matrix + Array[File?] multimappers_Rescue_matrix = Optimus.multimappers_Rescue_matrix + Array[File?] multimappers_PropUnique_matrix = Optimus.multimappers_PropUnique_matrix + File? gex_aligner_metrics = Optimus.aligner_metrics # cellbender outputs File? cell_barcodes_csv = CellBender.cell_csv diff --git a/pipelines/skylab/multiome/atac.changelog.md b/pipelines/skylab/multiome/atac.changelog.md index 32fbeb00fc..13d51a928c 100644 --- a/pipelines/skylab/multiome/atac.changelog.md +++ b/pipelines/skylab/multiome/atac.changelog.md @@ -1,3 +1,34 @@ +# 1.1.7 +2024-02-01 (Date of Last Commit) + +* Add new paired-tag task to parse sample barcodes from cell barcodes when preindexing is set to true; this does not affect the ATAC workflow + +# 1.1.6 +2024-01-24 (Date of Last Commit) + +* Added task GetNumSplits before FastqProcess task to determine the number of splits based on the bwa-mem2 machine specs +* Added an error message to the BWAPairedEndAlignment task to ensure that the number of splits equals the number of ranks +* Added an error message to the BWAPairedEndAlignment task to ensure that the number of R1s equals to the number of R3s + +# 1.1.5 +2024-01-10 (Date of Last Commit) + +* Added a check of read2 length to the paired-tag pipeline; this does not affect the ATAC workfow + +# 1.1.4 +2024-01-02 (Date of Last Commit) + +* Added functionality for using the ATAC pipeline with paired-tag data, including the option for SnapATAC task to pull cell barcodes from the BB tag of the BAM + +# 1.1.3 +2023-12-17 (Date of Last Commit) + +* Added updated docker to BWAPairedEndAlignment ATAC task to use updated code for distributed bwa-mem2 from Intel +* Removed MergedBAM ATAC and moved BWAPairedEndAlignment ATAC outside of the for loop +* Changed CPU platform to Ice Lake for BWAPairedEndAlignment ATAC task +* Added input parameter input_output_parameter to the Multiome ATAC wdl +* Increased memory for JoinMultiomeBarcodes in H5adUtils + # 1.1.2 2023-11-21 (Date of Last Commit) diff --git a/pipelines/skylab/multiome/atac.json b/pipelines/skylab/multiome/atac.json index e9cb983e76..a8b9465fdc 100644 --- a/pipelines/skylab/multiome/atac.json +++ b/pipelines/skylab/multiome/atac.json @@ -4,5 +4,6 @@ "ATAC.TrimAdapters.adapter_seq_read1": "GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG", "ATAC.TrimAdapters.adapter_seq_read2": "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG", "ATAC.input_id": "scATAC", - "ATAC.tar_bwa_reference": "gs://fc-dd55e131-ef49-4d02-aa2a-20640daaae1e/submissions/8f0dd71a-b42f-4503-b839-3f146941758a/IndexRef/53a91851-1f6c-4ab9-af66-b338ffb28b5a/call-BwaMem2Index/GRCh38.primary_assembly.genome.bwamem2.fa.tar" + "ATAC.tar_bwa_reference": "gs://fc-dd55e131-ef49-4d02-aa2a-20640daaae1e/submissions/8f0dd71a-b42f-4503-b839-3f146941758a/IndexRef/53a91851-1f6c-4ab9-af66-b338ffb28b5a/call-BwaMem2Index/GRCh38.primary_assembly.genome.bwamem2.fa.tar", + "ATAC.preindex": "false" } diff --git a/pipelines/skylab/multiome/atac.wdl b/pipelines/skylab/multiome/atac.wdl index c5c2defcba..4db04a9968 100644 --- a/pipelines/skylab/multiome/atac.wdl +++ b/pipelines/skylab/multiome/atac.wdl @@ -2,6 +2,7 @@ version 1.0 import "../../../tasks/skylab/MergeSortBam.wdl" as Merge import "../../../tasks/skylab/FastqProcessing.wdl" as FastqProcessing +import "../../../tasks/skylab/PairedTagUtils.wdl" as AddBB workflow ATAC { meta { @@ -18,14 +19,20 @@ workflow ATAC { # Output prefix/base name for all intermediate files and pipeline outputs String input_id + # Option for running files with preindex + Boolean preindex = false + # BWA ref File tar_bwa_reference + # BWA machine type -- to select number of splits + Int num_threads_bwa = 128 + Int mem_size_bwa = 512 + String cpu_platform_bwa = "Intel Ice Lake" # GTF for SnapATAC2 to calculate TSS sites of fragment file File annotations_gtf # Text file containing chrom_sizes for genome build (i.e. hg38) File chrom_sizes - # Whitelist File whitelist @@ -34,7 +41,7 @@ workflow ATAC { String adapter_seq_read3 = "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG" } - String pipeline_version = "1.1.2" + String pipeline_version = "1.1.7" parameter_meta { read1_fastq_gzipped: "read 1 FASTQ file as input for the pipeline, contains read 1 of paired reads" @@ -42,6 +49,17 @@ workflow ATAC { read3_fastq_gzipped: "read 3 FASTQ file as input for the pipeline, contains read 2 of paired reads" output_base_name: "base name to be used for the pipelines output and intermediate files" tar_bwa_reference: "the pre built tar file containing the reference fasta and cooresponding reference files for the BWA aligner" + num_threads_bwa: "Number of threads for bwa-mem2 task (default: 128)" + mem_size_bwa: "Memory size in GB for bwa-mem2 task (default: 512)" + cpu_platform_bwa: "CPU platform for bwa-mem2 task (default: Intel Ice Lake)" + + } + + call GetNumSplits { + input: + nthreads = num_threads_bwa, + mem_size = mem_size_bwa, + cpu_platform = cpu_platform_bwa } call FastqProcessing.FastqProcessATAC as SplitFastq { @@ -50,11 +68,11 @@ workflow ATAC { read3_fastq = read3_fastq_gzipped, barcodes_fastq = read2_fastq_gzipped, output_base_name = input_id, + num_output_files = GetNumSplits.ranks_per_node_out, whitelist = whitelist } scatter(idx in range(length(SplitFastq.fastq_R1_output_array))) { - call TrimAdapters { input: read1_fastq = SplitFastq.fastq_R1_output_array[idx], @@ -63,38 +81,140 @@ workflow ATAC { adapter_seq_read1 = adapter_seq_read1, adapter_seq_read3 = adapter_seq_read3 } + } - call BWAPairedEndAlignment { - input: + call BWAPairedEndAlignment { + input: read1_fastq = TrimAdapters.fastq_trimmed_adapter_output_read1, read3_fastq = TrimAdapters.fastq_trimmed_adapter_output_read3, tar_bwa_reference = tar_bwa_reference, - output_base_name = input_id + "_" + idx + output_base_name = input_id, + nthreads = num_threads_bwa, + mem_size = mem_size_bwa, + cpu_platform = cpu_platform_bwa + } + + if (preindex) { + call AddBB.AddBBTag as BBTag { + input: + bam = BWAPairedEndAlignment.bam_aligned_output, + input_id = input_id + } + call CreateFragmentFile as BB_fragment { + input: + bam = BBTag.bb_bam, + chrom_sizes = chrom_sizes, + annotations_gtf = annotations_gtf, + preindex = preindex } } + if (!preindex) { + call CreateFragmentFile { + input: + bam = BWAPairedEndAlignment.bam_aligned_output, + chrom_sizes = chrom_sizes, + annotations_gtf = annotations_gtf, + preindex = preindex - call Merge.MergeSortBamFiles as MergeBam { - input: - output_bam_filename = input_id + ".bam", - bam_inputs = BWAPairedEndAlignment.bam_aligned_output, - sort_order = "coordinate" + } } + File bam_aligned_output_atac = select_first([BBTag.bb_bam, BWAPairedEndAlignment.bam_aligned_output]) + File fragment_file_atac = select_first([BB_fragment.fragment_file, CreateFragmentFile.fragment_file]) + File snap_metrics_atac = select_first([BB_fragment.Snap_metrics,CreateFragmentFile.Snap_metrics]) + output { + File bam_aligned_output = bam_aligned_output_atac + File fragment_file = fragment_file_atac + File snap_metrics = snap_metrics_atac + } +} - call CreateFragmentFile { - input: - bam = MergeBam.output_bam, - chrom_sizes = chrom_sizes, - annotations_gtf = annotations_gtf +# get number of splits +task GetNumSplits { + input { + # machine specs for bwa-mem2 task + Int nthreads + Int mem_size + String cpu_platform + String docker_image = "ubuntu:latest" + } + + parameter_meta { + docker_image: "the ubuntu docker image (default: ubuntu:latest)" + nthreads: "Number of threads per node (default: 128)" + mem_size: "the size of memory used during alignment" + } + + command <<< + set -euo pipefail + + # steps taken from https://github.com/IntelLabs/Open-Omics-Acceleration-Framework/blob/main/pipelines/fq2sortedbam/print_config.sh + num_nodes=1 + lscpu + lscpu > compute_config + + num_cpus_per_node=$(cat compute_config | grep -E '^CPU\(s\)' | awk '{print $2}') + num_sockets=$(cat compute_config | grep -E '^Socket'| awk '{print $2}') + num_numa=$(cat compute_config | grep '^NUMA node(s)' | awk '{print $3}') + num_cpus_all_node=`expr ${num_cpus_per_node} \* ${num_nodes}` + threads_per_core=$(cat compute_config | grep -E '^Thread' | awk '{print $4}') + + num_cpus_all_node=`expr ${num_cpus_per_node} \* ${num_nodes}` + + echo "Number of threads: " $num_cpus_per_node + echo "Number of sockets: " $num_sockets + echo "Number of NUMA domains: "$num_numa + echo "Number of threads per core: "$threads_per_core + echo "Number of CPUs: $num_cpus_all_node" + + num_physical_cores_all_nodes=`expr ${num_cpus_all_node} / ${threads_per_core}` + num_physical_cores_per_nodes=`expr ${num_cpus_per_node} / ${threads_per_core}` + num_physical_cores_per_socket=`expr ${num_physical_cores_all_nodes} / ${num_sockets}` + num_physical_cores_per_numa=`expr ${num_physical_cores_all_nodes} / ${num_numa}` + echo "Number physical cores: "$num_physical_cores_per_nodes + echo "Number physical cores per socket: "$num_physical_cores_per_socket + echo "Number physical cores per numa: "$num_physical_cores_per_numa + + th=`expr ${num_physical_cores_per_numa} / 2` + if [ $th -le 10 ] + then + th=${num_physical_cores_per_numa} + fi + + while [ $num_physical_cores_per_nodes -gt $th ] + do + num_physical_cores_per_nodes=`expr $num_physical_cores_per_nodes / 2` + done + + num_physical_cores_per_rank=$num_physical_cores_per_nodes + total_num_ranks=`expr ${num_physical_cores_all_nodes} / ${num_physical_cores_per_rank}` + + ranks_per_node=`expr ${total_num_ranks} / ${num_nodes}` + echo "Number of MPI ranks: "${total_num_ranks} + echo "Number of cores per MPI rank: "$num_physical_cores_per_nodes + echo "#############################################" + #echo "Note: Each MPI rank runs a bwa-mem2 process on its input fastq files produced by fqprocess. Please ensure that the number of files created due to bam_size parameter to fqprocess (in config file) creates number of fastq files equal to ${total_num_ranks}" + echo "Please set bam_size such that fastqprocess creates ${total_num_ranks} splits of input fastq files" + echo "#############################################" + + echo $total_num_ranks > total_num_ranks.txt + echo $ranks_per_node > ranks_per_node.txt + + >>> + + runtime { + docker: docker_image + cpu: nthreads + cpuPlatform: cpu_platform + memory: "${mem_size} GiB" } output { - File bam_aligned_output = MergeBam.output_bam - File fragment_file = CreateFragmentFile.fragment_file - File snap_metrics = CreateFragmentFile.Snap_metrics + Int ranks_per_node_out = read_int("ranks_per_node.txt") } } + # trim read 1 and read 2 adapter sequeunce with cutadapt task TrimAdapters { input { @@ -163,18 +283,20 @@ task TrimAdapters { # align the two trimmed fastq as paired end data using BWA task BWAPairedEndAlignment { input { - File read1_fastq - File read3_fastq + Array[File] read1_fastq + Array[File] read3_fastq File tar_bwa_reference String read_group_id = "RG1" String read_group_sample_name = "RGSN1" + String suffix = "trimmed_adapters.fastq.gz" String output_base_name - String docker_image = "us.gcr.io/broad-gotc-prod/samtools-bwa-mem-2:1.0.0-2.2.1_x64-linux-1685469504" + String docker_image = "us.gcr.io/broad-gotc-prod/samtools-dist-bwa:2.0.0" # Runtime attributes - Int disk_size = ceil(3.25 * (size(read1_fastq, "GiB") + size(read3_fastq, "GiB") + size(tar_bwa_reference, "GiB"))) + 400 - Int nthreads = 16 - Int mem_size = 40 + Int disk_size = 2000 + Int nthreads + Int mem_size + String cpu_platform } parameter_meta { @@ -190,38 +312,122 @@ task BWAPairedEndAlignment { docker_image: "the docker image using BWA to be used (default: us.gcr.io/broad-gotc-prod/samtools-bwa-mem-2:1.0.0-2.2.1_x64-linux-1685469504)" } - String bam_aligned_output_name = output_base_name + ".aligned.bam" + String bam_aligned_output_name = output_base_name + ".bam" # bwa and call samtools to convert sam to bam command <<< - set -euo pipefail + set -euo pipefail + + # print lscpu + echo "lscpu output" + lscpu + echo "end of lscpu output" # prepare reference declare -r REF_DIR=$(mktemp -d genome_referenceXXXXXX) tar -xf "~{tar_bwa_reference}" -C $REF_DIR --strip-components 1 rm "~{tar_bwa_reference}" - - # align w/ BWA: -t for number of cores - bwa-mem2 \ - mem \ - -R "@RG\tID:~{read_group_id}\tSM:~{read_group_sample_name}" \ - -C \ - -t ~{nthreads} \ - $REF_DIR/genome.fa \ - ~{read1_fastq} ~{read3_fastq} \ - | samtools view -bS - > ~{bam_aligned_output_name} + REF_PAR_DIR=$(basename "$(dirname "$REF_DIR/genome.fa")") + echo $REF_PAR_DIR + + # make read1_fastq and read3_fastq into arrays + declare -a R1_ARRAY=(~{sep=' ' read1_fastq}) + declare -a R3_ARRAY=(~{sep=' ' read3_fastq}) + + file_path=`pwd` + echo "The current working directory is" $file_path + + # make input and output directories needed for distributed bwamem2 code + mkdir "output_dir" + mkdir "input_dir" + + echo "Move R1, R3 and reference files to input directory." + R1="" + echo "R1" + for fastq in "${R1_ARRAY[@]}"; do mv "$fastq" input_dir; R1+=`basename $fastq`" "; done + echo $R1 + R3="" + echo "R3" + for fastq in "${R3_ARRAY[@]}"; do mv "$fastq" input_dir; R3+=`basename $fastq`" "; done + echo $R3 + + mv $REF_DIR input_dir + + echo "List of files in input directory" + ls input_dir + + # multiome-practice-may15_arcgtf, trimmed_adapters.fastq.gz + PREFIX=~{output_base_name} + SUFFIX=~{suffix} + + I1="" + R2="" + + echo "REF_PAR_DIR:" $REF_PAR_DIR + REF=$REF_PAR_DIR/genome.fa + + PARAMS="+R '@RG\tID:~{read_group_id}\tSM:~{read_group_sample_name}' +C" + + INPUT_DIR=$file_path/input_dir + OUTPUT_DIR=$file_path/output_dir + + input_to_config="INPUT_DIR=\"${INPUT_DIR}\"\nOUTPUT_DIR=\"${OUTPUT_DIR}\"\nPREFIX=\"${PREFIX}\"\nSUFFIX=\"${SUFFIX}\"\n" + other_to_add="R1=\"${R1}\"\nR2=\"${R2}\"\nR3=\"${R3}\"\nI1=\"${I1}\"\nREF=\"${REF}\"\n" + params="PARAMS=\"${PARAMS}\"" + + printf "%b" "$input_to_config" + printf "%b" "$other_to_add" + echo $params + + # cd into fq2sortedbam + cd /usr/temp/Open-Omics-Acceleration-Framework/pipelines/fq2sortedbam + # remove the first part of config + tail -10 config > config + # add inputs to config file (this file is needed to run bwa-mem2 in this specific code" + printf "%b" "$input_to_config" | tee -a config + printf "%b" "$other_to_add" | tee -a config + echo $params | tee -a config + echo "CONFIG" + cat config + # run bwa-mem2 + echo "Run distributed BWA-MEM2" + ./run_bwa.sh multifq + echo "Done running distributed BWA-MEM2" + echo "List of files in output directory" + ls $OUTPUT_DIR + cd $OUTPUT_DIR + + # remove all files except for final and text file + echo "Remove all files except for final bam file and log files" + ls | grep -xv final.sorted.bam | grep -v .txt$ | xargs rm + + echo "List of files in output directory after removal" + ls + + # rename file to this + mv final.sorted.bam ~{bam_aligned_output_name} + + # save output logs for bwa-mem2 + mkdir output_logs + mv *txt output_logs + tar -zcvf /cromwell_root/output_distbwa_log.tar.gz output_logs + + # move bam file to /cromwell_root + mv ~{bam_aligned_output_name} /cromwell_root >>> runtime { docker: docker_image - disks: "local-disk ${disk_size} HDD" + disks: "local-disk ${disk_size} SSD" cpu: nthreads + cpuPlatform: cpu_platform memory: "${mem_size} GiB" } output { File bam_aligned_output = bam_aligned_output_name + File output_distbwa_log_tar = "output_distbwa_log.tar.gz" } } @@ -231,6 +437,7 @@ task CreateFragmentFile { File bam File annotations_gtf File chrom_sizes + Boolean preindex Int disk_size = 500 Int mem_size = 16 Int nthreads = 1 @@ -257,6 +464,7 @@ task CreateFragmentFile { bam = "~{bam}" bam_base_name = "~{bam_base_name}" chrom_sizes = "~{chrom_sizes}" + preindex = "~{preindex}" # calculate chrom size dictionary based on text file chrom_size_dict={} @@ -269,8 +477,12 @@ task CreateFragmentFile { import snapatac2.preprocessing as pp import snapatac2 as snap - # extract CB tag from bam file to create fragment file - pp.make_fragment_file("~{bam}", "~{bam_base_name}.fragments.tsv", is_paired=True, barcode_tag="CB") + # extract CB or BB (if preindex is true) tag from bam file to create fragment file + if preindex == "true": + pp.make_fragment_file("~{bam}", "~{bam_base_name}.fragments.tsv", is_paired=True, barcode_tag="BB") + elif preindex == "false": + pp.make_fragment_file("~{bam}", "~{bam_base_name}.fragments.tsv", is_paired=True, barcode_tag="CB") + # calculate quality metrics; note min_num_fragments and min_tsse are set to 0 instead of default # those settings allow us to retain all barcodes diff --git a/pipelines/skylab/multiome/test_inputs/Plumbing/10k_pbmc_downsampled.json b/pipelines/skylab/multiome/test_inputs/Plumbing/10k_pbmc_downsampled.json index f20de3feec..902b564388 100644 --- a/pipelines/skylab/multiome/test_inputs/Plumbing/10k_pbmc_downsampled.json +++ b/pipelines/skylab/multiome/test_inputs/Plumbing/10k_pbmc_downsampled.json @@ -20,5 +20,9 @@ "Multiome.tar_bwa_reference":"gs://gcp-public-data--broad-references/hg38/v0/bwa/v2_2_1/bwa-mem2-2.2.1-Human-GENCODE-build-GRCh38.tar", "Multiome.tar_star_reference":"gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_star2.7.10a-Human-GENCODE-build-GRCh38-43.tar", "Multiome.chrom_sizes":"gs://broad-gotc-test-storage/Multiome/input/hg38.chrom.sizes", - "Multiome.run_cellbender":"false" + "Multiome.run_cellbender":"false", + "Multiome.Atac.cpu_platform_bwa":"Intel Cascade Lake", + "Multiome.Atac.num_threads_bwa":"16", + "Multiome.Atac.mem_size_bwa":"64", + "Multiome.soloMultiMappers":"Uniform" } diff --git a/pipelines/skylab/multiome/test_inputs/Scientific/10k_pbmc.json b/pipelines/skylab/multiome/test_inputs/Scientific/10k_pbmc.json index 260e25dd27..846b91ed2d 100644 --- a/pipelines/skylab/multiome/test_inputs/Scientific/10k_pbmc.json +++ b/pipelines/skylab/multiome/test_inputs/Scientific/10k_pbmc.json @@ -1,9 +1,9 @@ { -"Multiome.annotations_gtf":"gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_v43.annotation.gtf", + "Multiome.annotations_gtf":"gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_v43.annotation.gtf", "Multiome.gex_i1_fastq":[ - "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_gex_S1_L001_I1_001.fastq.gz", - "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_gex_S1_L002_I1_001.fastq.gz" -], + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_gex_S1_L001_I1_001.fastq.gz", + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_gex_S1_L002_I1_001.fastq.gz" + ], "Multiome.input_id":"10k_PBMC", "Multiome.gex_r1_fastq":[ "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_gex_S1_L001_R1_001.fastq.gz", @@ -28,5 +28,8 @@ "Multiome.ref_genome_fasta":"gs://gcp-public-data--broad-references/hg38/v0/GRCh38.primary_assembly.genome.fa", "Multiome.tar_bwa_reference":"gs://gcp-public-data--broad-references/hg38/v0/bwa/v2_2_1/bwa-mem2-2.2.1-Human-GENCODE-build-GRCh38.tar", "Multiome.tar_star_reference":"gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_star2.7.10a-Human-GENCODE-build-GRCh38-43.tar", - "Multiome.chrom_sizes":"gs://broad-gotc-test-storage/Multiome/input/hg38.chrom.sizes" + "Multiome.chrom_sizes":"gs://broad-gotc-test-storage/Multiome/input/hg38.chrom.sizes", + "Multiome.Atac.cpu_platform_bwa":"Intel Cascade Lake", + "Multiome.Atac.num_threads_bwa":"24", + "Multiome.Atac.mem_size_bwa":"175" } diff --git a/pipelines/skylab/optimus/Optimus.changelog.md b/pipelines/skylab/optimus/Optimus.changelog.md index 731988d073..9123a32d64 100644 --- a/pipelines/skylab/optimus/Optimus.changelog.md +++ b/pipelines/skylab/optimus/Optimus.changelog.md @@ -1,6 +1,24 @@ +# 6.3.5 +2024-01-30 (Date of Last Commit) +* Added task GetNumSplits before FastqProcess ATAC task to determine the number of splits based on the bwa-mem2 machine specs +* Added an error message to the BWAPairedEndAlignment ATAC task to ensure that the number of splits equals the number of ranks +* Added an error message to the BWAPairedEndAlignment ATAC task to ensure that the number of R1s equals to the number of R3s + +# 6.3.4 +2024-01-11 (Date of Last Commit) +* Increased memory for MergeStarOutputs in StarAlign.wdl, RunEmptyDrops in RunEmptyDrops.wdl, OptimusH5ad in H5adUtils.wdl and GeneMetrics in Metrics.wdl +* Added the --soloMultiMappers flag as an optional input to the StarSoloFastq task in the StarAlign.wdl + +# 6.3.3 +2024-01-05 (Date of Last Commit) +* Modified the STARsoloFastq task in the StarAlign.wdl so STARsolo can run different types of alignments in a single STARsolo command depending on the counting_mode + +# 6.3.2 +2023-12-20 (Date of Last Commit) +* Updated the ATAC WDL for the Multiome BWAPairedEndAlignment and MergedBAM tasks; this does affect the Optimus workflow + # 6.3.1 2023-12-20 (Date of Last Commit) - * JoinMultiomeBarcodes now has dynamic memory and disk allocation; this does affect the Optimus workflow # 6.3.0 diff --git a/pipelines/skylab/optimus/Optimus.wdl b/pipelines/skylab/optimus/Optimus.wdl index 7ee20891e4..f4a07d840f 100644 --- a/pipelines/skylab/optimus/Optimus.wdl +++ b/pipelines/skylab/optimus/Optimus.wdl @@ -31,6 +31,7 @@ workflow Optimus { File annotations_gtf File ref_genome_fasta File? mt_genes + String? soloMultiMappers # Chemistry options include: 2 or 3 Int tenx_chemistry_version @@ -64,7 +65,7 @@ workflow Optimus { # version of this pipeline - String pipeline_version = "6.3.1" + String pipeline_version = "6.3.5" # this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays Array[Int] indices = range(length(r1_fastq)) @@ -131,7 +132,8 @@ workflow Optimus { chemistry = tenx_chemistry_version, counting_mode = counting_mode, count_exons = count_exons, - output_bam_basename = output_bam_basename + "_" + idx + output_bam_basename = output_bam_basename + "_" + idx, + soloMultiMappers = soloMultiMappers } } call Merge.MergeSortBamFiles as MergeBam { @@ -237,6 +239,11 @@ workflow Optimus { File gene_metrics = GeneMetrics.gene_metrics File? cell_calls = RunEmptyDrops.empty_drops_result File? aligner_metrics = MergeStarOutputs.cell_reads_out + Array[File?] multimappers_EM_matrix = STARsoloFastq.multimappers_EM_matrix + Array[File?] multimappers_Uniform_matrix = STARsoloFastq.multimappers_Uniform_matrix + Array[File?] multimappers_Rescue_matrix = STARsoloFastq.multimappers_Rescue_matrix + Array[File?] multimappers_PropUnique_matrix = STARsoloFastq.multimappers_PropUnique_matrix + # h5ad File h5ad_output_file = final_h5ad_output } diff --git a/pipelines/skylab/paired_tag/PairedTag.changelog.md b/pipelines/skylab/paired_tag/PairedTag.changelog.md new file mode 100644 index 0000000000..06b2ec320b --- /dev/null +++ b/pipelines/skylab/paired_tag/PairedTag.changelog.md @@ -0,0 +1,36 @@ +# 0.0.6 +2024-02-01 (Date of Last Commit) + +* Add new paired-tag task to parse sample barcodes from cell barcodes when preindexing is set to true + +# 0.0.5 +2024-01-30 (Date of Last Commit) + +* Added task GetNumSplits before FastqProcess ATAC task to determine the number of splits based on the bwa-mem2 machine specs +* Added an error message to the BWAPairedEndAlignment ATAC task to ensure that the number of splits equals the number of ranks +* Added an error message to the BWAPairedEndAlignment ATAC task to ensure that the number of R1s equals to the number of R3s + +# 0.0.4 +2024-01-18 (Date of Last Commit) + +* Increased memory for MergeStarOutputs in StarAlign.wdl, RunEmptyDrops in RunEmptyDrops.wdl, OptimusH5ad in H5adUtils.wdl and GeneMetrics in Metrics.wdl +* Added the --soloMultiMappers flag as an optional input to the StarSoloFastq task in the StarAlign.wdl +* Added a check of read2 length and barcode orientation to the demultiplexing step of the pipeline; this task now checks read2 length, performs demultiplexing or trimming if necessary, and checks barcode orientation + +# 0.0.3 +2024-01-05 (Date of Last Commit) + +* Added a new option for the preindex boolean to add cell barcodes and preindex sample barcode to the BB tag of the BAM +* Added new functionality for the ATAC workflow to use BB tag of BAM for SnapATAC2 +* Modified the STARsoloFastq task in the StarAlign.wdl so STARsolo can run different types of alignments in a single STARsolo command depending on the counting_mode + +# 0.0.2 +2023-12-20 (Date of Last Commit) + +* Updated the ATAC WDL for the Multiome BWAPairedEndAlignment and MergedBAM tasks + +# 0.0.1 +2023-12-18 (Date of Last Commit) + +* Initial release of the PairedTag pipeline + diff --git a/pipelines/skylab/paired_tag/PairedTag.wdl b/pipelines/skylab/paired_tag/PairedTag.wdl new file mode 100644 index 0000000000..bc0e6763f7 --- /dev/null +++ b/pipelines/skylab/paired_tag/PairedTag.wdl @@ -0,0 +1,131 @@ +version 1.0 + +import "../../../pipelines/skylab/multiome/atac.wdl" as atac +import "../../../pipelines/skylab/optimus/Optimus.wdl" as optimus +import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils +import "../../../tasks/skylab/PairedTagUtils.wdl" as Demultiplexing +workflow PairedTag { + String pipeline_version = "0.0.6" + + input { + String input_id + + # Optimus Inputs + String counting_mode = "sn_rna" + Array[File] gex_r1_fastq + Array[File] gex_r2_fastq + Array[File]? gex_i1_fastq + File tar_star_reference + File annotations_gtf + File ref_genome_fasta + File? mt_genes + Int tenx_chemistry_version = 3 + Int emptydrops_lower = 100 + Boolean force_no_check = false + Boolean ignore_r1_read_length = false + String star_strand_mode = "Forward" + Boolean count_exons = false + File gex_whitelist = "gs://gcp-public-data--broad-references/RNA/resources/arc-v1/737K-arc-v1_gex.txt" + + # ATAC inputs + # Array of input fastq files + Array[File] atac_r1_fastq + Array[File] atac_r2_fastq + Array[File] atac_r3_fastq + # BWA input + File tar_bwa_reference + File chrom_sizes + # Trimadapters input + String adapter_seq_read1 = "GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG" + String adapter_seq_read3 = "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG" + # Whitelist + File atac_whitelist = "gs://gcp-public-data--broad-references/RNA/resources/arc-v1/737K-arc-v1_atac.txt" + + # PairedTag + Boolean preindex + } + # Call the Optimus workflow + call optimus.Optimus as Optimus { + input: + counting_mode = counting_mode, + r1_fastq = gex_r1_fastq, + r2_fastq = gex_r2_fastq, + i1_fastq = gex_i1_fastq, + input_id = input_id + "_gex", + output_bam_basename = input_id + "_gex", + tar_star_reference = tar_star_reference, + annotations_gtf = annotations_gtf, + ref_genome_fasta = ref_genome_fasta, + mt_genes = mt_genes, + tenx_chemistry_version = tenx_chemistry_version, + whitelist = gex_whitelist, + emptydrops_lower = emptydrops_lower, + force_no_check = force_no_check, + ignore_r1_read_length = ignore_r1_read_length, + star_strand_mode = star_strand_mode, + count_exons = count_exons, + } + + # Call the ATAC workflow + # Call the ATAC workflow + scatter (idx in range(length(atac_r1_fastq))) { + call Demultiplexing.PairedTagDemultiplex as demultiplex { + input: + read1_fastq = atac_r1_fastq[idx], + read3_fastq = atac_r3_fastq[idx], + barcodes_fastq = atac_r2_fastq[idx], + input_id = input_id, + whitelist = atac_whitelist, + preindex = preindex + } + } + call atac.ATAC as Atac_preindex { + input: + read1_fastq_gzipped = demultiplex.fastq1, + read2_fastq_gzipped = demultiplex.barcodes, + read3_fastq_gzipped = demultiplex.fastq3, + input_id = input_id + "_atac", + tar_bwa_reference = tar_bwa_reference, + annotations_gtf = annotations_gtf, + chrom_sizes = chrom_sizes, + whitelist = atac_whitelist, + adapter_seq_read1 = adapter_seq_read1, + adapter_seq_read3 = adapter_seq_read3, + preindex = preindex + } + + if (preindex) { + call Demultiplexing.ParseBarcodes as ParseBarcodes { + input: + atac_h5ad = Atac_preindex.snap_metrics, + atac_fragment = Atac_preindex.fragment_file + } + } + + meta { + allowNestedInputs: true + } + + File atac_fragment_out = select_first([ParseBarcodes.atac_fragment_tsv,Atac_preindex.fragment_file]) + File atac_h5ad_out = select_first([ParseBarcodes.atac_h5ad_file, Atac_preindex.snap_metrics]) + output { + + String pairedtag_pipeline_version_out = pipeline_version + + # atac outputs + File bam_aligned_output_atac = Atac_preindex.bam_aligned_output + File fragment_file_atac = atac_fragment_out + File snap_metrics_atac = atac_h5ad_out + + # optimus outputs + File genomic_reference_version_gex = Optimus.genomic_reference_version + File bam_gex = Optimus.bam + File matrix_gex = Optimus.matrix + File matrix_row_index_gex = Optimus.matrix_row_index + File matrix_col_index_gex = Optimus.matrix_col_index + File cell_metrics_gex = Optimus.cell_metrics + File gene_metrics_gex = Optimus.gene_metrics + File? cell_calls_gex = Optimus.cell_calls + File h5ad_output_file_gex = Optimus.h5ad_output_file + } +} diff --git a/pipelines/skylab/paired_tag/test_inputs/Plumbing/10k_pbmc_downsampled.json b/pipelines/skylab/paired_tag/test_inputs/Plumbing/10k_pbmc_downsampled.json new file mode 100644 index 0000000000..e46f86c366 --- /dev/null +++ b/pipelines/skylab/paired_tag/test_inputs/Plumbing/10k_pbmc_downsampled.json @@ -0,0 +1,24 @@ +{ + "PairedTag.annotations_gtf":"gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_v43.annotation.gtf", + "PairedTag.input_id":"10k_PBMC_downsampled", + "PairedTag.gex_r1_fastq":[ + "gs://broad-gotc-test-storage/Multiome/input/plumbing/fastq_R1_gex.fastq.gz" + ], + "PairedTag.gex_r2_fastq":[ + "gs://broad-gotc-test-storage/Multiome/input/plumbing/fastq_R2_gex.fastq.gz" + ], + "PairedTag.atac_r1_fastq":[ + "gs://broad-gotc-test-storage/Multiome/input/plumbing/fastq_R1_atac.fastq.gz" + ], + "PairedTag.atac_r2_fastq":[ + "gs://broad-gotc-test-storage/Multiome/input/plumbing/fastq_R2_atac.fastq.gz" + ], + "PairedTag.atac_r3_fastq":[ + "gs://broad-gotc-test-storage/Multiome/input/plumbing/fastq_R3_atac.fastq.gz" + ], + "PairedTag.ref_genome_fasta":"gs://gcp-public-data--broad-references/hg38/v0/GRCh38.primary_assembly.genome.fa", + "PairedTag.tar_bwa_reference":"gs://gcp-public-data--broad-references/hg38/v0/bwa/v2_2_1/bwa-mem2-2.2.1-Human-GENCODE-build-GRCh38.tar", + "PairedTag.tar_star_reference":"gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_star2.7.10a-Human-GENCODE-build-GRCh38-43.tar", + "PairedTag.chrom_sizes":"gs://broad-gotc-test-storage/Multiome/input/hg38.chrom.sizes", + "PairedTag.preindex":"false" +} diff --git a/pipelines/skylab/paired_tag/test_inputs/Scientific/10k_pbmc.json b/pipelines/skylab/paired_tag/test_inputs/Scientific/10k_pbmc.json new file mode 100644 index 0000000000..888439d2a6 --- /dev/null +++ b/pipelines/skylab/paired_tag/test_inputs/Scientific/10k_pbmc.json @@ -0,0 +1,33 @@ +{ +"PairedTag.annotations_gtf":"gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_v43.annotation.gtf", + "PairedTag.gex_i1_fastq":[ + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_gex_S1_L001_I1_001.fastq.gz", + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_gex_S1_L002_I1_001.fastq.gz" +], + "PairedTag.input_id":"10k_PBMC", + "PairedTag.gex_r1_fastq":[ + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_gex_S1_L001_R1_001.fastq.gz", + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_gex_S1_L002_R1_001.fastq.gz" + ], + "PairedTag.gex_r2_fastq":[ + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_gex_S1_L001_R2_001.fastq.gz", + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_gex_S1_L002_R2_001.fastq.gz" + ], + "PairedTag.atac_r1_fastq":[ + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_atac_S1_L001_R1_001.fastq.gz", + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_atac_S1_L002_R1_001.fastq.gz" + ], + "PairedTag.atac_r2_fastq":[ + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_atac_S1_L001_R2_001.fastq.gz", + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_atac_S1_L002_R2_001.fastq.gz" + ], + "PairedTag.atac_r3_fastq":[ + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_atac_S1_L001_R3_001.fastq.gz", + "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_atac_S1_L002_R3_001.fastq.gz" + ], + "PairedTag.ref_genome_fasta":"gs://gcp-public-data--broad-references/hg38/v0/GRCh38.primary_assembly.genome.fa", + "PairedTag.tar_bwa_reference":"gs://gcp-public-data--broad-references/hg38/v0/bwa/v2_2_1/bwa-mem2-2.2.1-Human-GENCODE-build-GRCh38.tar", + "PairedTag.tar_star_reference":"gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_star2.7.10a-Human-GENCODE-build-GRCh38-43.tar", + "PairedTag.chrom_sizes":"gs://broad-gotc-test-storage/Multiome/input/hg38.chrom.sizes", + "PairedTag.preindex":"false" +} \ No newline at end of file diff --git a/pipelines/skylab/slideseq/SlideSeq.changelog.md b/pipelines/skylab/slideseq/SlideSeq.changelog.md index c60159617c..f540bdc710 100644 --- a/pipelines/skylab/slideseq/SlideSeq.changelog.md +++ b/pipelines/skylab/slideseq/SlideSeq.changelog.md @@ -1,3 +1,27 @@ +# 2.1.6 +2024-01-30 (Date of Last Commit) + +* Added task GetNumSplits before FastqProcess ATAC task to determine the number of splits based on the bwa-mem2 machine specs; this does affect the SlideSeq workflow +* Added an error message to the BWAPairedEndAlignment ATAC task to ensure that the number of splits equals the number of ranks; this does affect the SlideSeq workflow +* Added an error message to the BWAPairedEndAlignment ATAC task to ensure that the number of R1s equals to the number of R3s; this does affect the SlideSeq workflow + +# 2.1.5 +2024-01-11 (Date of Last Commit) + +* Increased memory for MergeStarOutputs in StarAlign.wdl, RunEmptyDrops in RunEmptyDrops.wdl, OptimusH5ad in H5adUtils.wdl and GeneMetrics in Metrics.wdl +* Added the --soloMultiMappers flag as an optional input to the StarSoloFastq task in the StarAlign.wdl; this does affect the SlideSeq workflow + +# 2.1.4 +2024-01-05 (Date of Last Commit) + +* Modified the STARsoloFastq task in the StarAlign.wdl so STARsolo can run different types of alignments in a single STARsolo command depending on the counting_mode; this does affect the SlideSeq workflow + +# 2.1.3 +2023-12-17 (Date of Last Commit) + +* Updated the ATAC WDL for the Multiome BWAPairedEndAlignment and MergedBAM tasks; this does affect the SlideSeq workflow + + # 2.1.2 2023-11-21 (Date of Last Commit) diff --git a/pipelines/skylab/slideseq/SlideSeq.wdl b/pipelines/skylab/slideseq/SlideSeq.wdl index 647cd70dc1..c469d7fe56 100644 --- a/pipelines/skylab/slideseq/SlideSeq.wdl +++ b/pipelines/skylab/slideseq/SlideSeq.wdl @@ -23,7 +23,7 @@ import "../../../tasks/skylab/MergeSortBam.wdl" as Merge workflow SlideSeq { - String pipeline_version = "2.1.2" + String pipeline_version = "2.1.6" input { Array[File] r1_fastq diff --git a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md index a82ceafc8c..b0e84df63f 100644 --- a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md +++ b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md @@ -1,3 +1,14 @@ +# 1.2.28 +2024-01-11 (Date of Last Commit) + +* Increased memory for MergeStarOutputs in StarAlign.wdl, RunEmptyDrops in RunEmptyDrops.wdl, OptimusH5ad in H5adUtils.wdl and GeneMetrics in Metrics.wdl +* Added the --soloMultiMappers flag as an optional input to the StarSoloFastq task in the StarAlign.wdl; this does affect the MultiSampleSmartSeq2SingleNucleus workflow + +# 1.2.27 +2024-01-05 (Date of Last Commit) + +* Modified the STARsoloFastq task in the StarAlign.wdl so STARsolo can run different types of alignments in a single STARsolo command depending on the counting_mode; this does affect the MultiSampleSmartSeq2SingleNucleus workflow + # 1.2.26 2023-08-22 (Date of Last Commit) diff --git a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl index c9d1061872..d0bf9dbb2f 100644 --- a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl +++ b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl @@ -40,7 +40,7 @@ workflow MultiSampleSmartSeq2SingleNucleus { String? input_id_metadata_field } # Version of this pipeline - String pipeline_version = "1.2.26" + String pipeline_version = "1.2.28" if (false) { String? none = "None" diff --git a/pipelines/skylab/snM3C/snM3C.changelog.md b/pipelines/skylab/snM3C/snM3C.changelog.md index eed3655306..b24145073b 100644 --- a/pipelines/skylab/snM3C/snM3C.changelog.md +++ b/pipelines/skylab/snM3C/snM3C.changelog.md @@ -1,3 +1,8 @@ +# 1.0.1 +2024-01-31 (Date of Last Commit) + +* Refactored the snM3C.wdl. The outputs of the pipeline have not been affected + # 1.0.0 2023-08-01 (Date of Last Commit) diff --git a/pipelines/skylab/snM3C/snM3C.wdl b/pipelines/skylab/snM3C/snM3C.wdl index 2453bdd4ff..3feefb6787 100644 --- a/pipelines/skylab/snM3C/snM3C.wdl +++ b/pipelines/skylab/snM3C/snM3C.wdl @@ -2,51 +2,166 @@ version 1.0 workflow snM3C { - input { - # demultiplexing inputs - Array[File] fastq_input_read1 - Array[File] fastq_input_read2 - File random_primer_indexes - String plate_id - String output_basename = plate_id - - # mapping inputs - File tarred_index_files - File mapping_yaml - File snakefile - File chromosome_sizes - File genome_fa - } - # version of the pipeline - String pipeline_version = "1.0.0" - - call Demultiplexing { - input: - fastq_input_read1 = fastq_input_read1, - fastq_input_read2 = fastq_input_read2, - random_primer_indexes = random_primer_indexes, - plate_id = plate_id - } - - call Mapping { - input: - tarred_demultiplexed_fastqs = Demultiplexing.tarred_demultiplexed_fastqs, - tarred_index_files = tarred_index_files, - mapping_yaml = mapping_yaml, - snakefile = snakefile, - chromosome_sizes = chromosome_sizes, - genome_fa = genome_fa, - plate_id = plate_id + input { + Array[File] fastq_input_read1 + Array[File] fastq_input_read2 + File random_primer_indexes + String plate_id + # mapping inputs + File tarred_index_files + File genome_fa + File chromosome_sizes + + String r1_adapter = "AGATCGGAAGAGCACACGTCTGAAC" + String r2_adapter = "AGATCGGAAGAGCGTCGTGTAGGGA" + Int r1_left_cut = 10 + Int r1_right_cut = 10 + Int r2_left_cut = 10 + Int r2_right_cut = 10 + Int min_read_length = 30 + Int num_upstr_bases = 0 + Int num_downstr_bases = 2 + Int compress_level = 5 + Int batch_number + } - output { - File MappingSummary = Mapping.mappingSummary - File allcFiles = Mapping.allcFiles - File allc_CGNFiles = Mapping.allc_CGNFiles - File bamFiles = Mapping.bamFiles - File detail_statsFiles = Mapping.detail_statsFiles - File hicFiles = Mapping.hicFiles - } + # version of the pipeline + String pipeline_version = "1.0.1" + + call Demultiplexing { + input: + fastq_input_read1 = fastq_input_read1, + fastq_input_read2 = fastq_input_read2, + random_primer_indexes = random_primer_indexes, + plate_id = plate_id, + batch_number = batch_number + } + + scatter(tar in Demultiplexing.tarred_demultiplexed_fastqs) { + call Sort_and_trim_r1_and_r2 { + input: + tarred_demultiplexed_fastqs = tar, + r1_adapter = r1_adapter, + r2_adapter = r2_adapter, + r1_left_cut = r1_left_cut, + r1_right_cut = r1_right_cut, + r2_left_cut = r2_left_cut, + r2_right_cut = r2_right_cut, + min_read_length = min_read_length, + plate_id = plate_id + } + + call Hisat_3n_pair_end_mapping_dna_mode { + input: + r1_trimmed_tar = Sort_and_trim_r1_and_r2.r1_trimmed_fq_tar, + r2_trimmed_tar = Sort_and_trim_r1_and_r2.r2_trimmed_fq_tar, + tarred_index_files = tarred_index_files, + genome_fa = genome_fa, + chromosome_sizes = chromosome_sizes, + plate_id = plate_id + } + + call Separate_unmapped_reads { + input: + hisat3n_bam_tar = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_bam_tar, + min_read_length = min_read_length, + plate_id = plate_id + } + + call Split_unmapped_reads { + input: + unmapped_fastq_tar = Separate_unmapped_reads.unmapped_fastq_tar, + min_read_length = min_read_length, + plate_id = plate_id + } + + call Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name { + input: + split_fq_tar = Split_unmapped_reads.split_fq_tar, + tarred_index_files = tarred_index_files, + genome_fa = genome_fa, + plate_id = plate_id + } + + call remove_overlap_read_parts { + input: + bam = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name.merge_sorted_bam_tar, + plate_id = plate_id + } + + call merge_original_and_split_bam_and_sort_all_reads_by_name_and_position { + input: + bam = Separate_unmapped_reads.unique_bam_tar, + split_bam = remove_overlap_read_parts.output_bam_tar, + plate_id = plate_id + } + + call call_chromatin_contacts { + input: + name_sorted_bam = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position.name_sorted_bam, + plate_id = plate_id + } + + call dedup_unique_bam_and_index_unique_bam { + input: + bam = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position.position_sorted_bam, + plate_id = plate_id + } + + call unique_reads_allc { + input: + bam_and_index_tar = dedup_unique_bam_and_index_unique_bam.output_tar, + genome_fa = genome_fa, + num_upstr_bases = num_upstr_bases, + num_downstr_bases = num_downstr_bases, + compress_level = compress_level, + plate_id = plate_id + } + + call unique_reads_cgn_extraction { + input: + allc_tar = unique_reads_allc.allc, + tbi_tar = unique_reads_allc.tbi, + chrom_size_path = chromosome_sizes, + plate_id = plate_id + } + } + + call summary { + input: + trimmed_stats = Sort_and_trim_r1_and_r2.trim_stats_tar, + hisat3n_stats = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_stats_tar, + r1_hisat3n_stats = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name.hisat3n_dna_split_reads_summary_R1_tar, + r2_hisat3n_stats = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name.hisat3n_dna_split_reads_summary_R2_tar, + dedup_stats = dedup_unique_bam_and_index_unique_bam.dedup_stats_tar, + chromatin_contact_stats = call_chromatin_contacts.chromatin_contact_stats, + allc_uniq_reads_stats = unique_reads_allc.allc_uniq_reads_stats, + unique_reads_cgn_extraction_tbi = unique_reads_cgn_extraction.output_tbi_tar, + plate_id = plate_id + } + + output { + File MappingSummary = summary.mapping_summary + Array[File] trimmed_stats = Sort_and_trim_r1_and_r2.trim_stats_tar + Array[File] r1_trimmed_fq = Sort_and_trim_r1_and_r2.r1_trimmed_fq_tar + Array[File] r2_trimmed_fq = Sort_and_trim_r1_and_r2.r2_trimmed_fq_tar + Array[File] hisat3n_stats_tar = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_stats_tar + Array[File] hisat3n_bam_tar = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_bam_tar + Array[File] unique_bam_tar = Separate_unmapped_reads.unique_bam_tar + Array[File] multi_bam_tar = Separate_unmapped_reads.multi_bam_tar + Array[File] unmapped_fastq_tar = Separate_unmapped_reads.unmapped_fastq_tar + Array[File] split_fq_tar = Split_unmapped_reads.split_fq_tar + Array[File] merge_sorted_bam_tar = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name.merge_sorted_bam_tar + Array[File] name_sorted_bams = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position.name_sorted_bam + Array[File] pos_sorted_bams = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position.position_sorted_bam + Array[File] remove_overlap_read_parts_bam_tar = remove_overlap_read_parts.output_bam_tar + Array[File] dedup_unique_bam_and_index_unique_bam_tar = dedup_unique_bam_and_index_unique_bam.output_tar + Array[File] unique_reads_cgn_extraction_allc = unique_reads_cgn_extraction.output_allc_tar + Array[File] unique_reads_cgn_extraction_tbi = unique_reads_cgn_extraction.output_tbi_tar + Array[File] chromatin_contact_stats = call_chromatin_contacts.chromatin_contact_stats + Array[File] reference_version = Hisat_3n_pair_end_mapping_dna_mode.reference_version + } } task Demultiplexing { @@ -55,10 +170,13 @@ task Demultiplexing { Array[File] fastq_input_read2 File random_primer_indexes String plate_id + Int batch_number String docker_image = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" Int disk_size = 50 Int mem_size = 10 + Int preemptible_tries = 3 + Int cpu = 1 } command <<< @@ -110,101 +228,905 @@ task Demultiplexing { print(f'Removed file: {filename}') CODE - # zip up all the output fq.gz files - tar -zcvf ~{plate_id}.cutadapt_output_files.tar.gz *.fq.gz + # Batch the fastq files into folders of batch_number size + batch_number=~{batch_number} + for i in $(seq 1 "${batch_number}"); do # Use seq for reliable brace expansion + mkdir -p "batch${i}" # Combine batch and i, use -p to create parent dirs + done + + # Counter for the folder index + folder_index=1 + + # Define lists of r1 and r2 fq files + R1_files=($(ls | grep "\-R1.fq.gz")) + R2_files=($(ls | grep "\-R2.fq.gz")) + + # Distribute the FASTQ files and create TAR files + for file in "${R1_files[@]}"; do + sample_id=$(basename "$file" "-R1.fq.gz") + r2_file="${sample_id}-R2.fq.gz" + mv $file batch$((folder_index))/$file + mv $r2_file batch$((folder_index))/$r2_file + # Increment the counter + folder_index=$(( (folder_index % $batch_number) + 1 )) + done + echo "TAR files" + for i in $(seq 1 "${batch_number}"); do + tar -zcvf "~{plate_id}.${i}.cutadapt_output_files.tar.gz" batch${i}/*.fq.gz + done + + + echo "TAR files created successfully." >>> runtime { docker: docker_image disks: "local-disk ${disk_size} HDD" - cpu: 1 + cpu: cpu memory: "${mem_size} GiB" + preemptible: preemptible_tries } output { - File tarred_demultiplexed_fastqs = "~{plate_id}.cutadapt_output_files.tar.gz" - File stats = "~{plate_id}.stats.txt"} + Array[File] tarred_demultiplexed_fastqs = glob("*.tar.gz") + File stats = "~{plate_id}.stats.txt" + } } -task Mapping { - input { - File tarred_demultiplexed_fastqs - File tarred_index_files - File mapping_yaml - File snakefile - File chromosome_sizes - File genome_fa - String plate_id +task Sort_and_trim_r1_and_r2 { + input { + File tarred_demultiplexed_fastqs + String plate_id + String r1_adapter + String r2_adapter + Int r1_left_cut + Int r1_right_cut + Int r2_left_cut + Int r2_right_cut + Int min_read_length - String docker_image = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 200 - Int mem_size = 500 - } + Int disk_size = 50 + Int mem_size = 10 + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + Int preemptible_tries = 3 + Int cpu = 1 - command <<< + } + command <<< set -euo pipefail - mkdir group0/ - mkdir group0/fastq/ - mkdir group0/reference/ - - cp ~{tarred_index_files} group0/reference/ - cp ~{chromosome_sizes} group0/reference/ - cp ~{genome_fa} group0/reference/ - cp ~{tarred_demultiplexed_fastqs} group0/fastq/ - cp ~{mapping_yaml} group0/ - cp ~{snakefile} group0/ - - # untar the index files - cd group0/reference/ - echo "Untarring the index files" - tar -zxvf ~{tarred_index_files} - rm ~{tarred_index_files} - samtools faidx hg38.fa - - # untar the demultiplexed fastq files - cd ../fastq/ - tar -zxvf ~{tarred_demultiplexed_fastqs} - rm ~{tarred_demultiplexed_fastqs} - - # run the snakemake command - cd ../ - /opt/conda/bin/snakemake --configfile mapping.yaml -j - - # move outputs into /cromwell_root/ - mv /cromwell_root/group0/MappingSummary.csv.gz /cromwell_root/~{plate_id}_MappingSummary.csv.gz - - cd /cromwell_root/group0/allc - tar -zcvf ~{plate_id}_allc_files.tar.gz * - mv ~{plate_id}_allc_files.tar.gz /cromwell_root/ - cd ../allc-CGN - tar -zcvf ~{plate_id}_allc-CGN_files.tar.gz * - mv ~{plate_id}_allc-CGN_files.tar.gz /cromwell_root/ - cd ../bam - tar -zcvf ~{plate_id}_bam_files.tar.gz * - mv ~{plate_id}_bam_files.tar.gz /cromwell_root/ - cd ../detail_stats - tar -zcvf ~{plate_id}_detail_stats_files.tar.gz * - mv ~{plate_id}_detail_stats_files.tar.gz /cromwell_root/ - cd ../hic - tar -zcvf ~{plate_id}_hic_files.tar.gz * - mv ~{plate_id}_hic_files.tar.gz /cromwell_root/ + # untar the demultiplexed fastqs + tar -xf ~{tarred_demultiplexed_fastqs} - >>> + #change into batch subfolder + cd batch* + # define lists of r1 and r2 fq files + R1_files=($(ls | grep "\-R1.fq.gz")) + R2_files=($(ls | grep "\-R2.fq.gz")) - runtime { - docker: docker_image - disks: "local-disk ${disk_size} HDD" - cpu: 1 - memory: "${mem_size} GiB" - } + # loop over R1 and R2 files and sort them + for file in "${R1_files[@]}"; do + sample_id=$(basename "$file" "-R1.fq.gz") + r2_file="${sample_id}-R2.fq.gz" + zcat "$file" | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" > "${sample_id}-R1_sorted.fq" + zcat "$r2_file" | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" > "${sample_id}-R2_sorted.fq" + done - output { - File mappingSummary = "~{plate_id}_MappingSummary.csv.gz" - File allcFiles = "~{plate_id}_allc_files.tar.gz" - File allc_CGNFiles = "~{plate_id}_allc-CGN_files.tar.gz" - File bamFiles = "~{plate_id}_bam_files.tar.gz" - File detail_statsFiles = "~{plate_id}_detail_stats_files.tar.gz" - File hicFiles = "~{plate_id}_hic_files.tar.gz" - } + + echo "Starting to trim with Cutadapt" + sorted_R1_files=($(ls | grep "\-R1_sorted.fq")) + for file in "${sorted_R1_files[@]}"; do + sample_id=$(basename "$file" "-R1_sorted.fq") + /opt/conda/bin/cutadapt \ + -a R1Adapter=~{r1_adapter} \ + -A R2Adapter=~{r2_adapter} \ + --report=minimal \ + -O 6 \ + -q 20 \ + -u ~{r1_left_cut} \ + -u -~{r1_right_cut} \ + -U ~{r2_left_cut} \ + -U -~{r2_right_cut} \ + -Z \ + -m ~{min_read_length}:~{min_read_length} \ + --pair-filter 'both' \ + -o ${sample_id}-R1_trimmed.fq.gz \ + -p ${sample_id}-R2_trimmed.fq.gz \ + ${sample_id}-R1_sorted.fq ${sample_id}-R2_sorted.fq \ + > ${sample_id}.trimmed.stats.txt + done + + echo "Tarring up the trimmed files and stats files" + + tar -zcvf ~{plate_id}.R1_trimmed_files.tar.gz *-R1_trimmed.fq.gz + tar -zcvf ~{plate_id}.R2_trimmed_files.tar.gz *-R2_trimmed.fq.gz + tar -zcvf ~{plate_id}.trimmed_stats_files.tar.gz *.trimmed.stats.txt + # move files back to root + mv ~{plate_id}.R1_trimmed_files.tar.gz ../~{plate_id}.R1_trimmed_files.tar.gz + mv ~{plate_id}.R2_trimmed_files.tar.gz ../~{plate_id}.R2_trimmed_files.tar.gz + mv ~{plate_id}.trimmed_stats_files.tar.gz ../~{plate_id}.trimmed_stats_files.tar.gz + >>> + runtime { + docker: docker + disks: "local-disk ${disk_size} HDD" + cpu: cpu + memory: "${mem_size} GiB" + preemptible: preemptible_tries + } + output { + File r1_trimmed_fq_tar = "~{plate_id}.R1_trimmed_files.tar.gz" + File r2_trimmed_fq_tar = "~{plate_id}.R2_trimmed_files.tar.gz" + File trim_stats_tar = "~{plate_id}.trimmed_stats_files.tar.gz" + } +} + +task Hisat_3n_pair_end_mapping_dna_mode{ + input { + File r1_trimmed_tar + File r2_trimmed_tar + File tarred_index_files + File genome_fa + File chromosome_sizes + String plate_id + + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + Int disk_size = 1000 + Int mem_size = 64 + Int preemptible_tries = 3 + Int cpu = 16 + } + command <<< + set -euo pipefail + + # check genomic reference version and print to output txt file + STRING=~{genome_fa} + BASE=$(basename $STRING .fa) + + echo "The reference is $BASE" > ~{plate_id}.reference_version.txt + + mkdir reference/ + mkdir fastq/ + + cp ~{tarred_index_files} reference/ + cp ~{genome_fa} reference/ + cp ~{chromosome_sizes} reference/ + cp ~{r1_trimmed_tar} fastq/ + cp ~{r2_trimmed_tar} fastq/ + + # untar the index files + cd reference/ + echo "Untarring the index files" + tar -zxvf ~{tarred_index_files} + rm ~{tarred_index_files} + + #get the basename of the genome_fa file + genome_fa_basename=$(basename ~{genome_fa} .fa) + echo "samtools faidx $genome_fa_basename.fa" + samtools faidx $genome_fa_basename.fa + + # untar the demultiplexed fastq files + cd ../fastq/ + echo "Untarring the fastq files" + tar -zxvf ~{r1_trimmed_tar} + tar -zxvf ~{r2_trimmed_tar} + rm ~{r1_trimmed_tar} + rm ~{r2_trimmed_tar} + + # define lists of r1 and r2 fq files + R1_files=($(ls | grep "\-R1_trimmed.fq.gz")) + R2_files=($(ls | grep "\-R2_trimmed.fq.gz")) + + for file in "${R1_files[@]}"; do + sample_id=$(basename "$file" "-R1_trimmed.fq.gz") + hisat-3n /cromwell_root/reference/$genome_fa_basename \ + -q \ + -1 ${sample_id}-R1_trimmed.fq.gz \ + -2 ${sample_id}-R2_trimmed.fq.gz \ + --directional-mapping-reverse \ + --base-change C,T \ + --no-repeat-index \ + --no-spliced-alignment \ + --no-temp-splicesite \ + -t \ + --new-summary \ + --summary-file ${sample_id}.hisat3n_dna_summary.txt \ + --threads 11 | samtools view -b -q 0 -o "${sample_id}.hisat3n_dna.unsort.bam" + done + + # tar up the bam files and stats files + tar -zcvf ~{plate_id}.hisat3n_paired_end_bam_files.tar.gz *.bam + tar -zcvf ~{plate_id}.hisat3n_paired_end_stats_files.tar.gz *.hisat3n_dna_summary.txt + + mv ~{plate_id}.hisat3n_paired_end_bam_files.tar.gz ../ + mv ~{plate_id}.hisat3n_paired_end_stats_files.tar.gz ../ + + >>> + runtime { + docker: docker + disks: "local-disk ${disk_size} HDD" + cpu: cpu + memory: "${mem_size} GiB" + preemptible: preemptible_tries + } + output { + File hisat3n_paired_end_bam_tar = "~{plate_id}.hisat3n_paired_end_bam_files.tar.gz" + File hisat3n_paired_end_stats_tar = "~{plate_id}.hisat3n_paired_end_stats_files.tar.gz" + File reference_version = "~{plate_id}.reference_version.txt" + } +} + +task Separate_unmapped_reads { + input { + File hisat3n_bam_tar + Int min_read_length + String plate_id + + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + Int disk_size = 50 + Int mem_size = 10 + Int preemptible_tries = 3 + Int cpu = 1 + + } + command <<< + + set -euo pipefail + + # untar the hisat3n bam files + tar -xf ~{hisat3n_bam_tar} + rm ~{hisat3n_bam_tar} + + python3 <>> + runtime { + docker: docker + disks: "local-disk ${disk_size} HDD" + cpu: cpu + memory: "${mem_size} GiB" + preemptible: preemptible_tries + } + output { + File unique_bam_tar = "~{plate_id}.hisat3n_paired_end_unique_bam_files.tar.gz" + File multi_bam_tar = "~{plate_id}.hisat3n_paired_end_multi_bam_files.tar.gz" + File unmapped_fastq_tar = "~{plate_id}.hisat3n_paired_end_unmapped_fastq_files.tar.gz" + } +} + +task Split_unmapped_reads { + input { + File unmapped_fastq_tar + Int min_read_length + String plate_id + + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + Int disk_size = 50 + Int mem_size = 10 + Int preemptible_tries = 3 + Int cpu = 1 + } + command <<< + + set -euo pipefail + + # untar the unmapped fastq files + tar -xf ~{unmapped_fastq_tar} + rm ~{unmapped_fastq_tar} + + python3 <>> + runtime { + docker: docker + disks: "local-disk ${disk_size} HDD" + cpu: cpu + memory: "${mem_size} GiB" + preemptible: preemptible_tries + } + output { + File split_fq_tar = "~{plate_id}.hisat3n_paired_end_split_fastq_files.tar.gz" + } +} + +task Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name { + input { + File split_fq_tar + File genome_fa + File tarred_index_files + String plate_id + + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + Int disk_size = 500 + Int mem_size = 64 + Int preemptible_tries = 3 + Int cpu = 16 + } + command <<< + set -euo pipefail + + mkdir reference/ + + cp ~{tarred_index_files} reference/ + cp ~{genome_fa} reference/ + + # untar the tarred index files + cd reference/ + tar -xvf ~{tarred_index_files} + rm ~{tarred_index_files} + + #get the basename of the genome_fa file + genome_fa_basename=$(basename ~{genome_fa} .fa) + samtools faidx $genome_fa_basename.fa + + # untar the unmapped fastq files + tar -xvf ~{split_fq_tar} + rm ~{split_fq_tar} + + # define lists of r1 and r2 fq files + R1_files=($(ls | grep "\.hisat3n_dna.split_reads.R1.fastq")) + R2_files=($(ls | grep "\.hisat3n_dna.split_reads.R2.fastq")) + + for file in "${R1_files[@]}"; do + sample_id=$(basename "$file" ".hisat3n_dna.split_reads.R1.fastq") + hisat-3n /cromwell_root/reference/$genome_fa_basename \ + -q \ + -U ${sample_id}.hisat3n_dna.split_reads.R1.fastq \ + --directional-mapping-reverse \ + --base-change C,T \ + --no-repeat-index \ + --no-spliced-alignment \ + --no-temp-splicesite \ + -t \ + --new-summary \ + --summary-file ${sample_id}.hisat3n_dna_split_reads_summary.R1.txt \ + --threads 11 | samtools view -b -q 10 -o "${sample_id}.hisat3n_dna.split_reads.R1.bam" + done + + for file in "${R2_files[@]}"; do + sample_id=$(basename "$file" ".hisat3n_dna.split_reads.R2.fastq") + hisat-3n /cromwell_root/reference/$genome_fa_basename \ + -q \ + -U ${sample_id}.hisat3n_dna.split_reads.R2.fastq \ + --directional-mapping \ + --base-change C,T \ + --no-repeat-index \ + --no-spliced-alignment \ + --no-temp-splicesite \ + -t --new-summary \ + --summary-file ${sample_id}.hisat3n_dna_split_reads_summary.R2.txt \ + --threads 11 | samtools view -b -q 10 -o "${sample_id}.hisat3n_dna.split_reads.R2.bam" + done + + # tar up the r1 and r2 stats files + tar -zcvf ../~{plate_id}.hisat3n_dna_split_reads_summary.R1.tar.gz *.hisat3n_dna_split_reads_summary.R1.txt + tar -zcvf ../~{plate_id}.hisat3n_dna_split_reads_summary.R2.tar.gz *.hisat3n_dna_split_reads_summary.R2.txt + + + # define lists of r1 and r2 bam files + R1_bams=($(ls | grep "\.hisat3n_dna.split_reads.R1.bam")) + R2_bams=($(ls | grep "\.hisat3n_dna.split_reads.R2.bam")) + + # Loop through the R1 BAM files + for r1_bam in "${R1_bams[@]}"; do + # Extract the corresponding R2 BAM file + r2_bam="${r1_bam/.hisat3n_dna.split_reads.R1.bam/.hisat3n_dna.split_reads.R2.bam}" + + # Define the output BAM file name + output_bam="$(basename ${r1_bam/.hisat3n_dna.split_reads.R1.bam/.hisat3n_dna.split_reads.name_sort.bam})" + + # Perform the samtools merge and sort commands + samtools merge -o - "$r1_bam" "$r2_bam" | samtools sort -n -o "$output_bam" - + done + + #tar up the merged bam files + tar -zcvf ../~{plate_id}.hisat3n_dna.split_reads.name_sort.bam.tar.gz *.hisat3n_dna.split_reads.name_sort.bam + + >>> + runtime { + docker: docker + disks: "local-disk ${disk_size} HDD" + cpu: cpu + memory: "${mem_size} GiB" + preemptible: preemptible_tries + } + output { + File merge_sorted_bam_tar = "~{plate_id}.hisat3n_dna.split_reads.name_sort.bam.tar.gz" + File hisat3n_dna_split_reads_summary_R1_tar = "~{plate_id}.hisat3n_dna_split_reads_summary.R1.tar.gz" + File hisat3n_dna_split_reads_summary_R2_tar = "~{plate_id}.hisat3n_dna_split_reads_summary.R2.tar.gz" + } +} + +task remove_overlap_read_parts { + input { + File bam + String plate_id + + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + Int disk_size = 80 + Int mem_size = 20 + Int preemptible_tries = 3 + Int cpu = 1 + } + + command <<< + set -euo pipefail + # unzip bam file + tar -xf ~{bam} + rm ~{bam} + + # create output dir + mkdir /cromwell_root/output_bams + + # get bams + bams=($(ls | grep "sort.bam$")) + + # loop through bams and run python script on each bam + # scatter instead of for loop to optimize + python3 <>> + runtime { + docker: docker + disks: "local-disk ${disk_size} HDD" + cpu: cpu + memory: "${mem_size} GiB" + preemptible: preemptible_tries + } + output { + File output_bam_tar = "~{plate_id}.remove_overlap_read_parts.tar.gz" + } +} + +task merge_original_and_split_bam_and_sort_all_reads_by_name_and_position { + input { + File bam + File split_bam + String plate_id + + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + Int disk_size = 80 + Int mem_size = 20 + Int preemptible_tries = 3 + Int cpu = 1 + } + command <<< + set -euo pipefail + #unzip bam file + tar -xf ~{bam} + tar -xf ~{split_bam} + rm ~{bam} + rm ~{split_bam} + + echo "samtools merge and sort" + # define lists of r1 and r2 fq files + UNIQUE_BAMS=($(ls | grep "\.hisat3n_dna.unique_aligned.bam")) + SPLIT_BAMS=($(ls | grep "\.hisat3n_dna.split_reads.read_overlap.bam")) + + for file in "${UNIQUE_BAMS[@]}"; do + sample_id=$(basename "$file" ".hisat3n_dna.unique_aligned.bam") + samtools merge -f "${sample_id}.hisat3n_dna.all_reads.bam" "${sample_id}.hisat3n_dna.unique_aligned.bam" "${sample_id}.hisat3n_dna.split_reads.read_overlap.bam" + samtools sort -n -o "${sample_id}.hisat3n_dna.all_reads.name_sort.bam" "${sample_id}.hisat3n_dna.all_reads.bam" + samtools sort -O BAM -o "${sample_id}.hisat3n_dna.all_reads.pos_sort.bam" "${sample_id}.hisat3n_dna.all_reads.name_sort.bam" + done + + echo "Zip files" + #tar up the merged bam files + tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.pos_sort.tar.gz *.hisat3n_dna.all_reads.pos_sort.bam + tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.name_sort.tar.gz *.hisat3n_dna.all_reads.name_sort.bam + >>> + runtime { + docker: docker + disks: "local-disk ${disk_size} HDD" + cpu: cpu + memory: "${mem_size} GiB" + preemptible: preemptible_tries + } + output { + File name_sorted_bam = "~{plate_id}.hisat3n_dna.all_reads.name_sort.tar.gz" + File position_sorted_bam = "~{plate_id}.hisat3n_dna.all_reads.pos_sort.tar.gz" + } +} + +task call_chromatin_contacts { + input { + File name_sorted_bam + String plate_id + + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + Int disk_size = 80 + Int mem_size = 20 + Int preemptible_tries = 3 + Int cpu = 1 + } + command <<< + set -euo pipefail + + # untar the name sorted bam files + tar -xf ~{name_sorted_bam} + rm ~{name_sorted_bam} + + python3 <>> + runtime { + docker: docker + disks: "local-disk ${disk_size} HDD" + cpu: cpu + memory: "${mem_size} GiB" + preemptible: preemptible_tries + } + output { + File chromatin_contact_stats = "~{plate_id}.chromatin_contact_stats.tar.gz" + } } + +task dedup_unique_bam_and_index_unique_bam { + input { + File bam + String plate_id + + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + Int disk_size = 80 + Int mem_size = 20 + Int preemptible_tries = 3 + Int cpu = 1 + } + + command <<< + set -euo pipefail + + # unzip files + tar -xf ~{bam} + rm ~{bam} + + # create output dir + mkdir /cromwell_root/output_bams + mkdir /cromwell_root/temp + + # name : AD3C_BA17_2027_P1-1-B11-G13.hisat3n_dna.all_reads.pos_sort.bam + for file in *.bam + do + name=`echo $file | cut -d. -f1` + name=$name.hisat3n_dna.all_reads.deduped + echo $name + echo "Call Picard" + picard MarkDuplicates I=$file O=/cromwell_root/output_bams/$name.bam \ + M=/cromwell_root/output_bams/$name.matrix.txt \ + REMOVE_DUPLICATES=true TMP_DIR=/cromwell_root/temp + echo "Call samtools index" + samtools index /cromwell_root/output_bams/$name.bam + done + + cd /cromwell_root + + #tar up the output files + tar -zcvf ~{plate_id}.dedup_unique_bam_and_index_unique_bam.tar.gz output_bams + + #tar up the stats files + tar -zcvf ~{plate_id}.dedup_unique_bam_and_index_unique_bam_stats.tar.gz output_bams/*.matrix.txt + + >>> + runtime { + docker: docker + disks: "local-disk ${disk_size} HDD" + cpu: cpu + memory: "${mem_size} GiB" + preemptible: preemptible_tries + } + output { + File output_tar = "~{plate_id}.dedup_unique_bam_and_index_unique_bam.tar.gz" + File dedup_stats_tar = "~{plate_id}.dedup_unique_bam_and_index_unique_bam_stats.tar.gz" + } +} + +task unique_reads_allc { + input { + File bam_and_index_tar + File genome_fa + String plate_id + Int num_upstr_bases + Int num_downstr_bases + Int compress_level + + Int disk_size = 80 + Int mem_size = 20 + String genome_base = basename(genome_fa) + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + Int preemptible_tries = 3 + Int cpu = 1 + } + command <<< + set -euo pipefail + + # unzip files + tar -xf ~{bam_and_index_tar} + rm ~{bam_and_index_tar} + + mkdir reference + cp ~{genome_fa} reference + cd reference + + # index the fasta + echo "Indexing FASTA" + samtools faidx *.fa + cd ../output_bams + + echo "Starting allcools" + bam_files=($(ls | grep "\.hisat3n_dna.all_reads.deduped.bam$")) + echo ${bam_files[@]} + for file in "${bam_files[@]}"; do + sample_id=$(basename "$file" ".hisat3n_dna.all_reads.deduped.bam") + /opt/conda/bin/allcools bam-to-allc \ + --bam_path "$file" \ + --reference_fasta /cromwell_root/reference/~{genome_base} \ + --output_path "${sample_id}.allc.tsv.gz" \ + --num_upstr_bases ~{num_upstr_bases} \ + --num_downstr_bases ~{num_downstr_bases} \ + --compress_level ~{compress_level} \ + --save_count_df \ + --convert_bam_strandness + done + echo "Zipping files" + + tar -zcvf ../~{plate_id}.allc.tsv.tar.gz *.allc.tsv.gz + tar -zcvf ../~{plate_id}.allc.tbi.tar.gz *.allc.tsv.gz.tbi + tar -zcvf ../~{plate_id}.allc.count.tar.gz *.allc.tsv.gz.count.csv + + + >>> + runtime { + docker: docker + disks: "local-disk ${disk_size} HDD" + cpu: cpu + memory: "${mem_size} GiB" + preemptible: preemptible_tries + } + output { + File allc = "~{plate_id}.allc.tsv.tar.gz" + File tbi = "~{plate_id}.allc.tbi.tar.gz" + File allc_uniq_reads_stats = "~{plate_id}.allc.count.tar.gz" + } +} + + +task unique_reads_cgn_extraction { + input { + File allc_tar + File tbi_tar + File chrom_size_path + String plate_id + + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + Int disk_size = 80 + Int mem_size = 20 + Int num_upstr_bases = 0 + Int preemptible_tries = 3 + Int cpu = 1 + } + + command <<< + set -euo pipefail + + tar -xf ~{allc_tar} + rm ~{allc_tar} + + tar -xf ~{tbi_tar} + rm ~{tbi_tar} + + # prefix="allc-{mcg_context}/{cell_id}" + if [ ~{num_upstr_bases} -eq 0 ]; then + mcg_context=CGN + else + mcg_context=HCGN + fi + + # create output dir + mkdir /cromwell_root/allc-${mcg_context} + outputdir=/cromwell_root/allc-${mcg_context} + + for gzfile in *.gz + do + name=`echo $gzfile | cut -d. -f1` + echo $name + allcools extract-allc --strandness merge --allc_path $gzfile \ + --output_prefix $outputdir/$name \ + --mc_contexts ${mcg_context} \ + --chrom_size_path ~{chrom_size_path} + done + + cd /cromwell_root + + tar -zcvf ~{plate_id}.output_allc_tar.tar.gz $outputdir/*.gz + tar -zcvf ~{plate_id}.output_tbi_tar.tar.gz $outputdir/*.tbi + + >>> + + runtime { + docker: docker + disks: "local-disk ${disk_size} HDD" + cpu: cpu + memory: "${mem_size} GiB" + preemptible: preemptible_tries + } + + output { + File output_allc_tar = "~{plate_id}.output_allc_tar.tar.gz" + File output_tbi_tar = "~{plate_id}.output_tbi_tar.tar.gz" + } +} + + +task summary { + input { + Array[File] trimmed_stats + Array[File] hisat3n_stats + Array[File] r1_hisat3n_stats + Array[File] r2_hisat3n_stats + Array[File] dedup_stats + Array[File] chromatin_contact_stats + Array[File] allc_uniq_reads_stats + Array[File] unique_reads_cgn_extraction_tbi + String plate_id + + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + Int disk_size = 80 + Int mem_size = 20 + Int preemptible_tries = 3 + Int cpu = 1 + } + command <<< + set -euo pipefail + + mkdir /cromwell_root/fastq + mkdir /cromwell_root/bam + mkdir /cromwell_root/allc + mkdir /cromwell_root/hic + + extract_and_remove() { + if [ $# -eq 0 ]; + then + echo "No files exist" + return + fi + for tar in "${@}"; do + tar -xf "$tar" + rm "$tar" + done + } + + extract_and_remove ~{sep=' ' trimmed_stats} + extract_and_remove ~{sep=' ' hisat3n_stats} + extract_and_remove ~{sep=' ' r1_hisat3n_stats} + extract_and_remove ~{sep=' ' r2_hisat3n_stats} + extract_and_remove ~{sep=' ' dedup_stats} + extract_and_remove ~{sep=' ' chromatin_contact_stats} + extract_and_remove ~{sep=' ' allc_uniq_reads_stats} + extract_and_remove ~{sep=' ' unique_reads_cgn_extraction_tbi} + + mv *.trimmed.stats.txt /cromwell_root/fastq + mv *.hisat3n_dna_summary.txt *.hisat3n_dna_split_reads_summary.R1.txt *.hisat3n_dna_split_reads_summary.R2.txt /cromwell_root/bam + mv output_bams/*.hisat3n_dna.all_reads.deduped.matrix.txt /cromwell_root/bam + mv *.hisat3n_dna.all_reads.contact_stats.csv /cromwell_root/hic + mv *.allc.tsv.gz.count.csv /cromwell_root/allc + mv cromwell_root/allc-CGN/*.allc.tsv.gz.tbi /cromwell_root/allc + + python3 <>> + runtime { + docker: docker + disks: "local-disk ${disk_size} HDD" + cpu: cpu + memory: "${mem_size} GiB" + preemptible: preemptible_tries + } + output { + File mapping_summary = "~{plate_id}_MappingSummary.csv.gz" + } +} \ No newline at end of file diff --git a/pipelines/skylab/snM3C/test_inputs/Plumbing/miseq_M16_G13.json b/pipelines/skylab/snM3C/test_inputs/Plumbing/miseq_M16_G13.json index c0d6a6b61f..8df63dba8b 100644 --- a/pipelines/skylab/snM3C/test_inputs/Plumbing/miseq_M16_G13.json +++ b/pipelines/skylab/snM3C/test_inputs/Plumbing/miseq_M16_G13.json @@ -14,8 +14,7 @@ "snM3C.random_primer_indexes": "gs://broad-gotc-test-storage/snM3C/inputs/plumbing/random_index_M16_G13.fa", "snM3C.plate_id": "AD3C_BA17_2027_P1-1-B11", "snM3C.tarred_index_files":"gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38_index_files.tar.gz", - "snM3C.mapping_yaml":"gs://broad-gotc-test-storage/methylome/input/plumbing/config_files/mapping.yaml", - "snM3C.snakefile": "gs://broad-gotc-test-storage/methylome/input/plumbing/config_files/Snakefile", "snM3C.chromosome_sizes": "gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38.chrom.sizes", - "snM3C.genome_fa": "gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38.fa" + "snM3C.genome_fa": "gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38.fa", + "snM3C.batch_number": 2 } diff --git a/pipelines/skylab/snM3C/test_inputs/Scientific/novaseq_M16_G13.json b/pipelines/skylab/snM3C/test_inputs/Scientific/novaseq_M16_G13.json index d04654fc76..77e5f9187b 100644 --- a/pipelines/skylab/snM3C/test_inputs/Scientific/novaseq_M16_G13.json +++ b/pipelines/skylab/snM3C/test_inputs/Scientific/novaseq_M16_G13.json @@ -14,8 +14,7 @@ "snM3C.random_primer_indexes": "gs://broad-gotc-test-storage/snM3C/inputs/plumbing/random_index_M16_G13.fa", "snM3C.plate_id": "230419-iN-hs-snm3C_seq-NovaSeq-pe-150-BW-iN230412_Entorhinal_Cortex_Adult_2115_R1_1-1-I3", "snM3C.tarred_index_files":"gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38_index_files.tar.gz", - "snM3C.mapping_yaml":"gs://broad-gotc-test-storage/methylome/input/plumbing/config_files/mapping.yaml", - "snM3C.snakefile": "gs://broad-gotc-test-storage/methylome/input/plumbing/config_files/Snakefile", "snM3C.chromosome_sizes": "gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38.chrom.sizes", - "snM3C.genome_fa": "gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38.fa" + "snM3C.genome_fa": "gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38.fa", + "snM3C.batch_number": 2 } diff --git a/pipelines/skylab/snM3C/test_inputs/Scientific/snM3C_inputs.json b/pipelines/skylab/snM3C/test_inputs/Scientific/snM3C_inputs.json index b85d1f0807..bf424365e1 100644 --- a/pipelines/skylab/snM3C/test_inputs/Scientific/snM3C_inputs.json +++ b/pipelines/skylab/snM3C/test_inputs/Scientific/snM3C_inputs.json @@ -14,8 +14,7 @@ "snM3C.random_primer_indexes": "gs://broad-gotc-test-storage/methylome/input/plumbing/fastqs/random_index_v2.fa", "snM3C.plate_id": "AD3C_BA17_2027_P1-1-B11", "snM3C.tarred_index_files":"gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38_index_files.tar.gz", - "snM3C.mapping_yaml":"gs://broad-gotc-test-storage/methylome/input/plumbing/config_files/mapping.yaml", - "snM3C.snakefile": "gs://broad-gotc-test-storage/methylome/input/plumbing/config_files/Snakefile", "snM3C.chromosome_sizes": "gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38.chrom.sizes", - "snM3C.genome_fa": "gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38.fa" + "snM3C.genome_fa": "gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38.fa", + "snM3C.batch_number": 2 } diff --git a/tasks/skylab/FastqProcessing.wdl b/tasks/skylab/FastqProcessing.wdl index 25cdaec964..ac22cc38aa 100644 --- a/tasks/skylab/FastqProcessing.wdl +++ b/tasks/skylab/FastqProcessing.wdl @@ -257,8 +257,7 @@ task FastqProcessATAC { Int preemptible = 3 # Additional parameters for fastqprocess - Int num_output_files = 1 - Int bam_size = 1 + Int num_output_files } meta { @@ -278,14 +277,15 @@ task FastqProcessATAC { cpu: "(optional) the number of cpus to provision for this task" disk_size: "(optional) the amount of disk space (GiB) to provision for this task" num_output_files: "(optional) the number of output fastq file shards to produce. if this is set to > 0, bam_size is ignored." - bam_size: "(optional) the size of each fastq file produced. this is taken into account if num_output_files == 0." preemptible: "(optional) if non-zero, request a pre-emptible instance and allow for this number of preemptions before running the task on a non preemptible machine" } command <<< set -e - + echo "Num of output files" + echo ~{num_output_files} + declare -a FASTQ1_ARRAY=(~{sep=' ' read1_fastq}) declare -a FASTQ2_ARRAY=(~{sep=' ' barcodes_fastq}) declare -a FASTQ3_ARRAY=(~{sep=' ' read3_fastq}) @@ -341,13 +341,13 @@ task FastqProcessATAC { cat best_match.txt barcode_choice=$( r2.fastq + FASTQ=r2.fastq + echo 'this is the fastq:' $FASTQ + R2=$(awk 'NR==2' $FASTQ) + COUNT=$(echo ${#R2}) + echo 'this is the read:' $R2 + echo 'this is the barcode count:' $COUNT + echo "Renaming files for UPS tools" + mv ~{read1_fastq} "~{input_id}_R1.fq.gz" + mv ~{barcodes_fastq} "~{input_id}_R2.fq.gz" + mv ~{read3_fastq} "~{input_id}_R3.fq.gz" + echo performing read2 length and orientation checks + if [[ $COUNT == 27 && ~{preindex} == "false" ]] + then + echo "Preindex is false and length is 27 bp" + echo "Trimming first 3 bp with UPStools" + upstools trimfq ~{input_id}_R2.fq.gz 4 26 + echo "Running orientation check" + file="~{input_id}_R2_trim.fq.gz" + zcat "$file" | sed -n '2~4p' | shuf -n 1000 > downsample.fq + head -n 1 downsample.fq + python3 /upstools/pyscripts/dynamic-barcode-orientation.py downsample.fq ~{whitelist} best_match.txt + cat best_match.txt + barcode_choice=$( downsample.fq + head -n 1 downsample.fq + python3 /upstools/pyscripts/dynamic-barcode-orientation.py downsample.fq ~{whitelist} best_match.txt + cat best_match.txt + barcode_choice=$(>> + + runtime { + docker: docker + cpu: cpu + memory: "${mem_size} GiB" + disks: "local-disk ${disk_size} HDD" + preemptible: preemptible + } + + output { + File fastq1 = "~{input_id}_R1.fq.gz" + File barcodes = "~{input_id}_R2.fq.gz" + File fastq3 = "~{input_id}_R3.fq.gz" + } +} + +task AddBBTag { + input { + File bam + String input_id + + # using the latest build of upstools docker in GCR + String docker = "us.gcr.io/broad-gotc-prod/upstools:1.0.0-2023.03.03-1704300311" + + # Runtime attributes + Int mem_size = 8 + Int cpu = 1 + # TODO decided cpu + # estimate that bam is approximately equal in size to fastq, add 20% buffer + Int disk_size = ceil(2 * ( size(bam, "GiB"))) + 100 + Int preemptible = 3 + } + + meta { + description: "Demultiplexes paired-tag ATAC fastq files that have a 3 bp preindex and adds the index to readnames." + } + + parameter_meta { + bam: "BAM with aligned reads and barcode in the CB tag" + input_id: "input ID" + docker: "(optional) the docker image containing the runtime environment for this task" + mem_size: "(optional) the amount of memory (MiB) to provision for this task" + cpu: "(optional) the number of cpus to provision for this task" + disk_size: "(optional) the amount of disk space (GiB) to provision for this task" + preemptible: "(optional) if non-zero, request a pre-emptible instance and allow for this number of preemptions before running the task on a non preemptible machine" + } + + command <<< + + set -e + echo "BAM file name is:" + echo ~{bam} + echo moving BAM + mv ~{bam} ./~{input_id}.bam + echo Running UPStools + python3 /upstools/pyscripts/scifi.preindex_CB_to_BB.py --in ~{input_id}.bam + >>> + + runtime { + docker: docker + cpu: cpu + memory: "${mem_size} GiB" + disks: "local-disk ${disk_size} HDD" + preemptible: preemptible + } + + output { + File bb_bam = "~{input_id}.bam.BB.bam" + } +} + +task ParseBarcodes { + input { + File atac_h5ad + File atac_fragment + Int nthreads = 1 + String cpuPlatform = "Intel Cascade Lake" + } + + String atac_base_name = basename(atac_h5ad, ".h5ad") + String atac_fragment_base = basename(atac_fragment, ".tsv") + + Int machine_mem_mb = ceil((size(atac_h5ad, "MiB") + size(atac_fragment, "MiB")) * 3) + 10000 + Int disk = ceil((size(atac_h5ad, "GiB") + size(atac_fragment, "GiB")) * 5) + 10 + + parameter_meta { + atac_h5ad: "The resulting h5ad from the ATAC workflow." + atac_fragment: "The resulting fragment TSV from the ATAC workflow." + } + + command <<< + set -e pipefail + + python3 < 1].index), 'duplicates'] = 1 + + # Separate out CB and preindex in the fragment file + print("Setting preindex and CB columns in fragment file") + test_fragment["preindex"] = test_fragment["barcode"].str[:3] + test_fragment["CB"] = test_fragment["barcode"].str[3:] + + # Create a new column 'duplicates' initialized with 0 + test_fragment['duplicates'] = 0 + + # Group by 'CB' and count the number of unique 'preindex' values for each group + preindex_counts = test_fragment.groupby('CB')['preindex'].nunique() + + # Update the 'duplicates' column for rows with more than one unique 'preindex' for a 'CB' + test_fragment.loc[test_fragment['CB'].isin(preindex_counts[preindex_counts > 1].index), 'duplicates'] = 1 + + # Idenitfy the barcodes in the whitelist that match barcodes in datasets + atac_data.write_h5ad("~{atac_base_name}.h5ad") + test_fragment.to_csv("~{atac_fragment_base}.tsv", sep='\t', index=False, header = False) + CODE + + # sorting the file + echo "Sorting file" + sort -k1,1V -k2,2n "~{atac_fragment_base}.tsv" > "~{atac_fragment_base}.sorted.tsv" + echo "Starting bgzip" + bgzip "~{atac_fragment_base}.sorted.tsv" + echo "Starting tabix" + tabix -s 1 -b 2 -e 3 "~{atac_fragment_base}.sorted.tsv.gz" + + >>> + + runtime { + docker: "us.gcr.io/broad-gotc-prod/snapatac2:1.0.4-2.3.1-1700590229" + disks: "local-disk ~{disk} HDD" + memory: "${machine_mem_mb} MiB" + cpu: nthreads + } + + output { + File atac_h5ad_file = "~{atac_base_name}.h5ad" + File atac_fragment_tsv = "~{atac_fragment_base}.sorted.tsv.gz" + File atac_fragment_tsv_tbi = "~{atac_fragment_base}.sorted.tsv.gz.tbi" + } +} diff --git a/tasks/skylab/RunEmptyDrops.wdl b/tasks/skylab/RunEmptyDrops.wdl index d455751b85..a0f60b1c99 100644 --- a/tasks/skylab/RunEmptyDrops.wdl +++ b/tasks/skylab/RunEmptyDrops.wdl @@ -17,7 +17,7 @@ task RunEmptyDrops { # runtime values String docker = "us.gcr.io/broad-gotc-prod/empty-drops:1.0.1-4.2" - Int machine_mem_mb = 16000 + Int machine_mem_mb = 32000 Int cpu = 1 Int disk = 20 Int disk_size = disk + 20 diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index b52c7417ae..8ab0c8d615 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -220,9 +220,10 @@ task STARsoloFastq { File white_list Int chemistry String star_strand_mode - String counting_mode + String counting_mode # when counting_mode = sn_rna, runs Gene and GeneFullEx50pAS in single alignments String output_bam_basename Boolean? count_exons + String? soloMultiMappers # runtime values String docker = "us.gcr.io/broad-gotc-prod/star:1.0.1-2.7.11a-1692706072" @@ -270,20 +271,7 @@ task STARsoloFastq { exit 1; fi - COUNTING_MODE="" - if [[ "~{counting_mode}" == "sc_rna" ]] - then - ## single cell or whole cell - COUNTING_MODE="Gene" - elif [[ "~{counting_mode}" == "sn_rna" ]] - then - ## single nuclei - COUNTING_MODE="GeneFull_Ex50pAS" - else - echo Error: unknown counting mode: "$counting_mode". Should be either sn_rna or sc_rna. - exit 1; - fi -# Check that the star strand mode matches STARsolo aligner options + # Check that the star strand mode matches STARsolo aligner options if [[ "~{star_strand_mode}" == "Forward" ]] || [[ "~{star_strand_mode}" == "Reverse" ]] || [[ "~{star_strand_mode}" == "Unstranded" ]] then ## single cell or whole cell @@ -298,49 +286,86 @@ task STARsoloFastq { tar -xf "~{tar_star_reference}" -C genome_reference --strip-components 1 rm "~{tar_star_reference}" - - echo "UMI LEN " $UMILen - if [[ ~{count_exons} ]] + COUNTING_MODE="" + if [[ "~{counting_mode}" == "sc_rna" ]] then - STAR \ - --soloType Droplet \ - --soloStrand ~{star_strand_mode} \ - --runThreadN ~{cpu} \ - --genomeDir genome_reference \ - --readFilesIn "~{sep=',' r2_fastq}" "~{sep=',' r1_fastq}" \ - --readFilesCommand "gunzip -c" \ - --soloCBwhitelist ~{white_list} \ - --soloUMIlen $UMILen --soloCBlen $CBLen \ - --soloFeatures "Gene" \ - --clipAdapterType CellRanger4 \ - --outFilterScoreMin 30 \ - --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \ - --soloUMIdedup 1MM_Directional_UMItools \ - --outSAMtype BAM SortedByCoordinate \ - --outSAMattributes UB UR UY CR CB CY NH GX GN sF \ - --soloBarcodeReadLength 0 \ - --soloCellReadStats Standard + ## single cell or whole cell + COUNTING_MODE="Gene" + echo "Running in ~{counting_mode} mode. The Star parameter --soloFeatures will be set to $COUNTING_MODE" + STAR \ + --soloType Droplet \ + --soloStrand ~{star_strand_mode} \ + --runThreadN ~{cpu} \ + --genomeDir genome_reference \ + --readFilesIn "~{sep=',' r2_fastq}" "~{sep=',' r1_fastq}" \ + --readFilesCommand "gunzip -c" \ + --soloCBwhitelist ~{white_list} \ + --soloUMIlen $UMILen --soloCBlen $CBLen \ + --soloFeatures $COUNTING_MODE \ + --clipAdapterType CellRanger4 \ + --outFilterScoreMin 30 \ + --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \ + --soloUMIdedup 1MM_Directional_UMItools \ + --outSAMtype BAM SortedByCoordinate \ + --outSAMattributes UB UR UY CR CB CY NH GX GN sF \ + --soloBarcodeReadLength 0 \ + --soloCellReadStats Standard \ + ~{"--soloMultiMappers " + soloMultiMappers} + elif [[ "~{counting_mode}" == "sn_rna" ]] + then + ## single nuclei + if [[ ~{count_exons} == false ]] + then + COUNTING_MODE="GeneFull_Ex50pAS" + echo "Running in ~{counting_mode} mode. Count_exons is false and the Star parameter --soloFeatures will be set to $COUNTING_MODE" + STAR \ + --soloType Droplet \ + --soloStrand ~{star_strand_mode} \ + --runThreadN ~{cpu} \ + --genomeDir genome_reference \ + --readFilesIn "~{sep=',' r2_fastq}" "~{sep=',' r1_fastq}" \ + --readFilesCommand "gunzip -c" \ + --soloCBwhitelist ~{white_list} \ + --soloUMIlen $UMILen --soloCBlen $CBLen \ + --soloFeatures $COUNTING_MODE \ + --clipAdapterType CellRanger4 \ + --outFilterScoreMin 30 \ + --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \ + --soloUMIdedup 1MM_Directional_UMItools \ + --outSAMtype BAM SortedByCoordinate \ + --outSAMattributes UB UR UY CR CB CY NH GX GN sF \ + --soloBarcodeReadLength 0 \ + --soloCellReadStats Standard \ + ~{"--soloMultiMappers " + soloMultiMappers} + else + COUNTING_MODE="GeneFull_Ex50pAS Gene" + echo "Running in ~{counting_mode} mode. Count_exons is true and the Star parameter --soloFeatures will be set to $COUNTING_MODE" + STAR \ + --soloType Droplet \ + --soloStrand ~{star_strand_mode} \ + --runThreadN ~{cpu} \ + --genomeDir genome_reference \ + --readFilesIn "~{sep=',' r2_fastq}" "~{sep=',' r1_fastq}" \ + --readFilesCommand "gunzip -c" \ + --soloCBwhitelist ~{white_list} \ + --soloUMIlen $UMILen --soloCBlen $CBLen \ + --soloFeatures $COUNTING_MODE \ + --clipAdapterType CellRanger4 \ + --outFilterScoreMin 30 \ + --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \ + --soloUMIdedup 1MM_Directional_UMItools \ + --outSAMtype BAM SortedByCoordinate \ + --outSAMattributes UB UR UY CR CB CY NH GX GN sF \ + --soloBarcodeReadLength 0 \ + --soloCellReadStats Standard \ + ~{"--soloMultiMappers " + soloMultiMappers} + fi + else + echo Error: unknown counting mode: "$counting_mode". Should be either sn_rna or sc_rna. + exit 1; fi - STAR \ - --soloType Droplet \ - --soloStrand ~{star_strand_mode} \ - --runThreadN ~{cpu} \ - --genomeDir genome_reference \ - --readFilesIn "~{sep=',' r2_fastq}" "~{sep=',' r1_fastq}" \ - --readFilesCommand "gunzip -c" \ - --soloCBwhitelist ~{white_list} \ - --soloUMIlen $UMILen --soloCBlen $CBLen \ - --soloFeatures $COUNTING_MODE \ - --clipAdapterType CellRanger4 \ - --outFilterScoreMin 30 \ - --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \ - --soloUMIdedup 1MM_Directional_UMItools \ - --outSAMtype BAM SortedByCoordinate \ - --outSAMattributes UB UR UY CR CB CY NH GX GN sF \ - --soloBarcodeReadLength 0 \ - --soloCellReadStats Standard - + echo "UMI LEN " $UMILen touch barcodes_sn_rna.tsv touch features_sn_rna.tsv @@ -350,37 +375,50 @@ task STARsoloFastq { touch Summary_sn_rna.csv touch UMIperCellSorted_sn_rna.txt + if [[ "~{counting_mode}" == "sc_rna" ]] then + SoloDirectory="Solo.out/Gene/raw" + echo "SoloDirectory is $SoloDirectory" + find "$SoloDirectory" -maxdepth 1 -type f -name "*.mtx" -print0 | xargs -0 -I{} echo mv {} /cromwell_root/ + find "$SoloDirectory" -maxdepth 1 -type f -name "*.mtx" -print0 | xargs -0 -I{} mv {} /cromwell_root/ mv "Solo.out/Gene/raw/barcodes.tsv" barcodes.tsv mv "Solo.out/Gene/raw/features.tsv" features.tsv - mv "Solo.out/Gene/raw/matrix.mtx" matrix.mtx mv "Solo.out/Gene/CellReads.stats" CellReads.stats mv "Solo.out/Gene/Features.stats" Features.stats mv "Solo.out/Gene/Summary.csv" Summary.csv mv "Solo.out/Gene/UMIperCellSorted.txt" UMIperCellSorted.txt elif [[ "~{counting_mode}" == "sn_rna" ]] then - if ! [[ ~{count_exons} ]] + if [[ "~{count_exons}" == "false" ]] then + SoloDirectory="Solo.out/GeneFull_Ex50pAS/raw" + echo "SoloDirectory is $SoloDirectory" + find "$SoloDirectory" -maxdepth 1 -type f -name "*.mtx" -print0 | xargs -0 -I{} echo mv {} /cromwell_root/ + find "$SoloDirectory" -maxdepth 1 -type f -name "*.mtx" -print0 | xargs -0 -I{} mv {} /cromwell_root/ mv "Solo.out/GeneFull_Ex50pAS/raw/barcodes.tsv" barcodes.tsv mv "Solo.out/GeneFull_Ex50pAS/raw/features.tsv" features.tsv - mv "Solo.out/GeneFull_Ex50pAS/raw/matrix.mtx" matrix.mtx mv "Solo.out/GeneFull_Ex50pAS/CellReads.stats" CellReads.stats mv "Solo.out/GeneFull_Ex50pAS/Features.stats" Features.stats mv "Solo.out/GeneFull_Ex50pAS/Summary.csv" Summary.csv mv "Solo.out/GeneFull_Ex50pAS/UMIperCellSorted.txt" UMIperCellSorted.txt else + SoloDirectory="Solo.out/GeneFull_Ex50pAS/raw" + echo "SoloDirectory is $SoloDirectory" + find "$SoloDirectory" -maxdepth 1 -type f -name "*.mtx" -print0 | xargs -0 -I{} echo mv {} /cromwell_root/ + find "$SoloDirectory" -maxdepth 1 -type f -name "*.mtx" -print0 | xargs -0 -I{} mv {} /cromwell_root/ + SoloDirectory="Solo.out/Gene/raw" + echo "SoloDirectory is $SoloDirectory" + find "$SoloDirectory" -maxdepth 1 -type f -name "*.mtx" -print0 | xargs -0 -I{} sh -c 'new_name="$(basename {} .mtx)_sn_rna.mtx"; echo mv {} "/cromwell_root/$new_name"' + find "$SoloDirectory" -maxdepth 1 -type f -name "*.mtx" -print0 | xargs -0 -I{} sh -c 'new_name="$(basename {} .mtx)_sn_rna.mtx"; mv {} "/cromwell_root/$new_name"' mv "Solo.out/GeneFull_Ex50pAS/raw/barcodes.tsv" barcodes.tsv mv "Solo.out/GeneFull_Ex50pAS/raw/features.tsv" features.tsv - mv "Solo.out/GeneFull_Ex50pAS/raw/matrix.mtx" matrix.mtx mv "Solo.out/GeneFull_Ex50pAS/CellReads.stats" CellReads.stats mv "Solo.out/GeneFull_Ex50pAS/Features.stats" Features.stats mv "Solo.out/GeneFull_Ex50pAS/Summary.csv" Summary.csv mv "Solo.out/GeneFull_Ex50pAS/UMIperCellSorted.txt" UMIperCellSorted.txt mv "Solo.out/Gene/raw/barcodes.tsv" barcodes_sn_rna.tsv mv "Solo.out/Gene/raw/features.tsv" features_sn_rna.tsv - mv "Solo.out/Gene/raw/matrix.mtx" matrix_sn_rna.mtx mv "Solo.out/Gene/CellReads.stats" CellReads_sn_rna.stats mv "Solo.out/Gene/Features.stats" Features_sn_rna.stats mv "Solo.out/Gene/Summary.csv" Summary_sn_rna.csv @@ -420,6 +458,10 @@ task STARsoloFastq { File align_features_sn_rna = "Features_sn_rna.stats" File summary_sn_rna = "Summary_sn_rna.csv" File umipercell_sn_rna = "UMIperCellSorted_sn_rna.txt" + File? multimappers_EM_matrix = "UniqueAndMult-EM.mtx" + File? multimappers_Uniform_matrix = "UniqueAndMult-Uniform.mtx" + File? multimappers_Rescue_matrix = "UniqueAndMult-Rescue.mtx" + File? multimappers_PropUnique_matrix = "UniqueAndMult-PropUnique.mtx" } } @@ -438,7 +480,7 @@ task MergeStarOutput { #runtime values String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730" - Int machine_mem_mb = 8250 + Int machine_mem_gb = 20 Int cpu = 1 Int disk = ceil(size(matrix, "Gi") * 2) + 10 Int preemptible = 3 @@ -449,7 +491,7 @@ task MergeStarOutput { parameter_meta { docker: "(optional) the docker image containing the runtime environment for this task" - machine_mem_mb: "(optional) the amount of memory (MiB) to provision for this task" + machine_mem_gb: "(optional) the amount of memory (GiB) to provision for this task" cpu: "(optional) the number of cpus to provision for this task" disk: "(optional) the amount of disk space (GiB) to provision for this task" preemptible: "(optional) if non-zero, request a pre-emptible instance and allow for this number of preemptions before running the task on a non preemptible machine" @@ -512,7 +554,7 @@ task MergeStarOutput { runtime { docker: docker - memory: "${machine_mem_mb} MiB" + memory: "${machine_mem_gb} GiB" disks: "local-disk ${disk} HDD" disk: disk + " GB" # TES cpu: cpu diff --git a/verification/VerifyTasks.wdl b/verification/VerifyTasks.wdl index 07c2a37ba3..3991dba0dc 100644 --- a/verification/VerifyTasks.wdl +++ b/verification/VerifyTasks.wdl @@ -111,16 +111,18 @@ task CompareTabix { File test_fragment_file File truth_fragment_file } - command { - a="md5sum ~{test_fragment_file}" - b="md5sum ~{truth_fragment_file}" - if [[ a = b ]]; then + command <<< + exit_code=0 + a=$(md5sum "~{test_fragment_file}" | awk '{ print $1 }') + b=$(md5sum ~{truth_fragment_file} | awk '{ print $1 }') + if [[ $a = $b ]]; then echo equal else echo different exit_code=1 fi - } + exit $exit_code + >>> runtime { docker: "us.gcr.io/broad-gotc-prod/snapatac2:1.0.4-2.3.1-1700590229" disks: "local-disk 100 HDD" @@ -128,6 +130,7 @@ task CompareTabix { preemptible: 3 } } + task CompareTextFiles { input { Array[File] test_text_files @@ -239,7 +242,7 @@ task CompareBams { Float bam_size = size(test_bam, "GiB") + size(truth_bam, "GiB") Int disk_size = ceil(bam_size * 4) + 200 - Int memory_mb = 500000 + Int memory_mb = 600000 Int java_memory_size = memory_mb - 1000 Int max_heap = memory_mb - 500 @@ -247,13 +250,40 @@ task CompareBams { set -e set -o pipefail - java -Xms~{java_memory_size}m -Xmx~{max_heap}m -jar /usr/picard/picard.jar \ - CompareSAMs \ - ~{test_bam} \ - ~{truth_bam} \ - O=comparison.tsv \ - LENIENT_HEADER=~{lenient_header} \ - LENIENT_LOW_MQ_ALIGNMENT=~{lenient_low_mq} + truth_bam=~{truth_bam} + test_bam=~{test_bam} + + # Get the sizes of the BAM files in bytes + truth_size=$(stat -c %s ~{truth_bam}) + test_size=$(stat -c %s ~{test_bam}) + + # Convert sizes to megabytes + truth_size_mb=$((truth_size / (1024 * 1024))) + test_size_mb=$((test_size / (1024 * 1024))) + + # Calculate the difference in megabytes + size_difference_mb=$((truth_size_mb - test_size_mb)) + + # Calculate the absolute value of the difference: + # First, check if the difference is negative. If negative, make it positive. If the differnce is positive, leave it as is. + abs_size_difference_mb=$((size_difference_mb < 0 ? -size_difference_mb : size_difference_mb)) + + # Compare the sizes and fail fast if the difference is greater than 200 MB + if [ "$abs_size_difference_mb" -gt 200 ]; then + echo "Skipping CompareSAMs as BAM file sizes differ by more than 200 MB. $truth_bam is $truth_size_mb MB and $test_bam is $test_size_mb MB. Exiting." + exit 1 + elif [ "$abs_size_difference_mb" -gt 1 ]; then + echo "WARNING: BAM file sizes differ by more than 1 MB but less than 200 MB. $truth_bam is $truth_size_mb MB and $test_bam is $test_size_mb MB. Proceeding to CompareSAMs:" + + java -Xms~{java_memory_size}m -Xmx~{max_heap}m -jar /usr/picard/picard.jar \ + CompareSAMs \ + ~{test_bam} \ + ~{truth_bam} \ + O=comparison.tsv \ + LENIENT_HEADER=~{lenient_header} \ + LENIENT_LOW_MQ_ALIGNMENT=~{lenient_low_mq} \ + MAX_RECORDS_IN_RAM=300000 + fi } runtime { @@ -276,13 +306,13 @@ task CompareCompressedTextFiles { Int disk_size = ceil(file_size * 4) + 20 command { - diff <(gunzip -c -f ~{test_zip}) <(gunzip -c -f ~{truth_zip}) + diff <(gunzip -c ~{test_zip} | sort) <(gunzip -c ~{truth_zip} | sort) } runtime { docker: "gcr.io/gcp-runtimes/ubuntu_16_0_4:latest" disks: "local-disk " + disk_size + " HDD" - memory: "3.5 GiB" + memory: "20 GiB" preemptible: 3 } diff --git a/verification/test-wdls/TestMultiome.wdl b/verification/test-wdls/TestMultiome.wdl index 9f35f7b8ae..bb9aff4018 100644 --- a/verification/test-wdls/TestMultiome.wdl +++ b/verification/test-wdls/TestMultiome.wdl @@ -27,6 +27,7 @@ workflow TestMultiome { String star_strand_mode = "Forward" Boolean count_exons = false File gex_whitelist = "gs://broad-gotc-test-storage/Multiome/input/737K-arc-v1_gex.txt" + String? soloMultiMappers # ATAC inputs # Array of input fastq files @@ -36,6 +37,7 @@ workflow TestMultiome { # BWA input File tar_bwa_reference + # CreateFragmentFile input File chrom_sizes # Trimadapters input @@ -84,7 +86,8 @@ workflow TestMultiome { adapter_seq_read3 = adapter_seq_read3, chrom_sizes = chrom_sizes, atac_whitelist = atac_whitelist, - run_cellbender = run_cellbender + run_cellbender = run_cellbender, + soloMultiMappers = soloMultiMappers } diff --git a/verification/test-wdls/TestOptimus.wdl b/verification/test-wdls/TestOptimus.wdl index b162ab1e35..535eb8d530 100644 --- a/verification/test-wdls/TestOptimus.wdl +++ b/verification/test-wdls/TestOptimus.wdl @@ -26,6 +26,7 @@ workflow TestOptimus { File annotations_gtf File ref_genome_fasta File? mt_genes + String? soloMultiMappers # Chemistry options include: 2 or 3 Int tenx_chemistry_version = 2 @@ -84,7 +85,8 @@ workflow TestOptimus { force_no_check = force_no_check, star_strand_mode = star_strand_mode, count_exons = count_exons, - ignore_r1_read_length = ignore_r1_read_length + ignore_r1_read_length = ignore_r1_read_length, + soloMultiMappers = soloMultiMappers } # Collect all of the pipeling output into single Array diff --git a/verification/test-wdls/TestsnM3C.wdl b/verification/test-wdls/TestsnM3C.wdl index f57838c2fd..3ca01baf74 100644 --- a/verification/test-wdls/TestsnM3C.wdl +++ b/verification/test-wdls/TestsnM3C.wdl @@ -13,12 +13,21 @@ workflow TestsnM3C { Array[File] fastq_input_read2 File random_primer_indexes String plate_id - String output_basename = plate_id File tarred_index_files - File mapping_yaml - File snakefile - File chromosome_sizes File genome_fa + File chromosome_sizes + String r1_adapter = "AGATCGGAAGAGCACACGTCTGAAC" + String r2_adapter = "AGATCGGAAGAGCGTCGTGTAGGGA" + #Int batch_number + Int r1_left_cut = 10 + Int r1_right_cut = 10 + Int r2_left_cut = 10 + Int r2_right_cut = 10 + Int min_read_length = 30 + Int num_upstr_bases = 0 + Int num_downstr_bases = 2 + Int compress_level = 5 + Int batch_number # These values will be determined and injected into the inputs by the scala test framework String truth_path @@ -38,27 +47,49 @@ workflow TestsnM3C { fastq_input_read2 = fastq_input_read2, random_primer_indexes = random_primer_indexes, plate_id = plate_id, - output_basename = output_basename, tarred_index_files = tarred_index_files, - mapping_yaml = mapping_yaml, - snakefile = snakefile, + genome_fa = genome_fa, chromosome_sizes = chromosome_sizes, - genome_fa = genome_fa - + r1_adapter = r1_adapter, + r2_adapter = r2_adapter, + r1_left_cut = r1_left_cut, + r1_right_cut = r1_right_cut, + r2_left_cut = r2_left_cut, + r2_right_cut = r2_right_cut, + min_read_length = min_read_length, + num_upstr_bases = num_upstr_bases, + num_downstr_bases = num_downstr_bases, + compress_level = compress_level, + batch_number = batch_number + } # Collect all of the pipeline outputs into single Array[String] Array[String] pipeline_outputs = flatten([ [ # File outputs - snM3C.hicFiles, - snM3C.detail_statsFiles, - snM3C.bamFiles, - snM3C.allc_CGNFiles, - snM3C.allcFiles, snM3C.MappingSummary, ], - + # Array[File] outputs + snM3C.reference_version, + snM3C.chromatin_contact_stats, + snM3C.unique_reads_cgn_extraction_tbi, + snM3C.unique_reads_cgn_extraction_allc, + snM3C.dedup_unique_bam_and_index_unique_bam_tar, + snM3C.remove_overlap_read_parts_bam_tar, + snM3C.pos_sorted_bams, + snM3C.name_sorted_bams, + snM3C.merge_sorted_bam_tar, + snM3C.split_fq_tar, + snM3C.unmapped_fastq_tar, + snM3C.multi_bam_tar, + snM3C.unique_bam_tar, + snM3C.hisat3n_bam_tar, + snM3C.hisat3n_stats_tar, + snM3C.r2_trimmed_fq, + snM3C.r1_trimmed_fq, + snM3C.trimmed_stats, + ]) diff --git a/website/docs/Pipelines/ATAC/README.md b/website/docs/Pipelines/ATAC/README.md index 63c96bc877..c5357613c2 100644 --- a/website/docs/Pipelines/ATAC/README.md +++ b/website/docs/Pipelines/ATAC/README.md @@ -8,10 +8,10 @@ slug: /Pipelines/ATAC/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [1.1.2](https://github.com/broadinstitute/warp/releases) | December, 2023 | Kaylee Mathews | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [1.1.6](https://github.com/broadinstitute/warp/releases) | January, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the ATAC workflow -ATAC is an open-source, cloud-optimized pipeline developed collaboration with members of the [BRAIN Initiative](https://braininitiative.nih.gov/) (BICCN and [BICAN](https://brainblog.nih.gov/brain-blog/brain-issues-suite-funding-opportunities-advance-brain-cell-atlases-through-centers) Sequencing Working Group) and [SCORCH](https://nida.nih.gov/about-nida/organization/divisions/division-neuroscience-behavior-dnb/basic-research-hiv-substance-use-disorder/scorch-program) (see [Acknowledgements](#acknowledgements) below). It supports the processing of 10x single-nucleus data generated with 10x Multiome [ATAC-seq (Assay for Transposase-Accessible Chromatin)](https://www.10xgenomics.com/products/single-cell-multiome-atac-plus-gene-expression), a technique used in molecular biology to assess genome-wide chromatin accessibility. +ATAC is an open-source, cloud-optimized pipeline developed in collaboration with members of the [BRAIN Initiative](https://braininitiative.nih.gov/) (BICCN and [BICAN](https://brainblog.nih.gov/brain-blog/brain-issues-suite-funding-opportunities-advance-brain-cell-atlases-through-centers) Sequencing Working Group) and [SCORCH](https://nida.nih.gov/about-nida/organization/divisions/division-neuroscience-behavior-dnb/basic-research-hiv-substance-use-disorder/scorch-program) (see [Acknowledgements](#acknowledgements) below). It supports the processing of 10x single-nucleus data generated with 10x Multiome [ATAC-seq (Assay for Transposase-Accessible Chromatin)](https://www.10xgenomics.com/products/single-cell-multiome-atac-plus-gene-expression), a technique used in molecular biology to assess genome-wide chromatin accessibility. This workflow is the ATAC component of the [Mutiome wrapper workflow](../Multiome_Pipeline/README). It corrects cell barcodes (CBs), aligns reads to the genome, and produces a fragment file as well as per barcode metrics. @@ -22,10 +22,10 @@ The following table provides a quick glance at the ATAC pipeline features: | Pipeline features | Description | Source | |--- | --- | --- | | Assay type | 10x single cell or single nucleus ATAC | [10x Genomics](https://www.10xgenomics.com) -| Overall workflow | Barcode correction, read alignment, and fragment quanitification | +| Overall workflow | Barcode correction, read alignment, and fragment quantification | Code available from [GitHub](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/multiome/atac.wdl) | Workflow language | WDL 1.0 | [openWDL](https://github.com/openwdl/wdl) | | Genomic Reference Sequence | GRCh38 human genome primary sequence | GENCODE | -| Aligner | BWA-mem | [Li H. and Durbin R., 2009](http://www.ncbi.nlm.nih.gov/pubmed/19451168) | +| Aligner | bwa-mem2 | [Li H. and Durbin R., 2009](http://www.ncbi.nlm.nih.gov/pubmed/19451168) | | Fragment quantification | SnapATAC2 | [Zhang, K. et al., 2021](https://pubmed.ncbi.nlm.nih.gov/34774128/) | Data input file format | File format in which sequencing data is provided | [FASTQ](https://academic.oup.com/nar/article/38/6/1767/3112533) | | Data output file format | File formats in which ATAC output is provided | TSV, h5ad, BAM | @@ -38,7 +38,7 @@ To download the latest ATAC release, see the release tags prefixed with "Multiom To discover and search releases, use the WARP command-line tool [Wreleaser](https://github.com/broadinstitute/warp/tree/master/wreleaser). -ATAC can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/stable/), a GA4GH compliant, flexible workflow management system that supports multiple computing platforms. The workflow can also be run in [Terra](https://app.terra.bio), a cloud-based analysis platform. +ATAC can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/stable/), a GA4GH-compliant, flexible workflow management system that supports multiple computing platforms. The workflow can also be run in [Terra](https://app.terra.bio), a cloud-based analysis platform. ## Input Variables The following describes the inputs of the ATAC workflow. For more details on how default inputs are set for the Multiome workflow, see the [Multiome overview](../Multiome_Pipeline/README). @@ -48,9 +48,13 @@ The following describes the inputs of the ATAC workflow. For more details on how | read1_fastq_gzipped | Fastq inputs (array of compressed read 1 FASTQ files). | | read2_fastq_gzipped | Fastq inputs (array of compressed read 2 FASTQ files containing cellular barcodes). | | read3_fastq_gzipped | Fastq inputs (array of compressed read 3 FASTQ files). | -| output_base_name | Output prefix/base name for all intermediate files and pipeline outputs. | +| input_id | Output prefix/base name for all intermediate files and pipeline outputs. | +| preindex | Boolean used for paired-tag data and not applicable to ATAC data types; default is set to false. | | tar_bwa_reference | BWA reference (tar file containing reference fasta and corresponding files). | -| atac_gtf | CreateFragmentFile input variable: GTF file for SnapATAC2 to calculate TSS sites of fragment file.| +| num_threads_bwa | Optional integer defining the number of CPUs per node for the BWA-mem alignment task (default: 128). | +| mem_size_bwa | Optional integer defining the memory size for the BWA-mem alignment task in GB (default: 512). | +| cpu_platform_bwa | Optional string defining the CPU platform for the BWA-mem alignment task (default: "Intel Ice Lake"). | +| annotations_gtf | CreateFragmentFile input variable: GTF file for SnapATAC2 to calculate TSS sites of fragment file.| | chrom_sizes | CreateFragmentFile input variable: Text file containing chrom_sizes for genome build (i.e., hg38) | | whitelist | Whitelist file for ATAC cellular barcodes. | | adapter_seq_read1 | TrimAdapters input: Sequence adapter for read 1 fastq. | @@ -59,10 +63,10 @@ The following describes the inputs of the ATAC workflow. For more details on how ## ATAC tasks and tools Overall, the ATAC workflow: +1. Identifies optimal parameters for performing CB correction and alignment. 1. Corrects CBs and partitions FASTQs by CB. 1. Aligns reads. -1. Merges aligned BAMs -1. Generates a fragment file +1. Generates a fragment file. 1. Calculates per cell barcode fragment metrics. The tools each ATAC task employs are detailed in the table below. @@ -71,10 +75,10 @@ To see specific tool parameters, select the task WDL link in the table; then vie | Task name and WDL link | Tool | Software | Description | | --- | --- | --- | ------------------------------------ | -| [FastqProcessing as SplitFastq](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/FastqProcessing.wdl) | fastqprocess | custom | Dynamically selects the correct barcode orientation, corrects cell barcodes, and splits FASTQ files. The number of files output depends on either the `bam_size` parameter, which determines the size of the output FASTQ files produced, or the `num_output_files` parameter, which determines the number of FASTQ files that should be output. The smaller FASTQ files are grouped by cell barcode with each read having the corrected (CB) and raw barcode (CR) in the read name. | +| [GetNumSplits](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/multiome/atac.wdl) | Bash | Bash | Uses the virtual machine type to determine the optimal number of FASTQ files for performing the BWA-mem alignment step. This allows BWA-mem to run in parallel on multiple FASTQ files in the subsequent workflow steps. | +| [FastqProcessing as SplitFastq](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/FastqProcessing.wdl) | fastqprocess | custom | Dynamically selects the correct barcode orientation, corrects cell barcodes, and splits FASTQ files by the optimal number determined in the [GetNumSplits](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/multiome/atac.wdl) task. The smaller FASTQ files are grouped by cell barcode with each read having the corrected (CB) and raw barcode (CR) in the read name. | | [TrimAdapters](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/multiome/atac.wdl) | Cutadapt v4.4 | cutadapt | Trims adaptor sequences. | | [BWAPairedEndAlignment](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/multiome/atac.wdl) | bwa-mem2 | mem | Aligns reads from each set of partitioned FASTQ files to the genome and outputs a BAM with ATAC barcodes in the CB:Z tag. | -| [Merge.MergeSortBamFiles as MergeBam](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/MergeSortBam.wdl) | MergeSamFiles | Picard | Merges each BAM into a final aligned BAM with corrected cell barcodes in the CB tag. | | [CreateFragmentFile](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/multiome/atac.wdl) | make_fragment_file, import_data | SnapATAC2 | Generates a fragment file from the final aligned BAM and outputs per barcode quality metrics in h5ad. A detailed list of these metrics is found in the [ATAC Count Matrix Overview](./count-matrix-overview.md). | @@ -96,4 +100,4 @@ Please identify the pipeline in your methods section using the ATAC Pipeline's [ ## Acknowledgements -We are immensely grateful to the members of the BRAIN Initiative (BICAN Sequencing Working Group) and SCORCH for their invaluable and exceptional contributions to this pipeline. Our heartfelt appreciation goes to Alex Dobin, Aparna Bhaduri, Alec Wysoker, Anish Chakka, Brian Herb, Daofeng Li, Fenna Krienen, Guo-Long Zuo, Jeff Goldy, Kai Zhang, Khalid Shakir, Bo Li, Mariano Gabitto, Michael DeBerardine, Mengyi Song, Melissa Goldman, Nelson Johansen, James Nemesh, and Theresa Hodges for their unwavering dedication and remarkable efforts. \ No newline at end of file +We are immensely grateful to the members of the BRAIN Initiative (BICAN Sequencing Working Group) and SCORCH for their invaluable and exceptional contributions to this pipeline. Our heartfelt appreciation goes to Alex Dobin, Aparna Bhaduri, Alec Wysoker, Anish Chakka, Brian Herb, Daofeng Li, Fenna Krienen, Guo-Long Zuo, Jeff Goldy, Kai Zhang, Khalid Shakir, Bo Li, Mariano Gabitto, Michael DeBerardine, Mengyi Song, Melissa Goldman, Nelson Johansen, James Nemesh, and Theresa Hodges for their unwavering dedication and remarkable efforts. diff --git a/website/docs/Pipelines/BuildIndices_Pipeline/README.md b/website/docs/Pipelines/BuildIndices_Pipeline/README.md new file mode 100644 index 0000000000..fc328379aa --- /dev/null +++ b/website/docs/Pipelines/BuildIndices_Pipeline/README.md @@ -0,0 +1,122 @@ +--- +sidebar_position: 1 +slug: /Pipelines/BuildIndices_Pipeline/README +--- + +# BuildIndices Overview + +| Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | +| :----: | :---: | :----: | :--------------: | +| [BuildIndices_v3.0.0](https://github.com/broadinstitute/warp/releases) | December, 2023 | Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | + +![BuildIndices_diagram](./buildindices_diagram.png) + + +## Introduction to the BuildIndices workflow + +The [BuildIndices workflow](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/build_indices/BuildIndices.wdl) is an open-source, cloud-optimized pipeline developed in collaboration with the [BRAIN Initiative Cell Census Network](https://biccn.org/) (BICCN) and the BRAIN Initiative Cell Atlas Network (BICAN). + +Overall, the workflow filters GTF files for selected gene biotypes, calculates chromosome sizes, and builds reference bundles with required files for [STAR](https://github.com/alexdobin/STAR) and [bwa-mem2](https://github.com/bwa-mem2/bwa-mem2) aligners. + +## Quickstart table +The following table provides a quick glance at the BuildIndices pipeline features: + +| Pipeline features | Description | Source | +| --- | --- | --- | +| Overall workflow | Reference bundle creation for STAR and bwa-mem2 aligners | Code available on [GitHub](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/build_indices/BuildIndices.wdl) | +| Workflow language | WDL 1.0 | [openWDL](https://github.com/openwdl/wdl) | +| Genomic Reference Sequence | GRCh38 human genome primary sequence, M32 (GRCm39) mouse genome primary sequence, and release 103 (GCF_003339765.1) macaque genome primary sequence | GENCODE [human reference files](https://www.gencodegenes.org/human/release_43.html), GENCODE [mouse reference files](https://www.gencodegenes.org/mouse/release_M32.html), and NCBI [macaque reference files](https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_003339765.1/) | +| Gene annotation reference (GTF) | Reference containing gene annotations | GENCODE [human GTF](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.annotation.gtf.gz), GENCODE [mouse GTF](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/gencode.vM32.primary_assembly.annotation.gtf.gz), and NCBI [macaque GTF](https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_003339765.1/) | +| Reference builders | STAR, bwa-mem2 | [Dobin et al. 2013](https://pubmed.ncbi.nlm.nih.gov/23104886/), [Vasimuddin et al. 2019](https://ieeexplore.ieee.org/document/8820962) | +| Data input file format | File format in which reference files are provided | FASTA, GTF, TSV | +| Data output file format | File formats in which BuildIndices output is provided | GTF, TAR, TXT | + +## Set-up + +### BuildIndices installation + +To download the latest BuildIndices release, see the release tags prefixed with "BuildIndices" on the WARP [releases page](https://github.com/broadinstitute/warp/releases). All BuildIndices pipeline releases are documented in the [BuildIndices changelog](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/build_indices/BuildIndices.changelog.md). + +To search releases of this and other pipelines, use the WARP command-line tool [Wreleaser](https://github.com/broadinstitute/warp/tree/master/wreleaser). + +If you’re running a BuildIndices workflow version prior to the latest release, the accompanying documentation for that release may be downloaded with the source code on the WARP [releases page](https://github.com/broadinstitute/warp/releases) (see the folder `website/docs/Pipelines/BuildIndices_Pipeline`). + +The BuildIndices pipeline can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/stable/), a GA4GH-compliant, flexible workflow management system that supports multiple computing platforms. The workflow can also be run in [Terra](https://app.terra.bio), a cloud-based analysis platform. + +### Inputs + +The BuildIndices workflow inputs are specified in JSON configuration files. Configuration files for [macaque](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/build_indices/Macaque.json) and [mouse](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/build_indices/Mouse.json) references can be found in the WARP repository. + +#### Input descriptions + +| Parameter name | Description | Type | +| --- | --- | --- | +| genome_source | Describes the source of the reference genome listed in the GTF file; used to name output files; can be set to “NCBI” or “GENCODE”. | String | +| gtf_annotation_version | Version or release of the reference genome listed in the GTF file; used to name STAR output files; ex.”M32”, “103”. | String | +| genome_build | Assembly accession (NCBI) or version (GENCODE) of the reference genome listed in the GTF file; used to name output files; ex. “GRCm39”, “GCF_003339765.1”. | String | +| organism | Organism of the reference genome; used to name the output files; can be set to “Macaque”, “Mouse”, “Human”, or any other organism matching the reference genome. | String | +| annotations_gtf | GTF file containing gene annotations; used to build the STAR reference files. | File | +| genome_fa | Genome FASTA file used for building indices. | File | +| biotypes | TSV file containing gene biotypes attributes to include in the modified GTF file; the first column contains the biotype and the second column contains “Y” to include or “N” to exclude the biotype; [GENCODE biotypes](https://www.gencodegenes.org/pages/biotypes.html) are used for GENCODE references and RefSeq biotypes are used for NCBI references. | File | + +## BuildIndices tasks and tools + +Overall, the BuildIndices workflow: +1. Checks inputs, modifies reference files, and creates STAR index. +2. Calculates chromosome sizes. +3. Builds reference bundle for bwa. + +The tasks and tools used in the BuildIndices workflow are detailed in the table below. + +To see specific tool parameters, select the [workflow WDL link](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/build_indices/BuildIndices.wdl); then find the task and view the `command {}` section of the task in the WDL script. To view or use the exact tool software, see the task's Docker image which is specified in the task WDL `# runtime values` section as `docker: `. + +| Task name | Tool | Software | Description | +| --- | --- | --- | --- | +| BuildStarSingleNucleus | [modify_gtf.py](https://github.com/broadinstitute/warp-tools/blob/develop/3rd-party-tools/build-indices/modify_gtf.py), STAR | [warp-tools](https://github.com/broadinstitute/warp-tools/tree/develop), [STAR](https://github.com/alexdobin/STAR) | Checks that the input GTF file contains input genome source, genome build version, and annotation version with correct build source information, modifies files for the STAR aligner, and creates STAR index file. | +| CalculateChromosomeSizes | faidx | [Samtools](http://www.htslib.org/) | Reads the genome FASTA file to create a FASTA index file that contains the genome chromosome sizes. | +| BuildBWAreference | index | [bwa-mem2](https://github.com/bwa-mem2/bwa-mem2) | Builds the reference bundle for the bwa aligner. | + +#### 1. Check inputs, modify reference files, and create STAR index file + +**Check inputs** + +The BuildStarSingleNucleus task reads the input GTF file and verifies that the `genome_source`, `genome_build`, and `gtf_annotation_version` listed in the file match the input values provided to the pipeline. + +**Modify reference files and create STAR index** + +The BuildStarSingleNucleus task uses a custom python script, [`modify_gtf.py`](https://github.com/broadinstitute/warp-tools/blob/develop/3rd-party-tools/build-indices/modify_gtf.py), and a list of biotypes ([example](https://github.com/broadinstitute/warp-tools/blob/develop/3rd-party-tools/build-indices/Biotypes.tsv)) to filter the input GTF file for only the biotypes indicated in the list with the value “Y” in the second column. The defaults in the custom code produce reference outputs that are similar to those built with 10x Genomics reference scripts. + +The task uses the filtered GTF file and STAR `--runMode genomeGenerate` to generate the index file for the STAR aligner. Outputs of the task include the modified GTF and compressed STAR index files. + +#### 2. Calculates chromosome sizes + +The CalculateChromosomeSizes task uses Samtools to create and output a FASTA index file that contains the genome chromosome sizes, which can be used in downstream tools like SnapATAC2. + +#### 3. Builds reference bundle for bwa-mem2 + +The BuildBWAreference task uses the chromosome sizes file and bwa-mem2 to prepare the genome FASTA file for alignment and builds, compresses, and outputs the reference bundle for the bwa-mem2 aligner. + +## Outputs + +The following table lists the output variables and files produced by the pipeline. + +| Output name | Filename, if applicable | Output format and description | +| ------ | ------ | ------ | +| snSS2_star_index | `modified_star2.7.10a---build--.tar` | TAR file containing a species-specific reference genome and GTF file for [STAR](https://github.com/alexdobin/STAR) alignment. | +| pipeline_version_out | `BuildIndices_v` | String describing the version of the BuildIndices pipeline used. | +| snSS2_annotation_gtf_modified | `modified_v.annotation.gtf` | GTF file containing gene annotations filtered for selected biotypes. | +| reference_bundle | `bwa-mem2-2.2.1---build-.tar` | TAR file containing the reference index files for [BWA-mem](https://github.com/lh3/bwa) alignment. | +| chromosome_sizes | `chrom.sizes` | Text file containing chromosome sizes for the genome build. | + +## Versioning and testing + +All BuildIndices pipeline releases are documented in the [BuildIndices changelog](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/build_indices/BuildIndices.changelog.md) and tested manually using [reference JSON files](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/build_indices). + +## Consortia support +This pipeline is supported by the [BRAIN Initiative](https://braininitiative.nih.gov/) (BICCN and BICAN). + +If your organization also uses this pipeline, we would like to list you! Please reach out to us by contacting the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org). + +## Feedback + +Please help us make our tools better by contacting the [WARP Pipelines Team](mailto:warp-pipelines-help@broadinstitute.org) for pipeline-related suggestions or questions. \ No newline at end of file diff --git a/website/docs/Pipelines/BuildIndices_Pipeline/_category_.json b/website/docs/Pipelines/BuildIndices_Pipeline/_category_.json new file mode 100644 index 0000000000..bd46d4a679 --- /dev/null +++ b/website/docs/Pipelines/BuildIndices_Pipeline/_category_.json @@ -0,0 +1,4 @@ +{ + "label": "BuildIndices", + "position": 2 +} diff --git a/website/docs/Pipelines/BuildIndices_Pipeline/buildindices_diagram.png b/website/docs/Pipelines/BuildIndices_Pipeline/buildindices_diagram.png new file mode 100644 index 0000000000000000000000000000000000000000..6de6fb6c697980a9c820f0872c69b99c3e38d67b GIT binary patch literal 29991 zcmeFYcTf{v@HmKIPzgv82t}j`(nApy1eD%;Z$WyMUP1to5<1d*2kA{fdO!$8nuLz@ z-UULh$_4#=e>3;b&CSi-A9r)iFeLll?!LEg-`h&4ijp(|9u*!I78Zf5jD#8%)?E9+Cq=~{QRN}515Mqsy%XTEHyRSo{x`jra!*E#8*l}IZ*Q-@z8(gHrKYB$P^gZMj-sNX@bGYLZS9$v8Eb26 zDJiM*^Yfvhp{Ay$#Kgqq<>l+^>w|-XkdP1v1Tr->_5S_)#3*P}WoUQfCod1%>#f4` zwfrGuTz7Xj?PF2_0fFtGX%Zrwowa@xBb(yBA+_Ij;^qP>Ko*pA3qoAOmzP`ST ziwgt-QBhG55fKp+6Y~|7nOYv^@9#f5JKNaU$S=&9_bsw@?sMd4n52r3rkPybSVnSn zpqZO?$7(SnD=i5b(Z@uGioQf=xG6R^Ha$IkXlSUDlar&Pqm7MCPEOA1>Z+-!X;oDf z#E<(U=A`eK$V*FLg#c(|Fkc=x%ILaaVNs;r{@#&w3y8-6@43nO2b1USmhGiU{d?WuH~A}(IByxE#4H#dy&M_9p>2hNi>Y6~B8yPc z>9tdqf%E_r&N)U42+1}m*SG_dzHQY zYnpcYy$R|QVf&B3M^qktZDdz7UfLU(10+QB_XEe23?M43J#Wg@Z8p1C z4v|P?-^n0B z(y*Zw!6q`mIhuYf+6B_!KdUg9Q@ljf?Nuz<>mBOdPv3x($=&;4Vcuq($snZA3DQ~( zt~J*U$nI=xb=3Efm2G_ij@6a%_0i)Mi;&FnkCms8tw@U9sR2Sc4lCx9imXR6CqT@k8p zoHJ~j3)K-`Z||g7R`|3M-sTBO23dFbR{U7dW0c;!#8xg!0}2sOnDJJRr@8)_+~b-h zHpw1a4gJbifm^Q;3x-X&vgbo&D#@D;r>lDa;h3dVo{X8aFl`rJ!x$=G)^Ccr28%3w z216@iCrdo5?yd>zsuR7@?n>|XU_o();^Cs~>$lXtGS@Bc_CFWItUv&wWAv!Fj%^1J zZc<$&I?(?$X8ZwjfONQXMKc?~dfx5eM~NI`-AH2rDU7~LZ?n(Vqf1iUf|#vmClg3`3f)H3gJQtZMtqm%bsN*pk78M&8fzy#Ko4|j3Nz1%)P`fpF`6mH40+2yPoyU6cs z_1e-qLf)(`hjDbtV<%WGYoJ0|<6S3d?s0@!kD*5aMoh`y4>I}l2Z5T>KrPkG1u#OcCh8Rr-(=gJjz(#Ls*(?0T8Dm9rBy0 zp63*gP6C-rC0gR*x{@aM8a$V@-herZCy;Ks!O@LjFG=me;YNQh{ToQjT|NCu1I41} zV@RYnYKbT)jC3Fg-!3S>)zay@?*MK3nnW%T1@)f`%3Yzxt$D4By6Sxh74mT$(L~t? z8pY(Qpsb6}c4MyvbOn#5!{Q9rS%StMkqe2(uqBha7_vH>P$^)-J|pkT|p0 zJ2uzYZf;=$NrEkoarI%IzgJ&~zcGI|cz~>aY{@YA%1w`JJ6^wl{y~4ghiA#FUKyV_ z;N;}KinduW$m(FtZFpy_s=(DVC-cYZd;YMT#k%sN6onW~KF`s-Ppt^|e!hWC=5MiQ zC&&}Y^$2^?I|p!c0dITbP=Ed7V5T zc2GiN;wy{Gl3!igqJy!!SZ0w4Z# zeT)^bCV$z!SLAw7#_9vf4#Ngex zeZ5VSF7($V@-sOD4C#CNsTmFvY-)2HLRfk&;8TCeX+G`=-IE&?g~&gYDc!81Qe62i zNb(nnOaM`x9W0tk@YkA3DplxkP`q*XdwV+*W2~cZS>h>##~e)~V_VsM&D%Q;Pu$L; z=ohT~5igQITGuWsV;#%_g7xL=$&VRV3@658`|rH?_?0(n81d!v6QS0k{j@`&Da*kt z7CI=s1a`rAEclLA^3})gwX{R0KemUxH(bjPL%SAbgTC^n#UR>_P}?+1xJ~2snNP0k zx@*Wq>$Mj$)fZc8b4Y#yGafjO9;JSggq5sc=R%P;Zan?(>*T)Kv-y=Vl)trj4(2FH z>K}@1xl~NtI}+P~32QyZ92mA&4&NR7nzW{d%1u1Ks}fPv3_R&)#&~umhm>B*kNc2@ zHId-=AKaV(kYw^aPvgvTt_3YtckMS(ww|hQ!e>_CnlF6$Hrc~5{sW&s$o`b=9F1J1 z`~kH6#ay|>xvkN(l{RyN-6CSe^uH#p!vKYg@Ls-H(8QdjV_2uL$Lr?DvttPHLmeQ6 zr^H{jn&-nEA^yMbZ+F&KIN>;T#VeT>x2%phsxel3f#SZ6e@ zG=yGRJKKfF2sbX7Rh}H{6P5VMHKRZJB+$#3V*tUN?Y6o&pH_y_Z+<=1k<&0 zQ%Z$^%Sx9lp6?CVZjo(B8chRToV_0rxZ;s`Dp=_WFXo>_-3*3YxUl?L-9JLP4 zLyB@ADjij72r)q;k16JX{qw{_r3}m*)DK=}yu8l}ekINz9K;PSFQPnn#<)J1bMe~S z^ek6oS(UO2$X*&zKBwE%TbjNstacZ{?HP22G^{`Qq7);2!0;ts|K&!qdXx$T{q@P- z-B2NBzVbZ>yX#%2uuI45;qLBds8G{Fhy7$PY5pxy#{0o%@WTe%2N5j0993qnir0L+0q%J9qkot@Lhl>rvP1fHvn=U@9(=12}BT)uGX%8 z{7MCa(z65{O-p}+-dK*Vyh+LUtqi_ubY&u_d_aH^%`F$)wc2lITXq}`-b~vyq=bv` zTjIQQmA?mC)|@@k`okST;>;%2RPqKM^_bP@!a9G%t8y@K1p>`A3k6 z^ukxFjHnL>);Gf&Z*JcEzI@6jD|PH&1}>AKE-OR=CtUjOe7iKS3=9S>OJe3G7~OO` zwE;J9C(ZC421zR1{3g$>5I>jVqcn*kK}-B-*g`~_YC=tW9&LYlx&XZHbJT^9iyS?@ zZiEo2s8|^H`ua{8);S74k4<^-OiEji0A7bIO%knt>9VDlM2R^oG!l*Q9*&tAJi#Wo zr=MxG`zXaQ5X2k;E)QDddk@dCH>l@oMW@iG$YO&dked7E^wFSB`Qgbq4<;}NXXtBG zB53009Q;P%+tZX*h>p&>(SoMBl@iUcVe9yTzd4ltzINqF(QK9j0QPh6m+v1Vd`m#1 zsCAI?8+qs0MNdUWwZq!yIqY`H#E+!|LS;e_c=cYu_|x zuCV;o^yQ=2d$7<*(8{M}=t@Fwke|BQ(iU!a%QL$;54Tb2ugcO}0`92x| zhTEMq9U^(bi{bU_MfWUI!k6}teW@H5?5p>>?Q0sF>$`pz_q&EqBXfyxZk(N#CKTPh z+J67)9SDr;RxCBb-K@JtCpkADYoAe5=ZSEtrz`~!#-(C-Y#O*peZ@aT&XQPa&gO8n zw2@UYh}4h`Z^=hS%sk&W3=?%EL$mgK^f1c*dZGFBe6s(re`nHRkf4bBDc>WdJI~5L zxO}1#ff+s2ZgB6ooa^r1q>fLB&+<(idg=%2URo=qgGArwlmQn_x4Wq_5kVID?_a_q zsUFzS^>9{qr;4d@+8w6_#F!AF|*(*Jz?UOU4&O%I0STov?huW>|l?Jk;& zi_0IL^L-7BYwQd0BP<#FEgXY&KfgtE3 zV>e^Zt-4L9j;YHIdoH4s9Q&q4@SC@Kgzn;Ya0>aO zO&r_k1P2`PqYKgF@JZe#tbtHd*w|OJmbQpc%)42NF>}8}Nl9emkSY>N<^B3K zf{4M`lRmT!XzKC_WIyO+&sBWIGJXUiw;wDYoA8`!-?>CLe$U?<26f-(j>JH8;VSHK{?f#H^3V zrzIo+O8{<1eqADSpL}!GK2xapGUH-c`v7X(_jwLTyWqtQfBkg6^@S_l@lQxLNcIQk zoL;J_5^D7^HPiNeaG+dYi=%^_hoTci+NUh;-Ss*0lIEnYk!jc8^TlyFFaXrb)_W@W!9HR>l$R=N&Sup?TxKg%~$3zSB==P&je5P>- zw7KfQ&O$IP_gR}z_$9RC9)vsoxtCw&m0+!9sqiW_-+rhf+CS&>6j0eyH3Rapoe%K_ z8zO^meTv*E7RkN1^Pk2VH!+bxK7wL#5bxF}3hpA_v7r4^^wUT%xNSfS`}cAf1;z#^ zH`cP=0aM`p-#Cezd@7y4tGtU4a~l@nF08N2n!lMgM%O2DFmJ%4M6MPC2O*W`W?Zg8D^6zOdz+6pIqW4a!wmEE7P7T_&g3niN}r9f5=iqu;#G*{?5Oe!_tm zOduwgg0g-uGeV6=+JG2DmMV*wDlz8KckbmqcJk16n`-9{_1Dj^lf%Bqjfy8Czs;lcFpmlYT8u6bbXYd%Qe^l{^m)VMZNQ<}Th8QdfTnq} zkMFPUPHEbrY~)d?kDzB6d(v0x4<|ERkJ@kV$X(nHcj>6OcI;I!6xX{O?MocZ8A zykFbjZm*<-{~FPO{||!*V#Te+PMUf=T;eGDQ+E%bB=W1ADGHdh#&lg%RfgU+AKLw4 z&@^F=xr7pn@VEED@e(_tgr;A@%KBPVdaGdBJW*0!^l_dSCK`O+1?7&$$-BBBqw@i* z+bczMe+f=iYQ}5c&8AhHn!|0Tcq~n-!?Rk>6jwS4)7V4RBT-US{91t`_BEQ6@5q*l8}t46JR`3<1O&dBcmFPGXvXqh~L zQskTna-R%p%rff~IfW(u`1_n5$*>Qg=9_>t0c{I7nBAb+*u1+pusZhe2(<|3_5%&f z>IEXa*YM@9vlo1#2Nj9HTxvt^WeQe?u}>~ZA@?!5JWwtf<~Gb{c2Q63JuxMHuw=Qv z^Ez+7nGHn2_h`^SYbOAN#D;LY*s8iLnQZ(}%KD>_=iryza%v-m5_V@sKnZTwgdx*~ zwXHwfFNSNy_?|=Kozs-IVMhjyh?8mDR|6-t!^bKJ;x$|ub!NJ zlgql7>CcV%8G1;CH|OLrOlx-{F!v zz<1b3MSROWIzNB%2DL?oZfpY{EDz&k8oqq;`2`YfW*R1xqL;?xBm)*oy;JqppldVL z6TB__Ak-LeW))^4Q|W^W_A*|NE0&|eUw?FfqR8~-H7H#i9-VY3tTuJXlrlVB#&;oIp=w?jAv4_jiSXyrH&TR54=dpJ zRQGW<+n$1n2w%J?7;487gHMvHXor9bJeP*`kMA)*#0Nyu6nN6R5UvErb~BD0m;^gP zl6m$*A}pD^UT$@gt;i80qF+Uu9dtB26=jxxEK_}MY!FKq$Mo>2?lm`;Y;-Tly;zP9 z6iH7zEUK+%%L5KR;rM};gPmk~4VatH-mwMLEVOZf0;goGB?*(9axTz zU`Dc*dx?I3{JTUFp*Z26KhGBmHsHPw!N)j|5ZZ30UKPmsGqf1~zbV+@rw{+}MT;%} zp8!2NMrxJLxm>#2VqAHgf0#L5#+KSuQx@f!F2AZzFG~GDm;Z$E)D`@Tw`(2ZRf>9y zM!v#m4Mxf-A3=jzI1q|^|BxRYW7jh3m9^BO9?IO^{Vnc3HXQ)MWP1KCwX?<175b(B z;men24+%ZoKTW;(o6)(IEAt*)Tvhzll+fIJCxI^*ql3vJ-HKPC4Oso!MB39hOWAfS zGxUWi8i~m&{bk$$((40yy)H?^hEP2EkGq4k75Fi?`?Qn~{oVfv`~WhnVw?~x1ksxa z`B+w1PHkf?s)1$^e(+B|yvE?KG~jfeI@CiCs4a#T;eWIm-iY}MPR)_uynI=IfoiGA z%gcys37Ff>n0o{@miX}f0z^%HyyKy|4tHG@|#kBZI! z_cI%14>`7d9$=`dUc=KK+%RnW-0`2Zk41cf86j~XGNd;nKLT%fZ=u2Vu|g65L4CI@ zkl!$0J}$Y~ng>X17hhq3puT}E39=h~|K4^)K4Wlcta znoDrAgCr}d&PZw87mSPW&j;i`LK+Ub5f*n5(a-X-jkSEaNn&YF=ft8MAM zdA~dALvkFtK0R+!wW380p@4rYP&l7d-3Oo@qhePfV_w8;^e|(Gq$*Q)?}Gi})i)jw zr}hS!Ur* zl!=J@qdZ@`$e-<}JT32R(r6rLA3x!9?y|Ryqd5dWoO<|R<(=1AGQzRbon%A z$=#=xQmn+kPAw)%C?L?V>(4U4IB3X}x{iPy_Iwxd-mA@)M>qHU@w0E{rv)8ooDTgP z)c1DNl-u4u33&|qpq$|?F&%KFN}EMF3#wlh0Zt_GM_#d_!g;zUfX_eeric>WWKu*u z8G#EQ)8SmKx2o1NO7?$${_eF7>H#L@(Bwxh4#4K&^TX@N3)K))`|&&bGBXG?*zCbu z2Z6qyMNjk|&|Tk)6H1q$c{+Dji!-yI_oJoL&(J`$?D@l%sPAg!?1W_w!^9R896jhK1_r12Px-Z!QZ1)3^ zwBJS0J&C(1yt(>z@jY%m>)K^n4Zr&CKT4SA$>ht}SaRq4R~;B<^&Uj#ZX6oKl3;->5AOBdaMu*gV zcJF{w^C`@eUS5txGwoVCGD3wyL#zLDhE4I{OZr6N%EaLrVA+y40T>HPe-{@f8ec8+ z?Q@bBYNQ{?RGZxOLuKikZ9tz*##HW$n2pQcdp~DHAVvC@w$MA6H959|coRAM;#!mS zmp!C?=(Sl^HF9bxO8eGo>I~O_=j-|tJlbZ&8WS7*S3$?zQ$;Nb7>UISC#-kWbR`& zb{8#H(7U;-{ON9|gR-}u4K4zc&5S#*XF|#0@xrn#O@2s8aT$DW1h%Gz=+M5`9x#X> zCmW6}hSjC^lBdiv_!ybN{VVK3g!g8ZRsB>CfNOw8W@GQh#>XW9n)>TVcqxPu_V zzJrKPAn)YEE_IjamG|Z*$%T#bPbMnT;wnNlN$8V!9Kr zp#=FOM-gqnP%OxOKdj&J*ZvB=XGi`Om_D6x1RMKBevm=rvDk}euV3%vs>cZA05zQ| z`%;@v{fjhK;ZMx$0+8f`TH{aIx^=ezkV|+I4BDRfdYmP(S`Axs38$(%34;EzPx$ z71si5fcdXQ<`{+D-_0c=aTH`q{VEd-D)LjKo3K7{3r)G%dPa1Gb(I>gC1#>H-T%Uj5o7D$oX zcN#Z?SZ1_O>QPQ;`Ze!uft0nQtTPxFxe`Gpo!l$9Ej zD|a4@B`GSN8ql2&jeM{vl>n*dhQ_%#erQK$zkM1qbAFpQ;m&C{GRLrMJUQe|y zQxXm@!3%n7au2efk!iN0lKK*N092iHiN9u1O&lu#XAM<1sj#_Px#wS00gBWnMnLDM z2@x`<@8lp~#upu(9(MUKk{gbPJJS^?k(a7)_WdqrBTEYJidcc;R=$7Eo;y?9vtzQ% zUaI$XH}{vJ4RlqTjLf5T-n*97bimtCvs9%ttY&~c@EV-jXL~wjQ=??4m_Wr|{41uC z{{lB2R-JEB#YGbIf}xf8f*y62ty=Q3^g^L-=&Kukp}|UTf!3;21{FMLmWs2WsHFCY ztmG&6`x0Gt4?x%woQPr*jyGpDS5!=02yI&9TWeYXl6F<f50bXH4D8jmPcWLSPEaPwiLi(7gr z(!b*InB)zf5tBgZa;k24>cd=MoYb3PhL+0o(F3s#udbx&GFUD_AJCbi$iXi0_0zA5 zxbHNCx+C>MmeQFVOl7-3Y;}A@U)#c%oRnON#5g+ZpWO_E-;J>--kh=LM7c|{YMu3k zU2PO4$^Ovt_|!Gx5+Z0gH|f2J&pgcY!VXoymI)qwUHCheZ*}Z>2;J{^n`39${2X6I zo?Le9fNEx!+@MXTGF5?9=u)(jF4wE0>AmQ_hv3bFQBCE7LVSyv65U_jY&Httd9C#8 zW|jHH#0>n+@ARpz=OS1i_L1`)jgM-Naf%(CRd8VxZ@dDH+oe=()?^{&h|!}BC5%9Pm<=B%I2r>V21T<@uflj82p58IF&FkKKumvBe^ zlCCS^_1hksuI?+dC8EOEncpqZg(N0NROuDqLmT=U+ek!FT-dv+vPO#{n;{N!jjwd< z!Qh`v3C*7Gqfq17{e+cRT18Tu-DHOV)xDi+8y&6E#76P%ymGPO3e9Spo>~KL>3Uy~ zx#5^$lBR}yqVQF(Wp>O^?!yYUSzU<78nX&xDI20hzb~mvTbFpd^n|*Dn+8nUV#wm} z|Gurd&3_4H9dibg`I4Ks?x9!ZKJwzjJ%!_8ISBuUY%CM1?e6JZ^ zpE_(RZe#P1r*MzF!Z}~1J)g|EW^!%Qy*44}r?!5)3(vhXC)PZ!J14O&3#s}JeKUHD z=Cypf{J$b^RQ*BBZH_3Xo!5t5g1)-!Z9>0`FDe`YzAv7)?#@?_d#vB7_fqZ$`q-ns zxYp2Z#8W!$TUWXV`WG)j4soF;6a>UOTMw1DsRAn=Xsr(R+<&Je>26n>Q_WWhQJAGJ z#ZBoSEeZY{dZ!BZRd4vg=IWf171>%WGG# zU-jN%fWMmx%C8a+Eg}nXZ{;@jq72ftav6qHRiOfZq-Psyn zrXfekDO~X-g1BzFt7&6C)ot^oKScO`nnxq8y?bpF=h84P1@ecyP+ zQiA7$P#I}ntrh74&DE(*)xZlj?x#~bU4t;*?_HC9xI%@L@Um#;;S1}66?j>F>Bk_A zP3d?1g44tm&*pg2GWC_*jH2slmdb-Qglxa0=G)PN$4(e45T=7I5J&9f`K1b`^~MXs zZVoR``bld{_}y(@YoTX69~pfLK)Me(W@F4b7;Bxv5oW0pr= zJWWoC=VVrFB%2jb2`XuN!cb9pom!R^Cd+PheBf^x?Ui2-ay3p6CfD^OY~J)AV$wR+ z!1x8~t6gIJXJ1P9cm#dOO?3G+G_MExU}EK^Q-FWb2&<=GH-1t3Os+~%iD2a8QfqMh zDp46p!kck4O9MoOAD@ciEnKUP`Os$3C3D<*xcURc5c2Th<}SSYXjN!tdlxV+5JUE| zVyg(g3FYX$%5u&f9PS5_;{E8KDqa3E{tAV7Vxmgd@hafW$sl^cyDIij@~p!fV+%fl z{Q1hS3alcGTUff)Lf#62?4%E7U{n!)X0 zi5@3C=kv;pLI$0osL42Sf636ql_Ys{l*&PqUT($qrAZrXoG-O}~Lf=G3@ISgN8j_*FAq2R>lVoMHbeyO#7djN`b$dTx9x)Bgx%jatPQ z;3vSiCWE9|{bC)2C3{Cr9kIkzJ4TQFD2`B<+i z7iwzKveSi(_0G?pCtCINRPmcAXR*XytU#PuSkdC`rA@fZ*Qe-tcaR;BFrpsDuPtn! zcUJIs_NQF_Cm>uH{YLBW%8B0}R0QsUeKlc`CVo?0Pxv&}sPFF_pNgUSaZ`?9qTOLB zSm_&p>0l8EOXEK5_!pLsk^3jqRIMJ@D8@;@jh`JeLCM}XMu`My|LmVfS0pvSHG>>h zh$0(TUqph05B>LT;|Wm6JcNa>Iktb~b>dNOYa4$~y`I^N{*o5(nsX#}%Uq*x!kzzu zwG~YCGwulYjos(?!9QDHyl#Hczfop4XweufaFX!nNA1u+31Jh$`r9gO{LvwSfVPNq z)(N|;wOa;6g*y6MnE~V%Ciu}D^qPbuIq21;9=jh$T)nV}HEb!-P#cr8?rZeIWO;9O zWbi;tvsFDER8ayQZYYYIS;Y&Vm$TQ_@Xn!Iuo{$bX(=OjW@;;3^NE8&MavLVJwsIe z$Ad>Cx?Y8sLwj=5EsGMcX?9kS|5V!)B2R-7E0e63yfPiF7yMK`h%$ zTFhWGb7J-bv84QULCrxo-@6HQLNynYi4Wz_3{!?ZmtHw${V#k2K0N1C6H|PNBL@MG zap<|#6oL+O+fByjYL8H5x}5rZqQ#n_C8D}gPijGZb;lCQn4}A)CWragj-H>czw37K zJ#*-7=LCi`-s1IWyhhMfwwiy(<=35do=4~I0Ge60B3>u1N@m%`7rNoym|@HjK`mhm zB72&fKgz9|RMFLo>M{6vv7Tvf+r7$yp^Be4Uef<@XoO@pU?&{=vp{pl(AA%3mJ*b} zA*BvKit9@+#oA|RwqT+wt_-AMb(i{X_Gq_tbJjtxmguB!We^BU&yH_SjrRt7zF-Y! zi?M)Wsv@*!tFVl4vmg)@dz*gPfYGD+-6NaMCEDZo?|g{V1XRv$0p`u>Imr6@RXX1e z#Q#`xPOt}`R}LudeGAiWt>S`L%1x>*gmH4ZrGT};I1IY=(ub>8`8DDqe z9Rm7Do2a+N^7x2o&ZKSk@>2Bai7x%mBA(z}eXlewN%GX_Tc!&@#7|Y(!a#M&k^%YT zOPym+zn$DIO3Kc(3$}TW4?a0%0f{sal_L-WQWVnlK1_C;`x zM8yj-j7P*tsQIMqGTWKB1#~!ePkxwWfiAoa(vzWn9fpIfcm(gx3dH2{us6KgV*{Q@ zs+NyhEmyM$U7NnpHxU7&dk2TMB8m``)VlN0EP}>?t&koypPT{kLf1p^7&Of&LU!or zAk0@Fu2g+3x6dX~^Bn|?wt1_#pY##r7WT@bU*RN^4ZKOL|8=g+CBv?PKi!_;40}>6 z^qFK`Zd8}>vpX#+0+T@ET1K(E7?FFfw6UZ)kpda#_LxG?|YSlwZTbEK&Mym!reBaUv3r3CNjI3G=o3@#f)788wF{nrpeE@2t?CrA_tm4n@QJt`fU& z#Gcv`EnhCLR4d~g59fO+8ib7yRP~Imkq=K&JzZvTLZHv&Yl#(SpQD}l*UF*^rc^G4 z2utt?>E34SMl0XtCoXEmk=mc%s&`%A47&vgCWqO4Z$JsY9PsF@OTkC4rYq(6?;%NO zdfhx(@ov(QK%Dzm7eJeyyci7g>iSSA%`_6tlp+pZ;_E7)R^1@#+X66?a*guoxL^vD$t}Q*Gu@J{n8-}+}T{oV;>pRxg8kC`qp;@k_U(Uhg4+D4L-hNX# zCoj7`6}6~rb3Bgpe;lKk2#J1SMG7>8pa)pOvZz*;*bK<~@G4o!%yHoU_0{L==Je!C z1_ILx67CWo)d1iUZ#Oc^xgWZ0-W=cJ_;D!pa=k2VVCjewC%x+76jYNRL2C}8#`bxh z9j~rNx%%Bm-Y#?+*gB&~RX6(h8Mr}iuLRt{#|Cy4 zA^y&nJoX5kU^{#mz1}NOLn`q_%is)-EkV(w>(0in>yZwN+2Wu1BYj|It^KP+-RMi| zh`bvrf=sXERVSJQV|LNr-xo2*8bm$%BV4W_LtVH`!)E6L_;~^df#DGEjLqa^`w_(U z33MoopmXqzX-=pNE8HhF>m58h6jRs2Nhh!@Le;LHx|LQ`MmTg!)ru$xW4b189X(aA z_K~RAp4HtZ%|vm|g`u`u4G#4Z39WD)SelJ!)ZoVcla z1s>LPgZOMO&)NWSv?Oi+UcD*Qqak-$&Dr7Y{5Y69Y6l=MVp}NIHMXD-6ivm6)aQ?isMo-uh|GSK2>kH!6sQx%wk)LyD^RK!j(4fD(G__%x({d#ch6Bx5bV9brNP=%l(}nKliI|% zz{jX9A)sB6N5f-yyWu^6@O+_ezQ1HZ#|1ny69rCPc#bVA0@W8 z)il!wGL6}M0GoqEo5t8(2svbd$_i?pG@6c8eCb?#HsdpxxdY(89-sU2I7}1O>zA1M zF3(b`s);@o)4E0>u?;5)-#%@L3GoAMIyLO&74{Jbu*SuC(5$gFb@EdsHjuHhGkOe0 zxT3tYD)RI8QGLZc1BV1u$8YD#KBP#!(2glhI(yNZwXNU!#D^ZGd7M2_mdTRUpN47# zS{|-ReGjTJZ!zxgTN7%RLOof>iPW;n5Lma-gdawhjS zw=h`6jaOc?!>&SrU@{6Loa6*qo_)vo`Rb3#~AY(HkSBHv}SuSvAb_2Wru03*+Mf)7yKz zZa$Cb2$EtdKM>3J7{6K@akZcD|5s5MJ;o-pblX?%pnOzmkHZn_RBLueSZ{$mZ&4zu z;c=k+k?Of8u@YLY>(uzU~i61Or!5@3F(fI@Bh>_|DOtO7j=Idk6lHR?!our zlho9Nxgs2+m~&d(?e4}@=IL1x555uk{_TJ2;Ej-;omUPRxDUo({L|e%gK4gk;LXPO z0I#}MxVKmM0=*nIy>WqSNMHa_k|k%{Uf1kx;yB_}g_^&Z(5#D*{Z}-sc%>N&B z+q`uviCG{mrX?=>(Z9y9OnW<_NEq$gXWa=5PAL=3VJtc}AW7EGloFaIvdF?JlO%{5#*H#d&Hjc%kw|)XOvGst=-LmRx=m z+F^@uS@h<7nkDt4<*8AH^{vhcE~pFPZu}B=1!+4lZt zmIE`^Br?D&H1dCI?>nQKe7b!Dh>D?u1eB%-2nqoarFXD^6lqEcz4snMM?#Y#f`W8V z0V&d?C-f%0SLwYb^bY3@Jwol1ZX3w74d-lx!J$ui9JWqE?5F&w% zxLN|b6=vTLS`kOllH#}~7dv=5$%ejlu0Qu`pz6c@OI{CA6Xi^M++ppa#_!X3Hl6Fh zGsT5bhd;%P-_HP-cL)N=P#=WCgmqg%Ks^07(m6%QBZP?n2=cw^d%o8g(I_VO};>`I=r`f_WU|qGB^B=F(Yd=UDR)p&n+JY{NDCA zU%uIy(D;;P`4u9uOa&ur$b$vw8SFgX>b#EF;bz~0jwar^iYsf4E1PE7EI0U`J=8A~ z47;B6^s4h)G=CmUSzT(=xR%rt%$Myi-JlYrhKaD`1L%G zZcarl2}BS{FJRWhWxPIDDvdwY+hOpa*cx!Yk7LriX#A&k5_yl2Lje7U(4%TE zsy(9B&az@&^B1+?P`cz>ycM3Fb#4omW06MnI`K(oVyip-kr*>GsH~UwG+|Tt=qOWq z7x^vkG<4KHz#?W#OMM_SlL=LCFSxox0KL!Ma{sF*znuG{1pzh%&jgBu1Pl-qSlixT zmBklessN>Rpr;D-m}$vUQUZc9X$@XpzDbPYp@mESH(zvAz(%Z6WL(Dn;sdy4UObVd zcpF8|qQIeq>uo>CRiIh_tBVy3GTPO{N)=(SW}T3soHa8a<1->;lT>oBOpx+@!m|HQ z9#_eTtChCwy|K*M2l4Q+G28Y}n>NYGy)FHGB@!N0E|r5$WPjDJpa%p}Hj|biA&o73 zXAVv~>mQW#pP_ckM!I;t{s_;CyScASj;H01;Kj-%R2u}%2Y-$$Z)e|pRERw%Iun`J z+t_%IDfQR@oU(s#zXAY(J&|Q)kt3wBc)}v)m;&kT`a9z{Rb7>|gVx_rA2!&)`pC$% z$6M6Oj{@F3iG60}gaPsGf{%g}cmPuQ= zJ8@fW&5J~xgLf0KS_Bu~PC?^F@?9kpe8Ip>dR?_Gnxlo+;2569+9uXZBb%Z(ti2m` zqwwJg%D(K-`dPv^zqwa*!ODDgANgEm<)zPA#prKD(Fd1c>^?qsJxZ6lu00^Kf)5L( z$cSNAjWDX;-KwuHZ=}i=PiI9-FsbuSQNTtVOru}|G++iNLTlb${Lz8;D%O6`f;W;B>>ETYAqgfx31)DBT5sB5LZ=P2f8BB> z{8}=)f^5)z@jvxS!JNgjL$t((-zz)b1B4{(fX-@duOLs9m2xx!iZNq%@gA-tT3V~E!vhfxan|(d0l5Au%BR`Fnhv)33X)NTvPzAc8 zWKB&81vE5W&s}Kr^W2mN_sK@C3XAr(f}V_8#(>Z7oylS!?>AjYl}!< zJ%x`mWU~K&jr_6P!2D{mqwxFI+ojFrqA`{?@op$NKHE%%FL`5qv$Z-by7)}jq{}9L znL8EoHO0fzPXn_Kf;I*xFN7@Zy({idGIA|1shF@VF_*a=^a!8i>cOKEpNZP+qdr@++Puf&KT_zu& z{vt?tb+$6c8^a)+G{vo$gs^Mp3eZMlVjTeV#Zg;3qDXNcu|b)Cj>x{@vB~#BQ3zGQ zrPZB~F1ZmlX_4d9Ke%)i8U`SsEC__;)JW0bq^i4~$4nSQVpcWV1|iHm;t!#0)VSdM zG5OQ!-jPEamFowM8v=ixY8TC|#b_1Jd}7BqVACr*^Vd(PPe0}*M(YmlFPzM*Q!W2eJXqZ4h#kCcr zm+L*-#&HUxc`~wcmaOXxg$*x(8hWOGqQvjstzo(-=;}YgMy~$y~38 zc!kwq<|3RNzaBDNBwl+EL^%b}3GPLVjhu-Q@Ea}QO(%K3oZGB&0n;!tdTHR0?Ori|B0 z<*szcOT_*qsxr^K{qk!;3&Xg+@=35B>4T$F&I0>Iz#yG2tVB4Y zQ!S9dRCpCeON>gT9jI|jpNkv=E4H9fIj{%~gJflySV-cyJsBE(>d}JE1{*XpoWYc< zrdClz<^ABU4_5>W4En=`vz2UmO~x0HEc`z$9)tk~8C$OpOw+aA7oS7RETHu5@6Ugh zFJ~Xt`;~c36jv84Sv-60I%1F?RY0}t@3htpFy{>wKyzkp8?{rp8EYr4DLQ!?yv~mU zOpFW^Dj?gqgj8kt43U&xAiO~%AnyuKW2r8H=FDn4y0l~$JHCVoe5`tIaCIfqHUH2* z_s_abdszIV zxh(XkSo+-o8i!!?*9O%`FB0hc2S3KeL9~7`d2ro`ZPgvy*vBtx?n=JF8#*`Tdcv{d z+teRis@27V-9^k8yP5q_uLV0~nhDRw*xJnG5qj$$!0S{KCT~WaeLipCpc8{yGwkdY zNFx%-YL1KM&ZLiclz|oK*$;O=Ta4grkQ1Amh34kRZgA$~k=f?~S}R8xEsu#CL|F_w zqMugA!JrB^aE*w7Cblm5ttAIiZFweb=$f_Gaa@!a=TuLQO$^%L$J=Qu#zxMMii!OZ z9i@18z1ww#@ttUOkykdy`o`vQ!Vd#?N!^}##^4kWwQ08n2~M-vQi~^rWsb2*p9&O__YVp$Hu6?s|#HuJ*wO6A6ZCBHb}gG1C@&tGe6Ia3)6Q3q zw9bSZh~1oB|MlS4b1et+dB2_+5!?LBD@Kiwtf!`K9>CTMA@u%^i(ym~R7UlYWOC|2 zdg4D>%M?%dvR`(;@~!eB#zV4(ayB5j&vhQHVl432x~L;=4l)+h#qu>W(afE}wWG+b z=d3nK|6e9*-eZ7C(Op}ICtURpdD@c%-%O{Y0R$cQI$1$5f{3pYvVXsvd9#2WDqu%u?lEM z0bpn%|Gsd#9H;;P0Sx8(uOlQ1r{?Ff9RT(_hrOp10k$#9B>>cLPB zN65@O`wI}!VPFCL9y}{_*bqCn6k%7biCk$$PA0@x9O~#vR$mn5T#0;(YwZj6vH`7N zgFwvWj~)J#)9gXJ;-U82wi`1!uo56!*7grM(f(lEFu86#tdsc9NV`0Aw`s_2g0@QJ ziz7Xi+!Hp_nlr;+J9BenQ^`=mrNoIIn?k1QjcrO4xiGf~BxG-IS#Y7sO*~6}z2eKe zlr#C&wk?-u6pS^v^3igo74iOSX-4FLz+Me+@-0-CPwY@O{OyXG2##+!@m( z8g{vWrj^&o^OU-M?CkX;TcbOlxMV9QdkEVgZ07{5(OI9$RR=EjIlE{5ahUnUsnT_| z;cp&Suj!%Xd^OR8k<-TO+HC)WZ!7Vi*}qI9;fCmVhVL?h-&YoTZV$*FRw-W!#$wmj z=fXCoZ7?$X+HBL23UYG!<-2GFtdx!ew^z!wF2=91Y*HqjOp{3M%l`Y?$#dd@rDE;= z9M*ZRcX1$|6;^l@%h{Lec>~%A(M`vPNmnZuF898qGSGEA>6dk> zp(NzdQC%)z_^=YJPDJ)MVVfY?KpIy@lgn>pB*QMnd%U?PLX)tmzaBtxYt z@5E3vMgOvcB3-`EEw)5N@13USm)tbup!}*hjl9RZvppNW6bo%%-=5uGkk>S8*0Ufx z{O+25*)#Z1y~*;eHh16plJ?k?sq(%#W29Cntoj<%7xtz*8kXE46yZcvL9Oe_xQAB< z-nzH`QZ&LS9S0c_f|{FgfFFqG2@a3M^DX4XfrX&3aEI^LVIX~L9Mp#oGBUzJg&^A7 zI5ZGcJMzCt(e$8Z1`3dBOZn?v#N3O4Dv(4UZx`6^saOaC0(veK2&C-33lIO2<%aM3 zjRlQ0ZzkSEPvG!Y4npQFduWH}ws8Y}PbkpL!#)1hkMwuDVP7fmY7dny5GTho018V0 zB4;WGwgf?wfqdb0R^Y4$z<%B%Bju1iZV+g;iNzUs!?VNMI+YF&vL700sT0-a?&fcU_!d&Kb^*0gMj}VrD$mKJdy1>XX0@|Ad?i0XdwY#&<10 zA&k$^P>^q2&w!S>9GV^ETaFzzmji)3MdbMKLDQKbrdL3mt=ILaAfA(Zr`qXk6;3U zc;~t=L{>$C?aRY!4&P@TfP4pb;B2O+gL<;)Yu@(7SD!;7H)|$9OWW&v{m)d`Z-V+~ zd!w`DbqwJQE?Rgztc>FCD}S*DpQvQgfcj@hx;K>=<^PM2DLpmHXHYLZxdzj8x+SM>dr^E5xAwbt}b}8-N3b24vT+Q z)aKG$gf?5*Al%ZdxPNZVGeqvAS$>6cRu$$oonbCi5&~LF;ly!slY99^q zZkmg~76u*o@=B_;ZN%`?Fz97^&l5Gbj5kkmSRx1f?r5_aN=cr7B#A4^6~k)C=1nD@ zW_J2)2@$+aZ(R46tuC5CCNv-c1RIdZW+XW>w+3RVc#>Kyo_=q9x9IUx_Z6F9Dedol zoavEjxmKLpt;Qu%E%($}WDsdTpFGTAJQplUc)9uTl{yOt%?=&I*uw*o9B>cj7lGU7=6a+MJq&RXHow}UP-28CuaM0N7NELupR@FF$orMB2k z|Di?tnJa&VPMbR~G~aCZ-5e5Srd83MD9EpHH}fu;;=?;@U-Ebw>l%Oo6*kGFa`XGo z8i=vq&Yr4}I$na09_9BGe`1rj{_VC7H=|ncV684R8;nk-7L0C8oXKp`q`>?_{?$X$ zmCGC&Q;C`mlV8UA|GY79sNkaVqIFA4`o+-Q_@qxDC5r<&yv6}*vk=#%-CYq^c%8Oh z51I)`cJ+Xq)tg`dE7s%U2iA zv=RRBN`4V1fP$kXPercXOfAk9gbhZthNkR3OFlxS+WDEgma)kEUT9J~5TdqN$J1R* z8fTMva~Bh{pG&Iom~%8Nc}y**1LF8a44@F7qWEU0h$PbjPSn!M+WeY+nY1ZOZ_s zrrUV;RpxtF6z`#_ZH9W~B?;))JRg*52Vpy2gacup%c7KpHvmPyDs?HTWhA07qs_jw ztzC<2%FE}iam-p5x9Ktv;jV|G^9fV?~A zSD5-ur%>U$K(a}X=3e$=ASlot@ZvsD*25#TYn~vyM^0{Di`pppq?6C}=FI5o=&3uH z45zqw-EQ>wfbF{iH@J>_e;Rqm7b5Y}7ZM1hia3vZIzk5OpUEq?L=vP@)~S4}LbmEY!tAh&aWi*H*7={mmT(v$Oi zyZAug_1#blhiUG|!&gH3O`)B`A8q<#vULJWHK}M)2@`$^byJd5lV~uZ+i)Y0T+u6S ztOab0yjv^^n`VuP?a#=T9x)`jS?!689rkB0c%$#axo!gMs9=Umf7h9HiaD~hiSI7W z(lyxmXt%|L%zmW}Enu_KSTp{j@Tu_iZ_|Y^n14?DtW~a>&F9;rDSrBaFljA82Wety87c+ z&cklVgPzA*?=(h~oq%NdPxrU!*sGN@Yx1n+JS6#F6@6|23)21Qc+-^5I{fgeP-_&` zLT}5nI=~bMOe3v7AwY-7s;HUoJ!{CT#Sgw%=Y17V*H_V#{>*)LchdZQywn#df5GnO zOa7#<28uX4?!QEv^xu#%S8?})>=eZ%?*SV_KP!2kJ91A`^DXyx>5pG=D?M^KpDV>b zh@FH7Y=+ipYid6!n|0jQGk;Xo<$T-ElFE}Nn`Gp%S@e4u5=%8$Yya}M%4N%#@5$gs z?y+>zRLcjCm9y__tZd;|1Yt;yGMQtuh};L>%knxr#%X8smK5yj?#r8k%c(j*j-+E$ zAf?K^)aOm2+I{_3JPnbWrzUj|Lw_%%-|slep2#)j`;@-z4{HWOJqzO@+EgwfI(fFk zjWm0?pVD=qpC$z7Ki`(EdP~v3ep?nTQ>|sZ$18LXR}c2|{|KuYa2X6W8T1VcxR4w( z(c$4hJcaaYcpDmfe>DlDe?w%^9f+I~jv|LotiBjH0oAI(n;#GZK`tRO_C5rKv>#}) z98p{I=M8i`A5QF6Wa?6xSy7W9Joi8v$a4~MX^Eame2dxMhGDy}+)jl{zt@76--DQ6 z!zIH2zYoMq00aCKKN=7S_(BizWrE{@rmtPz0v`T9e{1HPAT~$^Lkwc3`7>^RB(RzU zIE?<0x?Cb52tA8;y;)mKs>P;;Uwaiw2Q~p(-i&zTh#JyurxY_*UzL9{&M-cy0PqR`>XtJ^yMkfrOBI z*ZV1q_*57}%Ed0hRPmQAB{I;D08L2s8Mh_T?h^tnN{~zgG1Sv1cnsuq=<2#JJFs-k z_pzgq>wZvHyN`HQJ^9`d9&@i;cefr^CCeKa zy+6q&ORQ+_n1FvxdpwWIO)*p^v1dNIf9T08@ln$j!_ED-@^T!9ueCEzsr2T)ycuV_ z@Lsd_Q2IIZE$&g1g z>*^}`@Ia%C1WHtx-Fu?hkKM9TC3sr1SeYT=R`7smor|Q+mFk!14hrD~d3sE*qje6K zn9b&OG61NOc?E?A->BoZ+Ul!nx)4d5MZoV@|FJ5IOe{A}P%Eg&Gz@PUfH9LKs*w2Z z+AF|DXx1o`sMVIHVIx%PNSb`Jb)f{24{PH{K>T>3bKvb^(i&eoyG=%0Mh4Y9zuOX5 z7~@*NRYgOo$}n?5s#b7Odg{oaNs5rDYvN6YWenL2A;?ey3SBoB{dM$=JAk2da5ivc zPQzS9BDEb{^^s}=Da|i=pv_pwB52UEh9roDL?p2ScIj?rEJSAIPv&b6_`u>TU)54{ zyEb)Zk;-&F)Y`os*#=fLe(i2B%s@G_CDq;^u`KwU640Srh-QYTE(-c73veYJ;z;xs!{{-#$(jjAJ>E$`f@8I8A zFUrfjGSZ1W%rZR~p~1t?0DSLWKrc!H8k`y=?SY9mwbq`}MYA%Fj5wm0^_U!}AL)H?q-yduDl?r+BFO|%0Np*npmVThP3bj%G(GWAtKSOEI>{cxQ8%! zrZCZ~GliaxcXJH

Jp)>Q+dG71!-iOWxwotqhyf{2B2@@h1P2~18} zg$T@1jE`Dkj!^~f9T6)f`;*=;w~mAFT;l=wm1!{FB1A+t-r*;0RL+pr6E=)atksxs-MC>K(nCX;Wl{HT{!cBgeUJwb}QLi3{9z zEyu}8X1e0xeGq!Bn5Z zK5s5TgC@&eRGMy8A5#E*pm?9}6uWVMj;{LfMc~b!x|Qg7|eoH9u;a124^B0m7QN+`hH-}3L(oU;|&O1#>*kYi*!QF*5my9WVkbdb_3 zabG#zeh<<()QPkr{zJff-6k-l`W-;?rl!kxgZ&e`8oK4c?!EG6$kS$X3Fp)fY(p$i za%;9;)-83L%*)pMv>{e+t*$FJKcK=vUXIxZ$I$KM!r<-AmVEDY_WP%;N2e22b~TRG z?hZzcjKO&0LkSVb_W)}F9JzpgDjWxitqH@e1)g5#r~0z{4ZN!;{+ePWyi-U}I-& uVe0i_P>x-l=&&8@BeR~39_;P literal 0 HcmV?d00001 diff --git a/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/_category_.json b/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/_category_.json index fdc7d13cad..f86d5546ee 100644 --- a/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/_category_.json +++ b/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "CEMBA", - "position": 2 + "position": 3 } diff --git a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/_category_.json b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/_category_.json index 464a50a2b9..43e366fed5 100644 --- a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/_category_.json +++ b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Exome Germline Single Sample", - "position": 3 + "position": 4 } diff --git a/website/docs/Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/_category_.json b/website/docs/Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/_category_.json index ab60b23f0b..ba21134e2a 100644 --- a/website/docs/Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/_category_.json +++ b/website/docs/Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/_category_.json @@ -1,4 +1,4 @@ { "label": "GDC Whole Genome Somatic Single Sample", - "position": 4 + "position": 5 } diff --git a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/_category_.json b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/_category_.json index b9d3ce3a31..13b02c85b0 100644 --- a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/_category_.json +++ b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Illumina Genotyping Array", - "position": 5 + "position": 6 } diff --git a/website/docs/Pipelines/Imputation_Pipeline/_category_.json b/website/docs/Pipelines/Imputation_Pipeline/_category_.json index 2dcc66a297..4551f71dc4 100644 --- a/website/docs/Pipelines/Imputation_Pipeline/_category_.json +++ b/website/docs/Pipelines/Imputation_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Imputation", - "position": 6 + "position": 7 } diff --git a/website/docs/Pipelines/Multiome_Pipeline/README.md b/website/docs/Pipelines/Multiome_Pipeline/README.md index c32b6a3bc0..97815b03c8 100644 --- a/website/docs/Pipelines/Multiome_Pipeline/README.md +++ b/website/docs/Pipelines/Multiome_Pipeline/README.md @@ -7,7 +7,9 @@ slug: /Pipelines/Multiome_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [Multiome v3.0.1](https://github.com/broadinstitute/warp/releases) | December, 2023 | Kaylee Mathews | Please file GitHub issues in warp or contact the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org) | + +| [Multiome v3.1.1](https://github.com/broadinstitute/warp/releases) | January, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org) | + ![Multiome_diagram](./multiome_diagram.png) @@ -29,7 +31,7 @@ The following table provides a quick glance at the Multiome pipeline features: | Pipeline features | Description | Source | |--- | --- | --- | | Assay type | 10x single cell or single nucleus gene expression (GEX) and ATAC | [10x Genomics](https://www.10xgenomics.com) | -| Overall workflow | Barcode correction, read alignment, gene and fragment quanitification | +| Overall workflow | Barcode correction, read alignment, gene and fragment quantification | | Workflow language | WDL 1.0 | [openWDL](https://github.com/openwdl/wdl) | | Genomic Reference Sequence | GRCh38 human genome primary sequence | GENCODE [human reference files](https://www.gencodegenes.org/human/release_43.html)| | Gene annotation reference (GTF) | Reference containing gene annotations | GENCODE [human GTF](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.annotation.gtf.gz) | @@ -48,7 +50,7 @@ To discover and search releases, use the WARP command-line tool [Wreleaser](http If you’re running a Multiome workflow version prior to the latest release, the accompanying documentation for that release may be downloaded with the source code on the WARP [releases page](https://github.com/broadinstitute/warp/releases) (see the source code folder). -Multiome can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/stable/), a GA4GH compliant, flexible workflow management system that supports multiple computing platforms. The workflow can also be run in [Terra](https://app.terra.bio), a cloud-based analysis platform. The [Multiome public workspace](https://app.terra.bio/#workspaces/warp-pipelines/Multiome) on Terra contains the Multiome workflow, workflow configuration, required reference data and other inputs, and example testing data. +Multiome can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/stable/), a GA4GH-compliant, flexible workflow management system that supports multiple computing platforms. The workflow can also be run in [Terra](https://app.terra.bio), a cloud-based analysis platform. The [Multiome public workspace](https://app.terra.bio/#workspaces/warp-pipelines/Multiome) on Terra contains the Multiome workflow, workflow configuration, required reference data and other inputs, and example testing data. ## Inputs @@ -70,6 +72,7 @@ Multiome can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/sta | star_strand_mode | Optional string for the Optimus (GEX) pipeline for performing STARsolo alignment on forward stranded, reverse stranded, or unstranded data; default is "Forward". | String | | count_exons | Optional boolean for the Optimus (GEX) pipeline indicating if the workflow should calculate exon counts **when in single-nucleus (sn_rna) mode**; if "true" in sc_rna mode, the workflow will return an error; default is "false". | Boolean | | gex_whitelist | Optional file containing the list of valid barcodes for 10x multiome GEX data; default is "gs://gcp-public-data--broad-references/RNA/resources/arc-v1/737K-arc-v1_gex.txt". | File | +| soloMultiMappers | Optional string describing whether or not the Optimus (GEX) pipeline should run STARsolo with the `--soloMultiMappers` flag. | String | | atac_r1_fastq | Array of read 1 paired-end FASTQ files representing a single 10x multiome ATAC library. | Array[File] | | atac_r2_fastq | Array of barcodes FASTQ files representing a single 10x multiome ATAC library. | Array[File] | | atac_r3_fastq | Array of read 2 paired-end FASTQ files representing a single 10x multiome ATAC library. | Array[File] | @@ -114,7 +117,12 @@ The Multiome workflow calls two WARP subworkflows, one external subworkflow (opt | gene_metrics_gex | `_gex.gene_metrics.csv.gz` | CSV file containing the per-gene metrics. | | cell_calls_gex | `_gex.emptyDrops` | TSV file containing the EmptyDrops results when the Optimus workflow is run in sc_rna mode. | | h5ad_output_file_gex | `_gex.h5ad` | h5ad (Anndata) file containing the raw cell-by-gene count matrix, gene metrics, cell metrics, and global attributes. Also contains equivalent ATAC barcode for each gene expression barcode in the `atac_barcodes` column of the `h5ad.obs` property. See the [Optimus Count Matrix Overview](../Optimus_Pipeline/Loom_schema.md) for more details. | -| cell_barcodes_csv | `` | Optional output produced when `run_cellbender` is "true"; see CellBender [documentation](https://cellbender.readthedocs.io/en/latest/usage/index.html) and [GitHub repository](https://github.com/broadinstitute/CellBender/tree/master) for more information. | +| multimappers_EM_matrix | `UniqueAndMult-EM.mtx` | Optional output produced when `soloMultiMappers` is "EM"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information.| +| multimappers_Uniform_matrix | `UniqueAndMult-Uniform.mtx` | Optional output produced when `soloMultiMappers` is "Uniform"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information.| +| multimappers_Rescue_matrix | `UniqueAndMult-Rescue.mtx` | Optional output produced when `soloMultiMappers` is "Rescue"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | +| multimappers_PropUnique_matrix | `UniqueAndMult-PropUnique.mtx` | Optional output produced when `soloMultiMappers` is "PropUnique"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information.| +| gex_aligner_metrics | `.star_metrics.tar` | Text file containing per barcode metrics (`CellReads.stats`) produced by the GEX pipeline STARsolo aligner. | +| cell_barcodes_csv | `` | Optional output produced when `run_cellbender` is "true"; see CellBender [documentation](https://cellbender.readthedocs.io/en/latest/usage/index.html) and [GitHub repository](https://github.com/broadinstitute/CellBender/tree/master) for more information.| | checkpoint_file | `` | Optional output produced when `run_cellbender` is "true"; see CellBender [documentation](https://cellbender.readthedocs.io/en/latest/usage/index.html) and [GitHub repository](https://github.com/broadinstitute/CellBender/tree/master) for more information. | | h5_array | `` | Optional output produced when `run_cellbender` is "true"; see CellBender [documentation](https://cellbender.readthedocs.io/en/latest/usage/index.html) and [GitHub repository](https://github.com/broadinstitute/CellBender/tree/master) for more information. | | html_report_array | `` | Optional output produced when `run_cellbender` is "true"; see CellBender [documentation](https://cellbender.readthedocs.io/en/latest/usage/index.html) and [GitHub repository](https://github.com/broadinstitute/CellBender/tree/master) for more information. | diff --git a/website/docs/Pipelines/Multiome_Pipeline/_category_.json b/website/docs/Pipelines/Multiome_Pipeline/_category_.json index 5ee853ee32..1ec6f2bad8 100644 --- a/website/docs/Pipelines/Multiome_Pipeline/_category_.json +++ b/website/docs/Pipelines/Multiome_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Multiome scATAC and GEX", - "position": 7 + "position": 8 } diff --git a/website/docs/Pipelines/Optimus_Pipeline/README.md b/website/docs/Pipelines/Optimus_Pipeline/README.md index 448b721178..ffe147e4ea 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/README.md +++ b/website/docs/Pipelines/Optimus_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Optimus_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [optimus_v6.3.0](https://github.com/broadinstitute/warp/releases?q=optimus&expanded=true) | December, 2023 | Elizabeth Kiernan | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [optimus_v6.3.5](https://github.com/broadinstitute/warp/releases?q=optimus&expanded=true) | January, 2024 | Elizabeth Kiernan | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![Optimus_diagram](Optimus_diagram.png) @@ -93,6 +93,7 @@ The example configuration files also contain metadata for the reference files, d | annotations_gtf | GTF containing gene annotations used for gene tagging (must match GTF in STAR reference). | N/A | | tenx_chemistry_version | Integer that specifies if data was generated with 10x v2 or v3 chemistry. Optimus validates this chemistry by examining the UMIs and CBs in the first read 1 FASTQ file. If the chemistry does not match, the pipeline will fail. You can remove the check by setting "ignore_r1_read_length = true" in the input JSON. | 2 or 3 | | mt_genes | Optional file containing mitochondrial gene names for a specific species. This is used for calculating gene metrics. | N/A | +| soloMultiMappers | Optional string describing whether or not the Optimus (GEX) pipeline should run STARsolo with the `--soloMultiMappers` flag. | N/A | | counting_mode | String describing whether data is single-cell or single-nucleus. Single-cell mode counts reads aligned to the gene transcript, whereas single-nucleus counts whole transcript to account for nuclear pre-mRNA. | "sc_rna" or "sn_rna" | | output_bam_basename | String used as a basename for output BAM file; the default is set to the string used for the `input_id` parameter. | N/A | | star_strand_mode | Optional string for running the workflow on forward stranded, reverse stranded, or unstranded data; default is "Forward". | "Forward" (default), "Reverse", and "Unstranded" | @@ -252,6 +253,10 @@ The following table lists the output files produced from the pipeline. For sampl | cell_metrics | `.cell-metrics.csv.gz` | Matrix of metrics by cells. | Compressed CSV | | gene_metrics | `.gene-metrics.csv.gz` | Matrix of metrics by genes. | Compressed CSV | | aligner_metrics | `.cell_reads.txt` | Per barcode metrics (CellReads.stats) produced by the STARsolo aligner. | TXT | +| multimappers_EM_matrix | `UniqueAndMult-EM.mtx` | Optional output produced when `soloMultiMappers` is "EM"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | +| multimappers_Uniform_matrix | `UniqueAndMult-Uniform.mtx` | Optional output produced when `soloMultiMappers` is "Uniform"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | +| multimappers_Rescue_matrix | `UniqueAndMult-Rescue.mtx` | Optional output produced when `soloMultiMappers` is "Rescue"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | +| multimappers_PropUnique_matrix | `UniqueAndMult-PropUnique.mtx` | Optional output produced when `soloMultiMappers` is "PropUnique"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information.| MTX | | cell_calls | empty_drops_result.csv | emptyDrops results from the RunEmptyDrops task. | CSV | | h5ad_output_file | `.h5ad` | h5ad file with count data (exonic or whole transcript depending on the counting_mode) and metadata. | H5AD | diff --git a/website/docs/Pipelines/Optimus_Pipeline/_category_.json b/website/docs/Pipelines/Optimus_Pipeline/_category_.json index 8956ac5542..ebfd0a5ec3 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/_category_.json +++ b/website/docs/Pipelines/Optimus_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Optimus", - "position": 8 + "position": 9 } diff --git a/website/docs/Pipelines/PairedTag_Pipeline/README.md b/website/docs/Pipelines/PairedTag_Pipeline/README.md new file mode 100644 index 0000000000..81c7506f30 --- /dev/null +++ b/website/docs/Pipelines/PairedTag_Pipeline/README.md @@ -0,0 +1,138 @@ +--- +sidebar_position: 1 +slug: /Pipelines/PairedTag_Pipeline/README +--- + +# Paired-Tag Overview + +| Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | +| :----: | :---: | :----: | :--------------: | +| [PairedTag_v0.0.5](https://github.com/broadinstitute/warp/releases) | January, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | + +## Introduction to the Paired-Tag workflow + +The [Paired-Tag workflow](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/paired_tag/PairedTag.wdl) is an open-source, cloud-optimized pipeline developed in collaboration with the [BRAIN Initiative Cell Census Network](https://biccn.org/) (BICCN) and the BRAIN Initiative Cell Atlas Network (BICAN). It supports the processing of 3' single-nucleus histone modification data (generated with the [paired-tag protocol](https://www.nature.com/articles/s41594-023-01060-1)]) and 10x gene expression (GEX) data generated with the [10x Chromium Multiome assay](https://www.10xgenomics.com/products/single-cell-multiome-atac-plus-gene-expression). + +The workflow is a wrapper WDL script that calls two subworkflows: the Optimus workflow for single-nucleus GEX data and the ATAC workflow for single-nucleus histone modification data. + +The [Optimus workflow](../Optimus_Pipeline/README) (GEX) corrects cell barcodes (CBs) and Unique Molecular Identifiers (UMIs), aligns reads to the genome, calculates per-barcode and per-gene quality metrics, and produces a raw cell-by-gene count matrix. + +The [ATAC workflow](../ATAC/README) (histone modification) corrects CBs, aligns reads to the genome, calculates per-barcode quality metrics, and produces a fragment file. + +The [wrapper WDL](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/paired_tag/PairedTag.wdl) is available in the [WARP repository](https://github.com/broadinstitute/warp). + +:::info NOTE +The Paired-Tag WDL is under active development (beta); it is not officially released and is undergoing scientific validation. +::: + +## Quickstart table +The following table provides a quick glance at the Paired-Tag pipeline features: + +| Pipeline features | Description | Source | +| --- | --- | --- | +| Assay type | Droplet Paired-Tag (parallel analysis of individual cells for RNA expression and DNA from targeted tagmentation by sequencing) | [Xie et al. 2023](https://www.nature.com/articles/s41594-023-01060-1) | +| Overall workflow | Barcode correction, read alignment, gene and fragment quantification | Code available on [GitHub](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/paired_tag/PairedTag.wdl) | +| Workflow language | WDL 1.0 | [openWDL](https://github.com/openwdl/wdl) | +| Genomic Reference Sequence | GRCh38 human genome primary sequence | GENCODE [human reference files](https://www.gencodegenes.org/human/release_43.html) | +| Gene annotation reference (GTF) | Reference containing gene annotations | [GENCODE human GTF](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.annotation.gtf.gz) | +| Aligners | STARsolo (GEX), BWA-mem2 (ATAC) | [Kaminow et al. 2021](https://www.biorxiv.org/content/10.1101/2021.05.05.442755v1), [Vasimuddin et al. 2019](https://ieeexplore.ieee.org/document/8820962) | +| Transcript and fragment quantification | STARsolo (GEX), SnapATAC2 (ATAC) | [Kaminow et al. 2021](https://www.biorxiv.org/content/10.1101/2021.05.05.442755v1), [SnapATAC2](https://kzhang.org/SnapATAC2/) | +| Data input file format | File format in which sequencing data is provided | [FASTQ](https://academic.oup.com/nar/article/38/6/1767/3112533) | +| Data output file format | File formats in which Multiome output is provided | [BAM](http://samtools.github.io/hts-specs/) and [h5ad](https://anndata.readthedocs.io/en/latest/) | + + +## Set-up + +### Paired-Tag installation + +To download the latest Paired-Tag release, see the release tags prefixed with "PairedTag" on the WARP [releases page](https://github.com/broadinstitute/warp/releases). All Paired-Tag pipeline releases are documented in the [Paired-Tag changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/paired_tag/PairedTag.changelog.md). + +To search releases of this and other pipelines, use the WARP command-line tool [Wreleaser](https://github.com/broadinstitute/warp/tree/master/wreleaser). + +If you’re running a Paired-Tag workflow version prior to the latest release, the accompanying documentation for that release may be downloaded with the source code on the WARP [releases page](https://github.com/broadinstitute/warp/releases) (see the folder `website/docs/Pipelines/PairedTag_Pipeline`). + +The Paired-Tag pipeline can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/stable/), a GA4GH-compliant, flexible workflow management system that supports multiple computing platforms. The workflow can also be run in [Terra](https://app.terra.bio), a cloud-based analysis platform. + +### Inputs + +The Paired-Tag workflow inputs are specified in JSON configuration files. Example configuration files can be found in the [`test_inputs`](https://github.com/broadinstitute/warp/tree/develop/pipelines/skylab/paired_tag/test_inputs) folder in the WARP repository. + +#### Input descriptions + +| Parameter name | Description | Type | +| --- | --- | --- | +| input_id | Unique identifier describing the biological sample or replicate that corresponds with the FASTQ files; can be a human-readable name or UUID. | String | +| counting_mode | Optional string that determines whether the Optimus (GEX) pipeline should be run in single-cell mode (sc_rna) or single-nucleus mode (sn_rna); default is "sn_rna". | String | +| gex_r1_fastq | Array of read 1 FASTQ files representing a single GEX 10x library. | Array[File] | +| gex_r2_fastq | Array of read 2 FASTQ files representing a single GEX 10x library.| Array[File] | +| gex_i1_fastq | Optional array of index FASTQ files representing a single GEX 10x library; multiplexed samples are not currently supported, but the file may be passed to the pipeline. | Array[File] | +| tar_star_reference | TAR file containing a species-specific reference genome and GTF for Optimus (GEX) pipeline. | File | +| annotations_gtf | GTF file containing gene annotations used for GEX cell metric calculation and ATAC fragment metrics; must match the GTF used to build the STAR aligner. | File | +| ref_genome_fasta | Genome FASTA file used for building the indices. | File | +| mt_genes | Optional file for the Optimus (GEX) pipeline containing mitochondrial gene names used for metric calculation; default assumes 'mt' prefix in GTF (case insensitive). | File | +| tenx_chemistry_version | Optional integer for the Optimus (GEX) pipeline specifying the 10x version chemistry the data was generated with; validated by examination of the first read 1 FASTQ file read structure; default is "3". | Integer | +| emptydrops_lower | **Not used for single-nucleus data.** Optional threshold for UMIs for the Optimus (GEX) pipeline that empty drops tool should consider for determining cell; data below threshold is not removed; default is "100". | Integer | +| force_no_check | Optional boolean for the Optimus (GEX) pipeline indicating if the pipeline should perform checks; default is "false". | Boolean | +| ignore_r1_read_length | Optional boolean for the Optimus (GEX) pipeline indicating if the pipeline should ignore barcode chemistry check; if "true", the workflow will not ensure the `10x_chemistry_version` input matches the chemistry in the read 1 FASTQ; default is "false". | Boolean | +| star_strand_mode | Optional string for the Optimus (GEX) pipeline for performing STARsolo alignment on forward stranded, reverse stranded, or unstranded data; default is "Forward". | String | +| count_exons | Optional boolean for the Optimus (GEX) pipeline indicating if the workflow should calculate exon counts **when in single-nucleus (sn_rna) mode**; if "true" in sc_rna mode, the workflow will return an error; default is "false". | Boolean | +| gex_whitelist | Optional file containing the list of valid barcodes for 10x multiome GEX data; default is "gs://gcp-public-data--broad-references/RNA/resources/arc-v1/737K-arc-v1_gex.txt". | File | +| atac_r1_fastq | Array of read 1 paired-end FASTQ files representing a single paired-tag DNA library. | Array[File] | +| atac_r2_fastq | Array of barcodes FASTQ files representing a single paired-tag DNA library. | Array[File] | +| atac_r3_fastq | Array of read 2 paired-end FASTQ files representing a single paired-tag DNA library. | Array[File] | +| tar_bwa_reference | TAR file containing the reference index files for BWA-mem alignment for the ATAC pipeline. | File | +| chrom_sizes | File containing the genome chromosome sizes; used to calculate ATAC fragment file metrics. | File | +| adapter_seq_read1 | Optional string describing the adapter sequence for ATAC read 1 paired-end reads to be used during adapter trimming with Cutadapt; default is "GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG". | String | +| adapter_seq_read3 | Optional string describing the adapter sequence for ATAC read 2 paired-end reads to be used during adapter trimming with Cutadapt; default is "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG". | String | +| atac_whitelist | Optional file containing the list of valid barcodes for 10x multiome ATAC adata; default is "gs://gcp-public-data--broad-references/RNA/resources/arc-v1/737K-arc-v1_atac.txt". | File | +| preindex | Optional boolean for the ATAC workflow; if “true”, the pipeline will run the ATAC workflow with a preindexing task necessary for processing of droplet-based Paired-Tag data where sample barcodes from read2 are combined with cell barcodes into the BB tag of the output BAM file; if “false”, the pipeline will run the ATAC workflow without preindexing and cell barcodes are stored in the CB tag of the output BAM file; default is “true”. | Boolean | + + +## Paired-Tag tasks and tools + +The Paired-Tag workflow calls two WARP subworkflows and an additional task which are described briefly in the table below. For more details on each subworkflow and task, see the documentation and WDL scripts linked in the table. + +| Subworkflow/Task | Software | Description | +| ----------- | -------- | ----------- | +| Optimus ([WDL](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/optimus/Optimus.wdl) and [documentation](../Optimus_Pipeline/README)) | fastqprocess, STARsolo, Emptydrops | Workflow used to analyze 10x single-cell GEX data. | +| ​​PairedTagDemultiplex as demultiplex ([WDL](https://github.com/broadinstitute/warp/blob/develop/tasks/skylab/PairedTagUtils.wdl)) | UPStools | Task used to check the length of the read2 FASTQ (should be either 27 or 24 bp). If `preindex` is set to true, the task will perform demultiplexing of the 3-bp sample barcode from the read2 ATAC fastq files and stores it in the readname. It will then perform barcode orientation checking. The ATAC workflow will then add a combined 3 bp sample barcode and cellular barcode to the BB tag of the BAM. If `preindex` is false and then length is 27 bp, the task will perform trimming and subsequent barcode orientation checking. | +ATAC ([WDL](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/multiome/atac.wdl) and [documentation](../ATAC/README)) | fastqprocess, bwa-mem, SnapATAC2 | Workflow used to analyze single-nucleus paired-tag DNA (histone modifications) data. | + + +## Outputs + +| Output variable name | Filename, if applicable | Output format and description | +|--- | --- | --- | +| pairedtag_pipeline_version_out | N.A. | String describing the version of the Paired-Tag pipeline used. | +| bam_aligned_output_atac | `_atac.bam` | BAM file containing aligned reads from ATAC workflow; contains sample and cell barcodes stored in the BB tag if `preindex` is “true”. | +| fragment_file_atac | `_atac.fragments.tsv` or if preindexing = true, `_atac.fragments.BB.tsv | TSV file containing fragment start and stop coordinates per barcode. The columns are "Chromosome", "Start", "Stop", "Barcode", and "Number of reads". | +| snap_metrics_atac | `_atac.metrics.h5ad` | h5ad (Anndata) file containing per-barcode metrics from SnapATAC2. See the [ATAC Count Matrix Overview](../ATAC/count-matrix-overview.md) for more details. | +| genomic_reference_version_gex | `.txt` | File containing the Genome build, source and GTF annotation version. | +| bam_gex | `_gex.bam` | BAM file containing aligned reads from Optimus workflow. | +| matrix_gex | `_gex_sparse_counts.npz` | NPZ file containing raw gene by cell counts. | +| matrix_row_index_gex | `_gex_sparse_counts_row_index.npy` | NPY file containing the row indices. | +| matrix_col_index_gex | `_gex_sparse_counts_col_index.npy` | NPY file containing the column indices. | +| cell_metrics_gex | `_gex.cell_metrics.csv.gz` | CSV file containing the per-cell (barcode) metrics. | +| gene_metrics_gex | `_gex.gene_metrics.csv.gz` | CSV file containing the per-gene metrics. | +| cell_calls_gex | `_gex.emptyDrops` | TSV file containing the EmptyDrops results when the Optimus workflow is run in sc_rna mode. | +| h5ad_output_file_gex | `_gex.h5ad` | h5ad (Anndata) file containing the raw cell-by-gene count matrix, gene metrics, cell metrics, and global attributes. See the [Optimus Count Matrix Overview](../Optimus_Pipeline/Loom_schema.md) for more details. | + + +## Versioning and testing + +All Paired-Tag pipeline releases are documented in the [Paired-Tag changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/paired_tag/PairedTag.wdl) and tested using [plumbing and scientific test data](https://github.com/broadinstitute/warp/tree/develop/pipelines/skylab/paired_tag/test_inputs). To learn more about WARP pipeline testing, see [Testing Pipelines](https://broadinstitute.github.io/warp/docs/About_WARP/TestingPipelines). Note that paired-tag tests are still in development. + + +## Consortia support +This pipeline is supported by the [BRAIN Initiative](https://braininitiative.nih.gov/) (BICCN and BICAN). + +If your organization also uses this pipeline, we would like to list you! Please reach out to us by contacting the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org). + + +## Acknowledgements +We are immensely grateful to the members of the BRAIN Initiative (BICAN Sequencing Working Group) and SCORCH for their invaluable and exceptional contributions to this pipeline. Our heartfelt appreciation goes to Dr. Bing Ren's lab, Yang Xie, and Lei Chang for their unwavering dedication and remarkable efforts. + + +## Feedback + +Please help us make our tools better by contacting the [WARP Pipelines Team](mailto:warp-pipelines-help@broadinstitute.org) for pipeline-related suggestions or questions. \ No newline at end of file diff --git a/website/docs/Pipelines/PairedTag_Pipeline/_category_.json b/website/docs/Pipelines/PairedTag_Pipeline/_category_.json new file mode 100644 index 0000000000..d7305fba0f --- /dev/null +++ b/website/docs/Pipelines/PairedTag_Pipeline/_category_.json @@ -0,0 +1,4 @@ +{ + "label": "Paired-Tag", + "position": 10 +} diff --git a/website/docs/Pipelines/RNA_with_UMIs_Pipeline/README.md b/website/docs/Pipelines/RNA_with_UMIs_Pipeline/README.md index 404860c4e3..c407efd7f4 100644 --- a/website/docs/Pipelines/RNA_with_UMIs_Pipeline/README.md +++ b/website/docs/Pipelines/RNA_with_UMIs_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/RNA_with_UMIs_Pipeline/README | Pipeline Version | Date Updated | Documentation Authors | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [RNAWithUMIsPipeline_v1.0.6](https://github.com/broadinstitute/warp/releases?q=RNAwithUMIs&expanded=true) | April, 2022 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) & [Kaylee Mathews](mailto:kmathews@broadinstitute.org)| Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [RNAWithUMIsPipeline_v1.0.15](https://github.com/broadinstitute/warp/releases?q=RNAwithUMIs&expanded=true) | December, 2023 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) & [Kaylee Mathews](mailto:kmathews@broadinstitute.org)| Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![RNAWithUMIs_diagram](rna-with-umis_diagram.png) @@ -235,8 +235,7 @@ Workflow outputs are described in the table below. | Output variable name | Description | Type | | ------ | ------ | ------ | | sample_name | Sample name extracted from the input unmapped BAM file header. | String -| transcriptome_bam | Duplicate-marked BAM file containing alignments from STAR translated into transcriptome coordinates. | BAM | -| transcriptome_bam_index | Index file for the transcriptome_bam output. | BAM Index | +| transcriptome_bam | Duplicate-marked BAM file containing alignments from STAR translated into transcriptome coordinates and postprocessed for RSEM. | BAM | | transcriptome_duplicate_metrics | File containing duplication metrics. | TXT | | output_bam | Duplicate-marked BAM file containing alignments from STAR translated into genome coordinates. | BAM | | output_bam_index | Index file for the output_bam output. | BAM Index | diff --git a/website/docs/Pipelines/RNA_with_UMIs_Pipeline/_category_.json b/website/docs/Pipelines/RNA_with_UMIs_Pipeline/_category_.json index e8ac72d9f6..d8cf127ef8 100644 --- a/website/docs/Pipelines/RNA_with_UMIs_Pipeline/_category_.json +++ b/website/docs/Pipelines/RNA_with_UMIs_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "RNA with UMIs", - "position": 9 + "position": 11 } diff --git a/website/docs/Pipelines/RNA_with_UMIs_Pipeline/rna-with-umis.methods.md b/website/docs/Pipelines/RNA_with_UMIs_Pipeline/rna-with-umis.methods.md index df211e00f3..7d29774ea1 100644 --- a/website/docs/Pipelines/RNA_with_UMIs_Pipeline/rna-with-umis.methods.md +++ b/website/docs/Pipelines/RNA_with_UMIs_Pipeline/rna-with-umis.methods.md @@ -2,24 +2,24 @@ sidebar_position: 2 --- -# RNA with UMIs v1.0.6 Methods +# RNA with UMIs v1.0.15 Methods Below we provide an example methods section for publications using the RNA with UMIs pipeline. For the complete pipeline documentation, see the [RNA with UMIs Overview](./README.md). ## Methods -Data preprocessing, gene counting, and metric calculation were performed using the RNA with UMIs v1.0.6 pipeline, which uses Picard v2.26.11, fgbio v1.4.0, fastp v0.20.1, FastQC v0.11.9, STAR v2.7.10a, Samtools v1.11, UMI-tools v1.1.1, GATK, and RNA-SeQC v2.4.2 with default tool parameters unless otherwise specified. Reference files are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references;tab=objects?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in [example configuration files](https://github.com/broadinstitute/warp/tree/develop/pipelines/broad/rna_seq/test_inputs) in the in the WARP repository. +Data preprocessing, gene counting, and metric calculation were performed using the RNA with UMIs v1.0.6 pipeline, which uses Picard, fgbio v1.4.0, fastp v0.20.1, FastQC v0.11.9, STAR v2.7.10a, Samtools v1.11, UMI-tools v1.1.1, GATK, and RNA-SeQC v2.4.2 with default tool parameters unless otherwise specified. Reference files are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references;tab=objects?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in [example configuration files](https://github.com/broadinstitute/warp/tree/develop/pipelines/broad/rna_seq/test_inputs) in the in the WARP repository. -Paired-end FASTQ files were first converted to an unmapped BAM (uBAM) using Picard's FastqToSam tool with SORT_ORDER = unsorted. (If a read group unmapped BAM file is used as input for the pipeline, this step is skipped.) Unique molecular identifiers (UMIs) were extracted from the uBAM using fgbio's ExtractUmisFromBam and stored in the RX read tag. +Paired-end FASTQ files were first converted to an unmapped BAM (uBAM) using Picard's (v3.0.0) FastqToSam tool with SORT_ORDER = unsorted. (If a read group unmapped BAM file is used as input for the pipeline, this step is skipped.) Unique molecular identifiers (UMIs) were extracted from the uBAM using fgbio's ExtractUmisFromBam and stored in the RX read tag. After the extraction of UMIs, reads that failed quality control checks performed by the sequencing platform were filtered and the uBAM was converted to FASTQ files using Picard's FastqToSam tool. Illumina TruSeq adapter and poly(A) sequences were clipped from the reads using fastp. Picard's FastqToSam tool was again used to convert the FASTQ files back to a uBAM. This uBAM was used to calculate quality control metrics using FastQC. Reads were aligned using STAR to the GRCh38 (hg38) reference with HLA, ALT, and decoy contigs removed with gene annotations from GENCODE v34 (or GRCh37 [hg19] with gene annotations from GENCODE v19). The --readFilesType and --readFilesCommand parameters were set to "SAM PE" and "samtools view -h", respectively, to indicate that the input was a BAM file. To specify that the output was an unsorted BAM that included unmapped reads, --outSAMtype was set to "BAM Unsorted" and --outSAMunmapped was set to "Within". A transcriptome-aligned BAM was also output with --quantMode = TranscriptomeSAM. To match [ENCODE bulk RNA-seq data standards](https://www.encodeproject.org/data-standards/rna-seq/long-rnas/), the alignment was performed with parameters --outFilterType = BySJout, --outFilterMultimapNmax = 20, --outFilterMismatchNmax = 999, --alignIntronMin = 20, --alignIntronMax = 1000000, --alignMatesGapMax = 1000000, --alignSJoverhangMin = 8, and --alignSJDBoverhangMin = 1. The fraction of reads required to match the reference was set with --outFilterMatchNminOverLread = 0.33 and the fraction of allowable mismatches to read length was set with --outFilterMismatchNoverLmax = 0.1. Chimeric alignments were included with --chimSegmentMin = 15, where 15 was the minimum length of each segment, and --chimMainSegmentMultNmax = 1 to prevent main chimeric segments from mapping to multiple sites. To output chimeric segments with soft-clipping in the aligned BAM, --chimOutType was set to "WithinBAM SoftClip". A maximum of 20 protruding bases at the ends of alignments was allowed with --alignEndsProtrude set to "20 ConcordantPair" to prevent reads from small cDNA fragments that were sequenced into adapters from being dropped. -Following alignment, both BAM files were sorted by coordinate with Picard's SortSam tool. UMI-tools was then used to further divide putative duplicates into subgroups based on UMI and sequencing errors in UMIs were corrected. To specify the tag where the UMIs were stored, --extract-umi-method was set to "tag" and --umi-tag was set to "RX". Unmapped reads were included in the output file with --unmapped-reads = use. Tagged BAM files were output using the option --output-bam. SortSam was used again to sort the BAM files by queryname for Picard's MarkDuplicates tool. MarkDuplicates was used to mark PCR duplicates and calculate duplicate metrics. After duplicate marking, BAM files were sorted by coordiante using SortSam to facilitate downstream analysis. The transcriptome-aligned, duplicate-marked BAM was sorted using GATK's (v4.2.6.0) PostProcessReadsForRSEM tool for compatability with RSEM. +Following alignment, both BAM files were sorted by coordinate with Picard's (v2.6.11) SortSam tool. UMI-tools was then used to further divide putative duplicates into subgroups based on UMI and sequencing errors in UMIs were corrected. To specify the tag where the UMIs were stored, --extract-umi-method was set to "tag" and --umi-tag was set to "RX". Unmapped reads were included in the output file with --unmapped-reads = use. Tagged BAM files were output using the option --output-bam. SortSam was used again to sort the BAM files by queryname for Picard's (v2.26.11) MarkDuplicates tool. MarkDuplicates was used to mark PCR duplicates and calculate duplicate metrics. After duplicate marking, BAM files were sorted by coordiante using SortSam to facilitate downstream analysis. The transcriptome-aligned, duplicate-marked BAM was sorted and postprocessed using GATK's (v4.2.6.0) PostProcessReadsForRSEM tool for compatability with RSEM. -The genome-aligned, duplicate-marked BAM file was then used to calculate summary metrics using RNASeQC, Picard's CollectRNASeqMetrics and CollectMultipleMetrics tools, and GATK's (v4.2.6.1) GetPileupSummaries and CalculateContamination tools. CollectMultipleMetrics was used with the programs “CollectInsertSizeMetrics” and “CollectAlignmentSummaryMetrics”. GetPileupSummaries was run with the read filters, "WellformedReadFilter" and "MappingQualityAvailableReadFilter" disabled. +The genome-aligned, duplicate-marked BAM file was then used to calculate summary metrics using RNASeQC, Picard's (v2.26.11) CollectRNASeqMetrics and (v3.0.0) CollectMultipleMetrics tools, and GATK's (v4.3.0.0) GetPileupSummaries and CalculateContamination tools. CollectMultipleMetrics was used with the programs “CollectInsertSizeMetrics” and “CollectAlignmentSummaryMetrics”. GetPileupSummaries was run with the read filters, "WellformedReadFilter" and "MappingQualityAvailableReadFilter" disabled. -The final outputs of the RNA with UMIs pipeline included metrics generated before alignment with FastQC, a transcriptome-aligned, duplicate-marked BAM file with corresponding index and duplication metrics, and a genome-aligned, duplicate-marked BAM file with corresponding index, duplication metrics, and metrics generated with RNASeQC, Picard, and GATK tools. +The final outputs of the RNA with UMIs pipeline included metrics generated before alignment with FastQC, a transcriptome-aligned, duplicate-marked BAM file with duplication metrics, and a genome-aligned, duplicate-marked BAM file with corresponding index, duplication metrics, and metrics generated with RNASeQC, Picard, and GATK tools. diff --git a/website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/_category_.json b/website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/_category_.json index 46c3145c00..2145d730e7 100644 --- a/website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/_category_.json +++ b/website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Single Cell ATAC", - "position": 10 + "position": 12 } diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/_category_.json b/website/docs/Pipelines/SlideSeq_Pipeline/_category_.json index ee0c22df80..a658fab6e4 100644 --- a/website/docs/Pipelines/SlideSeq_Pipeline/_category_.json +++ b/website/docs/Pipelines/SlideSeq_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Slide-seq", - "position": 11 + "position": 13 } \ No newline at end of file diff --git a/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/README.md b/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/README.md index 38f4bca0f1..1f069a419d 100644 --- a/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Smart-seq2_Multi_Sample_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [MultiSampleSmartSeq2_v2.2.1](https://github.com/broadinstitute/warp/releases) | May, 2021 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [MultiSampleSmartSeq2_v2.2.21](https://github.com/broadinstitute/warp/releases) | December, 2023 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction diff --git a/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/_category_.json b/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/_category_.json index d5322b1e81..a14bbb52df 100644 --- a/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/_category_.json +++ b/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Smart-seq2 Multi-Sample", - "position": 13 + "position": 15 } diff --git a/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/smart-seq2.methods.md b/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/smart-seq2.methods.md index e443bffe76..9e669ad57e 100644 --- a/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/smart-seq2.methods.md +++ b/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/smart-seq2.methods.md @@ -2,12 +2,12 @@ sidebar_position: 2 --- -# Smart-seq2 Multi-Sample v2.2.1 Publication Methods +# Smart-seq2 Multi-Sample v2.2.21 Publication Methods Below we provide an example methods section for a publication. For the complete pipeline documentation, see the [Smart-seq2 Multi-Sample Overview](./README.md). ## Methods -Data preprocessing and count matrix construction for a sample batch (or plate) were performed using the Smart-seq2 Multi-Sample v2.2.0 Pipeline (RRID:SCR_018920). For each cell in the batch, paired- or single-end FASTQ files were first processed with the Smart-seq2 Single Sample v5.1.1 Pipeline (RRID:SCR_021228). Reads were aligned to the GENCODE mouse (M21) or human (V27) reference genome using HISAT2 v2.1.0 with default parameters in addition to `--k 10` options. Metrics were collected and duplicate reads marked using the Picard v.2.10.10 `CollectMultipleMetrics` and `CollectRnaSeqMetrics`, and MarkDuplicates functions with validation_stringency=silent. For transcriptome quantification, reads were aligned to the GENCODE transcriptome using HISAT2 v2.1.0 with `--k 10 --no-mixed --no-softclip --no-discordant --rdg 99999999,99999999 --rfg 99999999,99999999 --no-spliced-alignment` options. Gene expression was calculated using RSEM v1.3.0’s `rsem-calculate-expression --calc-pme --single-cell-prior`. QC metrics, RSEM TPMs and RSEM estimated counts were exported to a single Loom file for each cell. All individual Loom files for the entire batch were aggregated into a single Loom file for downstream processing. The final output included the unfiltered Loom and the tagged, unfiltered individual BAM files. +Data preprocessing and count matrix construction for a sample batch (or plate) were performed using the Smart-seq2 Multi-Sample v2.2.21 Pipeline (RRID:SCR_018920). For each cell in the batch, paired- or single-end FASTQ files were first processed with the Smart-seq2 Single Sample v5.1.20 Pipeline (RRID:SCR_021228). Reads were aligned to the GENCODE mouse (M21) or human (V27) reference genome using HISAT2 v2.1.0 with default parameters in addition to `--k 10` options. Metrics were collected and duplicate reads marked using the Picard v.2.26.10 `CollectMultipleMetrics` and `CollectRnaSeqMetrics`, and MarkDuplicates functions with validation_stringency=silent. For transcriptome quantification, reads were aligned to the GENCODE transcriptome using HISAT2 v2.1.0 with `--k 10 --no-mixed --no-softclip --no-discordant --rdg 99999999,99999999 --rfg 99999999,99999999 --no-spliced-alignment` options. Gene expression was calculated using RSEM v1.3.0’s `rsem-calculate-expression --calc-pme --single-cell-prior`. QC metrics, RSEM TPMs and RSEM estimated counts were exported to a single Loom file for each cell. All individual Loom files for the entire batch were aggregated into a single Loom file for downstream processing. The final output included the unfiltered Loom and the tagged, unfiltered individual BAM files. An example of the pipeline and outputs can be found in [Terra](https://app.terra.bio/#workspaces/featured-workspaces-hca/HCA%20Smart-seq2%20Multi%20Sample%20Pipeline) and additional documentation can be found in the [Smart-seq2 Multi-Sample Overview](./README.md). Examples of genomic references, whitelists, and other inputs are available in the warp repository (see the *_example.json files at [human_single_example](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_multisample/human_single_example.json). diff --git a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md index ed704b44d5..e21fe808ee 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [MultiSampleSmartSeq2SingleNuclei_v1.2.14](https://github.com/broadinstitute/warp/releases) | November, 2022 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [MultiSampleSmartSeq2SingleNuclei_v1.2.28](https://github.com/broadinstitute/warp/releases) | January, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![](./snSS2.png) @@ -102,6 +102,7 @@ To see specific tool parameters, select the task WDL link in the table; then vie | Task name and WDL link | Tool | Software | Description | | --- | --- | --- | --- | | [CheckInputs.checkInputArrays](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/CheckInputs.wdl) | --- | Bash | Checks the inputs and initiates the per cell processing. | +| [StarAlign.STARGenomeRefVersion](https://github.com/broadinstitute/warp/tree/master/tasks/skylab/StarAlign.wdl) | --- | Bash | Reads the `tar_star_reference` file to obtain the genomic reference source, build version, and annotation version. | | [TrimAdapters.TrimAdapters](https://github.com/broadinstitute/warp/tree/master/tasks/skylab/TrimAdapters.wdl) | [fastq-mcf](https://github.com/ExpressionAnalysis/ea-utils/tree/master/clipper) | [ea-utils](https://github.com/ExpressionAnalysis/ea-utils) | Trims adapter sequences from the FASTQ inputs | | [StarAlign.StarAlignFastqMultisample](https://github.com/broadinstitute/warp/tree/master/tasks/skylab/StarAlign.wdl) | STAR | [STAR](https://github.com/alexdobin/STAR) | Aligns reads to the genome. | | [Picard.RemoveDuplicatesFromBam](https://github.com/broadinstitute/warp/tree/master/tasks/skylab/Picard.wdl) | MarkDuplicates, AddOrReplaceReadGroups | [Picard](https://broadinstitute.github.io/picard/) | Removes duplicate reads, producing a new BAM output; adds regroups to deduplicated BAM. | diff --git a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/_category_.json b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/_category_.json index 11bae335db..19995a09ed 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/_category_.json +++ b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Smart-seq2 Single Nucleus Multi-Sample", - "position": 12 + "position": 14 } diff --git a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md index 5a243797d8..77dedddb0e 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md +++ b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md @@ -2,13 +2,13 @@ sidebar_position: 2 --- -# Smart-seq2 Single Nucleus Multi-Sample v1.2.0 Publication Methods +# Smart-seq2 Single Nucleus Multi-Sample v1.2.26 Publication Methods Below we provide an example methods section for a publication. For the complete pipeline documentation, see the [Smart-seq2 Single Nucleus Multi-Sample Overview](./README.md). ## Methods -Data preprocessing and count matrix construction for a batch (or plate) were performed using the Smart-seq2 Single Nucleus Multi-Sample v1.2.0 Pipeline (RRID:SCR_021312) as well as Picard v.2.25.5 with default tool parameters unless otherwise specified. Genomic references are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/mm10/v0/single_nucleus?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in the [example workflow configuration](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_nucleus_multisample/mouse_example.json) in GitHub. +Data preprocessing and count matrix construction for a batch (or plate) were performed using the Smart-seq2 Single Nucleus Multi-Sample v1.2.26 Pipeline (RRID:SCR_021312) as well as Picard v.2.26.10 with default tool parameters unless otherwise specified. Genomic references are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/mm10/v0/single_nucleus?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in the [example workflow configuration](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_nucleus_multisample/mouse_example.json) in GitHub. For each nucleus in the batch, paired-end FASTQ files were first trimmed to remove adapters using the fastq-mcf tool with a subsampling parameter of 200,000 reads. The trimmed FASTQ files were then aligned to the GENCODE GRCm38 mouse genome using STAR v.2.7.10a. To count the number of reads per gene, but not isoforms, the quantMode parameter was set to GeneCounts. Multi-mapped reads, and optical and PCR duplicates, were removed from the resulting aligned BAM using the Picard MarkDuplicates tool with REMOVE_DUPLICATES = true. Metrics were collected on the deduplicated BAM using Picard CollectMultipleMetrics with VALIDATION_STRINGENCY =SILENT. diff --git a/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/README.md b/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/README.md index 6b6ec22e4b..080b79071f 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/README.md @@ -210,11 +210,11 @@ The SS2 pipeline has been validated for processing human and mouse, stranded or ## Versioning All SS2 release notes are documented in the [Smartseq2 Single Sample changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/smartseq2_single_sample/SmartSeq2SingleSample.changelog.md). -## ## Citing the Smart-seq2 Single Sample Pipeline +## Citing the Smart-seq2 Single Sample Pipeline Please identify the SS2 pipeline in your methods section using the Smart-seq2 Single Sample Pipeline's [SciCrunch resource identifier](https://scicrunch.org/browse/resourcedashboard). * Ex: *Smart-seq2 Single Sample Pipeline (RRID:SCR_021228)* -## Consortia Support +## Consortia Support This pipeline is supported and used by the [Human Cell Atlas](https://www.humancellatlas.org/) (HCA) project. If your organization also uses this pipeline, we would love to list you! Please reach out to us by contacting [the WARP team](mailto:warp-pipelines-help@broadinstitute.org). diff --git a/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/_category_.json b/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/_category_.json index cd7f299ad4..75e33d384f 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/_category_.json +++ b/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Smart-seq2 Single Sample", - "position": 14 + "position": 16 } diff --git a/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/_category_.json b/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/_category_.json index eee4ec1955..010f0be5a3 100644 --- a/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/_category_.json +++ b/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Ultima Genomics Whole Genome Germline", - "position": 16 + "position": 18 } diff --git a/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/_category_.json b/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/_category_.json index b834861b50..7fd28c7d80 100644 --- a/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/_category_.json +++ b/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Whole Genome Germline Single Sample", - "position": 16 + "position": 19 } diff --git a/website/docs/Pipelines/snM3C/README.md b/website/docs/Pipelines/snM3C/README.md index 492080d734..397cada01b 100644 --- a/website/docs/Pipelines/snM3C/README.md +++ b/website/docs/Pipelines/snM3C/README.md @@ -6,87 +6,182 @@ slug: /Pipelines/snM3C/README | Pipeline Version | Date Updated | Documentation Authors | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [snM3C_v1.0.0](https://github.com/broadinstitute/warp/releases) | August, 2023 | [Kaylee Mathews](mailto:warp-pipelines-help@broadinsitute.org) | Please file GitHub issues in the [WARP repository](https://github.com/broadinstitute/warp/issues) | +| [snM3C_v1.0.1](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Kaylee Mathews](mailto:warp-pipelines-help@broadinsitute.org) | Please file GitHub issues in the [WARP repository](https://github.com/broadinstitute/warp/issues) | ## Introduction to snM3C -The Single Nucleus Methly-Seq and Chromatin Capture (snM3C) workflow is a cloud-based computational workflow for processing single-nucleus methylome and chromatin contact (snM3C) sequencing data. The workflow is designed to demultiplex raw sequencing reads, align them, call chromatin contacts, and generate summary metrics. It is developed in collaboration Hanqing Liu and the laboratory of Joseph Ecker. For more information about the snM3C tools and analysis, please see the [YAP documentation](https://hq-1.gitbook.io/mc/) or the [cemba_data](https://github.com/lhqing/cemba_data) GitHub repository created by Hanqing Liu. +The Single Nucleus Methly-Seq and Chromatin Capture (snM3C) workflow is an open-source, cloud-optimized computational workflow for processing single-nucleus methylome and chromatin contact (snM3C) sequencing data. The workflow is designed to demultiplex and align raw sequencing reads, call chromatin contacts, and generate summary metrics. -## Set-up +The workflow is developed in collaboration with Hanqing Liu and the laboratory of Joseph Ecker. For more information about the snM3C tools and analysis, please see the [YAP documentation](https://hq-1.gitbook.io/mc/) or the [cemba_data](https://github.com/lhqing/cemba_data) GitHub repository created by Hanqing Liu. + +## Quickstart table +The following table provides a quick glance at the Multiome pipeline features: + +| Pipeline features | Description | Source | +|--- | --- | --- | +| Assay type | single-nucleus methylome and chromatin contact (snM3C) sequencing data | [Lee et al. 2019](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765423/) | +| Overall workflow | Read alignment and chromatin contact calling | +| Workflow language | WDL 1.0 | [openWDL](https://github.com/openwdl/wdl) | +| Genomic Reference Sequence | GRCh38 human genome primary sequence | GENCODE [human reference files](https://www.gencodegenes.org/human/release_43.html)| +| Aligner | HISAT-3N | [Zhang at al. 2021](https://genome.cshlp.org/content/31/7/1290) | +| Data input file format | File format in which sequencing data is provided | [FASTQ](https://academic.oup.com/nar/article/38/6/1767/3112533) | +| Data output file format | File formats in which snM3C output is provided | TSV, [FASTQ](https://academic.oup.com/nar/article/38/6/1767/3112533), [BAM](http://samtools.github.io/hts-specs/), and [ALLC](https://lhqing.github.io/ALLCools/intro.html) | -### Installation -To use the latest release of the snM3C pipeline, visit the [WARP releases page](https://github.com/broadinstitute/warp/releases) and download the desired version. +## Set-up - +### snM3C installation -### Running the Workflow +To download the latest snM3C release, see the release tags prefixed with "snM3C" on the WARP [releases page](https://github.com/broadinstitute/warp/releases). All snM3C pipeline releases are documented in the [snM3C changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/snM3C/snM3C.changelog.md). -To download the latest release of the snM3C pipeline, see the release tags prefixed with "snM3C" on the WARP [releases page](https://github.com/broadinstitute/warp/releases). All releases of the snM3C pipeline are documented in the [snM3C changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/snM3C/snM3C.changelog.md). +To discover and search releases, use the WARP command-line tool [Wreleaser](https://github.com/broadinstitute/warp/tree/master/wreleaser). -To search releases of this and other pipelines, use the WARP command-line tool [Wreleaser](https://github.com/broadinstitute/warp/tree/develop/wreleaser). +If you’re running a version of the snM3C workflow prior to the latest release, the accompanying documentation for that release may be downloaded with the source code on the WARP [releases page](https://github.com/broadinstitute/warp/releases) (see the source code folder `website/docs/Pipelines/snM3C`). - +The snM3C workflow can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/stable/), a GA4GH-compliant, flexible workflow management system that supports multiple computing platforms. The workflow can also be run in [Terra](https://app.terra.bio), a cloud-based analysis platform. -The snM3C pipeline can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/stable/), a GA4GH compliant, flexible workflow management system that supports multiple computing platforms. The workflow can also be run in [Terra](https://app.terra.bio), a cloud-based analysis platform. ### Inputs -The snM3C workflow requires a JSON configuration file specifying the input files and parameters for the analysis. Example configuration files can be found in the [snM3C `test_inputs` directory](https://github.com/broadinstitute/warp/tree/develop/pipelines/skylab/snM3C/test_inputs) in the WARP repository. +The snM3C workflow requires a JSON configuration file specifying the input files and parameters for the analysis. Example configuration files can be found in the snM3C [`test_inputs`](https://github.com/broadinstitute/warp/tree/develop/pipelines/skylab/snM3C/test_inputs) directory in the WARP repository. -The main input files and parameters include: +#### Input descriptions | Parameter | Description | | ---| --- | -| fastq_input_read1 | Array of multiplexed FASTQ files for read 1 | -| fastq_input_read2 | Array of multiplexed FASTQ files for read 2 | -| random_primer_indexes | File containing random primer indexes | -| plate_id | String specifying the plate ID | -| output_basename | String specifying a basename to be used for naming files | -| tarred_index_files | File containing tarred index files for hisat-3 mapping | -| mapping_yaml | File containing YAML configuration for mapping steps with snakemake | -| snakefile | File containing the snakefile for mapping | -| chromosome_sizes | File containing chromosome sizes information | -| genome_fa | File containing the reference genome in FASTA format | - - -## Tasks and Tools -The workflow contains two tasks described below. The parameters and more details about these tools can be found in the [YAP documentation](https://hq-1.gitbook.io/mc/). +| fastq_input_read1 | Array of multiplexed FASTQ files for read 1. | +| fastq_input_read2 | Array of multiplexed FASTQ files for read 2. | +| random_primer_indexes | File containing random primer indexes. | +| plate_id | String specifying the plate ID. | +| tarred_index_files | File containing tarred index files for hisat-3 mapping. | +| genome_fa | File containing the reference genome in FASTA format. | +| chromosome_sizes | File containing the genome chromosome sizes. | +| r1_adapter | Optional string describing the adapter sequence for read 1 paired-end reads to be used during adapter trimming with Cutadapt; default is "AGATCGGAAGAGCACACGTCTGAAC". | +| r2_adapter | Optional string describing the adapter sequence for read 2 paired-end reads to be used during adapter trimming with Cutadapt; default is "AGATCGGAAGAGCGTCGTGTAGGGA". | +| r1_left_cut | Optional integer describing the number of bases to be trimmed from the beginning of read 1 with Cutadapt; default is 10. | +| r1_right_cut | Optional integer describing the number of bases to be trimmed from the end of read 1 with Cutadapt; default is 10. | +| r2_left_cut | Optional integer describing the number of bases to be trimmed from the beginning of read 2 with Cutadapt; default is 10. | +| r2_right_cut | Optional integer describing the number of bases to be trimmed from the end of read 2 with Cutadapt; default is 10. | +| min_read_length | Optional integer; if a read length is smaller than `min_read_length`, both paired-end reads will be discarded; default is 30. | +| num_upstr_bases | Optional integer describing the number of bases upstream of the C base to include in ALLC file context column created using ALLCools; default is 0. | +| num_downstr_bases | Optional integer describing the number of bases downstream of the C base to include in ALLC file context column created using ALLCools; default is 2. | +| compress_level | Optional integer describing the compression level for the output ALLC file; default is 5. | + + +## snM3C tasks and tools +The workflow contains several tasks described below. + +Overall, the snM3C workflow: + +1. Demultiplexes, sorts, and trims reads. +2. Aligns paired-end reads. +3. Separates unmapped, uniquely aligned, multi-aligned reads. +4. Splits unmapped reads by enzyme cut sites. +5. Aligns unmapped, single-end reads. +6. Removes overlapping reads. +7. Merges mapped reads from single- and paired-end alignments. +8. Calls chromatin contacts. +9. Removes duplicate reads. +10. Creates ALLC file. +11. Creates summary output file. + +The tools each snM3C task employs are detailed in the table below. + +To see specific tool parameters, select the [workflow WDL link](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/snM3C/snM3C.wdl); then find the task and view the `command {}` section of the task in the WDL script. To view or use the exact tool software, see the task's Docker image which is specified in the task WDL `# runtime values` section as `docker: `. More details about these tools and parameters can be found in the [YAP documentation](https://hq-1.gitbook.io/mc/). | Task name | Tool | Software | Description | | --- | --- | --- | --- | -| Demultiplexing | cutadapt | cutadapt | Performs demultiplexing to cell-level FASTQ files | -| Mapping | hisat-3 | hisat-3 | Performs trimming, alignment and calling chromatin contacts with a [custom snakemake](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/snM3C/Config%20files/Snakemake-file/Snakefile) file developed by Hanqing Liu. | +| Demultiplexing | Cutadapt | [Cutadapt](https://cutadapt.readthedocs.io/en/stable/) | Performs demultiplexing to cell-level FASTQ files based on random primer indices. | +| Sort_and_trim_r1_and_r2 | Cutadapt | [Cutadapt](https://cutadapt.readthedocs.io/en/stable/) | Sorts, filters, and trims reads using the `r1_adapter`, `r2_adapter`, `r1_left_cut`, `r1_right_cut`, `r2_left_cut`, and `r2_right_cut` input parameters. | +| Hisat_3n_pair_end_mapping_dna_mode | HISAT-3N | [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/) | Performs paired-end read alignment. | +| Separate_unmapped_reads | [hisat3n_general.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/hisat3n_general.py) | python3 | Imports a custom python3 script developed by Hanqing Liu and calls the `separate_unique_and_multi_align_reads()` function to separate unmapped, uniquely aligned, multi-aligned reads from HISAT-3N BAM file; unmapped reads are stored in an unmapped FASTQ file and uniquely and multi-aligned reads are stored in separate BAM files. | +| Split_unmapped_reads | [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py) | python3 | Imports a custom python3 script developed by Hanqing Liu and calls the `split_hisat3n_unmapped_reads()` function to split the unmapped reads FASTQ file by all possible enzyme cut sites and output new R1 and R2 FASTQ files. | +| Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name | HISAT-3N | [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/) | Performs single-end alignment of unmapped reads to maximize read mapping. | +| remove_overlap_read_parts | [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py) | python3 | Imports a custom python3 script developed by Hanqing Liu and calls the `remove_overlap_read_parts()` function to remove overlapping reads from the split alignment BAM file produced during single-end alignment. | +| merge_original_and_split_bam_and_sort_all_reads_by_name_and_position | merge, sort | [samtools](https://www.htslib.org/) | Merges and sorts all mapped reads from the paired-end and single-end alignments; creates a position-sorted BAM file and a name-sorted BAM file. | +| call_chromatin_contacts | [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py) | python3 | Imports a custom python3 script developed by Hanqing Liu and calls the `call_chromatin_contacts()` function to call chromatin contacts from the name-sorted, merged BAM file; reads are considered chromatin contacts if they are greater than 2,500 base pairs apart. | +| dedup_unique_bam_and_index_unique_bam | MarkDuplicates | [Picard](https://broadinstitute.github.io/picard/) | Removes duplicate reads from the position-sorted, merged BAM file. | +| unique_reads_allc | bam-to-allc | [ALLCools](https://lhqing.github.io/ALLCools/intro.html) | Creates an ALLC file with a list of methylation points. | +| unique_reads_cgn_extraction | extract-allc | [ALLCools](https://lhqing.github.io/ALLCools/intro.html) | Creates an ALLC file containing methylation contexts. | +| summary | [summary.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/summary.py) | python3 | Imports a custom python3 script developed by Hanqing Liu and calls the `snm3c_summary()` function to generate a single, summary file for the pipeline in TSV format; contains trimming, mapping, deduplication, chromatin contact, and AllC site statistics. | -## Outputs +#### 1. Demultiplexes, sorts, and trims reads +In the first step of the pipeline (`Demultiplexing`), raw sequencing reads are demultiplexed by random primer index into cell-level FASTQ files using [Cutadapt](https://cutadapt.readthedocs.io/en/stable/). For more information on barcoding, see the [YAP documentation](https://hq-1.gitbook.io/mc/tech-background/barcoding#two-round-of-barcoding). -The snM3C workflow produces the following main outputs: +After demultiplexing, the pipeline uses [Cutadapt](https://cutadapt.readthedocs.io/en/stable/) to sort, filter, and trim reads in the `Sort_and_trim_r1_and_r2` task. The R1 and R2 adapter sequences are removed, along with the number of bases specified by the `r1_left_cut`, `r1_right_cut`, `r2_left_cut`, and `r2_right_cut` input parameters. Any reads shorter than the specified `min_read_length` are filtered out in this step. -| Output | Description | -| ---| --- | -| mappingSummary | Mapping summary file in CSV format | -| allcFiles | Tarred file containing allc files | -| allc_CGNFiles| Tarred file containing CGN context-specific allc files | -| bamFiles | Tarred file containing cell-level aligned BAM files | -| detail_statsFiles | Tarred file containing detail stats files | -| hicFiles | Tarred file containing Hi-C files | +#### 2. Aligns paired-end reads +In the next step of the pipeline, the `Hisat_3n_pair_end_mapping_dna_mode` task uses [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/) to perform paired-end read alignment to a reference genome FASTA file (`genome_fa`) and outputs an aligned BAM file. Additionally, the task outputs a stats file and a text file containing the genomic reference version used. +#### 3. Separates unmapped, uniquely aligned, multi-aligned reads +After paired-end alignment, the pipeline calls the `Separate_unmapped_reads` task, which imports a custom python3 script ([hisat3n_general.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/hisat3n_general.py)) developed by Hanqing Liu. The task calls the script's `separate_unique_and_multi_align_reads()` function to separate unmapped, uniquely aligned, and multi-aligned reads from the HISAT-3N BAM file. Three new files are output from this step of the pipeline: -## Versioning +1. A FASTQ file that contains the unmapped reads (`unmapped_fastq_tar`) +2. A BAM file that contains the uniquely aligned reads (`unique_bam_tar`) +3. A BAM file that contains the multi-aligned reads (`multi_bam_tar`) -All snM3C pipeline releases are documented in the [pipeline changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/snM3C/snM3C.changelog.md). +#### 4. Splits unmapped reads by enzyme cut sites +The `Split_unmapped_reads` task imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu and calls the script's `split_hisat3n_unmapped_reads()` function. This splits the FASTQ file containing the unmapped reads by all possible enzyme cut sites and outputs new R1 and R2 files. - +#### 5. Aligns unmapped, single-end reads +In the next step of the pipeline, the `Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name ` task uses [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/) to perform single-end read alignment of the previously unmapped reads to maximize read mapping and outputs a single, aligned BAM file. -## Feedback +#### 6. Removes overlapping reads +After the second alignment step, the pipeline calls the `remove_overlap_read_parts ` task, which imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu. The task calls the script's `remove_overlap_read_parts()` function to remove overlapping reads from the BAM file produced during single-end alignment and output another BAM file. -For questions, suggestions, or feedback related to the snM3C pipeline, please contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org). Your feedback is valuable for improving the pipeline and addressing any issues that may arise during its usage. +#### 7. Merges mapped reads from single- and paired-end alignments +The `merge_original_and_split_bam_and_sort_all_reads_by_name_and_position` task uses [samtools](https://www.htslib.org/) to merge and sort all of the mapped reads from the paired-end and single-end alignments into a single BAM file. The BAM file is output as both a position-sorted and a name-sorted BAM file. - +#### 8. Calls chromatin contacts +In the `call_chromatin_contacts` task, the pipeline imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu. The task calls the script's `call_chromatin_contacts()` function to call chromatin contacts from the name-sorted, merged BAM file. If reads are greater than 2,500 base pairs apart, they are considered chromatin contacts. If reads are less than 2,500 base pairs apart, they are considered the same fragment. +#### 9. Removes duplicate reads +After calling chromatin contacts, the `dedup_unique_bam_and_index_unique_bam` task uses Picard's MarkDuplicates tool to remove duplicate reads from the position-sorted, merged BAM file and output a deduplicated BAM file. +#### 10. Creates ALLC file +The `unique_reads_allc` task uses the [ALLCools](https://lhqing.github.io/ALLCools/intro.html) `bam-to-allc` function to create an ALLC file from the deduplicated BAM file that contains a list of methylation points. The `num_upstr_bases` and `num_downstr_bases` input parameters are used to define the number of bases upstream and downstream of the C base to include in the ALLC context column. - +Next, the `unique_reads_cgn_extraction` task uses the [ALLCools](https://lhqing.github.io/ALLCools/intro.html) `extract-allc` function to extract methylation contexts from the input ALLC file and output a second ALLC file that can be used to generate an [MCDS file](https://github.com/lhqing/allcools_doc/blob/master/tech-background/file-formats.md#mcds-file). +#### 11. Creates summary output file +In the last step of the pipeline, the `summary` task imports a custom python3 script ([summary.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/summary.py)) developed by Hanqing Liu. The task calls the script's `snm3c_summary()` function to generate a single, summary file for the pipeline in TSV format; contains trimming, mapping, deduplication, chromatin contact, and AllC site statistics. This is the main output of the pipeline. + +## Outputs + +The following table lists the output variables and files produced by the pipeline. + +| Output name | Filename, if applicable | Output format and description | +| ------ | ------ | ------ | +| MappingSummary | `_MappingSummary.csv.gz` | Mapping summary file in CSV format. | +| trimmed_stats | `.trimmed_stats_files.tar.gz` | Array of tarred files containing trimming stats files; for more information, see the [Cutadapt documentation](https://cutadapt.readthedocs.io/en/stable/guide.html#reporting). | +| r1_trimmed_fq | `.R1_trimmed_files.tar.gz` | Array of tarred files containing trimmed R1 FASTQ files. | +| r2_trimmed_fq | `.R2_trimmed_files.tar.gz` | Array of tarred files containing trimmed R2 FASTQ files. | +| hisat3n_stats_tar | `.hisat3n_paired_end_stats_files.tar.gz` | Array of tarred files containing paired-end alignment summary files; see the [HISAT2 alignment summary documentation](https://daehwankimlab.github.io/hisat2/manual/) for more information. | +| hisat3n_bam_tar | `.hisat3n_paired_end_bam_files.tar.gz` | Array of tarred files containing BAM files from paired-end alignment. | +| unique_bam_tar | `.hisat3n_paired_end_unique_bam_files.tar.gz` | Array of tarred files containing BAM files with uniquely aligned reads from paired-end alignment. | +| multi_bam_tar | `.hisat3n_paired_end_multi_bam_files.tar.gz` | Array of tarred files containing BAM files with multi-aligned reads from paired-end alignment. | +| unmapped_fastq_tar | `.hisat3n_paired_end_unmapped_fastq_files.tar.gz` | Array of tarred files containing FASTQ files with unmapped reads from paired-end alignment. | +| split_fq_tar | `.hisat3n_paired_end_split_fastq_files.tar.gz` | Array of tarred files containing FASTQ files with unmapped reads split by possible enzyme cut sites. | +| merge_sorted_bam_tar | `.hisat3n_dna.split_reads.name_sort.bam.tar.gz` | Array of tarred files containing BAM files from single-end alignment. | +| name_sorted_bams | `.hisat3n_dna.all_reads.name_sort.tar.gz` | Array of tarred files containing name-sorted, merged BAM files. | +| pos_sorted_bams | `.hisat3n_dna.all_reads.pos_sort.tar.gz` | Array of tarred files containing position-sorted, merged BAM files. | +| remove_overlap_read_parts_bam_tar | `.remove_overlap_read_parts.tar.gz` | Array of tarred files containing BAM files from single-end alignment with overlapping reads removed. | +| dedup_unique_bam_and_index_unique_bam_tar | `.dedup_unique_bam_and_index_unique_bam.tar.gz` | Array of tarred files containing deduplicated, position-sorted BAM files. | +| unique_reads_cgn_extraction_allc | `.output_allc_tar.tar.gz` | Array of tarred files containing CGN context-specific ALLC files that can be used to generate an [MCDS file](https://github.com/lhqing/allcools_doc/blob/master/tech-background/file-formats.md#mcds-file). | +| unique_reads_cgn_extraction_tbi | `.output_tbi_tar.tar.gz` | Array of tarred files containing ALLC index files. | +| chromatin_contact_stats | `.chromatin_contact_stats.tar.gz` | Array of tarred files containing chromatin contact files. | +| reference_version | `.reference_version.txt` | Array of tarred files containing the genomic reference version used. | + +## Versioning + +All snM3C pipeline releases are documented in the [pipeline changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/snM3C/snM3C.changelog.md). + +## Consortia support +This pipeline is supported by the [BRAIN Initiative](https://braininitiative.nih.gov/) (BICCN and BICAN). + +If your organization also uses this pipeline, we would like to list you! Please reach out to us by contacting the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org). + +## Feedback +For questions, suggestions, or feedback related to the snM3C pipeline, please contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org). Your feedback is valuable for improving the pipeline and addressing any issues that may arise during its usage. \ No newline at end of file diff --git a/website/docs/Pipelines/snM3C/_category_.json b/website/docs/Pipelines/snM3C/_category_.json index a5593278cd..0aed70ddcf 100644 --- a/website/docs/Pipelines/snM3C/_category_.json +++ b/website/docs/Pipelines/snM3C/_category_.json @@ -1,4 +1,4 @@ { "label": "Single Nucleus Methyl-Seq and Chromatin Capture", - "position": 15 + "position": 17 } diff --git a/website/docs/contribution/contribute_to_warp_docs/doc_style.md b/website/docs/contribution/contribute_to_warp_docs/doc_style.md index cf6975fa66..045e4b414f 100644 --- a/website/docs/contribution/contribute_to_warp_docs/doc_style.md +++ b/website/docs/contribution/contribute_to_warp_docs/doc_style.md @@ -4,7 +4,7 @@ sidebar_position: 2 # Documentation Style Guide -This guide provides some examples about how to add new documentation that can be properly rendered on this website. Please note most of the Github flavored [Markdown](https://github.github.com/gfm/) syntax should work natrually, this guide just tries to elaboratethe extension syntax to it. +This guide provides some examples about how to add new documentation that can be properly rendered on this website. Please note most of the Github flavored [Markdown](https://github.github.com/gfm/) syntax should work naturally, this guide just tries to elaborate the extension syntax to it. ## 1. Insert code blocks From 9f333d8745b550fce26baf66766f13c1a57414ba Mon Sep 17 00:00:00 2001 From: rsc3 Date: Tue, 6 Feb 2024 11:04:16 -0500 Subject: [PATCH 02/68] remove optional from merge files, remove double umipercellloop --- tasks/skylab/StarAlign.wdl | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 8ab0c8d615..a288613831 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -471,10 +471,10 @@ task MergeStarOutput { Array[File] barcodes Array[File] features Array[File] matrix - Array[File]? cell_reads - Array[File]? summary - Array[File]? align_features - Array[File]? umipercell + Array[File] cell_reads + Array[File] summary + Array[File] align_features + Array[File] umipercell String input_id @@ -525,12 +525,6 @@ task MergeStarOutput { fi done - for umipercell in "${umipercell_files[@]}"; do - if [ -f "$umipercell" ]; then - cat "$umipercell" >> "~{input_id}_umipercell.txt" - fi - done - for umipercell in "${umipercell_files[@]}"; do if [ -f "$umipercell" ]; then cat "$umipercell" >> "~{input_id}_umipercell.txt" From 41824a9528f20c5b7758a9be892592730b4bc407 Mon Sep 17 00:00:00 2001 From: rsc3 Date: Tue, 6 Feb 2024 11:22:45 -0500 Subject: [PATCH 03/68] make merge files optional again --- tasks/skylab/StarAlign.wdl | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index a288613831..91f2f3985a 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -471,10 +471,10 @@ task MergeStarOutput { Array[File] barcodes Array[File] features Array[File] matrix - Array[File] cell_reads - Array[File] summary - Array[File] align_features - Array[File] umipercell + Array[File]? cell_reads + Array[File]? summary + Array[File]? align_features + Array[File]? umipercell String input_id From a3de732eecd26c1bb994d352cc5ba6233f1c5c4f Mon Sep 17 00:00:00 2001 From: meganshand Date: Wed, 7 Feb 2024 11:17:14 -0500 Subject: [PATCH 04/68] Update UltimaJointGenotyping to use GATK 4.5.0.0 for filtering (#1151) undefined --- ...UltimaGenomicsJointGenotyping.changelog.md | 5 ++ .../UltimaGenomicsJointGenotyping.wdl | 53 +++++++++++++------ .../test_inputs/Plumbing/plumbing.inputs.json | 8 +-- .../Scientific/scientific.inputs.json | 9 ++-- .../TestUltimaGenomicsJointGenotyping.wdl | 6 +-- 5 files changed, 52 insertions(+), 29 deletions(-) diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.changelog.md b/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.changelog.md index a6c388e712..b3e7a610e9 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.changelog.md +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.changelog.md @@ -1,3 +1,8 @@ +# 1.1.6 +2023-02-06 (Date of Last Commit) + +* Updated VETS filtering pipeline to GATK version 4.5.0.0. Does not affect outputs. + # 1.1.5 2023-09-08 (Date of Last Commit) diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.wdl b/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.wdl index 48e5da0d28..2104739e3d 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.wdl +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.wdl @@ -1,7 +1,7 @@ version 1.0 import "../../../../../../tasks/broad/JointGenotypingTasks.wdl" as Tasks -import "https://raw.githubusercontent.com/broadinstitute/gatk/4.3.0.0/scripts/vcf_site_level_filtering_wdl/JointVcfFiltering.wdl" as Filtering +import "https://raw.githubusercontent.com/broadinstitute/gatk/4.5.0.0/scripts/vcf_site_level_filtering_wdl/JointVcfFiltering.wdl" as Filtering import "../../../../../../tasks/broad/UltimaGenomicsGermlineFilteringThreshold.wdl" as FilteringThreshold @@ -11,7 +11,7 @@ import "../../../../../../tasks/broad/UltimaGenomicsGermlineFilteringThreshold.w # For choosing a filtering threshold (where on the ROC curve to filter) a sample with truth data is required. workflow UltimaGenomicsJointGenotyping { - String pipeline_version = "1.1.5" + String pipeline_version = "1.1.6" input { File unpadded_intervals_file @@ -51,10 +51,11 @@ workflow UltimaGenomicsJointGenotyping { String flow_order #inputs for training and applying filter model - String snp_annotations - String indel_annotations - Boolean use_allele_specific_annotations + Array[String] snp_annotations + Array[String] indel_annotations String model_backend + String snp_resource_args = "--resource:hapmap,training=true,calibration=true gs://gcp-public-data--broad-references/hg38/v0/hapmap_3.3.hg38.vcf.gz --resource:omni,training=true,calibration=true gs://gcp-public-data--broad-references/hg38/v0/1000G_omni2.5.hg38.vcf.gz --resource:1000G,training=true,calibration=false gs://gcp-public-data--broad-references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz" + String indel_resource_args = "--resource:mills,training=true,calibration=true gs://gcp-public-data--broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz" Int? top_level_scatter_count Boolean? gather_vcfs @@ -154,24 +155,42 @@ workflow UltimaGenomicsJointGenotyping { disk_size_gb = medium_disk } - call Filtering.JointVcfFiltering as TrainAndApplyFilteringModel { + call Filtering.JointVcfFiltering as TrainAndApplyFilteringModelSNPs { input: - vcf = CalculateAverageAnnotations.output_vcf, - vcf_index = CalculateAverageAnnotations.output_vcf_index, + input_vcfs = CalculateAverageAnnotations.output_vcf, + input_vcf_idxs = CalculateAverageAnnotations.output_vcf_index, sites_only_vcf = SitesOnlyGatherVcf.output_vcf, - sites_only_vcf_index = SitesOnlyGatherVcf.output_vcf_index, - snp_annotations = snp_annotations, - indel_annotations = indel_annotations, + sites_only_vcf_idx = SitesOnlyGatherVcf.output_vcf_index, + annotations = snp_annotations, + resource_args = snp_resource_args, model_backend = model_backend, - use_allele_specific_annotations = use_allele_specific_annotations, - basename = callset_name, - gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + output_prefix = callset_name, + extract_extra_args = "--mode SNP", + train_extra_args = "--mode SNP", + score_extra_args = "--mode SNP", + gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" + } + + call Filtering.JointVcfFiltering as TrainAndApplyFilteringModelINDELs { + input: + input_vcfs = TrainAndApplyFilteringModelSNPs.scored_vcfs, + input_vcf_idxs = TrainAndApplyFilteringModelSNPs.scored_vcf_idxs, + sites_only_vcf = SitesOnlyGatherVcf.output_vcf, + sites_only_vcf_idx = SitesOnlyGatherVcf.output_vcf_index, + annotations = indel_annotations, + resource_args = indel_resource_args, + model_backend = model_backend, + output_prefix = callset_name, + extract_extra_args = "--mode INDEL", + train_extra_args = "--mode INDEL", + score_extra_args = "--mode INDEL", + gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } call FilteringThreshold.ExtractOptimizeSingleSample as FindFilteringThresholdAndFilter { input: - input_vcf = TrainAndApplyFilteringModel.variant_scored_vcf, - input_vcf_index = TrainAndApplyFilteringModel.variant_scored_vcf_index, + input_vcf = TrainAndApplyFilteringModelINDELs.scored_vcfs, + input_vcf_index = TrainAndApplyFilteringModelINDELs.scored_vcf_idxs, base_file_name = callset_name, call_sample_name = call_sample_name, truth_vcf = truth_vcf, @@ -188,7 +207,7 @@ workflow UltimaGenomicsJointGenotyping { medium_disk = medium_disk } - scatter (idx in range(length(TrainAndApplyFilteringModel.variant_scored_vcf))) { + scatter (idx in range(length(TrainAndApplyFilteringModelINDELs.scored_vcfs))) { # For large callsets we need to collect metrics from the shards and gather them later. if (!is_small_callset) { call Tasks.CollectVariantCallingMetrics as CollectMetricsSharded { diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/test_inputs/Plumbing/plumbing.inputs.json b/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/test_inputs/Plumbing/plumbing.inputs.json index a271e30aa6..3dc1a947c6 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/test_inputs/Plumbing/plumbing.inputs.json +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/test_inputs/Plumbing/plumbing.inputs.json @@ -14,17 +14,17 @@ "UltimaGenomicsJointGenotyping.scatter_cross_check_fingerprints":false, "UltimaGenomicsJointGenotyping.unbounded_scatter_count_scale_factor":2.5, "UltimaGenomicsJointGenotyping.unpadded_intervals_file":"gs://gcp-public-data--broad-references/hg38/v0/hg38.even.handcurated.20k.intervals", -"UltimaGenomicsJointGenotyping.snp_annotations": "-A AS_ReadPosRankSum -A AS_FS -A AS_SOR -A AS_QD -A AVERAGE_TREE_SCORE -A AVERAGE_ASSEMBLED_HAPS -A AVERAGE_FILTERED_HAPS", -"UltimaGenomicsJointGenotyping.indel_annotations": "-A AS_MQRankSum -A AS_ReadPosRankSum -A AS_FS -A AS_SOR -A AS_QD -A AVERAGE_TREE_SCORE", +"UltimaGenomicsJointGenotyping.snp_annotations": ["AS_ReadPosRankSum", "AS_FS", "AS_SOR", "AS_QD", "AVERAGE_TREE_SCORE", "AVERAGE_ASSEMBLED_HAPS", "AVERAGE_FILTERED_HAPS"], +"UltimaGenomicsJointGenotyping.indel_annotations": ["AS_MQRankSum", "AS_ReadPosRankSum", "AS_FS", "AS_SOR", "AS_QD", "AVERAGE_TREE_SCORE"], "UltimaGenomicsJointGenotyping.flow_order": "TGCA", "UltimaGenomicsJointGenotyping.ref_fasta_sdf": "gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/plumbing/reference_sdf.tar", "UltimaGenomicsJointGenotyping.runs_file": "gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/plumbing/runs.conservative.bed", "UltimaGenomicsJointGenotyping.annotation_intervals": ["gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/plumbing/LCR-hs38.bed", "gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/plumbing/mappability.0.bed", "gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/plumbing/exome.twist.bed"], -"UltimaGenomicsJointGenotyping.use_allele_specific_annotations": true, "UltimaGenomicsJointGenotyping.truth_vcf":"gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/plumbing/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.broad-header.vcf.gz", "UltimaGenomicsJointGenotyping.truth_vcf_index":"gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/plumbing/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.broad-header.vcf.gz", "UltimaGenomicsJointGenotyping.truth_highconf_intervals": "gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/plumbing/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel_noCENorHET7.bed", "UltimaGenomicsJointGenotyping.call_sample_name": "NA12878", "UltimaGenomicsJointGenotyping.truth_sample_name": "HG001", -"UltimaGenomicsJointGenotyping.model_backend": "PYTHON_IFOREST" +"UltimaGenomicsJointGenotyping.model_backend": "PYTHON_IFOREST", +"UltimaGenomicsJointGenotyping.TrainAndApplyFilteringModelSNPs.train_runtime_attributes": {"additional_mem_gb":2} } \ No newline at end of file diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/test_inputs/Scientific/scientific.inputs.json b/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/test_inputs/Scientific/scientific.inputs.json index a91bede656..9b6270b0b4 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/test_inputs/Scientific/scientific.inputs.json +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/test_inputs/Scientific/scientific.inputs.json @@ -14,17 +14,18 @@ "UltimaGenomicsJointGenotyping.scatter_cross_check_fingerprints":false, "UltimaGenomicsJointGenotyping.unbounded_scatter_count_scale_factor":2.5, "UltimaGenomicsJointGenotyping.unpadded_intervals_file":"gs://gcp-public-data--broad-references/hg38/v0/hg38.even.handcurated.20k.intervals", -"UltimaGenomicsJointGenotyping.snp_annotations": "-A AS_ReadPosRankSum -A AS_FS -A AS_SOR -A AS_QD -A AVERAGE_TREE_SCORE -A AVERAGE_ASSEMBLED_HAPS -A AVERAGE_FILTERED_HAPS", -"UltimaGenomicsJointGenotyping.indel_annotations": "-A AS_MQRankSum -A AS_ReadPosRankSum -A AS_FS -A AS_SOR -A AS_QD -A AVERAGE_TREE_SCORE", +"UltimaGenomicsJointGenotyping.snp_annotations": ["AS_ReadPosRankSum", "AS_FS", "AS_SOR", "AS_QD", "AVERAGE_TREE_SCORE", "AVERAGE_ASSEMBLED_HAPS", "AVERAGE_FILTERED_HAPS"], +"UltimaGenomicsJointGenotyping.indel_annotations": ["AS_MQRankSum", "AS_ReadPosRankSum", "AS_FS", "AS_SOR", "AS_QD", "AVERAGE_TREE_SCORE"], "UltimaGenomicsJointGenotyping.flow_order": "TGCA", "UltimaGenomicsJointGenotyping.ref_fasta_sdf": "gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/scientific/reference_sdf.tar", "UltimaGenomicsJointGenotyping.runs_file": "gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/scientific/runs.conservative.bed", "UltimaGenomicsJointGenotyping.annotation_intervals": ["gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/scientific/LCR-hs38.bed", "gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/scientific/mappability.0.bed", "gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/scientific/exome.twist.bed"], -"UltimaGenomicsJointGenotyping.use_allele_specific_annotations": true, "UltimaGenomicsJointGenotyping.truth_vcf":"gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/scientific/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.broad-header.vcf.gz", "UltimaGenomicsJointGenotyping.truth_vcf_index":"gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/scientific/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.broad-header.vcf.gz", "UltimaGenomicsJointGenotyping.truth_highconf_intervals": "gs://broad-gotc-test-storage/UltimaGenomicsJointGenotyping/wgs/scientific/HG001_GRCh38_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel_noCENorHET7.bed", "UltimaGenomicsJointGenotyping.call_sample_name": "NA12878", "UltimaGenomicsJointGenotyping.truth_sample_name": "HG001", -"UltimaGenomicsJointGenotyping.model_backend": "PYTHON_IFOREST" +"UltimaGenomicsJointGenotyping.model_backend": "PYTHON_IFOREST", +"UltimaGenomicsJointGenotyping.TrainAndApplyFilteringModelSNPs.extract_runtime_attributes": {"command_mem_gb":13, "additional_mem_gb":2}, +"UltimaGenomicsJointGenotyping.TrainAndApplyFilteringModelSNPs.train_runtime_attributes": {"command_mem_gb":13, "additional_mem_gb":2} } \ No newline at end of file diff --git a/verification/test-wdls/TestUltimaGenomicsJointGenotyping.wdl b/verification/test-wdls/TestUltimaGenomicsJointGenotyping.wdl index c3138ddb19..de9899439b 100644 --- a/verification/test-wdls/TestUltimaGenomicsJointGenotyping.wdl +++ b/verification/test-wdls/TestUltimaGenomicsJointGenotyping.wdl @@ -33,9 +33,8 @@ workflow TestUltimaGenomicsJointGenotyping { File runs_file Array[File] annotation_intervals String flow_order - String snp_annotations - String indel_annotations - Boolean use_allele_specific_annotations + Array[String] snp_annotations + Array[String] indel_annotations String model_backend Int? top_level_scatter_count Boolean? gather_vcfs @@ -83,7 +82,6 @@ workflow TestUltimaGenomicsJointGenotyping { flow_order = flow_order, snp_annotations = snp_annotations, indel_annotations = indel_annotations, - use_allele_specific_annotations = use_allele_specific_annotations, model_backend = model_backend, top_level_scatter_count = top_level_scatter_count, gather_vcfs = gather_vcfs, From 9f56bf17c93145bb0612b6f2d6ef011f8593579e Mon Sep 17 00:00:00 2001 From: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> Date: Thu, 8 Feb 2024 09:56:14 -0500 Subject: [PATCH 05/68] Km update Paired-Tag, Optimus, GDC, multi-snSS2 docs (#1187) * update refs in Optimus Overview * update pipeline README docs * Update README.md * fixing link in arrays doc * update paired-tag docs * added more details about h5ad if preindex is used * Update website/docs/Pipelines/PairedTag_Pipeline/README.md Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> * Update README.md --------- Co-authored-by: ekiernan Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> --- website/docs/Pipelines/ATAC/README.md | 2 +- .../README.md | 6 +++--- .../Illumina_genotyping_array_spec.md | 2 +- website/docs/Pipelines/Multiome_Pipeline/README.md | 2 +- website/docs/Pipelines/Optimus_Pipeline/README.md | 4 ++-- website/docs/Pipelines/PairedTag_Pipeline/README.md | 11 ++++++----- .../multi_snss2.methods.md | 4 ++-- 7 files changed, 16 insertions(+), 15 deletions(-) diff --git a/website/docs/Pipelines/ATAC/README.md b/website/docs/Pipelines/ATAC/README.md index c5357613c2..8df0e7a187 100644 --- a/website/docs/Pipelines/ATAC/README.md +++ b/website/docs/Pipelines/ATAC/README.md @@ -8,7 +8,7 @@ slug: /Pipelines/ATAC/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [1.1.6](https://github.com/broadinstitute/warp/releases) | January, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [1.1.7](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the ATAC workflow ATAC is an open-source, cloud-optimized pipeline developed in collaboration with members of the [BRAIN Initiative](https://braininitiative.nih.gov/) (BICCN and [BICAN](https://brainblog.nih.gov/brain-blog/brain-issues-suite-funding-opportunities-advance-brain-cell-atlases-through-centers) Sequencing Working Group) and [SCORCH](https://nida.nih.gov/about-nida/organization/divisions/division-neuroscience-behavior-dnb/basic-research-hiv-substance-use-disorder/scorch-program) (see [Acknowledgements](#acknowledgements) below). It supports the processing of 10x single-nucleus data generated with 10x Multiome [ATAC-seq (Assay for Transposase-Accessible Chromatin)](https://www.10xgenomics.com/products/single-cell-multiome-atac-plus-gene-expression), a technique used in molecular biology to assess genome-wide chromatin accessibility. diff --git a/website/docs/Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/README.md b/website/docs/Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/README.md index cef49ec424..cb0ee1be99 100644 --- a/website/docs/Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/README.md +++ b/website/docs/Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [GDCWholeGenomeSomaticSingleSample_v1.0.1](https://github.com/broadinstitute/warp/releases) | January, 2021 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [GDCWholeGenomeSomaticSingleSample_v1.3.1](https://github.com/broadinstitute/warp/releases) | January, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the GDC Whole Genome Somatic Single Sample pipeline @@ -29,8 +29,8 @@ For the latest workflow version and release notes, please see the [changelog](ht ### Software version requirements -* GATK 4.0.7 -* Picard 2.18.11 (Custom Docker is used to run software on Cromwell 52) +* GATK 4.5.0.0 +* Picard 2.26.10 * Samtools 1.11 * Python 3.0 * Cromwell version support diff --git a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md index 6d105fe77c..46da522506 100644 --- a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md +++ b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md @@ -6,7 +6,7 @@ sidebar_position: 2 The [Illumina Genotyping Array Pipeline](https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl) v1.11.0 pipeline produces a VCF (Variant Call Format) output with data processing and sample-specific genotype information. The VCF follows the format listed in the [VCF 4.2 specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf), but additionally contains fields and attributes that are unique to the Arrays pipeline. -This document describes the Array pipeline’s unique VCF fields and attributes that are not listed in the standard VCF specification. To learn more about the pipeline, see the [Illumina Genotyping Array Pipeline Overview](./IlluminaGenotypingArray.documentation.md). +This document describes the Array pipeline’s unique VCF fields and attributes that are not listed in the standard VCF specification. To learn more about the pipeline, see the [Illumina Genotyping Array Pipeline Overview](./README.md). :::tip How do I view a VCF file? diff --git a/website/docs/Pipelines/Multiome_Pipeline/README.md b/website/docs/Pipelines/Multiome_Pipeline/README.md index 97815b03c8..59b6b5f7ca 100644 --- a/website/docs/Pipelines/Multiome_Pipeline/README.md +++ b/website/docs/Pipelines/Multiome_Pipeline/README.md @@ -8,7 +8,7 @@ slug: /Pipelines/Multiome_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [Multiome v3.1.1](https://github.com/broadinstitute/warp/releases) | January, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org) | +| [Multiome v3.1.2](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org) | ![Multiome_diagram](./multiome_diagram.png) diff --git a/website/docs/Pipelines/Optimus_Pipeline/README.md b/website/docs/Pipelines/Optimus_Pipeline/README.md index ffe147e4ea..d57e64c815 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/README.md +++ b/website/docs/Pipelines/Optimus_Pipeline/README.md @@ -33,8 +33,8 @@ The following table provides a quick glance at the Optimus pipeline features: | Assay type | 10x single cell or single nucleus expression (v2 and v3) | [10x Genomics](https://www.10xgenomics.com) | Overall workflow | Quality control module and transcriptome quantification module | Code available from [GitHub](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/optimus/Optimus.wdl) | | Workflow language | WDL 1.0 | [openWDL](https://github.com/openwdl/wdl) | -| Genomic Reference Sequence | GRCh38 human genome primary sequence and M21 (GRCm38.p6) mouse genome primary sequence | GENCODE [human reference files](https://www.gencodegenes.org/human/release_27.html) and [mouse reference files](https://www.gencodegenes.org/mouse/release_M21.html) -| Transcriptomic reference annotation | V27 GENCODE human transcriptome and M21 mouse transcriptome | GENCODE [human GTF](ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.annotation.gtf.gz) and [mouse GTF](ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M21/gencode.vM21.annotation.gff3.gz) | +| Genomic Reference Sequence | GRCh38.p13 (v43) human genome primary sequence and GRCm39 (M32) mouse genome primary sequence | GENCODE [human reference files](https://www.gencodegenes.org/human/release_43.html) and [mouse reference files](https://www.gencodegenes.org/mouse/release_M32.html) +| Transcriptomic reference annotation | V43 GENCODE human transcriptome and M32 mouse transcriptome | GENCODE [human GTF](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.annotation.gtf.gz) and [mouse GTF](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/gencode.vM32.annotation.gtf.gz) | | Aligner and transcript quantification | STARsolo | [Dobin, et al.,2021](https://www.biorxiv.org/content/10.1101/2021.05.05.442755v1) | | Data input file format | File format in which sequencing data is provided | [FASTQ](https://academic.oup.com/nar/article/38/6/1767/3112533) | | Data output file format | File formats in which Optimus output is provided | [BAM](http://samtools.github.io/hts-specs/), Python numpy arrays (internal), h5ad | diff --git a/website/docs/Pipelines/PairedTag_Pipeline/README.md b/website/docs/Pipelines/PairedTag_Pipeline/README.md index 81c7506f30..3a7c983f79 100644 --- a/website/docs/Pipelines/PairedTag_Pipeline/README.md +++ b/website/docs/Pipelines/PairedTag_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/PairedTag_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [PairedTag_v0.0.5](https://github.com/broadinstitute/warp/releases) | January, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | +| [PairedTag_v0.0.6](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the Paired-Tag workflow @@ -95,8 +95,9 @@ The Paired-Tag workflow calls two WARP subworkflows and an additional task which | Subworkflow/Task | Software | Description | | ----------- | -------- | ----------- | | Optimus ([WDL](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/optimus/Optimus.wdl) and [documentation](../Optimus_Pipeline/README)) | fastqprocess, STARsolo, Emptydrops | Workflow used to analyze 10x single-cell GEX data. | -| ​​PairedTagDemultiplex as demultiplex ([WDL](https://github.com/broadinstitute/warp/blob/develop/tasks/skylab/PairedTagUtils.wdl)) | UPStools | Task used to check the length of the read2 FASTQ (should be either 27 or 24 bp). If `preindex` is set to true, the task will perform demultiplexing of the 3-bp sample barcode from the read2 ATAC fastq files and stores it in the readname. It will then perform barcode orientation checking. The ATAC workflow will then add a combined 3 bp sample barcode and cellular barcode to the BB tag of the BAM. If `preindex` is false and then length is 27 bp, the task will perform trimming and subsequent barcode orientation checking. | -ATAC ([WDL](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/multiome/atac.wdl) and [documentation](../ATAC/README)) | fastqprocess, bwa-mem, SnapATAC2 | Workflow used to analyze single-nucleus paired-tag DNA (histone modifications) data. | +| PairedTagDemultiplex as demultiplex ([WDL](https://github.com/broadinstitute/warp/blob/develop/tasks/skylab/PairedTagUtils.wdl)) | UPStools | Task used to check the length of the read2 FASTQ (should be either 27 or 24 bp). If `preindex` is set to true, the task will perform demultiplexing of the 3-bp sample barcode from the read2 ATAC fastq files and stores it in the readname. It will then perform barcode orientation checking. The ATAC workflow will then add a combined 3 bp sample barcode and cellular barcode to the BB tag of the BAM. If `preindex` is false and then length is 27 bp, the task will perform trimming and subsequent barcode orientation checking. | +| ATAC ([WDL](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/multiome/atac.wdl) and [documentation](../ATAC/README)) | fastqprocess, bwa-mem, SnapATAC2 | Workflow used to analyze single-nucleus paired-tag DNA (histone modifications) data. | +| ParseBarcodes as ParseBarcodes ([WDL](https://github.com/broadinstitute/warp/blob/develop/tasks/skylab/PairedTagUtils.wdl)) | python3 | Task used to parse and split the cell barcodes and sample barcodes from the combined index in the h5ad and fragment files when `preindex` is set to true. | ## Outputs @@ -105,8 +106,8 @@ ATAC ([WDL](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab |--- | --- | --- | | pairedtag_pipeline_version_out | N.A. | String describing the version of the Paired-Tag pipeline used. | | bam_aligned_output_atac | `_atac.bam` | BAM file containing aligned reads from ATAC workflow; contains sample and cell barcodes stored in the BB tag if `preindex` is “true”. | -| fragment_file_atac | `_atac.fragments.tsv` or if preindexing = true, `_atac.fragments.BB.tsv | TSV file containing fragment start and stop coordinates per barcode. The columns are "Chromosome", "Start", "Stop", "Barcode", and "Number of reads". | -| snap_metrics_atac | `_atac.metrics.h5ad` | h5ad (Anndata) file containing per-barcode metrics from SnapATAC2. See the [ATAC Count Matrix Overview](../ATAC/count-matrix-overview.md) for more details. | +| fragment_file_atac | `_atac.fragments.tsv` or if preindexing = true, `_atac.fragments.BB.tsv` | TSV file containing fragment start and stop coordinates per barcode. The columns are "Chromosome", "Start", "Stop", "Barcode", and "Number of reads". When preindexing is used, additional columns include "Sample Barcode", "Cell Barcode", and "Duplicates" (which indicates if a cell barcode matches more than one sample barcode). | +| snap_metrics_atac | `_atac.metrics.h5ad` | h5ad (Anndata) file containing per-barcode metrics from SnapATAC2. See the [ATAC Count Matrix Overview](../ATAC/count-matrix-overview.md) for more details. If the preindex option is used, the h5ad.obs will contain 3 extra columns: preindex (the sample barcode), CB (cell barcodes), and duplicates (indicates with a 1 if the cell barcode matches more than preindex, otherwise it is 0).| | genomic_reference_version_gex | `.txt` | File containing the Genome build, source and GTF annotation version. | | bam_gex | `_gex.bam` | BAM file containing aligned reads from Optimus workflow. | | matrix_gex | `_gex_sparse_counts.npz` | NPZ file containing raw gene by cell counts. | diff --git a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md index 77dedddb0e..5239ba7f97 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md +++ b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md @@ -2,13 +2,13 @@ sidebar_position: 2 --- -# Smart-seq2 Single Nucleus Multi-Sample v1.2.26 Publication Methods +# Smart-seq2 Single Nucleus Multi-Sample v1.2.28 Publication Methods Below we provide an example methods section for a publication. For the complete pipeline documentation, see the [Smart-seq2 Single Nucleus Multi-Sample Overview](./README.md). ## Methods -Data preprocessing and count matrix construction for a batch (or plate) were performed using the Smart-seq2 Single Nucleus Multi-Sample v1.2.26 Pipeline (RRID:SCR_021312) as well as Picard v.2.26.10 with default tool parameters unless otherwise specified. Genomic references are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/mm10/v0/single_nucleus?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in the [example workflow configuration](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_nucleus_multisample/mouse_example.json) in GitHub. +Data preprocessing and count matrix construction for a batch (or plate) were performed using the Smart-seq2 Single Nucleus Multi-Sample v1.2.28 Pipeline (RRID:SCR_021312) as well as Picard v.2.26.10 with default tool parameters unless otherwise specified. Genomic references are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/mm10/v0/single_nucleus?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in the [example workflow configuration](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_nucleus_multisample/mouse_example.json) in GitHub. For each nucleus in the batch, paired-end FASTQ files were first trimmed to remove adapters using the fastq-mcf tool with a subsampling parameter of 200,000 reads. The trimmed FASTQ files were then aligned to the GENCODE GRCm38 mouse genome using STAR v.2.7.10a. To count the number of reads per gene, but not isoforms, the quantMode parameter was set to GeneCounts. Multi-mapped reads, and optical and PCR duplicates, were removed from the resulting aligned BAM using the Picard MarkDuplicates tool with REMOVE_DUPLICATES = true. Metrics were collected on the deduplicated BAM using Picard CollectMultipleMetrics with VALIDATION_STRINGENCY =SILENT. From 39c280ba99ad4caecd75417de67cfb647fedec37 Mon Sep 17 00:00:00 2001 From: rsc3 Date: Thu, 8 Feb 2024 10:03:03 -0500 Subject: [PATCH 06/68] Add shardid column to umipercell file --- tasks/skylab/StarAlign.wdl | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 91f2f3985a..8811468935 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -525,9 +525,12 @@ task MergeStarOutput { fi done + counter=0 for umipercell in "${umipercell_files[@]}"; do if [ -f "$umipercell" ]; then + awk -v var="$counter" '{print $0, var}' "$umipercell" > "$umipercell" cat "$umipercell" >> "~{input_id}_umipercell.txt" + let counter=counter+1 fi done From b18af86016eeb74331c073a6ba9e09145a91cefe Mon Sep 17 00:00:00 2001 From: rsc3 Date: Thu, 8 Feb 2024 10:36:51 -0500 Subject: [PATCH 07/68] Add shardid column to umipercell file --- tasks/skylab/StarAlign.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 8811468935..3a6717c36e 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -525,7 +525,7 @@ task MergeStarOutput { fi done - counter=0 + counter=0 # note that the counter might not correspond to the shard number, it is just the order of files in bash (e.g. 10 before 2) for umipercell in "${umipercell_files[@]}"; do if [ -f "$umipercell" ]; then awk -v var="$counter" '{print $0, var}' "$umipercell" > "$umipercell" From 8e7e100bc4cc44d31bc516f34afebd87edadba6d Mon Sep 17 00:00:00 2001 From: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> Date: Fri, 9 Feb 2024 13:56:00 -0500 Subject: [PATCH 08/68] Km create JointGenotyping Overview doc (#1199) * add jg overview doc * Update README.md * Update README.md * Apply suggestions from LK review Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> --------- Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> --- .../docs/Pipelines/JointGenotyping/README.md | 241 ++++++++++++++++++ .../Pipelines/JointGenotyping/_category_.json | 4 + .../Multiome_Pipeline/_category_.json | 2 +- .../Optimus_Pipeline/_category_.json | 2 +- .../PairedTag_Pipeline/_category_.json | 2 +- .../RNA_with_UMIs_Pipeline/_category_.json | 2 +- .../_category_.json | 2 +- .../SlideSeq_Pipeline/_category_.json | 2 +- .../_category_.json | 2 +- .../_category_.json | 2 +- .../_category_.json | 2 +- .../_category_.json | 2 +- .../_category_.json | 2 +- website/docs/Pipelines/snM3C/_category_.json | 2 +- 14 files changed, 257 insertions(+), 12 deletions(-) create mode 100644 website/docs/Pipelines/JointGenotyping/README.md create mode 100644 website/docs/Pipelines/JointGenotyping/_category_.json diff --git a/website/docs/Pipelines/JointGenotyping/README.md b/website/docs/Pipelines/JointGenotyping/README.md new file mode 100644 index 0000000000..98eaeaa858 --- /dev/null +++ b/website/docs/Pipelines/JointGenotyping/README.md @@ -0,0 +1,241 @@ +--- +sidebar_position: 1 +slug: /Pipelines/JointGenotyping_Pipeline/README +--- + +# JointGenotyping Overview + +| Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | +| :----: | :---: | :----: | :--------------: | +| [JointGenotyping_v1.6.9](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in WARP or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | + +## Introduction to the JointGenotyping workflow + +The [JointGenotyping workflow](https://github.com/broadinstitute/warp/blob/master/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.wdl) is an open-source, cloud-optimized pipeline that implements joint variant calling, filtering, and (optional) fingerprinting. + +The pipeline can be configured to run using one of the following GATK joint genotyping methods: + +* **[GenotypeGVCFs](https://gatk.broadinstitute.org/hc/en-us/articles/21905118377755)** (default method) performs joint genotyping on GVCF files stored in GenomicsDB and pre-called with HaplotypeCaller. +* **[GnarlyGenotyper](https://gatk.broadinstitute.org/hc/en-us/articles/21904951112091)** performs scalable, “quick and dirty” joint genotyping on a set of GVCF files stored in GenomicsDB and pre-called with HaplotypeCaller. + +The pipeline can be configured to run using one of the following GATK variant filtering techniques: + +* **[Variant Quality Score Recalibration (VQSR)](https://gatk.broadinstitute.org/hc/en-us/articles/360035531612)** (default method) uses the VariantRecalibrator and ApplyVQSR tools to filter variants according to [GATK Best Practices](https://gatk.broadinstitute.org/hc/en-us/articles/360035535932). +* **Variant Extract-Train-Score (VETS)** uses the ExtractVariantAnnotations, TrainVariantAnnotationsModel, and ScoreVariantAnnotations tools called in the [VETS subworkflow](https://github.com/broadinstitute/gatk/blob/master/scripts/vcf_site_level_filtering_wdl/JointVcfFiltering.wdl) to score variant annotations. + +The pipeline takes in a sample map file listing GVCF files produced by HaplotypeCaller in GVCF mode and produces a filtered VCF file (with index) containing genotypes for all samples present in the input VCF files. All sites that are present in the input VCF file are retained. Filtered sites are annotated as such in the FILTER field. If you are new to VCF files, see the [file type specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf). + +## Set-up + +### JointGenotyping Installation and Requirements + +To download the latest JointGenotyping release, see the release tags prefixed with "JointGenotyping" on the WARP [releases page](https://github.com/broadinstitute/warp/releases). All JointGenotyping pipeline releases are documented in the [JointGenotyping changelog](https://github.com/broadinstitute/warp/blob/master/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.changelog.md). + +To search releases of this and other pipelines, use the WARP command-line tool [Wreleaser](https://github.com/broadinstitute/warp/tree/master/wreleaser). + +If you’re running a JointGenotyping workflow version prior to the latest release, the accompanying documentation for that release may be downloaded with the source code on the WARP [releases page](https://github.com/broadinstitute/warp/releases) (see the folder `website/docs/Pipelines/JointGenotyping`). + +The JointGenotyping pipeline can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/stable/), a GA4GH-compliant, flexible workflow management system that supports multiple computing platforms. The workflow can also be run in [Terra](https://app.terra.bio), a cloud-based analysis platform. The Terra [Whole-Genome-Analysis-Pipeline](https://app.terra.bio/#workspaces/warp-pipelines/Whole-Genome-Analysis-Pipeline) and [Exome-Analysis-Pipeline](https://app.terra.bio/#workspaces/warp-pipelines/Exome-Analysis-Pipeline) workspaces contain the JointGenotyping pipeline, as well as workflows for preprocessing, initial variant calling, and sample map generation, workflow configurations, required reference data and other inputs, and example testing data. + +### Inputs + +The JointGenotyping workflow inputs are specified in JSON configuration files. Example configuration files can be found in the [test_inputs](https://github.com/broadinstitute/warp/tree/master/pipelines/broad/dna_seq/germline/joint_genotyping/test_inputs) folder in the WARP repository. + +#### Default joint calling input descriptions + +The table below describes the pipeline inputs that apply when the pipeline is run with default parameters and uses GenotypeGVCFs for joint calling and VQSR for variant filtering: + +| Parameter name | Description | Type | +| --- | --- | --- | +| unpadded_intervals_file | Describes the intervals for which VCF output will be written; exome data will have different captures/targets. | File | +| callset_name | Identifier for the group of VCF files used for joint calling. | String | +| sample_name_map | Path to file containing the sample names and the cloud location of the individual GVCF files. | String | +| ref_fasta | Reference FASTA file used for joint calling; must agree with reference for `unpadded_intervals_file`. | File | +| ref_fasta_index | Index for reference FASTA file used for joint calling; must agree with reference for `unpadded_intervals_file`. | File | +| ref_dict | Reference dictionary file used for joint calling; must agree with reference for `unpadded_intervals_file`. | File | +| dbsnp_vcf | Resource VCF file containing common SNPs and indels used for annotating the VCF file after joint calling. | File | +| dbsnp_vcf_index | Index for `dbsnp_vcf`. | File | +| snp_recalibration_tranche_values | Set of sensitivity levels used when running the pipeline using VQSR; value should match estimated sensitivity of truth resource passed as `hapmap_resource_vcf` to the [SNPsVariantRecalibratorCreateModel](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) and [SNPsVariantRecalibrator as SNPsVariantRecalibratorScattered](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) tasks; filter cutoff based on sensitivity to common variants (more sensitivity = more false positives); required when `run_vets` is “false”. | Array[String] | +| snp_recalibration_annotation_values | Features used for filtering model (annotations in VCF file); all allele-specific versions. | Array[String] | +| indel_recalibration_tranche_values | Set of sensitivity levels used when running the pipeline using VQSR; value should match estimated sensitivity of truth resource passed as `mills_resource_vcf` to the [IndelsVariantRecalibrator](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task; filter cutoff based on sensitivity to common variants (more sensitivity = more false positives); required when `run_vets` is “false”. | Array[String] | +| indel_recalibration_annotation_values | Features used for filtering model when running the pipeline using VQSR; required when `run_vets` is “false”. | Array[String] | +| eval_interval_list | Subset of the unpadded intervals file used for metrics. | File | +| hapmap_resource_vcf | Used for SNP variant recalibration; see the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811) for more information. | File | +| hapmap_resource_vcf_index | Used for SNP variant recalibration; see the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811) for more information. | File | +| omni_resource_vcf | Used for SNP recalibration; see the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811) for more information. | File | +| omni_resource_vcf_index | Used for SNP recalibration; see the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811) for more information. | File | +| one_thousand_genomes_resource_vcf | Used for SNP recalibration; see the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811) for more information. | File | +| one_thousand_genomes_resource_vcf_index | Used for SNP recalibration; see the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811) for more information. | File | +| mills_resource_vcf | Used for indel variant recalibration; see the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811) for more information. | File | +| mills_resource_vcf_index | Used for indel variant recalibration; see the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811) for more information. | File | +| axiomPoly_resource_vcf | Used for indel variant recalibration; see the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811) for more information. | File | +| axiomPoly_resource_vcf_index | Used for indel variant recalibration; see the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811) for more information. | File | +| dbsnp_resource_vcf | Optional file used for SNP/indel variant recalibration; set to `dbsnp_vcf` by default; see the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811) for more information. | File | +| dbsnp_resource_vcf_index | Optional file used for SNP/indel variant recalibration; set to `dbsnp_vcf_index` by default; see the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811) for more information. | File | +| excess_het_threshold | Optional float used for hard filtering joint calls; phred-scaled p-value; set to `54.69` by default to cut off quality scores greater than a z-score of -4.5 (p-value of 3.4e-06). | Float | +| vqsr_snp_filter_level | Used for applying the recalibration model when running the pipeline using VQSR; required when `run_vets` is “false”. | Float | +| vqsr_indel_filter_level | Used for applying the recalibration model when running the pipeline using VQSR; required when `run_vets` is “false”. | Float | +| snp_vqsr_downsampleFactor | The downsample factor used for SNP variant recalibration if the number of GVCF files is greater than the ` snps_variant_recalibration_threshold` when running the pipeline using VQSR; required when `run_vets` is “false”. | Int | +| top_level_scatter_count | Optional integer used to determine how many files the input interval list should be split into; default will split the interval list into 2 files. | Int | +| gather_vcfs | Optional boolean; “true” is used for small callsets containing less than 100,000 GVCF files. | Boolean | +| snps_variant_recalibration_threshold | Optional integer that sets the threshold for the number of callset VCF files used to perform recalibration on a single file; if the number of VCF files exceeds the threshold, variants will be downsampled to enable parallelization; default is “500000”. | Int | +| rename_gvcf_samples | Optional boolean describing whether GVCF samples should be renamed; default is “true”. | Boolean | +| unbounded_scatter_count_scale_factor | Optional float used to scale the scatter count when `top_level_scatter_count` is not provided as input; default is “0.15”. | Float | +| use_allele_specific_annotations | Optional boolean used for SNP and indel variant recalibration when running the pipeline using VQSR; set to “true” by default. | Boolean | + + +#### GnarlyGenotyper joint calling input descriptions + +The table below describes the additional pipeline inputs that apply when the pipeline is run with GnarlyGenotyper for joint calling: + +| Parameter name | Description | Type | +| --- | --- | --- | +| gnarly_scatter_count | Optional integer used to determine how many files to split the interval list into when using GnarlyGenotyper; default is “10”. | Int | +| use_gnarly_genotyper | Optional boolean describing whether GnarlyGenotyper should be used; default is “false”. | Boolean | + + +#### VETS variant filtering input descriptions + +The table below describes the additional pipeline inputs that apply when the pipeline is run with VETS for variant filtering: + +| Parameter name | Description | Type | +| --- | --- | --- | +| targets_interval_list | Describes the intervals for which the filtering model will be trained when running the pipeline using VETS; for more details, see the associated [README](https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/bge_exome_calling_regions.v1.1.interval_list.README.md); required when `run_vets` is “true”. | File | +| run_vets | Optional boolean used to describe whether the pipeline will use VQSR (`run_vets = false`) or VETS (`run_vets = true`) to create the filtering model; default is “false”. | Boolean | + + +#### Fingerprinting input descriptions + +The table below describes the pipeline inputs that apply to fingerprinting: + +| Parameter name | Description | Type | +| --- | --- | --- | +| haplotype_database | Haplotype reference used for fingerprinting (see the CrosscheckFingerprints task). | File | +| cross_check_fingerprints | Optional boolean describing whether or not the pipeline should check fingerprints; default is “true”. | Boolean | +| scatter_cross_check_fingerprints | Optional boolean describing whether `CrossCheckFingerprintsScattered` or `CrossCheckFingerprintsSolo` should be run; default is “false” and `CrossCheckFingerprintsSolo` will be run. | Boolean | + +#### Runtime parameter input descriptions + +The table below describes the pipeline inputs used for setting runtime parameters of tasks: + +| Parameter name | Description | Type | +| --- | --- | --- | +| small_disk | Disk size; dependent on cohort size; requires user input; see example JSON configuration files found in the WARP [test_inputs](https://github.com/broadinstitute/warp/tree/master/pipelines/broad/dna_seq/germline/joint_genotyping/test_inputs) folder for recommendations. | Int | +| medium_disk | Disk size; dependent on cohort size; requires user input; see example JSON configuration files found in the WARP [test_inputs](https://github.com/broadinstitute/warp/tree/master/pipelines/broad/dna_seq/germline/joint_genotyping/test_inputs) folder for recommendations. | Int | +| large_disk | Disk size; dependent on cohort size; requires user input; see example JSON configuration files found in the WARP [test_inputs](https://github.com/broadinstitute/warp/tree/master/pipelines/broad/dna_seq/germline/joint_genotyping/test_inputs) folder for recommendations. | Int | +| huge_disk | Disk size; dependent on cohort size; requires user input; see example JSON configuration files found in the WARP [test_inputs](https://github.com/broadinstitute/warp/tree/master/pipelines/broad/dna_seq/germline/joint_genotyping/test_inputs) folder for recommendations. | Int | + + +## JointGenotyping tasks and tools + +The [JointGenotyping workflow](https://github.com/broadinstitute/warp/blob/master/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.wdl) imports individual "tasks," also written in WDL script, from the WARP [tasks folder](https://github.com/broadinstitute/warp/tree/master/tasks/broad). + +Overall, the JointGenotyping workflow: + +1. Splits the input interval list and imports GVCF files. +1. Performs joint genotyping using GATK GenotypeGVCFs (default) or GnarlyGenotyper. +1. Creates single site-specific VCF and index files. +1. Creates and applies a variant filtering model using GATK VQSR (default) or VETS. +1. Collects variant calling metrics. +1. Checks fingerprints (optional). + +The tasks and tools used in the JointGenotyping workflow are detailed in the table below. + +To see specific tool parameters, select the task WDL link in the table; then find the task and view the `command {}` section of the task in the WDL script. To view or use the exact tool software, see the task's Docker image which is specified in the task WDL `# runtime values` section as `String docker =`. + +| Task | Tool | Software | Description | +| --- | --- | --- | --- | +| [CheckSamplesUnique](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | bash | bash | Checks that there are more than 50 unique samples in `sample_name_map`. | +| [SplitIntervalList](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | SplitIntervals | [GATK](https://gatk.broadinstitute.org/hc/en-us) | Splits the unpadded interval list for scattering. | +| [ImportGVCFs](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | GenomicsDBImport | [GATK](https://gatk.broadinstitute.org/hc/en-us) | Imports single-sample GVCF files into GenomicsDB before joint genotyping. | +| [SplitIntervalList as GnarlyIntervalScatterDude](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | SplitIntervals | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `use_gnarly_genotyper` is “true” (default is “false”), splits the unpadded interval list for scattering; otherwise, this task is skipped. | +| [GnarlyGenotyper](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | GnarlyGenotyper | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `use_gnarly_genotyper` is “true” (default is “false”), performs scalable, “quick and dirty” joint genotyping on a set of GVCF files stored in GenomicsDB; otherwise, this task is skipped. | +| [GatherVcfs as TotallyRadicalGatherVcfs](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | GatherVcfsCloud | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `use_gnarly_genotyper` is “true” (default is “false”), compiles the site-specific VCF files generated for each interval into one VCF output and index; otherwise, this task is skipped. | +| [GenotypeGVCFs](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | GenotypeGVCFs | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `use_gnarly_genotyper` is “false” (default is “false”), performs joint genotyping on GVCF files stored in GenomicsDB; otherwise this task is skipped. | +| [HardFilterAndMakeSitesOnlyVcf](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | VariantFiltration, MakeSitesOnlyVcf | [GATK](https://gatk.broadinstitute.org/hc/en-us) | Uses the VCF files to hard filter the variant calls; outputs a VCF file with the site-specific (but not genotype) information. | +| [GatherVcfs as SitesOnlyGatherVcf](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | GatherVcfsCloud | [GATK](https://gatk.broadinstitute.org/hc/en-us) | Compiles the site-specific VCF files generated for each interval into one VCF output file and index. | +| [JointVcfFiltering as TrainAndApplyVETS](https://github.com/broadinstitute/gatk/blob/master/scripts/vcf_site_level_filtering_wdl/JointVcfFiltering.wdl) | ExtractVariantAnnotations, TrainVariantAnnotationsModel, ScoreVariantAnnotations | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `run_vets` is “true” (default is “false”), calls the `JointVcfFiltering.wdl` subworkflow to extract variant-level annotations, trains a model for variant scoring, and scores variants; otherwise, this task is skipped. | +| [IndelsVariantRecalibrator](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | VariantRecalibrator | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `run_vets` is “false” (default is “false”), uses the compiled VCF file to build a recalibration model to score indel variant quality; produces a recalibration table. | +| [SNPsVariantRecalibratorCreateModel](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | VariantRecalibrator | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `run_vets` is “false” (default is “false”) and the number of input GVCF files is greater than `snps_variant_recalibration_threshold`, builds a recalibration model to score variant quality; otherwise this task is skipped. | +| [SNPsVariantRecalibrator as SNPsVariantRecalibratorScattered](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | VariantRecalibrator | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `run_vets` is “false” (default is “false”) and the number of input GVCF files is greater than `snps_variant_recalibration_threshold`, builds a scattered recalibration model to score variant quality; otherwise this task is skipped. | +| [Tasks.GatherTranches as SNPGatherTranches](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | GatherTranches | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `run_vets` is “false” (default is “false”) and the number of input GVCF files is greater than `snps_variant_recalibration_threshold`, gathers tranches into a single file; otherwise this task is skipped. | +| [SNPsVariantRecalibrator as SNPsVariantRecalibratorClassic](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | VariantRecalibrator | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `run_vets` is “false” (default is “false”) and the number of input GVCF files is not greater than `snps_variant_recalibration_threshold`, builds a recalibration model to score variant quality; otherwise this task is skipped. | +| [ApplyRecalibration](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | ApplyVQSR | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `run_vets` is “false” (default is “false”), scatters the site-specific VCF file and applies a filtering threshold. | +| [CollectVariantCallingMetrics as CollectMetricsSharded](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | CollectVariantCallingMetrics | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If the callset has at least 1000 GVCF files, returns detail and summary metrics for each of the scattered VCF files. If the number is small, will return metrics for a merged VCF file produced in the `GatherVcfs as FinalGatherVcf` task (listed below). | +| [GatherVcfs as FinalGatherVcf](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | GatherVcfsCloud | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If the callset has fewer than 1000 GVCF files, compiles the VCF files prior to collecting metrics in the `CollectVariantCallingMetrics as CollectMetricsOnFullVcf` task (listed below). | +| [CollectVariantCallingMetrics as CollectMetricsOnFullVcf](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | CollectVariantCallingMetrics | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If the callset has fewer than 1000 GVCF files, returns metrics for the merged VCF file produced in the `GatherVcfs as FinalGatherVcf` task. | +| [GatherVariantCallingMetrics](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | AccumulateVariantCallingMetrics | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If the callset has at least 1000 GVCF files, gathers metrics produced for each VCF file. | +| [GetFingerprintingIntervalIndices](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | IntervalListTools | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `cross_check_fingerprints` is “true” (default is “true”) and `scatter_cross_check_fingerprints` is “true” (default is “false”), gets and sorts indices for fingerprint intervals; otherwise the task is skipped. | +| [GatherVcfs as GatherFingerprintingVcfs](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | GatherVcfsCloud | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `cross_check_fingerprints` is “true” (default is “true”) and `scatter_cross_check_fingerprints` is “true” (default is “false”), compiles the fingerprint VCF files; otherwise the task is skipped. | +| [SelectFingerprintSiteVariants](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | SelectVariants | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `cross_check_fingerprints` is “true” (default is “true”)and `scatter_cross_check_fingerprints` is “true” (default is “false”), selects variants from the fingerprint VCF file; otherwise the task is skipped. | +| [PartitionSampleNameMap](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | bash | bash | If `cross_check_fingerprints` is “true” (default is “true”) and `scatter_cross_check_fingerprints` is “true” (default is “false”), partitions the sample name map and files are scattered by the partition; otherwise the task is skipped. | +| [CrossCheckFingerprint as CrossCheckFingerprintsScattered](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | CrosscheckFingerprints | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `cross_check_fingerprints` is “true” (default is “true”) and `scatter_cross_check_fingerprints` is “true” (default is “false”), checks fingerprints for the VCFs in the scattered partitions and produces a metrics file; otherwise the task is skipped. | +| [GatherPicardMetrics as GatherFingerprintingMetrics](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | bash | bash | If `cross_check_fingerprints` is “true” (default is “true”) and `scatter_cross_check_fingerprints` is “true” (default is “false”), combines the fingerprint metrics files into a single metrics file; otherwise the task is skipped. | +| [CrossCheckFingerprint as CrossCheckFingerprintSolo](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) | CrosscheckFingerprints | [GATK](https://gatk.broadinstitute.org/hc/en-us) | If `cross_check_fingerprints` is “true” (default is “true”) and `scatter_cross_check_fingerprints` is “false” (default is “false”), checks fingerprints for the single VCF file and produces a metrics file; otherwise the task is skipped. | + +#### 1. Splits the input interval list and imports GVCF files + +The [SplitIntervalList](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task uses GATK’s SplitIntervals tool to split the input interval list into two or more interval files. The number of output interval files can be specified using the `top_level_scatter_count` input parameter or by specifying `unbounded_scatter_count_scale_factor`, which will scale the number of output files based on the number of input GVCF files. + +The [ImportGVCFs](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task uses GATK’s GenomicsDBImport tool and the input sample map file to import single-sample GVCF files into GenomicsDB before joint genotyping. + +#### 2. Performs joint genotyping using GATK GenotypeGVCFs (default) or GnarlyGenotyper + +**GenotypeGVCFs (default)** + +When `use_gnarly_genotyper` is “false”, the [GenotypeGVCFs](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task uses GATK’s GenotypeGVCFs tool to perform joint genotyping on GVCF files stored in GenomicsDB that have been pre-called with HaplotypeCaller. + +**GnarlyGenotyper** + +When `use_gnarly_genotyper` is “true”, the [SplitIntervalList as GnarlyIntervalScatterDude](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task splits the unpadded interval list for scattering using GATK’s SplitIntervals tool. The output is used as input for the [GnarlyGenotyper](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task which performs joint genotyping on the set of GVCF files and outputs an array of VCF and index files using the GnarlyGenotyper tool. Those VCF and index files are gathered in the next task, [GatherVcfs as TotallyRadicalGatherVcfs](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl), which uses the GatherVcfsCloud tool. + +#### 3. Creates single site-specific VCF and index files + +The [HardFilterAndMakeSitesOnlyVcf](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task takes in the output VCF and index files produced by either GnarlyGenotyper or GenotypeGVCFs. The task uses the `excess_het_threshold` input value to hard filter the variant calls using GATK’s VariantFiltration tool. After filtering, the site-specific VCF files are generated from the filtered VCF files by removing all sample-specific genotype information, leaving only the site-level summary information at each site. + +Next, the site-specific VCF and index files for each interval are gathered into a single site-specific VCF and index file by the [GatherVcfs as SitesOnlyGatherVcf](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task, which uses the GatherVcfsCloud tool. + +#### 4. Creates and applies a variant filtering model using GATK VQSR (default) or VETS + +**VQSR (default)** + +If `run_vets` is “false”, the [IndelsVariantRecalibrator](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task takes in the site-specific VCF and index files generated in [Step 3](#3-creates-single-site-specific-VCF-and-index-files) and uses GATK’s VariantRecalibrator tool to perform the first step of the Variant Quality Score Recalibration (VQSR) technique of filtering variants. The tool builds a model to be used to score and filter indels and produces a recalibration table as output. + +After building the indel filtering model, the workflow uses the VariantRecalibrator tool to build a model to be used to score and filter SNPs. If the number of input GVCF files is greater than `snps_variant_recalibration_threshold`, the [SNPsVariantRecalibratorCreateModel](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl), [SNPsVariantRecalibrator as SNPsVariantRecalibratorScattered](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl), and [Tasks.GatherTranches as SNPGatherTranches](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) tasks are called to scatter the site-specific VCF and index files, build the SNP model, and gather scattered tranches into a single file. If the number of input GVCF files is less than `snps_variant_recalibration_threshold`, the [SNPsVariantRecalibrator as SNPsVariantRecalibratorClassic](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task is called to build the SNP model. + +The [ApplyRecalibration](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task uses GATK’s ApplyVQSR tool to scatter the site-specific VCF file, apply the indel and SNP filtering models, and output a recalibrated VCF and index file. + +**VETS** + +If `run_vets` is “true”, the [JointVcfFiltering as TrainAndApplyVETS](https://github.com/broadinstitute/gatk/blob/master/scripts/vcf_site_level_filtering_wdl/JointVcfFiltering.wdl) task takes in the hard filtered and site-specific VCF and index files generated in [Step 3](#3-creates-single-site-specific-VCF-and-index-files) and calls the `JointVcfFiltering.wdl` subworkflow. This workflow uses the Variant Extract-Train-Score (VETS) algorithm to extract variant-level annotations, train a filtering model, and score variants based on the model. The subworkflow uses the GATK ExtractVariantAnnotations, TrainVariantAnnotationsModel, and ScoreVariantAnnotations tools to create extracted and scored VCF and index files. The output VCF and index files are not filtered by the score assigned by the model. The score is included in the output VCF files in the INFO field as an annotation called “SCORE”. + +The VETS algorithm trains the model only over target regions, rather than including exon tails which can lead to poor-quality data. However, the model is applied everywhere including the exon tails. + +#### 5. Collects variant calling metrics + +Summary and per-sample metrics are collected using Picard’s CollectVariantCallingMetrics tool. For large callsets (at least 1000 GVCF files), the workflow calls the [CollectVariantCallingMetrics as CollectMetricsSharded](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) followed by the [GatherVariantCallingMetrics](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task to compute and gather the variant calling metrics into single output files. For small callsets (less than 1000 GVCF files), the workflow calls the [GatherVcfs as FinalGatherVcf](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task followed by the [CollectVariantCallingMetrics as CollectMetricsOnFullVcf](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task to first compile the VCF files and then compute the variant calling metrics. Detail and summary metrics files are produced as outputs of these tasks. + +#### 6. Checks fingerprints (optional) + +If `cross_check_fingerprints` is “true”, the workflow will use Picard to determine the likelihood that the input and output data were generated from the same individual to verify that the pipeline didn’t swap any of the samples during processing. The [SelectFingerprintSiteVariants](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task uses GATK’s SelectVariants tool to select variants in the site-specific VCF file based on the variants present in the `haplotype_database` and outputs a fingerprint VCF and index file. Next, the workflow cross-checks the fingerprints and creates an output metrics file using the CrosscheckFingerprints tool. + +## Outputs + +The following table lists the output variables and files produced by the pipeline. + +| Output name | Filename, if applicable | Output format and description | +| ------ | ------ | ------ | +| detail_metrics_file | `.variant_calling_detail_metrics` | Detail metrics file produced using Picard. | +| summary_metrics_file | `.variant_calling_summary_metrics` | Summary metrics file produced using Picard. | +| output_vcfs | `.vcf.gz` or `.filtered..vcf.gz` | Array of all site-specific output VCF files. | +| output_vcf_indices | `.vcf.gz.tbi` or `.filtered..vcf.gz.tbi` | Array of all output VCF index files. | +| output_intervals | `scatterDir/` | Interval list file produced by the workflow. | +| crosscheck_fingerprint_check | `.fingerprintcheck` | Fingerprint metrics | Optional output file containing fingerprint metrics. | + +## Versioning and testing + +All JointGenotyping pipeline releases are documented in the [JointGenotyping changelog](https://github.com/broadinstitute/warp/blob/master/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.changelog.md) and tested using [plumbing and scientific test data](https://github.com/broadinstitute/warp/blob/master/pipelines/broad/dna_seq/germline/joint_genotyping/test_data_overview.md). To learn more about WARP pipeline testing, see [Testing Pipelines](https://broadinstitute.github.io/warp/docs/About_WARP/TestingPipelines). + +## Feedback + +Please help us make our tools better by contacting the [WARP Pipelines Team](mailto:warp-pipelines-help@broadinstitute.org) for pipeline-related suggestions or questions. \ No newline at end of file diff --git a/website/docs/Pipelines/JointGenotyping/_category_.json b/website/docs/Pipelines/JointGenotyping/_category_.json new file mode 100644 index 0000000000..8088ecaa3b --- /dev/null +++ b/website/docs/Pipelines/JointGenotyping/_category_.json @@ -0,0 +1,4 @@ +{ + "label": "JointGenotyping", + "position": 8 +} diff --git a/website/docs/Pipelines/Multiome_Pipeline/_category_.json b/website/docs/Pipelines/Multiome_Pipeline/_category_.json index 1ec6f2bad8..fddd703eab 100644 --- a/website/docs/Pipelines/Multiome_Pipeline/_category_.json +++ b/website/docs/Pipelines/Multiome_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Multiome scATAC and GEX", - "position": 8 + "position": 9 } diff --git a/website/docs/Pipelines/Optimus_Pipeline/_category_.json b/website/docs/Pipelines/Optimus_Pipeline/_category_.json index ebfd0a5ec3..5fa50a9742 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/_category_.json +++ b/website/docs/Pipelines/Optimus_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Optimus", - "position": 9 + "position": 10 } diff --git a/website/docs/Pipelines/PairedTag_Pipeline/_category_.json b/website/docs/Pipelines/PairedTag_Pipeline/_category_.json index d7305fba0f..94672fe8d3 100644 --- a/website/docs/Pipelines/PairedTag_Pipeline/_category_.json +++ b/website/docs/Pipelines/PairedTag_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Paired-Tag", - "position": 10 + "position": 11 } diff --git a/website/docs/Pipelines/RNA_with_UMIs_Pipeline/_category_.json b/website/docs/Pipelines/RNA_with_UMIs_Pipeline/_category_.json index d8cf127ef8..d17a4bd158 100644 --- a/website/docs/Pipelines/RNA_with_UMIs_Pipeline/_category_.json +++ b/website/docs/Pipelines/RNA_with_UMIs_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "RNA with UMIs", - "position": 11 + "position": 12 } diff --git a/website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/_category_.json b/website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/_category_.json index 2145d730e7..e6581e6a3b 100644 --- a/website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/_category_.json +++ b/website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Single Cell ATAC", - "position": 12 + "position": 13 } diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/_category_.json b/website/docs/Pipelines/SlideSeq_Pipeline/_category_.json index a658fab6e4..74b2f466a6 100644 --- a/website/docs/Pipelines/SlideSeq_Pipeline/_category_.json +++ b/website/docs/Pipelines/SlideSeq_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Slide-seq", - "position": 13 + "position": 14 } \ No newline at end of file diff --git a/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/_category_.json b/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/_category_.json index a14bbb52df..c6c709a762 100644 --- a/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/_category_.json +++ b/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Smart-seq2 Multi-Sample", - "position": 15 + "position": 16 } diff --git a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/_category_.json b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/_category_.json index 19995a09ed..7b7a9bf0ed 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/_category_.json +++ b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Smart-seq2 Single Nucleus Multi-Sample", - "position": 14 + "position": 15 } diff --git a/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/_category_.json b/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/_category_.json index 75e33d384f..5c2d6a9b2a 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/_category_.json +++ b/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Smart-seq2 Single Sample", - "position": 16 + "position": 17 } diff --git a/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/_category_.json b/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/_category_.json index 010f0be5a3..edca41ff15 100644 --- a/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/_category_.json +++ b/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Ultima Genomics Whole Genome Germline", - "position": 18 + "position": 19 } diff --git a/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/_category_.json b/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/_category_.json index 7fd28c7d80..d44ed244cb 100644 --- a/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/_category_.json +++ b/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/_category_.json @@ -1,4 +1,4 @@ { "label": "Whole Genome Germline Single Sample", - "position": 19 + "position": 20 } diff --git a/website/docs/Pipelines/snM3C/_category_.json b/website/docs/Pipelines/snM3C/_category_.json index 0aed70ddcf..e646087b04 100644 --- a/website/docs/Pipelines/snM3C/_category_.json +++ b/website/docs/Pipelines/snM3C/_category_.json @@ -1,4 +1,4 @@ { "label": "Single Nucleus Methyl-Seq and Chromatin Capture", - "position": 17 + "position": 18 } From e1c6825c91dbfd672500a8a16eaee33a5899593f Mon Sep 17 00:00:00 2001 From: rsc3 Date: Mon, 12 Feb 2024 11:32:26 -0500 Subject: [PATCH 09/68] modify awk WDL to cat directly to output file with >> instead of cat --- tasks/skylab/StarAlign.wdl | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 3a6717c36e..8c7fdf7f73 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -524,12 +524,12 @@ task MergeStarOutput { cat "$align_feature" >> "~{input_id}_align_features.txt" fi done - - counter=0 # note that the counter might not correspond to the shard number, it is just the order of files in bash (e.g. 10 before 2) + + # note that the counter might not correspond to the shard number, it is just the order of files in bash (e.g. 10 before 2) + counter=0 for umipercell in "${umipercell_files[@]}"; do if [ -f "$umipercell" ]; then - awk -v var="$counter" '{print $0, var}' "$umipercell" > "$umipercell" - cat "$umipercell" >> "~{input_id}_umipercell.txt" + awk -v var="$counter" '{print $0, var}' "$umipercell" >> "~{input_id}_umipercell.txt" let counter=counter+1 fi done From 809199538dc48b399aab09ab43a7c012beffb0fd Mon Sep 17 00:00:00 2001 From: phendriksen100 <103142505+phendriksen100@users.noreply.github.com> Date: Mon, 12 Feb 2024 17:53:03 -0500 Subject: [PATCH 10/68] Ph pd 2086 replace loom output (#1204) Updating slide-seq update to h5ad --- .../skylab/slideseq/SlideSeq.changelog.md | 5 +++++ pipelines/skylab/slideseq/SlideSeq.wdl | 20 +++++++++---------- verification/VerifySlideSeq.wdl | 10 +++++----- verification/test-wdls/TestSlideSeq.wdl | 10 +++++----- 4 files changed, 24 insertions(+), 21 deletions(-) diff --git a/pipelines/skylab/slideseq/SlideSeq.changelog.md b/pipelines/skylab/slideseq/SlideSeq.changelog.md index f540bdc710..fde1b8df3d 100644 --- a/pipelines/skylab/slideseq/SlideSeq.changelog.md +++ b/pipelines/skylab/slideseq/SlideSeq.changelog.md @@ -1,3 +1,8 @@ +# 3.0.0 +2024-02-12 (Date of Last Commit) + +* Updated the SlideSeq WDL output to utilize the h5ad format in place of Loom + # 2.1.6 2024-01-30 (Date of Last Commit) diff --git a/pipelines/skylab/slideseq/SlideSeq.wdl b/pipelines/skylab/slideseq/SlideSeq.wdl index c469d7fe56..dd7c3de10f 100644 --- a/pipelines/skylab/slideseq/SlideSeq.wdl +++ b/pipelines/skylab/slideseq/SlideSeq.wdl @@ -3,7 +3,7 @@ version 1.0 import "../../../tasks/skylab/StarAlign.wdl" as StarAlign import "../../../tasks/skylab/FastqProcessing.wdl" as FastqProcessing import "../../../tasks/skylab/Metrics.wdl" as Metrics -import "../../../tasks/skylab/LoomUtils.wdl" as LoomUtils +import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "../../../tasks/skylab/CheckInputs.wdl" as OptimusInputChecks import "../../../tasks/skylab/MergeSortBam.wdl" as Merge @@ -23,7 +23,7 @@ import "../../../tasks/skylab/MergeSortBam.wdl" as Merge workflow SlideSeq { - String pipeline_version = "2.1.6" + String pipeline_version = "3.0.0" input { Array[File] r1_fastq @@ -50,8 +50,8 @@ workflow SlideSeq { } call StarAlign.STARGenomeRefVersion as ReferenceCheck { - input: - tar_star_reference = tar_star_reference + input: + tar_star_reference = tar_star_reference } call Metrics.FastqMetricsSlideSeq as FastqMetrics { @@ -114,7 +114,7 @@ workflow SlideSeq { input_id = input_id } if ( !count_exons ) { - call LoomUtils.OptimusLoomGeneration as SlideseqLoomGeneration{ + call H5adUtils.OptimusH5adGeneration as SlideseqH5adGeneration{ input: input_id = input_id, annotation_file = annotations_gtf, @@ -135,7 +135,7 @@ workflow SlideSeq { matrix = STARsoloFastqSlideSeq.matrix_sn_rna, input_id = input_id } - call LoomUtils.SingleNucleusOptimusLoomOutput as SlideseqLoomGenerationWithExons{ + call H5adUtils.SingleNucleusOptimusH5adOutput as OptimusH5adGenerationWithExons{ input: input_id = input_id, annotation_file = annotations_gtf, @@ -149,10 +149,9 @@ workflow SlideSeq { gene_id_exon = MergeStarOutputsExons.col_index, pipeline_version = "SlideSeq_v~{pipeline_version}" } - } - File final_loom_output = select_first([SlideseqLoomGenerationWithExons.loom_output, SlideseqLoomGeneration.loom_output]) + File final_h5ad_output = select_first([OptimusH5adGenerationWithExons.h5ad_output, SlideseqH5adGeneration.h5ad_output]) output { String pipeline_version_out = pipeline_version @@ -173,8 +172,7 @@ workflow SlideSeq { File fastq_reads_per_umi = FastqMetrics.numReads_perUMI - # loom - File? loom_output_file = final_loom_output - + # h5ad + File? h5ad_output_file = final_h5ad_output } } diff --git a/verification/VerifySlideSeq.wdl b/verification/VerifySlideSeq.wdl index e3a8e7e8e4..d20f991d08 100644 --- a/verification/VerifySlideSeq.wdl +++ b/verification/VerifySlideSeq.wdl @@ -5,8 +5,8 @@ import "../verification/VerifyTasks.wdl" as VerifyTasks workflow VerifySlideSeq { input { - File test_loom - File truth_loom + File test_h5ad + File truth_h5ad File test_bam File truth_bam @@ -48,10 +48,10 @@ workflow VerifySlideSeq { truth_zip = truth_umi_metrics } - call VerifyTasks.CompareLooms as CompareLooms{ + call VerifyTasks.CompareH5adFilesGEX as CompareH5adFilesOptimus { input: - test_loom = test_loom, - truth_loom = truth_loom + test_h5ad = test_h5ad, + truth_h5ad = truth_h5ad } } \ No newline at end of file diff --git a/verification/test-wdls/TestSlideSeq.wdl b/verification/test-wdls/TestSlideSeq.wdl index bb92bb610f..b63cd87099 100644 --- a/verification/test-wdls/TestSlideSeq.wdl +++ b/verification/test-wdls/TestSlideSeq.wdl @@ -57,7 +57,7 @@ workflow TestSlideSeq { SlideSeq.bam, ], # File? outputs - select_all([SlideSeq.loom_output_file]), + select_all([SlideSeq.h5ad_output_file]), ]) @@ -94,9 +94,9 @@ workflow TestSlideSeq { # This is achieved by passing each desired file/array[files] to GetValidationInputs if (!update_truth){ - call Utilities.GetValidationInputs as GetLoom { + call Utilities.GetValidationInputs as GetH5adInputs { input: - input_file = SlideSeq.loom_output_file, + input_file = SlideSeq.h5ad_output_file, results_path = results_path, truth_path = truth_path } @@ -127,8 +127,8 @@ workflow TestSlideSeq { call VerifySlideSeq.VerifySlideSeq as Verify { input: - truth_loom = GetLoom.truth_file, - test_loom = GetLoom.results_file, + truth_h5ad = GetH5adInputs.truth_file, + test_h5ad = GetH5adInputs.results_file, truth_bam = GetBam.truth_file, test_bam = GetBam.results_file, truth_gene_metrics = GetGeneMetrics.truth_file, From 103111f739f03cf6f2a57b26c0cb95a30fe80b3b Mon Sep 17 00:00:00 2001 From: rsc3 Date: Mon, 12 Feb 2024 18:09:40 -0500 Subject: [PATCH 11/68] add shard counter to summary file output --- tasks/skylab/StarAlign.wdl | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 8c7fdf7f73..42ad5eb46d 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -513,9 +513,11 @@ task MergeStarOutput { fi done + counter=0 for summary in "${summary_files[@]}"; do if [ -f "$summary" ]; then - cat "$summary" >> "~{input_id}_summary.txt" + awk -v var=",$counter" '{print $0 var}' "$summary" >> "~{input_id}_summary.txt" + let counter=counter+1 fi done From c0da1eb21d3802d850e8bb0d0f213de3e369e8ac Mon Sep 17 00:00:00 2001 From: rsc3 Date: Mon, 12 Feb 2024 22:26:01 -0500 Subject: [PATCH 12/68] add shard counter for align_features --- tasks/skylab/StarAlign.wdl | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 42ad5eb46d..e67bbf452b 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -521,9 +521,11 @@ task MergeStarOutput { fi done - for align_feature in "${align_features_files[@]}"; do + counter=0 + for align_feature in "${align_features[@]}"; do if [ -f "$align_feature" ]; then - cat "$align_feature" >> "~{input_id}_align_features.txt" + awk -v var="$counter" '{print $0 " " var}' "$align_feature" >> "~{input_id}_align_features.txt" + let counter=counter+1 fi done From 45710535b6f1b45876ed1df95a7f05d8163041e5 Mon Sep 17 00:00:00 2001 From: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> Date: Tue, 13 Feb 2024 13:44:30 -0500 Subject: [PATCH 13/68] update slide-seq docs (#1205) --- .../Pipelines/Optimus_Pipeline/Loom_schema.md | 4 +- .../Pipelines/SlideSeq_Pipeline/README.md | 42 +++---- .../count-matrix-overview.md | 115 ++++++++++-------- .../SlideSeq_Pipeline/slide-seq_diagram.png | Bin 83392 -> 83512 bytes 4 files changed, 87 insertions(+), 74 deletions(-) diff --git a/website/docs/Pipelines/Optimus_Pipeline/Loom_schema.md b/website/docs/Pipelines/Optimus_Pipeline/Loom_schema.md index 7a76ffe328..5b4a8f44ed 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/Loom_schema.md +++ b/website/docs/Pipelines/Optimus_Pipeline/Loom_schema.md @@ -14,7 +14,7 @@ It contains the raw, but UMI-corrected cell by gene counts, which vary depending You can determine which type of counts are in the h5ad file by looking at the unstructured metadata (the `anndata.uns` property of the matrix) `expression_data_type` key (see [Table 1](#table-1-global-attributes) below). -The matrix also contains multiple metrics for both individual cells (the `anndata.obs` property of the matrix; [Table 2](#table-2-cell-metrics) and individual genes (the `anndata.var` property of the matrix; [Table 3](#table-3-gene-metrics)). +The matrix also contains multiple metrics for both individual cells (the `anndata.obs` property of the matrix; [Table 2](#table-2-cell-metrics)) and individual genes (the `anndata.var` property of the matrix; [Table 3](#table-3-gene-metrics)). :::tip Additional Matrix Processing for Consortia Previous Loom files generated by Optimus for consortia, such as the Human Cell Atlas (HCA) or the BRAIN Initiative Cell Census Network (BICCN), may have additional processing steps. Read the [Consortia Processing Overview](consortia-processing.md#hca-data-coordination-platform-matrix-processing) for details on consortia-specific matrix changes. @@ -80,7 +80,7 @@ The global attributes (unstuctured metadata) in the h5ad apply to the whole file | `emptydrops_PValue` | [dropletUtils](https://bioconductor.org/packages/release/bioc/html/DropletUtils.html) | The Monte Carlo p-value against the null model; single-cell data will read `NA` if task is unable to detect knee point inflection. Column is not included for data run in the `sn_rna` mode | | `emptydrops_Total` | [dropletUtils](https://bioconductor.org/packages/release/bioc/html/DropletUtils.html) | The total read counts for each barcode; single-cell data will read `NA` if task is unable to detect knee point inflection. Column is not included for data run in the `sn_rna` mode. | | `reads_mapped_intergenic` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of reads counted as intergenic; counted when the BAM file's `sF` tag is assigned to a `7` and the `NH:i` tag is `1`. | -| `reads_unmapped` | [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The total number of reads that are unmapped; counted when the BAM file's`sF` tag is `0`. | +| `reads_unmapped` | [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The total number of reads that are unmapped; counted when the BAM file's `sF` tag is `0`. | |`reads_per_molecule`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The average number of reads associated with each molecule in the cell. | ## Table 3. Gene metrics diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/README.md b/website/docs/Pipelines/SlideSeq_Pipeline/README.md index b4ce9af4e0..106538f0f7 100644 --- a/website/docs/Pipelines/SlideSeq_Pipeline/README.md +++ b/website/docs/Pipelines/SlideSeq_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/SlideSeq_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [SlideSeq v1.0.1](https://github.com/broadinstitute/warp/releases) | March, 2023 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | +| [SlideSeq v3.0.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | ![SlideSeq_diagram](./slide-seq_diagram.png) @@ -15,7 +15,7 @@ slug: /Pipelines/SlideSeq_Pipeline/README The [Slide-seq workflow](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/slideseq/SlideSeq.wdl) is an open-source, cloud-optimized pipeline developed in collaboration with the [BRAIN Initiative Cell Census Network](https://biccn.org/) (BICCN) and the BRAIN Initiative Cell Atlas Network (BICAN). It supports the processing of spatial transcriptomic data generated with the [Slide-seq](https://www.science.org/doi/10.1126/science.aaw1219) (commercialized as [Curio Seeker](https://curiobioscience.com/product/)) assay. -Overall, the workflow corrects bead barcodes, aligns reads to the genome, generates a count matrix, calculates summary metrics for genes, barcodes, and UMIs, and returns read outputs in BAM format. +Overall, the workflow corrects bead barcodes, aligns reads to the genome, generates a count matrix, calculates summary metrics for genes, barcodes, and UMIs, returns read outputs in BAM format, and returns counts in numpy matrix and h5ad file formats. Slide-seq has been validated for analyzing mouse datasets generated with the Slide-seq assay. Learn more in the [validation section](#validation-against-on-prem-pipeline). @@ -37,7 +37,7 @@ The following table provides a quick glance at the Slide-seq pipeline features: | Transcriptomic reference annotation | M23 mouse transcriptome built with the [BuildIndices workflow](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/build_indices/BuildIndices.wdl) | GENCODE [mouse GTF](https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/gencode.vM23.annotation.gff3.gz); [modified version](https://console.cloud.google.com/storage/browser/_details/gcp-public-data--broad-references/mm10/v0/single_nucleus/star/modified_star_2.7.9a_primary_gencode_mouse_vM23.tar;tab=live_object) available in Broad’s public reference bucket | | Aligner and transcript quantification | STARsolo | [Kaminow et al. 2021](https://www.biorxiv.org/content/10.1101/2021.05.05.442755v1) | | Data input file format | File format in which sequencing data is provided | [FASTQ](https://academic.oup.com/nar/article/38/6/1767/3112533) | -| Data output file format | File formats in which Slide-seq output is provided | [BAM](http://samtools.github.io/hts-specs/), Python NumPy arrays, and Loom (generated with [Loompy)](http://loompy.org/) | +| Data output file format | File formats in which Slide-seq output is provided | [BAM](http://samtools.github.io/hts-specs/), Python NumPy arrays, and h5ad | ## Set-up @@ -67,7 +67,7 @@ The Slide-seq workflow inputs are specified in JSON configuration files. Example | tar_star_reference | TAR file containing a species-specific reference genome and GTF; generated using the [BuildIndices workflow](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/build_indices/BuildIndices.wdl). | File | | annotations_gtf | GTF containing gene annotations used for gene tagging (must match GTF in STAR reference). | File | | output_bam_basename | Optional string used for the output BAM file basename. | String | -| count_exons | Optional boolean indicating if the workflow should calculate exon counts; default is set to “true” and produces a Loom file containing both whole-gene counts and exon counts in an additional layer; when set to “false”, a Loom file containing only whole-gene counts is produced. | Boolean | +| count_exons | Optional boolean indicating if the workflow should calculate exon counts; default is set to “true” and produces an h5ad file containing both whole-gene counts and exon counts in an additional layer; when set to “false”, an h5ad file containing only whole-gene counts is produced. | Boolean | | bead_locations | Whitelist TSV file containing bead barcodes and XY coordinates on a single line for each bead; determined by sequencing prior to mRNA transfer and library preparation. | File | #### Pseudogene handling @@ -84,11 +84,11 @@ The [Slide-seq workflow](https://github.com/broadinstitute/warp/blob/master/pipe Overall, the Slide-seq workflow: 1. Calculates prealignment metrics. -1. Uses sctools to filter, trim, and split reads into < 30 GB FASTQs. +1. Filters, trims, and splits reads into < 30 GB FASTQs. 1. Uses STARsolo to correct bead barcodes, align reads, and count genes. 1. Calculates metrics. 1. Merges the STAR outputs into NPY and NPZ arrays. -1. Merges gene counts and metrics into a Loom-formatted matrix. +1. Merges gene counts and metrics into a h5ad-formatted matrix. The tools each Slide-seq task employs are detailed in the table below. @@ -104,12 +104,12 @@ To see specific tool parameters, select the task WDL link in the table; then fin | [Metrics.CalculateUMIsMetrics (alias = UMIsMetrics)](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/Metrics.wdl) | TagSort | [warp-tools](https://github.com/broadinstitute/warp-tools) | Sorts the BAM file by gene using the bead barcode (CB), molecule barcode (UB), and gene ID (GX) tags and computes gene metrics. | | [Metrics.CalculateCellMetrics (alias = CellMetrics)](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/Metrics.wdl) | TagSort | [warp-tools](https://github.com/broadinstitute/warp-tools) | Sorts the BAM file by bead barcode (CB), molecule barcode (UB), and gene ID (GX) tags and computes bead barcode metrics. | | [StarAlign.MergeStarOutput (alias = MergeStarOutputsExons)](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/StarAlign.wdl) | create-npz-output.py | [Python 3](https://www.python.org/) | Creates a compressed raw NPY or NPZ file containing the STARsolo output features (NPY), barcodes (NPZ) and counts (NPZ). By default, `count_exons` is true and exon counts are included in output files. When `count_exons` is false, exon counts are excluded. | -| [LoomUtils.SingleNucleusOptimusLoomOutput (alias = SlideseqLoomGenerationWithExons)](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/LoomUtils.wdl) | create_loom_slide_seq.py | [Python 3](https://www.python.org/) | Merges the gene counts, bead barcode metrics, and gene metrics data into a Loom formatted bead-by-gene matrix. By default, the Loom file contains whole-gene counts with exon counts in an additional layer. When `count_exons` is false, the task is run as `SlideseqLoomGeneration` and exon counts are excluded. | +| [H5adUtils.SingleNucleusOptimusH5adOutput (alias = OptimusH5adGenerationWithExons)](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/H5adUtils.wdl) | create_h5ad_optimus.py | [Python 3](https://www.python.org/) | Merges the gene counts, bead barcode metrics, and gene metrics data into an h5ad formatted bead-by-gene matrix. By default, the h5ad file contains whole-gene counts with exon counts in an additional layer. When `count_exons` is false, the task is run as `SlideseqH5adGeneration` and exon counts are excluded. | #### 1. Calculating prealignment metrics The [FastqMetricsSlideSeq](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/FastqProcessing.wdl) task calculates prealignment metrics used for assessing data quality from the input FASTQ files. These metrics include the bead barcode distribution, UMI distribution, number of reads per cell and number of UMIs per cell. These metrics are included in the final outputs of the workflow. -#### 2. Filtering reads, trimming barcodes, and splitting FASTQs with sctools +#### 2. Filtering reads, trimming barcodes, and splitting FASTQs **Read filtering** @@ -121,7 +121,7 @@ Barcodes that are more than one edit distance ([Hamming distance](https://www.nc **Barcode trimming** -The task uses sctools to trim spacer sequences from bead barcodes and UMIs for use by STARsolo, which requires continuous sample barcodes without spacer sequences between them. The input `read_structure` is used to parse the barcodes and remove any bases with tags other than C or M, which represent the bead barcode and UMI, respectively. For example, with a `read_structure` of 8C18X6C9M1X, bases represented by 18X and 1X are removed from the reads and the string of bases is rewritten with the structure 14C9M. Bases represented by tags other than X will also be removed during this step, so long as they are not C or M. +The task uses warp-tools to trim spacer sequences from bead barcodes and UMIs for use by STARsolo, which requires continuous sample barcodes without spacer sequences between them. The input `read_structure` is used to parse the barcodes and remove any bases with tags other than C or M, which represent the bead barcode and UMI, respectively. For example, with a `read_structure` of 8C18X6C9M1X, bases represented by 18X and 1X are removed from the reads and the string of bases is rewritten with the structure 14C9M. Bases represented by tags other than X will also be removed during this step, so long as they are not C or M. **FASTQ splitting** @@ -151,40 +151,40 @@ The resulting BAM files are merged together into a single BAM using the [MergeSo **STARsolo outputs** -The task’s output includes a coordinate-sorted BAM file containing the bead barcode-corrected reads and SAM attributes UB UR UY CR CB CY NH GX GN. Additionally, after counting, the task outputs three intermediate TSV files (features, barcodes, and matrix) used for downstream Loom matrix generation. +The task’s output includes a coordinate-sorted BAM file containing the bead barcode-corrected reads and SAM attributes UB UR UY CR CB CY NH GX GN. Additionally, after counting, the task outputs three intermediate TSV files (features, barcodes, and matrix) used for downstream h5ad matrix generation. #### 4. Calculating metrics The [CalculateGeneMetrics](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/Metrics.wdl), [CalculateUMIsMetrics](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/Metrics.wdl), and [CalculateCellMetrics](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/Metrics.wdl) tasks use [warp-tools](https://github.com/broadinstitute/warp-tools) to calculate summary metrics that help assess the per-bead and per-UMI quality of the data output each time this pipeline is run. -These metrics output from both tasks are included in the output Loom matrix. A detailed list of these metrics is found in the [Slide-seq Count Matrix Overview](./count-matrix-overview.md). +These metrics output from both tasks are included in the output h5ad matrix. A detailed list of these metrics is found in the [Slide-seq Count Matrix Overview](./count-matrix-overview.md). #### 5. Merging the STAR outputs into NPY and NPZ arrays The STARsolo output includes a features, barcodes, and matrix TSV for each of the partitioned FASTQ input files. The [MergeStarOutput task](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/StarAlign.wdl) merges each respective TSV. It uses a custom python script to convert the merged matrix, features, and barcodes output from STARsolo into an NPY (features and barcodes)- and NPZ (the matrix)-formatted file. -#### 6. Merging counts and metrics data into Loom-formatted matrix +#### 6. Merging counts and metrics data into h5ad-formatted matrix -The [SingleNucleusOptimusLoomOutput](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/LoomUtils.wdl) task uses a custom python script to merge the converted STARsolo count matrix and the cell (bead) and gene metrics into a Loom-formatted bead-by-gene matrix. **These counts are raw and unfiltered.** +The [SingleNucleusOptimusH5adOutput](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/H5adUtils.wdl) task uses a custom python script to merge the converted STARsolo count matrix and the cell (bead) and gene metrics into an h5ad-formatted bead-by-gene matrix. **These counts are raw and unfiltered.** Read full details for all the metrics in the [Slide-seq Count Matrix Overview](./count-matrix-overview.md). **Gene counts** -The type of gene counts in the Loom will vary depending on the value of the Slide-seq workflow input, `count_exons`. By default, `count_exons` is set to true and the output Loom will contain whole-gene counts with exon counts in an additional layer. +The type of gene counts in the h5ad file will vary depending on the value of the Slide-seq workflow input, `count_exons`. By default, `count_exons` is set to true and the output h5ad file will contain whole-gene counts with exon counts in an additional layer. -If the workflow is run with `count_exons` set to false, the output Loom file will contain whole-gene counts. Running the workflow in this configuration will cause the Loom matrix to have fewer columns (bead barcodes) due to the difference in STARsolo counting mode. +If the workflow is run with `count_exons` set to false, the output h5ad file will contain whole-gene counts. Running the workflow in this configuration will cause the h5ad matrix to have fewer columns (bead barcodes) due to the difference in STARsolo counting mode. -You can determine which type of counts are in the Loom by looking at the global attribute `expression_data_type`. +You can determine which type of counts are in the h5ad by looking at the global attribute `expression_data_type`. -After running the pipeline with `count_exons` set to true, you can access whole-gene and exonic counts using Loompy's `layers()` method. For example, `loompy.connect.layers[“”]` will return the whole-gene counts from the output Loom file. Similarly, `loompy.connect.layers[“exon_counts”]` will return the exonic counts from the output Loom. +After running the pipeline with `count_exons` set to true, you can access whole-gene and exonic counts using the AnnData `layers()` function. For example, adata.layers[“exon_counts”]` will return the exonic counts from the output h5ad. #### 6. Outputs Output files of the pipeline include: -1. Bead x Gene unnormalized count matrices in Loom format. +1. Bead x Gene unnormalized count matrices in h5ad format. 2. Unfiltered, sorted BAM file with barcode and downstream analysis tags. 3. Bead metadata, including bead metrics. 4. Gene metadata, including gene metrics. @@ -206,11 +206,9 @@ The following table lists the output files produced from the pipeline. For sampl | fastq_umi_distribution | `.barcode_distribution_XM.txt` | Metric file containing the distribution of reads per UMI that were calculated prior to alignment. | TXT | | fastq_reads_per_cell | `.numReads_perCell_XC.txt` | Metric file containing the number of reads per barcode that were calculated prior to alignment. | TXT | | fastq_reads_per_umi | `.numReads_perCell_XM.txt` | Metric file containing the number of reads per UMI that were calculated prior to alignment. | TXT | -| loom_output_file | `.loom` | Loom file containing count data and metadata. | Loom | +| h5ad_output_file | `.h5ad` | h5ad file containing count data and metadata. | H5AD | -The Loom matrix is the default output. See the [create_loom_slide_seq.py](https://github.com/broadinstitute/warp-tools/blob/develop/tools/scripts/create_loom_optimus.py) script for the detailed code. This matrix contains the unnormalized (unfiltered) count matrices, as well as the gene and bead barcode metrics detailed in the [Slide-seq Count Matrix Overview](./count-matrix-overview.md). - -The output Loom matrix can be converted to an H5AD file for downstream processing using a [custom script](https://github.com/broadinstitute/warp-tools/blob/develop/tools/scripts/loom_to_h5ad.py) available in the [warp-tools GitHub repository](https://github.com/broadinstitute/warp-tools). +The h5ad matrix is the default output. This matrix contains the unnormalized (unfiltered) count matrices, as well as the gene and bead barcode metrics detailed in the [Slide-seq Count Matrix Overview](./count-matrix-overview.md). ## Validation against on-prem pipeline diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/count-matrix-overview.md b/website/docs/Pipelines/SlideSeq_Pipeline/count-matrix-overview.md index 751ac6f605..32a87f13d2 100644 --- a/website/docs/Pipelines/SlideSeq_Pipeline/count-matrix-overview.md +++ b/website/docs/Pipelines/SlideSeq_Pipeline/count-matrix-overview.md @@ -4,28 +4,30 @@ sidebar_position: 2 # Slide-seq Count Matrix Overview -The Slide-seq pipeline's default count matrix output is a Loom file generated using [Loompy v.3.0.6](http://loompy.org/). +:::warning +The Loom matrix is deprecated and the default matrix is now h5ad. +::: -It contains the raw bead-by-gene counts, which vary depending on the workflow's `count_exons` parameter. By default, `count_exons` is set to `true` and the output Loom will contain whole-gene counts with exon counts in an additional layer. +The Slide-seq pipeline's default count matrix output is a h5ad file generated using [AnnData](https://anndata.readthedocs.io/en/latest/index.html). -If the workflow is run with `count_exons` set to `false`, the output Loom file will contain whole-gene counts. Running the workflow in this configuration will cause the Loom matrix to have fewer columns (bead barcodes) due to the difference in STARsolo counting mode. +It contains the raw bead-by-gene counts, which vary depending on the workflow's `count_exons` parameter. By default, `count_exons` is set to `true` and the output h5ad file will contain whole-gene counts with exon counts in an additional layer. -You can determine which type of counts are in the Loom by looking at the global attribute `expression_data_type` (see [Table 1](#table-1-global-attributes) below). +If the workflow is run with `count_exons` set to `false`, the output h5ad file will contain whole-gene counts. Running the workflow in this configuration will cause the h5ad matrix to have fewer columns (bead barcodes) due to the difference in STARsolo counting mode. -The matrix also contains multiple metrics for both individual bead barcodes (the columns of the matrix; [Table 2](#table-2-column-attributes-bead-barcode-metrics)) and individual genes (the rows of the matrix; [Table 3](#table-3-row-attributes-gene-metrics)). +You can determine which type of counts are in the h5ad file by looking at the unstructured metadata (the `anndata.uns` property of the matrix) `expression_data_type` key (see [Table 1](#table-1-global-attributes) below). + +The matrix also contains multiple metrics for both individual bead barcodes (the `anndata.obs` property of the matrix; [Table 2](#table-2-cell-metrics)) and individual genes (the `anndata.var` property of the matrix; [Table 3](#table-3-gene-metrics)) ## Table 1. Global attributes -The global attributes in the Loom apply to the whole file, not any specific part. +The global attributes (unstuctured metadata) in the h5ad apply to the whole file, not any specific part. | Attribute | Details | | :-------- | :------ | -| `CreationDate` | Date the Loom file was created. | -| `LOOM_SPEC_VERSION` | Loom file spec version used during creation of the Loom file. | | `expression_data_type` | String describing if the pipeline counted whole transcript (exonic and intronic) or only exonic reads determined by the value of the `count_exons` parameter. By default, `count_exons` is `true` and `expression_data_type` is `whole_transcript`; if `count_exons` is `false` then `expression_data_type` is `exonic`. | | `input_id` | The `input_id` provided to the pipeline as input and listed in the pipeline configuration file. This can be any string, but it's recommended for this to be consistent with any sample metadata. | -| `optimus_output_schema_version` | Loom file spec version used during creation of the Loom file. | -| `pipeline_version` | Version of the Slide-seq pipeline used to generate the Loom file. | +| `optimus_output_schema_version` | h5ad file spec version used during creation of the h5ad file. | +| `pipeline_version` | Version of the Slide-seq pipeline used to generate the h5ad file. | ## Table 2. Column attributes (bead barcode metrics) @@ -33,37 +35,45 @@ The bead barcode metrics below are computed using [TagSort](https://github.com/b | Bead Barcode Metrics | Details | | :------------------- | :------ | +|`cell_names` | The unique identifier for each bead based on bead barcodes; identical to `CellID`. | | `CellID` | The unique identifier for each bead based on bead barcodes; identical to `cell_names`. | +|`n_reads`| The number of reads associated with this entity. n_reads, like all metrics, are calculated from the Slide-Seq output BAM. Prior to alignment with STARsolo, reads are checked against the whitelist (1 hamming distance). These CB-corrected reads are the input to the STAR aligner. Then, the reads also get CB correction during STAR. For this reason, almost all reads in the aligned BAM have a CB tag and UB tag. Therefore, n_reads represents CB corrected reads, not all reads in the input FASTQ files. | +|`noise_reads`| Number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides. | +|`perfect_molecule_barcodes`| The number of reads whose molecule barcodes contain no errors. | +| `reads_mapped_exonic` | The number of unique reads counted as exon; counted when BAM file's `sF` tag is assigned to `1` or `3` and the `NH:i` tag is `1`. | +| `reads_mapped_exonic_as` | The number of reads counted as exon in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `2` or `4` and the `NH:i` tag is `1`. | +| `reads_mapped_intronic` | The number of reads counted as intron; counted when the BAM file's `sF` tag is assigned to a `5` and the `NH:i` tag is `1`. | +| `reads_mapped_intronic_as` | The number of reads counted as intron in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `6` and the `NH:i` tag is `1`. | +|`reads_mapped_uniquely`| The number of reads mapped to a single unambiguous location in the genome. | +|`reads_mapped_multiple`| The number of reads mapped to multiple genomic positions with equal confidence. | +| `duplicate_reads` | The number of duplicate reads. | +|`spliced_reads`| The number of reads that overlap splicing junctions. | |`antisense_reads`| The number of reads that are mapped to the antisense strand instead of the transcribed strand. | -|`cell_barcode_fraction_bases_above_30_mean`| The average fraction of base calls for the bead barcode sequences that are greater than 30, across molecules. | -|`cell_barcode_fraction_bases_above_30_variance`| The variance of the fraction of base calls for the bead barcode sequences that are greater than 30, across molecules. | -|`cell_names` | The unique identifier for each bead based on bead barcodes; identical to `CellID`. | -|`fragments_per_molecule`| The average number of fragments associated with each molecule in this entity. | +|`n_molecules`| Number of molecules corresponding to this entity (only reflects reads with CB and UB tags). | +|`n_fragments`| Number of fragments corresponding to this entity. | |`fragments_with_single_read_evidence`| The number of fragments associated with this entity that are observed by only one read. | +|`molecules_with_single_read_evidence`| The number of molecules associated with this entity that are observed by only one read. | +|`perfect_cell_barcodes`| The number of reads whose bead barcodes contain no errors. | +| `reads_mapped_intergenic` | The number of reads counted as intergenic; counted when the BAM file's `sF` tag is assigned to a `7` and the `NH:i` tag is `1`. | +| `reads_unmapped` | The total number of reads that are unmapped; counted when the BAM file's `sF` tag is `0`. | +|`reads_mapped_too_many_loci`| The number of reads that were mapped to too many loci across the genome and as a consequence, are reported unmapped by the aligner. | +| `n_genes` | The number of genes detected by this bead. | | `genes_detected_multiple_observations` | The number of genes that are observed by more than one read in this entity. | -| `genomic_read_quality_mean` | Average quality of base calls in the genomic reads corresponding to this entity. | -| `genomic_read_quality_variance` | Variance in quality of base calls in the genomic reads corresponding to this entity. | -| `genomic_reads_fraction_bases_quality_above_30_mean` | The average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity. | -| `genomic_reads_fraction_bases_quality_above_30_variance` | The variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity. | -| `input_id` | The `input_id` provided to the pipeline as input and listed in the pipeline configuration file. This can be any string, but it's recommended for this to be consistent with any sample metadata. | | `molecule_barcode_fraction_bases_above_30_mean` | The average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity. | | `molecule_barcode_fraction_bases_above_30_variance` | The variance in the fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity.| -|`molecules_with_single_read_evidence`| The number of molecules associated with this entity that are observed by only one read. | -|`n_fragments`| Number of fragments corresponding to this entity. | -| `n_genes` | The number of genes detected by this bead. | +| `genomic_reads_fraction_bases_quality_above_30_mean` | The average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity. | +| `genomic_reads_fraction_bases_quality_above_30_variance` | The variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity. | +| `genomic_read_quality_mean` | Average quality of base calls in the genomic reads corresponding to this entity. | +| `genomic_read_quality_variance` | Variance in quality of base calls in the genomic reads corresponding to this entity. | +|`reads_per_molecule`| The average number of reads associated with each molecule in this entity. | +|`reads_per_fragment`| The average number of reads associated with each fragment in this entity. | +|`fragments_per_molecule`| The average number of fragments associated with each molecule in this entity. | +|`cell_barcode_fraction_bases_above_30_mean`| The average fraction of base calls for the bead barcode sequences that are greater than 30, across molecules. | +|`cell_barcode_fraction_bases_above_30_variance`| The variance of the fraction of base calls for the bead barcode sequences that are greater than 30, across molecules. | |`n_mitochondrial_genes`| The number of mitochondrial genes detected by this bead. | |`n_mitochondrial_molecules`| The number of molecules from mitochondrial genes detected for this bead. | -|`n_molecules`| Number of molecules corresponding to this entity (only reflects reads with CB and UB tags). | -|`n_reads`| The number of reads associated with this entity. n_reads, like all metrics, are calculated from the Optimus output BAM. Prior to alignment with STARsolo, reads are checked against the whitelist (1 hamming distance). These CB-corrected reads are the input to the STAR aligner. Then, the reads also get CB correction during STAR. For this reason, almost all reads in the aligned BAM have a CB tag and UB tag. Therefore, n_reads represents CB corrected reads, not all reads in the input FASTQ files. | -|`noise_reads`| Number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides. | |`pct_mitochondrial_molecules`| The percentage of molecules from mitochondrial genes detected for this bead. | -|`perfect_cell_barcodes`| The number of reads whose bead barcodes contain no errors. | -|`perfect_molecule_barcodes`| The number of reads whose molecule barcodes contain no errors. | -|`reads_mapped_multiple`| The number of reads mapped to multiple genomic positions with equal confidence. | -|`reads_mapped_too_many_loci`| The number of reads that were mapped to too many loci across the genome and as a consequence, are reported unmapped by the aligner. | -|`reads_mapped_uniquely`| The number of reads mapped to a single unambiguous location in the genome. | -|`reads_per_fragment`| The average number of reads associated with each fragment in this entity. | -|`spliced_reads`| The number of reads that overlap splicing junctions. | +| `input_id` | The `input_id` provided to the pipeline as input and listed in the pipeline configuration file. This can be any string, but it's recommended for this to be consistent with any sample metadata. | ## Table 3. Row attributes (gene metrics) @@ -72,28 +82,33 @@ The gene metrics below are computed using [TagSort](https://github.com/broadinst | Gene Metrics | Details | | ------------ | ------- | +|`gene_names` | The unique `gene_name` provided in the [GENCODE GTF](https://www.gencodegenes.org/); identical to the `Gene` attribute. | +|`ensembl_ids` | The `gene_id` provided in the [GENCODE GTF](https://www.gencodegenes.org/). | | `Gene` | The unique `gene_name` provided in the [GENCODE GTF](https://www.gencodegenes.org/); identical to the `gene_names` attribute. | +|`n_reads`| The number of reads associated with this entity. n_reads, like all metrics, are calculated from the Slide-Seq output BAM. Prior to alignment with STARsolo, reads are checked against the whitelist (1 hamming distance). These CB-corrected reads are the input to the STAR aligner. Then, the reads also get CB correction during STAR. For this reason, almost all reads in the aligned BAM have a CB tag and UB tag. Therefore, n_reads represents CB corrected reads, not all reads in the input FASTQ files. | +|`noise_reads`| The number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides. | +|`perfect_molecule_barcodes`| The number of reads with molecule barcodes that have no errors. | +| `reads_mapped_exonic` | The number of unique reads counted as exon; counted when BAM file's `sF` tag is assigned to `1` or `3` and the `NH:i` tag is `1`. | +| `reads_mapped_exonic_as` | The number of reads counted as exon in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `2` or `4` and the `NH:i` tag is `1`. | +| `reads_mapped_intronic` | The number of reads counted as intron; counted when the BAM file's `sF` tag is assigned to a `5` and the `NH:i` tag is `1`. | +| `reads_mapped_intronic_as` | The number of reads counted as intron in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `6` and the `NH:i` tag is `1`. | +|`reads_mapped_uniquely`| The number of reads mapped to a single unambiguous location in the genome. | +|`reads_mapped_multiple`| The number of reads mapped to multiple genomic positions with equal confidence. | +| `duplicate_reads` | The number of duplicate reads. | +|`spliced_reads`| The number of reads that overlap splicing junctions. | |`antisense_reads`| The number of reads that are mapped to the antisense strand instead of the transcribed strand. | -|`ensembl_ids` | The `gene_id` provided in the [GENCODE GTF](https://www.gencodegenes.org/). | -|`fragments_per_molecule`| The average number of fragments associated with each molecule in this entity. | -|`fragments_with_single_read_evidence`| The number of fragments associated with this entity that are observed by only one read. | -|`gene_names` | The unique `gene_name` provided in the [GENCODE GTF](https://www.gencodegenes.org/); identical to the `Gene` attribute. | -|`genomic_read_quality_mean`| Average quality of base calls in the genomic reads corresponding to this entity. | -|`genomic_read_quality_variance`| Variance in quality of base calls in the genomic reads corresponding to this entity. | -|`genomic_reads_fraction_bases_quality_above_30_mean`| The average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity. | -|`genomic_reads_fraction_bases_quality_above_30_variance`| The variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity. | |`molecule_barcode_fraction_bases_above_30_mean`| The average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity. | |`molecule_barcode_fraction_bases_above_30_variance`| The variance in the fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity. | -|`molecules_with_single_read_evidence`| The number of molecules associated with this entity that are observed by only one read. | -|`n_fragments`| Number of fragments corresponding to this entity. | +|`genomic_reads_fraction_bases_quality_above_30_mean`| The average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity. | +|`genomic_reads_fraction_bases_quality_above_30_variance`| The variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity. | +|`genomic_read_quality_mean`| Average quality of base calls in the genomic reads corresponding to this entity. | +|`genomic_read_quality_variance`| Variance in quality of base calls in the genomic reads corresponding to this entity. | |`n_molecules`| Number of molecules corresponding to this entity (only reflects reads with CB and UB tags). | -|`n_reads`| The number of reads associated with this entity. n_reads, like all metrics, are calculated from the Optimus output BAM. Prior to alignment with STARsolo, reads are checked against the whitelist (1 hamming distance). These CB-corrected reads are the input to the STAR aligner. Then, the reads also get CB correction during STAR. For this reason, almost all reads in the aligned BAM have a CB tag and UB tag. Therefore, n_reads represents CB corrected reads, not all reads in the input FASTQ files. | -|`noise_reads`| The number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides. | +|`n_fragments`| Number of fragments corresponding to this entity. | +|`reads_per_molecule`| The average number of reads associated with each molecule in this entity. | +|`reads_per_fragment`|The average number of reads associated with each fragment in this entity. | +|`fragments_per_molecule`| The average number of fragments associated with each molecule in this entity. | +|`fragments_with_single_read_evidence`| The number of fragments associated with this entity that are observed by only one read. | +|`molecules_with_single_read_evidence`| The number of molecules associated with this entity that are observed by only one read. | |`number_cells_detected_multiple`| The number of bead barcodes which observe more than one read of this gene. | |`number_cells_expressing`| The number of bead barcodes that detect this gene. | -|`perfect_molecule_barcodes`| The number of reads with molecule barcodes that have no errors. | -|`reads_mapped_multiple`| The number of reads mapped to multiple genomic positions with equal confidence. | -|`reads_mapped_uniquely`| The number of reads mapped to a single unambiguous location in the genome. | -|`reads_per_fragment`|The average number of reads associated with each fragment in this entity. | -|`reads_per_molecule`| The average number of reads associated with each molecule in this entity. | -|`spliced_reads`| The number of reads that overlap splicing junctions. | diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/slide-seq_diagram.png b/website/docs/Pipelines/SlideSeq_Pipeline/slide-seq_diagram.png index ce70363d23aa2977fe803f9a691088c9c30ea538..385c653c2091443745cd3815737c09c95a57f388 100644 GIT binary patch delta 32461 zcmce-Wmuct(mz_GxH}YgFHUin;_gtqcqtBnyA*d?9EucoibH`=f?=WBi<|@|X*2`w=ewBdQ=9d(kG)V?hWop1q=5;%5XBz{NX@vAHK0p~a;q{m)jXrrvj-%~>A^ z|9$u+B-8TPp*eqHsbh?9Jq1t7YV1D48n8;*9s^&e^l-J?ANDJ7hMR!r~6_ zCpVHO1U>K|=nm)*^sB~(V>tdK`+INcQJwT?xSUcLTn_B!@hFW+L|odj2H3=vMHKF% zmarBP+n6J?b89{s9wCsfz$1_zCjah?{cYkMY66PNJ`Rc<>y?fpS=hJ}Ui>+LdSDj2YuQ8L$J7{WOSk z)nQGivGNZf=yRygl>iTNZRE92kdVP$20!xdl}}J;KQtddX;NExBcFod!1lSVlYko# zFgg*W4=I;R+gxyyEZOmNSW&6GPe^0{>2T14Quy>CM-5XW*`HdOlVCH27j8a-(2K=@ zerSBfca{P7_7lTTQG<{IfaI>U3P#st2mYYmal014j2+hWFanvwm-(nu*UEyrToiea`%BN;qe~6zubGDLS9#*UqN{6Eu6;L&Ub{My*}8ro*d3p8FOEyBBG^sD&RPh%8o?=1YLC5gvd zoA(L%fCpNOC|Nn5{!3n!gg>=3Wf3wd9}v1`2&uV6Wn`d$VV{gnZA*F4xUm<9F-_70 zp7O?DLJ9YeAm|+e97tke^9~NASQJzp8>lhSIzN5dy(Z9 zWgRZZlb!bt{~R+tMESm3z#B23P{rO3X8onleODeN6? zLhnq{U~%ectnzYr_!yxb(wndrHY(GRwVx5__m)9kE=Y_vh|5@t>>oZ@th#IR&xHy^yXF4 z#lJQH~nYRs)HNF2iLR^^gwvYa>&);Ky7XD`OVo>U0U*=N6*pzG^3^efsGyB zly|P}G#zB}@vU|5`+08Wbr!xxd5LMsT|afI>wxRV8{qAh5+%DB%1;O#eLXtSg0u$A zA4rc#;1<66`(W3;^r!@;`9|QSJ4$V6k0I#kW#7XNhgJ(twH~ok{GEpSpv3v#ZwHT` z5%6B-zeAIq$){S7`GA#>e19U@UX=eSz3S+P0MiG5@}B5CM`_a;80AKq6G&&F&VBk* z2)ZH4qznh1_1C2Fw4p|ypnh%YuHR$^mOIQy@<%&NNbnbmp%|-@)c9lqX27swMK@k+ zVEQ(S8f%o~^HSV$X9Q{F;@P6sHlk(VW!hwEF`= zl$h(zsEuw}`YR@5-D>VkxFc;(d{WlERoe|;eXBwxGFb+-)mRI4(4QnnA}ycy+C+Tb zT#EX+-S+~Wl{~9G@*#Np56!OVdYmf$?B&0>cWwi2=&Zef`NU4~I+TNQTe;rxW9abq zd~62t_P>pA3rD{^l|FdIWieyCQZGX$et$~mf)g3M;IeuMwFm(k?e5qx=$N9R631AarN-9%+{#zP1jfQg(C3X`A!=j$HD!Till9lOO>TI;+g}M zdDy?({RR~kU4s5WoN#F`28AUXpM~EvhU$q-6v%Afu^j2+w+@dlR7z}F&j?d`-BUp~ zYEcfF6B`!pV1g9C`p|a&$K-h9t0Bk=IK49KZ%jDN+q#)9-|n^Yz8^=RdkH^0N(?Yh4n2Q6L8tg1$ziqL}y zN21}q0kJ?l{_zF+wH%S+r)?~TFporYvUlmSGQ zjk3A1pX=P;T@_k>8Sg!NX5AA*25J{QkH0ZxYlqSqPqtW>tl{aAX zj3~5z=&3$Lrf6JOr(By)U-K!-fj-dgT)pQ#5_%W+MG;Z%<`eoxtS}BixO0O{FyLV> zAd}v+#KBk6xO>}=htI3%dz-d%VZw`qWm(dGuYsMqf2wqmMkN|UsUDX-lc@N@_?_g= z*&2{{EOQYRz0w)$PhyFp0STx$K}4afmXmy?^XILDegWXt`ndV{r-

t$g1LxYIOZl8);!z_UNNl56_UY^b?M`hf5`I-O?Z2f&lFb$V*bI>OdY*L`hHDm%KB+zl{kCe|vg1&r}8Vr}OV6%9jDBUoLV!jt3%vET25e z(WJ9*HCTa;UA2W2FTK;sS8IE7+#1>)A0+R{J zZ)1%cA8TuKu|TBhw|xWU16`YQQA2^&<2REi>);n@7ZYdd#i!_$ES)$nx_3C*{*{%; z7tF+h1as3Wci-_-)9CT#jol4U&d7mSK)imf07T{pe*vm$7&#eokId|H*R@bwvBIL` zhHZI}>*o2IUX;{=GUC1tTgN#5v`MxP(ZFrmnpF+IE)xdImhKu%jubtWu?uL2 zITPUc!Cuu~Jqr?_I#*sP<1>QabE5S>9+Hx~*9j>Kg9AH$benlmkjF5m^_}1+zVnVz z@X4F9o2c8dpK*=Fn^!3SJC!3o-{h4!m7d%}*d*WjMAq%vz>lAC*NBv$OSZA0!-L6_ zgN#V1_REr9dqTj+1BWz7((dOE+8O*fes7`nRPb@M+wY22xV0a2GcRHtMQTI$5|y?7 z46c%-mG%`+-~0}n^?s0f8jUpvF+Nzl&WQAW;mEX9XD`%L?c5S^U$Y#N;ZGHZA5oBK zzRLTxu1*c*N|#Fp-sN$!`}%MQt;yJA-ZHHEf4by|0%S-;SHqKgL@GE4+_O9tkMZhgYcj#q3_FW6U1@7L2@ z4sQj4DwzHN@P?VneLz9CCV=?~ zugvfCoBWSmRa0Xg*-mb+njzCCfWl{g7Nk|ZKWNZrc*`e9gdBtJ`$NbiMnsQ@A$e%B z#7)e?U?e=coQajqe+pvk-5S#%ukr_D!Zz^ub;BahsYdHVI*ZZQ_z}s<%iOXE$}sps z*)Bxpcgw{?iCJMkD_t4!TqaP4B0L@LEc?3H+;JZfV0AHyeNSeQ@z7om_$%9DAw0Nx z!p-UljP6r|0#?tbH(}K&xBFCgDe~edp~`zV^%7ejS-s(&#KNF^V@1)o+chk~k2d(7 z&3=|j1M6NjDFO%r&zk6SgT5HJnHq+nKMwk6_~xxz`k&M=tVoI@KNcj@;V~1(c|OHn zt&BNh_H}LoZi(HU6cu3c`s8P-oIC>v1K`)jO@@6gy2`qWNH9aWgUbXp4G}J+RhOpO z^L#T0AC92D3f@1#V9OBgChQ9~A11@JeDKO}b9iLih|3(0qF zmeImc2^hi1t0IbUTf(ES!xfs(c|dW>IDDyLa8Y&;>7o&&9B7L!7qUF;jTPF1vD4hF zMZ)LU?~2G|$Z+c=Nj30VQ!s$}aAOy0eTa5hj0MReM>o8H!mx2Gkgcp;Qn`?kn2%s; z5u+)xKKlE!i{S87rYax0i|eg!;=4aZ<>(n*l9V{JDIacAQ^!81tqt(s8<3DGfWPfb zJzqR&_#3K#a*FHHm>}!8y!?`q_VXzLCA%8cgAZaA@u?f}3^YYz6McR87%=$JwG^QX z)nK^G#d$81u$g1yVzGVmceIV;G%JL#oiaX7){9K1<8O$-@{(U&ipuWq10k8PU0O{O z#&w<$KY3E?7ZxYm@^}j0WGKDLr`+0@NWV)v1U;+**b68>Ggz_F4wrw>gB-=O@K$h( zy+rrH5~RH|V()=NSs_4{zZq5n8Kv@_r3UB)L-D+gmTEa0BadxVU!VSUg?8R|rOrQu z-90)-iZ{RcyQQeOlIdu0lAV7H(;6;Bt4Xkt-@#(e@g3*j92MRI&ouoUBjA_WY%$yR za`F3s+8QQ@*zqPx_&Q52pOT}GP8cu`x2crCi37uBb7Tk` zj%{QX1=7i#oMChq26nv+tBjvCR0$zaS1ZP%)Q6;>LJR!Gu_lA!cFhy?Rp?v_p{T zcD-YO0n`V^$7nq@i0-pK(Nck3p;5Lx{L>G|R|3*tUXtFa`(NJDUMf#9WIx=ioHv7c zH36f?s@f0JF|2*u5+Y?NJ~0boPf&Q^pt#GYFG@vQJ9x3uZ;)IUP=B=99W=CeW;Bm@kUx|mLj5OOHz)@{As=|a>$Loc)om;xbXK&mI!Tr znzLN1QSXHNtVI9e3g0lecM%l?_-h=rJ;M@c!H{_J!17l|{~mAjC(JA7qdod`54!9G zArUVXrX@jXDthtql*-5ddZxa6r9Nh21}Ny9gRNE(_e%$Z@3by+)0=AE$Rb|o62k5q zjS2Id_!j&jw8w8-QcsJ_I+|j+xoA-d{=N=k@@*G3IC}$ZeCPfWDs$g~72}0vRX4q( zB(0q3|Mik{3=1QofLMY!G;+^@4_f;NM3sEJntxv?c$z}{(ZoT3FwXB*FMTB&7#TG+ zlBGv}`k4-g)@^;4a%#E$dFrB>a_dpL^T?TEYg9(4+bn~W5JQH|j~N6*K@m`#t@jC& zHLpNa)ulKz{EhO(tqz_MB{5o?=#8d4V{v~8cM?h)j#(M0*|JVnG@F0UBp8=6tV|gg zGu+)zKK|k3KETbz_*W2R0Aawhd>M*nytQ+h&V1hPV&{Fkav(j?^?Wi9g|WQ3t4VWE zdIFh@UB&2Gf6QNJv}UN~OmK(7JC>h7tTQi3>BbyOQ(QOGp2F&JB6>9Gk-n8Cg1>7v zqTySQ{xh_AqlMNfc{j>|3$doe0CHs!dyilWh*MNTW&9KZWhJ=#DvKbv1JNVX!{BY0 z5k0&F(h)Lr7=Qyk(jB!7c0^AZeD8d5Ea5F;m4hyO)^g3Ex%TGaF~F}B6*$W z4Pf_ZwI**zqUn!W$syLh3(yxHv~9CAmVq^}drDMn zJRdLDi>;C1AIUaBcPGHcO>~5If%NUgrgxd}r7Qx4>Fm`1{s|7bKHC^OLL&?9o;tG_ z%4sO+>?mxW^OKr%H_tZyar0(Xu_kPf_tZG**Tr0_($-&Tjy>myj8chIF)J4TWT3fXL-CMCY&ed2 zTc+ic^mgh#tEcO~pXRe!=Z@S-U3H6wumlX|2t!ZbF9Eg!tQ91Xp4;OYqX@)R@(cNd ztMePr_}y!20%<^~dUGGqrY&|#;$_G6wg8N<#S)+l?ap?JiRBJnM+ zJ2NsR>5S@Ot^$qa?k*r7}4hvg&q!D*>EH(6-j#pu1K zS@Y&_@yvAG!75*)8o<7U{Jg}aM_znofC{ti%bJD?vgvwVZM>TLDxAx zhF$0Juio%Zjve5pRt$n4krfklZ1=-L4&fFR+VP}F7JR;#CP;24k{nbcydgU3#|jEI zBjov@;1@rF)yW2tCV98DxR}KO(r#{A5+K{Cc-iy)hQ3Hk^l^oS8eGt z0!6Q0Tz%#AnIxu&E~CVr^A!5U70j4cd`S@$whHfS*PBmDwP?B;?<8j>0=gUFc^pbV zt0F&haPvmY8Tm8wi6ji0)rkYbJsgr)h^?Li`1=dPt*tce%~zkwzf4 zZwpykiQP-ZM8+eHMJF$Jft}?+`73Mi&eGAjl1!WHC3Hn?_e$_n7Z~n3k84`u#+`P5 z70o$%xYpUjZhO3Yy;qViQ~Z3j1D>l#E@0eWKdi?PqD!m?@nrV0x-~ydC~OADwj~t$ zD1<1>aJUX?YonhlrMy2X`+zKcG{0N+&e*`0%5vUk^J!vw>T_Rxz(?`2?BhO_TDIif zrUpijR4|8S;AL8KDruP=;-f5RbDKSR#UuBT)Xb%(_(7`)8xM$gK1vMzv!r?0L~V*l zDn<`&!DLj&Z3Zu$?K}}!x7qtr|9ed4Z)E;p=E2gnELA?%mj$Unhv6?>;{Hi{4a4L- z%(M%QHFoW^J^aq%K&3@33*^%0b-b-P2kdk`AL>2gM;rR3>h-4S>WRGzH`a(Ri15Og zLBuNfjr?SyD}ExYc0JMtJzsTYuBoPdrK?kJ^Zql0#(&Mi9K7W!YR9ML>h&116&%ow z`lE8gpx)E*=XrIz?2Wi#KFY{uILmb2Szh{DyaWFM0cTsxhWYmbO;zr=hFTHZbavC-7W18zxH6AimST5I|R8VuG zV_}I6b2ho?3bBvk`Ob!QQl*fqGHwe@sZp|$!zO!fHXGs@T&0m1dGzIp%G?I7+s58t z8hqKu+Xfi>2vLs{%`QSGq@rPQ^f*kHNNc|pFQn9~E3%X0Z zzu|jDYSGl&CdrnLSzjF@*l8R*3T0HOBI~!`C4e44+to{cl$ucb@-Q5i<9-a?Fl}{M zCrz!+a$%xictob>UCeojBL=1S9B1=)BoOKkp80S6b~d&hpcwvC3WT zLS?ysjU@W@hbns0^~}MQu<}}kNCsXist%cS*zD`xt#^+w?{U^2?dUn>DKE#eN477( z6#?qXwKlxTS(la<{Ge`SZe$dLGux(McRGbX3V3@~bIRl6f*r|sB)&326bo^(o%-EA zPY4$D?U7A<*2?bTo`);KS7BMb`q`tlDdOmZ*?~^E1@k<4`6o_-^A`}~&pe@D&4u+5 zwyrI$lHMR#S)9!=j~Z-D1RpF*;+e#24G`*+U=~4Gn-SBXYTqC(A}8qxT{4)r;A*b% zq*djk8VJo|BKsE)X`QKLgXRzg6dL{9|IcibF$|w{EHI^;op>dv*o;eWoVk+>EGSw* zb)TZhqGJ%ZG#eXiq^dIg97qW242#pP@YqU{ZKka_gFwHa~+X4S!{+hi+m zFyfjtS4=dpWPbDR4Zoccd^@l4&Ldt|7+K+H+tN3zvn>w6FN?cqC4V6>_4xy?pC&0v zan_gi+}f=!V`+PgU{=nDf{~Mewwyv8Sq`Lau5=F1@|aNDR>i88Cy;+fV079LKQP}R zX-}Ebyj;$D)MhvA{RxJQ?p<{Ce1tC=qMRF+RN2kNlQRmBRQ-o}t$gG&x zvg-!}&r6Y(*nm+&(Vr)Edm_ZJ=(0^mEbAJWYxs+zs8t`!Tbq4)J}}U^MHhQyscod0 z4ZbBOOwNf9wYrm#n5u%~YgZS)2b)G#`5SBLmsF*G^H-|0PMdbr! zUuL%L7=$%tY;cjZ%I=`V4)G^1yqs=_DE5-|v1clz{&c|5{<}JxS1a7tV>!u=n+%#! z!-S-~e0$6vD88e)N(_9;7p~{5DwA zyj>OjRp&r#?)svFcSfQ()igVzfc<{2e69syYW&XHdT~k=$kDlb8cD}>e_`EI3vxs6 zsg}sRKL)M~>OZyk&x}F?Rh>AMpkWdqozXd4r7Wv~85mub=UpX>2 zJ{m==o+V6y^}PPepGaGNRDP;*(8pJe`^K8=RLkEiAFnhj>`NG^CsDP>>RLS4%(5ZD z2;3?ns$#yZ?>D-r-QRs>&-zq?AFM!vmd~rOvJ;9B{#;ceVzH186OpXu zwkep_DNiloEz4N(F+=3NASZ4)ib|b~&s0aOsB>ljH}x_`aiPd7OQU!9nfK>_xzi=* zQeAqFE^NW&l5X3ouv#kqTm`f>eQ-CPep#2RmUiOL3xGTuArJslcUvr}M-h=%Qoum0 zS)r9h2Cjj1&SFlzG!!5BlXFeJ|JCCqg7E>#ZoOMeA71gOPesI0xseqoXqR!xX7IV4 zD@H-igB=g>!Po~SKQSl`#HHfrG0%{R@n)`F+nDYM$7_A!Xi2{AeH79 z+cCbgK;6QAluK4&MHpvq*lL*8Pq?`Qxqgja*SnwTSg0V* zEB^#?%_=pKh3?ih@+%8uT8ZrHI(Oy z?J%J2S$h7xzPiglEaU4xIs}LL3VJvod8$wEff#DR`^!UUecO$3qxSd^ojE~htsj>M z?cImDtFP?D{=BF+jkT#QqW1Hl2tj;Uw6epg*M7`K5s`%?`-FU*s7g})q|59=ixktS%KFK&sk|} z+^cFdqYQpL{HV&RGt#4UYbttHa$o=rc1iY>XvN8z{iwXPBY4E@uS#Q9oJ|JS#r%@+ zAg!x;HcwrVJYSdr#F=N_C`$$*=aDb60CXV9W!1Em@s$qEkcEE4PsanBP3~oKD`%dy zJFal@4K^3Ll@g?hgoJ^`&YmZ#Rp5T0zuOTUYHd|Gpgp^mZmF?A>t>Q(29-XtLinZ4 zMut!_XBvhUhn9}Li`p9MnWyK^st;}AQ9uyuZbdByDqzc~g0(TB*=|K?;~BWXCY$m8 z$~LxuIzoJem&Mc%zuD|K*{AjxUR##%s#2WR_ckHN*X5JFm&}RL^qel}$Xxl!cS2V& z;pq7obYZgd9~t%70xEtNZaa4$6OB&J)8HM{sjSMk&lFTHsN0I1VPo_bD*Z3Gt#ObW zp4+`#lN3*fV*}8NnCjWMDq#zN_+ZQDW#adW2u(v1l_)s%&P!R6%q{nfGuGmpZ#tLz zIqgrEgYIo@tot!GmX(-f1=SI2K0nvg@sliH!cxgLRFrynpZ(r%J36*gZL<$`qnGfI z(jkTMVDu<&nj#a9AaWCT$j=28w>xEDbwr+j?ZVjyAscQ!lUhi9qVyF8uzgp5>TP_8 zq5H^klWw9rz3vzo3#Y!;)jnp@X`srl}6jvK4i_n5HLkHsu{ zv(6()q|FQ-4;w6xsY{2WO*&>?USwS-=BQm5Xh={RzdfL=*XvU^Yl_MeCHqniZgUiI zGb0wXFimRO-i!R1)MAfm2e=vqr65}(j+nJQme=_)4TIWnVS7=BNn3+*_l}I9x!RVA z>BQEPiQbIl;GsKwUsVk<>@H3|!vc(+I=Ur0UlvfV65};7ly}sXAh(##%a1~Cw&H|)iEF1xk=naSlEw+YR2QC~*E^r?meJ|{6bprOVo}lP2D<8q z%_bHZaf8fjri$ot*@7b5R%DGVmb)MIma|ZPJc!4Uw=NX-D)8DlEJ-UJ;K8j%h7-RWfNcGBR^-eXuA&T9@cBr8zuM>~#|Iei@;;W%|!tMUx2mRVd_ zbX96neg1}?v;7GJ=as=bmD~yyAy!4Z!Gj_mSNx(Qw~~C|L2Q*yU#3Op-7mkmsz-$u`GO#9MX#M?{ua4x2w1CVXyG4dRJ|Y zb1}#3Xy$@(!i}9#UOw*Uwj?g^EV*7)6?AOV)Z4CBft?1VT#HIPSnmwN>0%}g3KFBt zL8Rq`J5 zl*0xEty(N6J%^zBd~NTpt12k#-Helb{yaHz>LcYcMMc>}xv9O`g$dk`+j6V!yRkOs z-%b4hBq|{Wd)8@~nB(b@RS5qc75}CgJWGQUzQepMw=JAlB8!MO@s|WyQncCcfh*`0 z!#;V53OCE^8`LPF6+4vXH>a0I)))LNOgFIWIMhF?5|`r=9=0ZsnNG)mCVkYBe~HN8 zWHmVWY$K99M%eB&Py0PAyU4()WE#7m-S-|C3^#S`x(RY|YaYk6jy!4O06~I$Z`Zyv z9+Zm;4k|CN;53ko^39S&Z=&$V{w3lIA{1Hc$M7b zp5B$>&!Id5sRH>eLLGytD&G z0JoNRxdxj3hX(TNz!f12cDts3k!cwbB-5ncYISx4+|%K;ab0bisFX_r-kxc6hjDpy z2RS@jPQ3BvZjcMH>O_Z;?;{wHN(Q|Cj(Q9*C)*}Sy{TwiX?5c6>0F{EGX-R_>&?Fr z(gMdm2p?5&xi$tJqyRsn2M{0#rNZbj`Cs^RMiw#f2!eu7AOD3<3!Xyw$o>}$6(R_2 zcn4=dqFQ3P9>dDR;PI&DN!TR%UszQJH^Q2m9`zni`X7AOnI38T@LwP+3S-rN3?oN1 zcK$>2zexiBK(7C1Y5q4!;Qx@E|4sA1Ndo`Ry=`L$Rj;;|N!zPrcs%aEb#>Poanv;IZ<)+5a~tRg>lB>BgOOAJ zOFMP!Ab+c~Ua$H;#hNfj%y^4)0AZ|~Gbvv5-YVIIUHZ%3Nqi|}0b8VHiKeMc8`8FW zT;S-Tz4G#>7E*3CF4FVGjBZ)Se{??}05f&(nZsO8?Tr+EJhg1+0jUc>av164^kt1U z6;i^Ddz7YVB(!XXv)C~toqex$94))(2f`8MtwXa#*AJ*cn1FT-sm>2ow>3NWWTX01 zZN}p~u64C_p8L2!cLt=NVnJ&YEj%N_v$OY;=lg7%nu8`6DVw_nS;SxSYETU53LPCL ztjWKVY=TpBVIer%2c_fm`cNgr3tra_8)oU^nkI2>ED0a{=Nw3TIS)5zttsK&4|L6L z7I8`gcDMuJZuZ`@QW>(PhwmcSFKNA^hgjJI2``BQdsj&Ax3IRrt6w6NV=l;+$Rqe! zOHLwY{3JeKVD=-UV?A4iE;siIK6$Ti#uYv_X4f+*zBu8?Wk-MAwe zxaToUbm8X^N;Bmry-5*8CiWvW*rqd#o{>N*M4ONO_gq8&wfKF0(m1gaIq=Kg3P!sYKeo7 zyGt13Das?_Q3rEC9yYh}RbRz`e=iJ*qljUG1@*JcyfYA8ud4>!g5;Ln`E|EEfX@7a zs;xz^dHqNyOnEyJw_^)*vZCA+z+tFR~gbUQzh!tq8h1?dE!ud z4}bxAbyQ<*Nqe%h1;=^(gS5m^jX}D}{*3mC>kp z#tjt&|E1DMXG1?kL52Y)X50d!rB(WGnFcl2ShT(XIU-HjW3-Sh%MDMOP7Z_3k8@Y` zljL5QP?&?9_aK$UuR5BPu3A~eUWZTS5#s&Qs+Y(z1~n6)zwHMzvDlP7w&MIofjt_m zylG-L#U;eDubz#tA+|=HZ@Z$*+;Vt?g{GTU<~P}bKO;!gV`c6rV5DdTngKE1S451s z$96sZJrKmqxgq~CLwj7oz&fP^HoWmB0j&}SYhA@v-SLp1qm~n;(S@T;*7eT7|qISt^P4P3;@-I(_q5h z<>Y$nPWWlH~LHIX!3ifOZqCqeZo=iTn)Js09?)@>X;AOzeGSL{FdsXJld>ZvPcp z2k*%auKH079}^+;k{E{G07~1ew09m`7BcL!KG-$UMi6J>TjF;ij?>F?XXh$3>9jCp zS_&mwR?fT5lE=kaK*knqxIuy(NTXZ<;iArQRYPW%#FBj3^n}(@UQ4iB9xtWWBINb z8k|N?DlwF3w8q*Ye1zPs`rN8a5=McqXYU!K`0%)5JI08@uvAsZsB0AwzS;F)V`CS5 z1o|#ddR<1PL=29K+Z2V-W07NcxZ+zKyRe*> zd5TsBQN=vPrDQ3bMVnPvV|;x4vV#R2>xwvjUt~q8llUx_N(z0#I-ntjJ%&0!62z2DRH90#{x?)0KVi!ns9h`8ArREz&rb;_Gst#GC#2&OO z>LqDo7Idzyo1eLibHMbo**%3Ny<&x(^=U~7ULtJXE^hxm35h=Nu_?XeQ>Xd*Hc_IS z%gGe&-9J9KL(S=}I`UdP$!;&MQ;GwmZtiL;7j!(JJueKD3p@f~71o01Rce?r-B0f) z>PjPzM$q1yLbP!_MY~EC)~beNMqW;S)MN4{8L^!#8p<;%A4hcn;}mRb$+z z4W&PkkE;{`iQfZi2xmP+C7S7Y1@Sf_zZ@#u41XN@odMg2`d#i?bGcQXnNwqaiiNi_ z+8|K~7E&zJBBSnGO!LT;>|GZsDbMTcO#&B`i2LVDCguK;31IkR=3^%g=zEeXn7q z`Vn9*A>I`QlYj@>=HqHQAPa5LAkB`-Aa0~AkCVZx0BN*P61EiUpGc7!vT+h=mw)+T zTUg-8yXUxA%I53-)#66OQHItORybHGPm0FT+@~y)T8Rd4cg^hst5Yjhi?rTE!8Zk! z2B}gZ`1XqQ81BA&hgx;tt@pqL3f@5D*)Rx6Y~azHh3h*;;-u*3pfQC3QS8 zIwXVYNQl{4ajv|6FQ!Q&J}_HhJ2u({YJSJfl~3Ag77I^A%Pys^t95)fUj)`@4JeGS zB9=$Evx-A`QPGp`A$D#OP{l?8xt#l@EVoQtDzD<7a$@Moel0-==D~V!_2ksBspb)? z9nFim>LvVZ%Kj^bCWvyX|eq`$=heijn9!4R$0U^D`4Z+)hRkB&bE&IB78;#*~ zFs?Q@UfvKngXzJH9j9zGg)9BqCt>u3Y%`)T7SZ>u{-kJbW? z)OGJ3X6TQLjWW7BrJ=+u?ORoS7+bzplt=bIAzKf&m_+^hU z8E(kycv;s*dI0!r1xNI`p zoDZ%a=b-Am8i|ag8op-5>i4hX4|Jk+7pCN>L;a1*x9XTIkwb~CZ{?<5RTq2`Q7&$> z=Lw@Nn}^^lHGmBeZ{br(p>I}5z$==9bs9eF8FU}Oz_&ObO_g);LNoOU<>dj?UNaV8 z>R9m8YGwJVW)?k@L1}oggLtUF5~uB)`YOA)Ag0)p+f?eycQhr}qkbcjMwswoP5Lax zzP+#MU2Y57UF}+R9;o7}wjfbjRY5<^gY~5t@poU3=SUm>Q6(cC>@*kAD!8~}h(tB% z0OYzIE7_R;s$E`ada-*>pXNPtJ>RJRH2t4~hq3;JyL&m!?~r{iI^cf_*}sKpR{yjA zdqsJ}0&*(HrTeb}6xEUyP9KU5BY*bq0u?Dm5SnJ}&-zabj1cfoL5t0d{F>fHR3+1q zH@oYUNch2bdyMPB|JJ{u{1Hdb>!*rzDZ?|UzFA8is(JT*eJf@?_zBP3UNt(Oy!7Em zS0Qa?+|n+V|8V^X!WRc;nm=|PoXwz{Vz&~xUp{xH2Msy#h!$pRn^g0bBV$YeF1M*~ zBs^AH?ISo_xljMMjz$(?SSMYvJ6e{1__;}t==*}A@)I7|Vc|?RHH?C|nYTQ#vDQ5}bpBl2jr1iq}sR98Jzw1THZ3?>bt zd68(Wni@9unyUUwpPAJ+=$yhNaZskHq%U8^@%~=GSKVa~Aqkg9jXaP31r78_L;vJO zJqv_H-akK`UUqLT1%5XN5zn-1V?nbFC$V0)LxK6$ss=P-X-!XA1)}Ake!J1cFG>)9 zlk&dTgH{sRCH=L59~h7sS&zi_kOvYYjaUy*p3VB}Y9{J_mSSkvfnR}>br3ql>3S|} zK^q%dw=;r5l@=?lwUulUw3>430ml|m0!|h$CK)sbh+{4e6%j%0!SW=je(~-NOhkd%n|q6 z(Kshfw6vj&M;xviQtFDN28&@ag8*Uo_N+6IGOqQwt*|yic+9!IuI=D62R#x6we$vX z))-q_)ub4AlAxNC9L--idq%U8iU=utkNGRs=i$YJ_@pIUuOUfj&fZ>G{S_sy5 z-o~+~)MQ*{wW#@r?N4^6oUXGtt-kIg>InC__0d1rJ^2_iSp=c#8to_#-iZd`qn)HR zZpyg7TaBMvs^+@nc5aBMFYyp`~2LAdpLpY*Ro zK}WUVf8fZ1Z+`o%E9)5&grOd>AJ~#lT|kb8lqk4%k7H1j0TBvu_E(z~+9? zubUVDJr-*7?mtuh?a2Ol9M!mpYDYv}@_LRc1yQ|@|G;Q;S^K$g=vH1--bv^w1mB|5 zu`}#$CZH4Zndnig`M-p@c!?02Hou^Jb4izj$8$q+xfr3nVN9j1=8gyhIE?X2zQ84> zA_XaxYsY^F7HK-yn3?4ROh1qja z*SSPudNYn4+I-ZAdYKLJjOpQSO@cH1E!d==FY>`;h)>J9LOx!)b#>+*U+P|b&t^cj zF1y8D&DH*J4P?Lt>_t#U0{twshU8y^*?o`Mn;|nT8EOreM+2qbA3!}IZt@wzp5wm88Z>K^W^Mkh$U)TnuH=kLvU!$!xK2!_3sl@6Ye z`34uGRG_Rlhe(=dwy>inhq2$bNflUlRqMy|n!n*`2e_9pe(j|5Y4o2L&5?U~?3A$l zGS!CK6mC8l+}8!yotUr%B{khG%q#-%&jwBBx29abE-4KHN+Kq6M4&!GOaxED2}k5*k=CwCw$Z&M@VST%aG!c&t3_`%fwWRNqfs`dl((vPOCiJT{r9^p4Gdmyv)s-|PmZjlJtyWV~!^pV&Lgvm|bKX0{0P1VWY` zs2Sr%Xn2PFnJPyXLJu9Vg_^zn&DE}Pf&0( z>b^eekaF*r!fEDXq0U6G_({p`6a_!OKw&U%IW0MjlR=oA>i{*p#df48|E`&dO~z+{ z)n#;8C@e7i?&5d8Wyq1v0;CNX2tjifU_l01==Fm#Hu*}cd+kwUe1V1 zAei9?=~dx;6f*53W!d|`dVBA%D4J(qd=Zr-AXz{_l4MB&l2wv`#&&FUPzF7hi0s(gPoNVmF_ahP)? z?&o*I9JY>OFaiH;cEGkttoc87udWH9Tn5OTNEA!-h*-_v4a%r8XqHvQLgj&cirG9b z_FV&s;E7P?n}b#|4NDg^+$z~!q8+UxSo989LtWVY_`*F@=9bIVQO92!ed-m`qU($R zHb4WTxAKY|OL8X7oNR=}oHa1hsg;pxqRc@IO)QcIn6EB%h6}9@)1+VOJ9s%z7UPWy zk5cx@Q3gJ<9#Q!KEc`BT2xacOl{`2q>HZD+`aXF+r&UPJygF)!&cfY*rwtW7n~Rlv zKlMn3rejfk-+=Pq?qAaLtiPkr<-sXf*ADN0*}-!a=C=Uggjqa+U``#P0>E| zSLUovCtZH2R;#QQ8b5G1XuCgMG+?;R`rYm~al<2v8h}yF5Wy3nV|g6d!IJZ2**~+I z<-%+@y0lB|fZ*))0i|I?`g!?D*>h7 zO9s{_x2ppvz1XvH)r!tp2V_B(91Wu5xA<~V}Cbp-uB2ax(b z7t)=_<4ki-^wAv=Gh%zkxlkLazxBb{C@oP$teUMeD<6thbsAzEx!11LoXq??;(Q}E zYm7NhqE((B8$YQZ6lJ>X1cq||Q8Rxvfe=@yeIuY__Q=||d+%=rGu#1K*;8K^<@sB$ z;2Hiriso;?hp%GvXH?)opI$R!MHM2(ueISJHdYMJ;p@s_nCuQrpm^cZuu2;fV&d=sA;>CJp#HJ`1 z^6e7cr|a6|ce9G(pOpLYV@tF+ff2RmZ#YM&Q>pt&+*Fh*Uom!5Z+2gP^Z4!$7RW!> zKZn$M$V{`7$3K94*6YRVSE>C85o~=6kMplSx~~=5D)G-5tTR9v95CyAeIOvPx!17T zGH1t))xDus%|fpt%dA2E&!GacbPDZ38pqm$q*+{4&}>1tqdrcS;Gb2bNnz|ojEyuX zDJ|4fLHX40#($d>!yVwnCUkhj^j!*5##FImFYefTZ~XgthjuhRHZ~wZYLmdU39$4o z74@AUNY1}l0}$g=L+!ML@WgQ>22$V<-dZUCFMAXSqgtVVPL>yK=%CYsL*H!CM2^F+ zm!Ser?IT-PUXlEBhFD0gDPoNMD=-sY`Vs*ar+t~J*45(D8;}3domlC$u;PWYZAH9w zM!yemq+Sa?_s`*8R`@4ta}nqMlZ~jub^h5JNymSQbGx1rD}wu#e~Cs<6by|2ByZlS zK&*p^A5{4Ve<_AbyTD-_W!NH3Z*8vHf*t|}c7G20_nZrC@cz~c8!T-|f6 zZK?W?W_^?OE`Y!&ucN>W!k?n&=(0p?Wx&?RUV>O9_a6kRC_5<(`&@7TD$9Lgkz1_n z8Hl;`n~Fn*@f*j}Z2!BYLx>w1Zj zg(Ojz+hWKdEoKUHfze0~gjPadj(24YNbpOzFQ*iyy9F-LXYZRIg%&|trUDY<_^*OX z5fJ`z$V+)}Pj9w#l8}|XfRrel@)9dPLviKRp~$B48Ue@MmvQ{G;Qp^jU;QwZ({0{Qo!VyMuA$!kPJup)vkKX3-*sABU$agRA3qu(3%53| z+T0-OZ*I*zbd{fUzJ5{%>wz`5zDRD$I@rpfCx#WaAEWg27%;J>yJ%u$@I)2!!Ki|pO z+%TW*eV~60>*Y1Szn<879Knhm0`iaHCX2<0M%cPYDsHINPP zRtMYe3!7DybU|I&sqmd%ezT;P#q~X@uO=!h+3{Mc} zrkz#ROk}%qw=d2;jQ9qYvbF6sru|!3TVgX9umXmP41Y?903ax*Uo_z`A&{bTC?^O$ ztJ|9C-~Ryq^P>NOk`yz{V)kM5T})bL*n9+137;O`co7_j7zq#%C}z9124HMYavw#9 zD?aSP*zY|65!*U82ZXU)L>DWBKYH=vu<6q@CT~hH0d5!o`BEXR|qYoU`6}P4>3ahjL82WjxrhnvxvW z{fh;)1K!E<44FK%g|Mrav`Y%*ZYfIJIc=|hkdl#AU{GLwOEEdMT+xtBN{S2bdt+OC zIh6ewdO-amit;*#bX&g8R*v1#RDxO@?EN{^%MJxs^@{ws4uDMzRVFUU8aMgq-i5+1 zL{!9foNxUZ#vvV(OAp#u;+H9{z>F*=JIU9cV}^H7Y&DvJZkD(I(9RgHxHZ2T8z3MB zXK!b@sxON`eSV^xx3*L3U~}q4g8P7kT_IMo)Gw8i%j?_}_2YM`t4(m zK@?CqECYP9H19hZ!-uUYR1ICOB&ZVoA-m5Dm8*lrXiwE(i`8Ug8DLYIH{{r>IQ_r? zA5CSyHwTmv`$Ts!os(vnZMsRo4p8D+*x0+9^R+dMDw|@(P!Cg_Myb}O8;&N!uSW(M ztU5G!)II+0z2yYt0 zX~T1+@J(I|=DZ>7g6R$U+{_p3E^)Y=l90VX1C1Md6HPqm0OA&!Iv=rcqGB!bIT>e~ zlNkL>)lucp3Wof?Y><%{GTYE$e>l*1C8IQ)P4Q70^(p2W88=9JCR|rR+9!n=rz^9X zUp#+n@uRu0IB>rUy{M3H5sISROEoUeC+0%2eH^2)A_YC+c>XkFZDXnx*CUgc%R{25 zn8&P(!=QXvr9<8C3uf3^5~j+Cx7mS%S>CpMDokFU)3$X ztAxq-KV2wYa(Pl_@|5A$mq3Y=;)bppqpH90mL;}-DJis=bLF)yY3_rb z902C3JE%`$EwVB~K72?*5?453(ouZi8XR0QTgp>$6kJ)ew#@${u(!5nu;Lk-lEPuX zPg8?9R|bsiQX#-kP?Q?$zE=8yQ_zWA=RH|&>Qj3Pt;N1T?1+sb#q=AV__|I;@&+TS z)90anHZ0e$l?13vn)}aE2*5w8xG3O%6^Q=>Qt#i%+K&V%HsO%A@9)Yw*aWkWM+*Bp z6q&9%SYP=g(SYWBbW~qzffs3Pg`qkoN)>8SQKdgVEv7y-(IoEY}zfgUDA zX?yX+W}mK{3u~veJ@A|4o}L7XlDv`+? zWPC$@V{wKtT@Rz?QxkVMV*MsgTgA;LAgE_Z`!K>w&T;Kt@*-YvUnundZW` z?AkFPu%3(2skkj`MmI8)U2Y9ip z!zEO(gd5!g!g~~lbe+O!BhAzbu-A`luTTMyf0#VWj3frEl&t7&wV>KK*`S~{ocvOaSG+4 zhbim&ol_75%(|@6Q(x23iS=mbP>VjQK_Y3Gl!kN!;PuW>__x<0F1FmctD02^o7b{( zBX#@9SJhr`o~KdhMj}uP(34ujn|^h9%NIpQD1PgZI+|kxaz?&}WO_@QE1*^dzcJ?B z16B1v4s*y>qH|E^=eC9~UZmF6YTNZ})>_f%Jue>h>Ky1R9d9c;qCYE@k{6jvjH?jN z;tuXaFJ?NqS~U)Cli>`1CnunCyE+sfek~b4C@uZ?qr=2xOjlz3ddFrKRdr_p4ddE# zv_A=m|DY{KhBGz!Okvx}*pZpHYO^VJ8mgVtveuy}@q?k;zg-z)S2G@{6TwTNwj*bJ z0Tj=LedeK=#JGD_rkU~+Og?hekHY`5YRF*zB55yL-aUz5s<$JC2_JpiVp~Dq_sMIm zC_@X|GjxpALd!;&L1a{#pvdQzR&ho^kC7>c$SFVJiGsq!x^gr+igQBwrW_R`A`0zK z0nVB6WrYh{!$G}sy#ZZ4?O+yvjAicI1Z5y+YfHtW>-SCd3`-0L=M5cYllL@|OIaO@ zMn@xCS!=(1`~m~b!C1JZ$iP6kqPv@KYOIi#p#iT@X)6pNk}~ou@l>Nz(w6~n^;IxL znVg^~=A!Eb3L?=_ov4hx9pGZaMVvrFCe48RO(GIyLESx zP9NsWO9o163Hg`7K-5vlZ(rMpKNDemRdeyl=iAO(B_^>wF#c?`lcxgkCeKFE>`|z8 zh90KY+wL^NEA)!+O#qfE{AVR!km!m%1OLGP8ie4$uVF8p!3k;pU;P2D|Nong{*!)y z|DWMWUI){)gO5(5F}Y=>8-RbpMD3`ISIpQ zbuRoQ@bqwbb)A+Cij#CIIOfN0uF9?DY~0H0%_rU-k9p4!{O0sxGpI zoro?ze(`7Vn$Yl*_|tv(x>+zrwA{U+8bk+xn|IGGvROMVjP$}B%Up9Kfa7@&7fUT` z=wRFH&=ZAKj=xsk{iN=<$8O@JJtdbv`65n_0);HJJK7r-_?0kv!eiMCD;Mk2ooZZO z8k!|VQWNb1cH}d_x2f&auO+hTsv#$$Mz`)_EC@S<`L?BkCSGFM^z<*U`cBbZhD=BCDu>WPg?Z>inZ(UY+3p6add zCsK;b<;w`=JhC%QTK5vUl)h43rr@`DR9_lz1+*Rp6>T3FU?f(H1sGyi=$AeZE?3UB zR2l9|RAS}nL)U+=mtQ2oVT<~^m&^NF0G;#%=LMxR5{0Fzje!$E;GMtw&{T!*PJ;gF zDH+VM&(_l+xm05{EHD#ZFEwg!J@{!zfef~7k_5RRIIM zc47V=reu=A>u)P=b1g#7+Q~}ZgB$T5!f-8EA`RXJ+?P)*Mj2JFBqNGz2As`Dvy-ggJ(oYpHsN_IrJWTV;E)oN04zwb_?WwOM7& zj_IRX1tL=}D7e*3)<{>&hx@@v@9+OI^@!6roh7qo1BMy}>4@nDo8NklZD0MOp>QS) zMmYrz*)sX9_XZCYKYC2Pyt3X1rsKyWrqYB5x^O$VeyB5}C^ZKHKzR$}(zFg`UZ*Z7 z-1FqT!3=_%fCe6$k&s12@E+j_jLpYns;(j<1o;%Ks`%v{7HMzt6~HNCTa6@zl#l?B zri8&h*|=c>U0F`&d!72u+yLB+LlAaUpM$f04LLGcw({+61Ik-dKmuH*eC;+PhBw1~ zJjMdA8#Lagtf~w$Qcn&Mh(=Guvq$ng+neC|?A;ytzl%{sO%x z6bgO`4s<_NXPwCi++}Cw&;keFZ}dK<`y+8~?%wKe$Rn|WxvfOD3ePtn_z4(iVG?tE z5NGY&k;B}g;712$n!*P-gPCOJ_GZkocE_|>3YtcZxXY^+s1z2B<*;Tktq0$aN1|KV zi2>Q$7^C?gz#g1t6bMl~+<)A`02=|j3Gki2V}9h1yuvU_S6wFMKoc5&ajaE}{e(!W znz~iz4hX7-e>VNs_Y>w9Qc_m$?J7;cfC#~UE_~*U!NSO%iF1_YURg_EqfZCJbkO_i zqT}qtW;05VUi=PWIcY9Vb>KSo@WEkOi!aca)`2{Ca1FZ-0m}r)a*;EOIr}-Dd*%J+ zveSBWW!@6&^q}{vf*rf^tekXB;LOb_{y%mJfbP%#W^0){v?g?uCZQY9L0J;SuaM?} z+QA(VC1I0`_ASrM##q}J7pVW|B_p|;5qC18mR6oFcT8DE%sK@oKBsSK#+^neK0n1a57sZO4;gtIf zRz#tP1C^s1Tbb?$A<>i`sdDXm0|PhstH}a`R!5HAo3q<*_XU0i7T)n<*MaQ=pU%Ct zO(YHku_o}ookhH#6$NV>KIS=p$?bH1^D*epX8I&&1yzwFT9a?=A4IN^rZQcTp$RZjlZ2)BS#RI`9Y3=1=!x9oCya&Z!lQU&e%D5pScCdjjt ztykmQ+fKT?U0QI>D`2VKJ)y*H?enjGp)5kJW*2fuA-78dh>;AmVy9yumNbNyhW7I? zxbN_g-jEiA4x@rR6WgjnoJfPfvRQJQer$_57a_N^FpS20{I?_^&j!zx)6&0Tf_!p( zZbV9^O%8cj`-wl|Lh(h&2eXIi9WM+BA(Mh1G;BVX5?*2YXcx}|Li^=Gtbqw1ONOI%5?TR!MPm<4>bkeJYxHEkPwum1Lq zC7ZBfP?o)#L2hG0=PR0Oq`RQ@cno-w!m~j!1pl2>61nWtmKd~r#-r(^IYDE(&{l35 z1_{Ry7qVp#fk{=Suve%kEBliv##cv(X~+w@FsbPWDA z;=a#CEr-K#5P}^)qB@OOmHfBni|5LnCcj)osb!o7g>`s59!3N+RGL{>Sne4tzl?xY z*t!D)cJorS$&-}}JW*~xZ_tGjqX;1lgM#HxMlKv#3g}KaPBGGfNG^QpgzslR9Wn&O|>#-i6dFy%$|A)NJ<%Xliji&e(Z3_dkiOgePjE0R#+b)BP`m9`hfSA7z00 zMq(jITE~1C!vc5fL%gU%b3z5A(ky_~JtviG4PV;_6xUy}$OcP6nY+|~3mjq*NyTm4 zCR!xO-1yfRarH8<{PlK+Q zlA3FJ!V^#VF1qUw!-SZr0~qH=ZB;7}R1su4gcB{y`Up>5%j?ofIc~g{Xi0~Q!jVY? z{yYmy<{Vh#Z}woU(VHYb0}GUMzE0_=aLmq=V?o;>Cm2M$PkCm@h z9feTO)mrGe#Xp)1Ft!w$>M6~*pI>HGeB7ICZu-4J3l+;Hv2ZToNP%TEdQLXWtYfi} z!e{LBxLCt{ozVqo`r4Q}BIt;i8r8}_#d_A|PNl8%4RtVX&|2noRx+I;edtWm`*L6U zl2-#@U@73}ufrX@!&h8)Lq)4~zKkhUU;mo|vqr8&%?KY#w$Ip+#9D%7V9uVuupn9O zvxizr=W2<=7^0%|66qo-QZ(^O2N7f~t=o7@IiN$rmRn!TvsX(BgrL#HU~|8KQA1Ky z;_$Rnb8ye9{5bhXg#=@#6vxs~A-Zy*qgvW)42b`64rO6WIo>(DGcDp9#mSIXNz^wQ z-`r8na~z2#fXF)57`cvYd75J)CF{|ZJ_GMZv$S%H!r5N=deoO>cHN6he9UFG{C1V-cWgJhvo-#bG1HILvP`eYcenA;rJJZ8$CXm! z$^l-_4@*}#TZp=&YqPw#$67yTkSV74dL!2MXYU#H&#UacbEZf)e_r-BN(1w8<{B;8 z#iA;Ud=ub^xX2cxUR%|DEg162{gn+}IY6T}$=uTE;C2_oa)?f{+a29X^Ir{p$lXyq zgs~lLFt@H6Nx2ZI`I_N3S=yIMN2UFviU}VnW_aqJZQ}R5s7g?IRk3#1mQ|n)8wcJs z+{0*@-MCQaY*p7s7;oR0Rfi5#_`>nEc>9{}(`1${rM*l!$s;xSGF7D$-J@};I-q_? zDT~*0{9_ZDLsqi6?G#?pNxa(mX{0=ilO<qE>#s1fvFlvAsz#{o4ekrkI44 z?Yqz2)fA@8X`t)3>KoX9(8mV1Iu0bH$X&>D)1&RBadd^xQOJ-7*-Mrnw6E}~i;cAB zM3!Ib_Pcky6TzmN$%e?xcJF5eGOLypWQ;niW4UCZ0o0jC`sNv$!{!nW2dpNn@dXqg z8OMc+NA&$Eu*B0CV#BY+Lb(P-lnd4d|By3OQtx8eUpf&ZQ`#kE^F-^;$=_vC*}8Q> zMj0i**N>t!RWc5lzznFbO;d4dt%yoayd=uQ#OyD0D^b<~1%`9V&$w)pS!=Fcj`$Xg zDi=+ZwXLR?7g;SOU;(kmp1Z474eXAMTzJ0Jl6E!zs&l?wgQ*V7^Q8`Di@k2U&)bgI zN6nTd44Yq}K!yJ)JLx!=zjspBzgW ztl5sG3Y!spt>O31$@^5nM*VV7YF~`^c-Th?CjWQNzYNW?5hL}otX6N`UI-O@%ag1_ zX&Ei-ToReUyr@xmC)(Dkq&0hVZi{OCfd;LwZhlo2kke2T?sg;jjCdonG5*15Di}Q_ zkKES!u1LrC>4?>xOhcu{%A}Ruk5f?w+SN{4>!VhfVpi2JuNyy;;Q83GIp64qmRYTv z5X(}?@rQGT>@)_QLc(C&FT^!Nz`Q%pCdsXsn`od9gw%#WK0Yxj4=;-r`*I8cABF&5 zdeSTBZpsU9SIgx5y#5?~6R(l|*UhN^wlxO)U$N@`&&dJ)x7H!p&V$eVdy%UT0)e>7 zwySm~$eDD{SDEtBwG{n+52Qq>-Ro!GLG`K6*1>Fw-=^#3kba=_MuiTU5cb+jCA5Y2$UHG(>JgCP1>@!ET_~e2-Tqr-q#saM0W_g&YI=L1^gE-}_dY%MF!t?< zJzJj2Pl+V~-(*R~Nhz$G`Wlf`Pf>kyrIakWh3-O z!ei6AvZ5M=GD7KBPxrx(D~O<-M0o(1sMi&B*GJXn+BwC$Q~Xy`3_Tr)qrSQc6sz6$ z$2ce1JAC`1PJ&FwjXM~=&)I!t=usNU_N7td!}Ft$teeVwaH{Qk`{hq_mOC*3M{r{i zQp!|Z-Kw(Fg#49~gHmj!ToD>T$Q&9>btTrlt$!bs^Y@CO3elz8V#U2-kBL`z^VnzLI@e8gSg++=LAkw3*OZnoe z++@X+6h#O?BnKWFm-37(IlfbIIxkYVvRZ`ZG2|&=GbHkl3-aJ%A`IpW*7uevq#55W z3P^_zrW9wUnaP6Yie0?95PfJ64m{xQLk2gs0%-9iyiifc%CC$czWpJHb{Ip;p@;-W zqkbD4yNTa`yty8<41sLhs+0%h57u7~;xX6ofdaHSZewu!22~DhLJ*qw`)HX7=K*Q$i>zH`m&eGiT6X|wBKT7x&D}MEvjEtzJ-`XmG7<9nbFV}*R z(G?p^QWDt67l5;<;%Mb)Ef^&4+$pO{VoGBvSE~p@F%yD=jWI^8tkek!Jo;EtrJ@!B z@C&8bW%KtxUr@c9Pt}fYalbalq%O>I`g|-esCrl?VQHT8F(zXyju-n`lNy@C%Z>d~ zzhZUOP%0?WEBCm8GuO$j#?jI@|K@quwL*FTghf>I&2B+k_&muVW$9b!YS-l12v$<# z)DWE-O#4oeLZM!SM#!ZbCN-~6ori`1@Y0FSj4kDkA{{)}+ZdJK`WTU_Vn}IX>{rUx zl2oOI6j8i3e1QXkEKE2MT#gztP2F;^@I6d27NF)jxh{iYHzPOpMk^!uCcVu=e%P$Yut)2pJ1r?4G zf{h0z>!!8R+$$(1ay5wxP3Q64XlvXMGl7=ZAy*Y8=-!&4kXQVv@@&BRb(Ll! zuf*Y;`sJP-L3jZMHkr04u~1OFeqG3r!Q~dVoi1o^oWErPuF)}{@{VybpAd-8*d`Rv z&b*5II;p;$XMVS>Aw<~cG0-rO*tZ|pvhdcqx#+%9XKbek^j@T0#rik?-={v3BLPa4 z_OJxRuc&r?fAACu&{45J$nP%0tDiZ8EhzVSIxV^Xx&XS40L@m~oRg0+Gnc8}T`BLx zmfn3Q(j-XTC}?621Les(%v!q914-#0h1|J2EiFck8c{VyJxXsM0@Zj|4&|gk`FYg% z0}JD&z|+U#>g!2CKi@@CLU>94E2aovxzU^5+e63q%)qt0CamF*4r<_{9_dLP4PQ{k zUU+9{yu1d+SQi?C%1alrKg6$X37lG&*YE8Z?=@W6nDsM_0+c#eCmR+@Jlys?X3XXp zOCC(Nf>Me1)>9|C?|Gh?oY_|LK^qM>F1os#dl_3W%{JXB7>>OXnsZT#SSSWvA#=;~ zbZrJd8*kccN!8m&Y{B4dy;^-=raN88ALM$S2UlP(*KzOSG0#y)9foqmw!hRYu7VD@{bihR#)3&9Tyn8$-pa`R=L`r}9B zv{}Kin499QoxC6~ot+_ERG$u>Qj)?V&EE2I9Z~lyG{XDkzgiwirHRR^Y(noncoB)- zqmA&@gEPDYUIKQxD+f;l{%S7etE#}vkZCVXIA}A(9S%94^US!f2BrMhnEDIRvVCd1 zsUT7)*J@0VctZt=PR>%UnKkA zdYlXFb6L_WELwm1R^Z6dTnlc**8h1-5WaI)wBhb$KXV(|&93UM*&HUw?a%*_;}H07 zdk+5>8*`f#?~2s>6TV9Ls=L#@@tA8<83O6=BCASw0uPz|`fTVv-YuLKQ~W?cnC*8C zsYrM1gUlJcNBkT__eLnHXHx+PCZmm3+gz%X>pz|Y>&7)&>fR6SJ01nEVV0B(bd$rv;u2rPYE!OzFX=qoH1c-L=cf&zhk=1(~SDxhN+ zbiY6h!Sg!tUgO~z@EnhGpp)4W?S0_+ms1-5IZ#4;R_#2#TK{l?dq+VV4QNF@d@6S{ z94E5bg?C4D4BH_Uhh_P|>xMqgKTc-3*a)29A8$4PX8I__3e<`H6k;$J6cf#|<5j_~)nfU*73Ue;nM3i#cGM5V9Kg z*-0=SM?AYZRABQeIC6d4cBtj8jp^F$A$rg+&jJtaM{I}(Ci#0nL6`VclV9fU(P|V2 zdmJFV`39_?;+f-vTf9?)*-kGeeGZO)l9$pQd79LSEIvG$Uq^Us4}oqzUWPXQ6Hbrz z;h%_gg#Vvgbk$AhSo&}5p&-f= zB*{?Z)Z@L#U&F_--|^pO4?FE?La~`Y({gGI>aWZYU$sh@;x*vL1=v)Ahk$~c23{BN z=N6x8h7T*s0CojU^hjAb8J68hBJAG1?+Z_P!dU5u+eKy9S$3bE#LkKe`OmKQZ3?sF zhc3w&p(qAz{WWn+<7ZEr%W&?#$rjOCc>ERS2l^#WT#XWQY!5~oZV}1*^~2~zvEn`P zidV()bT?>U2cuCwcQ5}m{SK$>S)2R$BIkYh@E#CC3uE4k3~|TIqJ!^uKl^Y*#mrU` z`GXaHGSVCX+?pfXog}#nZccpg{Y|D=#0CN|(ptwy>Gu^TlSkT09n66(e64Vq;I&;1 zV5rkZah%yc`wCkX4*0!`0B)mlW6xfJ2JMC`&|fd_hkTiIu7=Cf?Rkk8j>RJPB36#g zF_{HTI>pqd*Ua3%%xRsirtdhK15jr)fh5j^xTETktzR!YAIPmyJ=1nji z3mk@u7zwToxB^G_u6@*b`3bl$h#lj)taxu_f3Ch)HEOQs8da#U)yA9n8xG;%0dkOM z>nZA(%={c<0*L*VgU-F8j#HC@kEPVkX1^E}ZP@8G+T{}|z; zJ;f-|`mr;NX5@*0z+UmVT`_z$N0c6r5el|yRFq|^C1BnyZEx@g9uebSy$+&O6MqYf z_=|Z?(c!H5GV)wc*?MB*BFkXEA3ePC6vX;k36ACrRr|Em@NAC;b>ILh&fAyHxI)Z3 zTyf1M;$Y&6Q`8*tm-p77ZurhvLq=FWOUwsYFiY9UP0HngQ{B*e1Eu~%CwMn21aXn( z9r(Uwj>!=`#_w`z0Utt8xWk39VjZL+U+D2R55895?GD_2pv6brNv}1VhtAekXCs?+ z#u^QNV#s$iu3&m4U2zM?a0D4<55Y!OND11HpNg72GuMt$1K`oC#Wu&Xl(wO-lTHti zXQa}>t;h~S2k0$^d)Q$71;~()lZmcVtM-;qVE*&W?jR=rslM;=)9Q?k{Zo0%F&vs1PrK~r z7XPQ5{qUMMx$fpue%O_GuT%JL)fpM|aMp@S>~OB3(t5i^`P$*B`~+Yz76)b<;n&Yf zeZji^&^T&ibGE5ryOzL9JiTP0q7!Lgro`tyd#66`mcN znQT-8ms@gG`EaXj_{Y{FO+1MX5o`*BY>R6bcatuG2p&55#P>rWO%QbzZRM)F H4`2L$Zx7FC delta 32339 zcmce;XIRruvp;M@np7#$r1uU2(ovd7?;VsXU3xL_RSCUIQ;^=f^qy$w(xgUe3`lRG z2MOUHfA@XvbI$X;d7d}_7hJhA*`3+hot@eF>@02Zcwgi2z6jl>o1L9?aB%SQ@`{Lv zSY2Ix_39M_0{IZYN0_R9o1bXoHr0J<2^O9^pBq89#bJPU)}k6rLDALCqu{>Ijg%YD z-uJ4%YNl!N+sQ7jnvUxDP|a3lR?R1TY|V(2VpMpxB@{8-vC^;WI8P!~I}MJvwp_Vd95G5lOcLJ`83yBsj6 zmK}P){j&fdmC>k#zA=q8KUW<3cPk2pMor_)ylA4U7ofrjgnjC_iSoLTEj^>_pvf{5 zYX#|#xE@Mq7Oz_kO*9Wl~YOmZtP=9UssUWhX(cLQ@+Uf6GOTRp8jlBj!2VmHg5 z{B2AL*^L!0I_;Vrvks$+^4mPLz_wz5Lsu<;)* z0RAc7{^SZgvg8VbnKcPf2-z}%vA)O3#t&im)Oaxbi-vC6e8$GZAwv6bw>n&RNBqCU zJ}Da}@&10dq|f(|;oCQm;h=A_9yfn7;JDBdFu(1-iKDzj{X+o$UzEX2P8jq-ZZz$z zljPYU>JAM5h3H@LoG{uR-h1#(Gp~E^1)_SG^ZH8q%#z9kB2802m%Q0iM(6M1=ET;7 z@Y+t~y8uSh5Uorah%KqahzkbNn`wth*^w)h)BNRR46}`T#|tVrF}&gf(pY5+XtOnu zjd2xEaSF`f{w~D$G1#pjP=K5PHbWHi?KLRCdxTp!lAE?{Rl_zuIY64R(U-2}AgBHq z8;^1!cf1hKj^8|O727YG2FAE>VyOO#p+la}p--$JEgXr+sr)l{&=@zqyV&{^8=tSL zH$yFmx~!lc1(ZbqC9Y`hU2xs+=e1Bpy5Xn4Km^F}=YQwu6HSG%t2XS7yKSjZ0m3UyXBgfK6 z#Rtz`+H$}$g(cecU{G?_3;wIHBYW(4iQmQRj!aU3nkwXB2!CP7T(Iu+<$(JgWVq`; zq7#s|C*$S7C<_I)B1m;F1*2m+#rpuU@%b?O<_z<3FO|aeR$ASqo>I~R_cdUMh;C5$Kar?L|; z8^vE!!SLI0q6|-4d<@av_n>(+SQTME&W`P3d-M@PTnIDqct1!{zfMWf-t>eFHz|FJ zdsydS?wdwgF37z4;|YDh>siuZKMd-yL91RQUYXATI~W=${gY~+)%AJ5gdN*n>?t$g zJtQ()GQe3BhT8|_G>QS8WJfx;%O6d_m_-7@`ui6z2kKF8?)U2qtZK*<`})D~5Q>iW zFFhq0OD~az)49TdI^DjtE0e#qmV~r=tN+@EaL3YqQZV-|BjD2~8yRkg^SFs*l7w$l ziQKDGaTRI{MMYJj_Lw&mHOTLZ-Jka}g2O_1pFRnB{4gsmE;st0aZn1|_#mN!vefzC zN9Z)2CxQl74@Vxm7R9A&{@sO~>vO5COZ&q>=jD;=GVe`lu3=(5p)p=}@#rXtWeS{u zMmQ%S{DJeHW9RW8&g;EY=imdea=uT?LH;9L+|Kc*EFe7vo86(arH!P#qsg_4+?KOr zdn*1Hk?wVmi&Nsc2RcvXaW?i#qxJG=DNa^aedl~@HI$cm6z|2|t|=O$EN0sPAhhEk zL17{E;%H@Q*z;Uxxt%!nE|`pObk(!9xFD%a80Zu}w@G{V{x;6!k83Z54mfsWotO4g zyQXds{nK!Gf3++1i+FAzVm;@Mo(OJGF5ek9u@#rT%NjTEa8{Fk7lv<)v$_9#mX45Q zlPi40k<-_M&&&Sy4%=23$2r#*O6dGajFlZyaVy|h=N8Wcas{CLks}S3iGW!PXZ6o< zqu|4RR@#x;nS9T%#8GjS%A?1JSHoRX4Ia54d&D+!9_TURMqwPL_1O%MFC7>i_{@uyio|h!t-ab{P#Vd=)!MXs zKkV}ClAC~;DuQ{KVIsy}CVKNmxa|3}UI4J@*%-b!2~F9f1_0aF>NurBkGq;Eavqn4 zN0W-wKTM8L#6Xyqu7T6j@aaES@Kzi15QaLNCCKToi7O7s@kZAz+yq{({}n9rx6j9w z0+GW@L6sv#z_JOaGjCv;w$xVG0#{5Om;Sww3e_L{c%h< zxa=lXnrXxQ)v3()>z#vUKV7rd18sh0oDv(sI3?KrDq+^UcL_Jd+S(p0Uu9nV2rdl+ zdm3xdYn&OiAGS47Fvb2|92)zj5A~(rc4 zQZRjqXCR|F$v5w1E@WSa&hB9wcRcj?u6qIrn5%mdvRhmFIz4JO*K-8-7JF*;7NCc= z9&XxXb^3X?@!y(YqrB)UFFKsZ0N}?SwiCXX>%Rj@+`$DfhQP2k%V+!YKi|_J77=o+ zoxdh}y2xaI5=x!WoiCi1j9s6U(JLdrC0Mjx>efQ#=R~jB!DJ2Io%zpyW4=@z)~E*` z!0bG|+Pw1f&7@9m*W1#N2Ar8E6uiDBIqgKKpquCU>)R37NYsJl@aT%aqe8(=aqCI4Su z3k-W+Z$Q|DGWW)>1Nfp%)Y@fGH9qW?m|N=cI`{m~fJZ=_sL< za84rhCUWBrnCua`!cW@&KSbdFYkyxbAU18~uI)wJgH+NXiCPJN+`WCcg+*~JzgFy~ z!o@b+s+{#PBFR2EeHk=a#ZH9WdypO9I}JWe3L1r4%5~f+(MN78;5r)@U-)2F$UK0A!@;RCz&AC? zVK=h#LO>8>&xIAj&;HL!a|oQpe`_Z;s%CH6wkeO<)@ zYZsQu@>d(%PMkX9y2mz)`Fg1KP&@}YrcEO zia~y1*0kvhzy%VCTGtD(RJ?dgR zXO5%%W!1d+Fy<@vm6ky!OfiF`&eWM!-MJ_C2Zlos^YC;RjH=?ml#L(Hs?-NxoarPG z=lN`tngJuJtzGjd!)51VWaCrxKst5$PL3eu!tp%kuE?KJi`o(}cUQ>ym}g&w=zrWZ=A*GALVlqbP1;ii#{r(OdPI zq%jE!t6?-JL>4+bex*JteqVdr)~Y{86#r^CrJm;0KPlv#`*>iVY2F+%a#^J1hQ zK;ifD9(9PA^Uog&Q)IECzK`0b-W^7{|1NJq*#;oia=QjzzMOK};hYrKqv%ZMdn2O~t%rsfRfZ08TVOf$seAsw@jwh|PsZ{Yi zEYoAs?yC-Unhb=Jp80b_)wk$;f++xigu3CT?s1LTvp`s0i|MZ9Rt^t89WZGLR z;_c=qU*4z?nkl2eyz)R_4dij3P8nG7L6x-H@!e}M>YEh{CSD26VCaox*6u$cKQtq- z3LD}-59JItF3MaI8=w2}Dxw6qP=9*x9DNmb{rmG)^7!2Df)xws%YTj78Xda7Yj49q z3uDen_ylT?qk)7t$35Wpb5Hywuuhvqla1r!$bgzik3UmFziJx21d1j6=wbX}Dk z=QDTo(Q9eMRuQ`!aFjaLq;w9Y0*O$jYZEoT(RPh?4?qk>m>t z^necsrZFzPKkMaIc!}L;C6OHCWNP*MDgMewd8EcXy&vZ@QAzCsNhQ;lcGa!lKuE(P z4>8H)#EWyl<#oYUsP0|D2ox)3O-@2CGAfqFe*M0f>+RAAgp`e4gEjJe)qgVnu*$gK zv`{LZKB7N9YgZl3Ub@`Hvs@G=S%~~KHmx@viSln9vFVa(NC|{FfyV<+4CLKAIg-hc zx7p~lsOYG!b=d3(o=cqdD0k_}asq=Z$CmTR+bgz;$Ye zw9V+4S3zqPFMPg4sGyG@b{Xt^qzF5T9dt~|nIyiB(dT@V8)z3htV>|zu!6@XWBC$- zI!*pm^aKbB(MsR90bWliuHDZJmH+a4X6to%WIW^NT$_aA6Cqu6YvGp%hje3!>O)z& zMpsJ!xkBHMI+f|m=k6zuIQw#vGI;6jRb7+%bgXl8trB0+_Jy5=gi^qeP?A4v(~n~D zXBanQ>?JIm8s=9WnnpNR57TVW5~~RhABgy}j<38sJl4_p790~le>V;++%ppz@-im6 z_1;fjW6;{`p;b-kdpED0Msn1=I+?0ooY75*1A3|#iv%QxGKYE!32BsJwjKn)@_hzdvk~Y|F2I$`;k#ogT7ka(g&+ zQj$fvm&*MpSWf?;q5^%+kNlL-%G#t40XjmM<&PwZ>oX|ca$gyG-9RTWitH12^rpj; zIAB*}nUMGK;TAZ49;}+sQmCGA&n23cn$+QjO@PJ$nQ&5CPQ|n2)PZ7@VQ{!omLhfi z;RK7hcslKd3-bI=WqrPSD?yo|0ghuM%|7ks1r`&=d>clb72( z*DcQHo;={&6f%UjJef)SF!!Tz`MQV0w3>ix6BCdSOp<$vbrq}W?Nb~f%-}dQ+7LL3H}h+J=C)S6La#eFchSVw*0VGk z=ff;jzl{$G|8V)MNl?sF(ff*pXJ4cnVa${Fm937P3x~(d0XYR{Zh1}+Q8#m*{_TCy zybihhjfa+Rk?D-~`2#i6R{ z#6P;30JgUGUNs@VoqV}X+o+uI1dTs-zp26T7jHi>a#zgNw-R$J&8h;7;w=pio8k{n zJY;S>B&BP{#RMHAWwaN?#L!n=;Uk@I?@%i^l;d#_3+8iqBr1>5N@Zg!F7Ip%Xu8(M zo=i1I7GE5CKhzK;a(wVNF2>+aHfKj)*kk}*VBaJva0kM2Hg`NK74ow>$1&nIt^sbb zGQ(px<=(Wn-zSaE++q4mJM%rq!@>OUEy~v}2?k`WTBG(DTf5Jic|)%;8+u>Eb43l;`liydSmt&$Hv zNyO{gH>VY_tC_r-ZrM#prkbe0C`s-(np3>PjZhI*<~ldbr!n%IE z!AXuSRYH?v?fzR1IW`hj2j3Xq5%rp@wCbPLD77EI&J_(@E2SxYacV%0#Q`v#cVGxA z0^l}sA;F~g!LgNd!15j;bUy^du3sgp{wET3A*XG>!|zXuJh=03nE=w(7VX~3sFIg9^la!4D%9ee9@4Dcg}V9KVs5$VZMtmF2?uw4LiBM z9dXpPUN^gxcq7z}c50=b(u@us)5DV13G*%?%DDkqAiI9F3$n50Plu=}nn`j)$AR}B zPE{Ac$;D;sIU%!R80PA@OiT0OThulBY85dA3B5ueUT!E731C8s*4h&#oOJ@;+C%F%8A5Eb7FMl+y;TNnRO3^Mk{EKS$G{(N8(MD=Pb&DNuYs= z+kZEZ7egH-to*XuR*b9s9t?VUZrgc(l%GoDL1N&en-IcCD?yxhXA@a=H}KUSK4X^pMB4HR%M=&yfi5&*}YzAY-5Xt02G6S z7O;8klu*ILW=hCRf!V#8u68e`W?IkB)rzC2^9>5$8Z)Q2eq%C0~wrK@Q;lPUEYMI@3bMbGkt`0KDck2Q58bFb-84$RRkK$|D9pI9_N)WX_znvVd$F z9ICwQhMD7@i;YG>p~4G3q_B;dp&wq+8H47lw#w+`t_QbxlV!i=K+{W-6Hh2_W;@^W zS}N~5jbN@Hn;y5?V#@1sg*P$tj3AJcr#0=cI>X3R@Ai-^IooHlMj!kEIjs7i=~V8L z#&?AzFIANk16ZI;CBe}r;iL@Zl9+*3M9R(q>h!o>nT8muW4U(SDq!LJd*UVGYEeT0 zO4X&c{FXNfvejw$;IXNjjIK3%6DRg{Z@<&5)9k}-U)|EuOst#Z-pf4!eh#UFIGL?8 zW#&9+e+%?_b5w9NTs;X0d!~7P3Hy#Jg+ULWxg#xGZcNgH{^b6|M!^puOCcwl6IW>` zP^=9j691`|N>AH^M{sh5>>^vCcRb0d3fbQ4J|pY;NfExa`Ylq(g~nI{Rnj@uRO$d- za464>@y14+!RNkmZgv#t*G~`obpAvo$8+%$lY!_%ZikxrCKPbn*d@u#@}Z#adL}|Nv$+O)M3SC*}f6X zH8Z|vGUg^))%Xo;UB+v`AGE6U!E*S5lvC@)_L-n^f-Ph;5L*hXn|ZjL8Yp6EjNO3x z!U2vK!K%QO=W=dU>K<~F$)la@PEuJ40f%JG@g3PHqHVEUD zr{X1MBST3c2 z&Un6AT1cus`X%Gts>W9_s2;S=TVc^``ZwL`oVBTr zs%z=M^c1`%0tf2OG$XQyg<2WN5R01I7rvjn+EH$^g675Uf@#-VoqXT-6@%IY><+ir zcZzcr%S4tsQI-a@Vy3;u4i(9eazb9D)Q87)kXbo)o*J(eL3W-cJ;Vg3<7oSdpPua< zUbGYakS*Vf8c(7iRpYV=M?urtA7+mU4O;;fbw4kzq1u$Rtt;EnG}OZJ#$(o^g(ounjwpX!{>&>ij?2XNqSyV08lnjW25zr`ME6 z)L>Gv0sCRGbcY{4Bc{*|)|Lm)dJJ!#b#Ra0qsuTI81t5=xhRb2UA4&YzXxBH;t~Y7 z)2o_Q(7B`@`tJOkAe+VGH$rn`J%J|FHF8zTx~2Vv$yNijF9B-1)n@#%t#D`!?DAAmzGA@JibNmgVBT;ONh7Z*-AAH&T~K_;J6=bWpqye1Vm~>V~LxD z#n;+YaDuI%A_YfF?`Wb6>FTH{yjeb}{vC*3O8>)2>^Ond-C77xZchkvU6*mr!-<=7fyJyAP2m4mOLgb$>jZ zaXtqmN++wOC8i^3hmML%!Hy<|8${5(3NZ;9NDn2Wagx-RgggdunAg;31R$tMB+R?) z@PQIp6~3dt3?AyZJ@?jq+CY*btnyWBbymHX&1Vvs5#5sdjvoCjBM%nWm@iznLrx4A z#eS3vl85DM>C_7vqCF}f{FoOwOt10Ev#92cu!T!7s(U56l18q(} z{I-tQ?A9zhUE`mNk~gSYn&zdAIqhRDBT@-{hcbQ@$Grc=iO9CPW;R1TaeUE0Q~<)o z3z}R1<{Z5;GqJ@F*u}CvG5YyPP(jMpEr&~WWh6@EI~&Ne6hCfz&dvC(aT|Ll1>WqQ zI^Um^*cLAhG=_A2`F+Tl45q?PI;nxy@x82|~!Cyqho2u^NN4-Px+s1FUQ?HrAdk1;SF}1R8I) zC1HjdwObsr;(?-YT8OhrBJcBWQXE;V7N6YE9y8CEwcw}(%lzrkJAHGdo&q?C;J47u zpwE=J&#cy4&p3SvOu3U?b9PF>4{b2>2-Tj4u#ZDwwwmNm3FF5j6k$uL^>TBSf`+wm z=&bTPBh(6tz#WoyvneV_jZRI#6t(RkN5Jnn`tPY4nfe*EA%m*|zE)D-f69>jG4d>Z z$itI_y1xd6dTN`Qz6Ay;_dm{cOcqvPY6unoUZ}vkA|?WbuJ^)EW5t?J)y2OfT2UK) zJgz#wIz0I)!ls%Qt&nk?AP*7s=`WVGRnT~*-+h6=IfnnRb7!cMcdGT8t#HCfQMCCR zm-RZ0xxSZ)Fyp6lDp!UnePRfIj#}tG)|sWG>FZ_`qI7*IV_wgpALUs3v*8Qj?=J5E zi5W=1q$uck?Auh5Urg_tsdud|NchDrpV)dy>SJ59OsmI$s8gJmCN7`<{H8r9PcG*q zW1kTCHA1$<5hbV-+(9W~i}rZwlReh#M}i1q%Nid!_`~C?Oz}mOU$A4lK*vZ{rx_w& zIeap<Tda&g)rAnKRUs7Q`P!V@~fQtJq+%jl-3k zSC>15FMK{I?2_nH1$Er>cD_Wy!IdB8Ut60-E}U#m$sysZZwf@h(x%H#z1T2kj7R{^ zpgn$EsDdu=9vh{GFtvH7jTjg7wM9F}fyc#Da^3SkKZ-c`X#ZnkM4_ZW2VWyctH8?N z0-b}$eZToYvMi5QI1|;LX(VvHj{s_^uh2W$)iM-oJdPP9qo zB(Dj4()7zt2QM=2hMwWILlsHD1LM-{=jbq3WIC)&sF2C&o_NbUR0U!3{`vxh(fo-@ zrhwE3GVI`J7Kd)htYR$yjoo z03k8_CUBdd6N3gFjml1xzWS+44CNvvO|=eNn`9l`sYo3v5AnN)WZ*X37HSi6W$17_ z^l#W@`QkZH6AyLv$L0av_Uq7XtZ=pmHhtId7EezPhhE zFlhSE; z4djql6qIZ40m`sPmZ#5fRy9Fndwq&evP;GRuP_eTYtjrbfon?HUot zGbws9gzFGNM&+tLe7xMatY3rXoSiJSHgP2oA;$J1(F|50-J~K{USA#+2@x7(oSPG$ zzT#wfd2}3)YG+90QoCYQRzG8CEi^|CvskJ3N(oF3ed8a0aCa~!2(OTRoSVM_ z376>y+6xx$zgIyE8k?y{q8PTGD^X=`cx0S#6l?vchW59{sl9c| zMRm`8>CSw}*qH|77VJ-o4KtS+oI60Q3?%OlVtzgiSPU-agUOPIC46s0(;-U2KVO|% zE9zhNc=#8(zw`^H`+0#1y;kOL3wbQQ?K(RR_}gnHsIv)iXrA%c7Z&%89#JBtPKt6@ z*z{T#{@ncR#Nn!5XrKKvSCGGD`lZ2H_q}jlGa^Nd95(jH@oHg?f{bm(`=s z0kRL!6jaXdArDJAu7c+WEv%rWA4`Xek2ev|!F3(Yv*sNWCd<-4kGv0lUTZT^ruN;c z`>r{|4XfLUIy18-L^!2HV|!cUe5|*Jn#_?qjfQgk>zni8<~za_C^xE6sYrbE-p+1v zf1UyWsspZ(r6tGq{BwtVVYdSDVH&SqRFBK2`Gy z9*1%JtLO5oy)O3s_y^t;>_3&n-c|9}`lqY3t_4=ywf+PvGa zGQUnOWPZQ!uosrG6Z^6syyfCY3)kn@CRQX*xb;L6Uu1K|qssJMkj!r;iJVAadz7v9 zV~5%($J9f-6;^>aW7xW4A9jeBWCm&aMxh9)%nEvlUco;4C(;-x z+)rQF7Q-}veJZ^h#W`T$T2+?6=}uMlR!&!t+O>iK=2fX90|`mgEJ+!PD*tVG^I=Wl z!-Ticku=;fKd z(P{G4NYQsf)~2t+ICMPr&8&*Xk`|`ot|B?2LV<0<3BA@Y^bZp-{nhWCK(#g&4|W&RQ*wy5iLguGwIW7CBY?SVo(c4 zN>fb5X753}GDi>ayDV5`t79{gocN8BmbdkcHER{AO?JP0nIknCRnE12w{1xWe~6CE zYAB)-)|UbKHa167+-(?9P^xL`q2ZJTqkj$v&6C<%O1&AN87}BIo3U)PLn#z7fH9iZ#ed zc`giE8SAl30@gnrVs}2(Pqc7@sJ|mhP=q@(qximYP0Y(9bPOJEyrN6=!U}A>&2(5W zj_#OSR5r%1=<8YGUOqPF@S;u^(g6-10`g>N)h`6ba%Rh*pON#)e(EIftFz&joPYLaY z11rnkq2h4Gnn>Y)pio5;u&n?KhyW1%S9=zW-Oc|1qq?_|w8R8BS@-?|t6}s6P_zGr zNvD+1eYapyDB9b|IudZ+U6?Sg+7%3a@DJenh&-(R?Z0LIPm#c1`1Sv?%>O0w|B{~n zOXmNWTHyaNw?GuEb>72mU9+ENWKT@0M~2R~wz-J(c-EHSzgb{%2oZZCI-9!Eij61_ zc&<1t*w&o|^RGD$JL3w;65q%(dk{-$ zn*Hv$1fYqNpjis)laSu572-}Iz>F6pZ?9RkBCC5K1kOQQS-6J_=b84F?4hozqQZIWYtRm6jdhdE>S7~dEE8ek;tsi zH-`GKp}8RJRV))kt;BvQ1h#_9xc_bfyLMDi#Frsqld`_T6NzkdH1;7FG5><0lM<5m z9bc9*)-Hv!v3AIUCb&u(5X79sKeb`f6sSASi(8~wyXDbC`-b^PCAhbbd&%l&@pabB zim%y-5Cslch9>(qH zJ_F>;YKNh^k?77l1d$4X84)7DIumOowNhIpEGb{K7pXPfzEBHnlx+vM9L%~MdA3cQ zWRWyyb?itu#G4DGfU_?w=C$+cQi1Prjr%PY(=YN{kZmnJ*C`U9PULqEbCT8>s{m14 z@j_0p@k}zLxH`4JA?J_#vw_bO1=6YyIoI%04ZVA#g=Y)rWLAO*kEIi5Ximd?3edLI z--;~XTZAB~-FX0zZDi)QAWbPfX#+7dRU5s1Hx2Q%niVrXpAQhExhns0VYPgy{7~}ISKG_wzooV!sV!MT%AWs7)En1k(11A=fglH?6yvdu5;A!yhnryzsYyQ%k>?Mm2XCy;gU&4pEEpVR=?(7NU-M z5QB!~5hzBl6$$8aYFaLrR?uTZoClm>c`~7!pkIASb#hCEwQ3%Je4e8#=++9>YBx#% z_gOk*=Zc)mkKQI!kEElkN?>+M<+Eve4f%I`bRL>H9iU~I9I=%q@p#eeC#JW>NJ5iUNn%16jG3}+}pCQ7&g!gr( z_q7FCIqGh68u^}FjNVX>);4Fgc^scQs(CZedoOGatJ{dXb!3Wm#iH7Iz-dF0-uLp( zN=My33R1`En`h?g1HGs|PfpQ5b2h!3TZ4A&T43AEi|sf5C1-De0OP(% z7Sti{6F`W~ruK?%L2l7eA{mHxz%h546EVPt*RN7 zF+Q?$PQcU%pClzDokk%rXkpFxHJ5x15qIdwNOGby`~4m7m}15}3WdP#KKSO0g*dEC zOn=!mXobpcqEe2JB@LB5g?&87@jcU4~X2?&`f2ItwlzjlI8!p?u+`zD9vj%;PH znRN47y2^8^an`6vk&bn!cb-_)sh(PoNZOCmO>xl0Q2Lt*fPB?#0)$cW`vhxjeN zu-hXbW*T3s(EiQm9`%xZJkwOCi7p)VPGMRf;h*+7+dq!FqPG}SY@UhEMCuMG5-MsF z1Bw#tRdoKhm=MP&M#{vBzb+RY-g(}BJymr!U1qqW1;78ncdY!YS)uMb=8X@-lhhAV zOr)e^>T(>ZZ{QAinh}|DLi6j}Ua@0ne|B=gN5X$-VhJKzW>4ZVVwy|5gN5(F)RV;% zG_qC5w@!{6Xof@BR9;$9>fUo2e8Lu0%}H`?+;JTcSS=A1w*=6 zUVUHY5V9@Xywl_OJuc3vpKC!o;(&9dW*C1+pu$)^?c~s-KMiFlE*8wD@gZ5qP!~WO zplNI^AD9zI)OnFzeDEoH75L}g7zEr65Iuf-euG&wD)7L?T z<<ztM_@LTHFDN(D<$|EC@WulC2F~y!J8PpJ2v?PW*WvBH!KFNh{(&*UuMxs z*MoF8H-6l`vpZ?(F*o4`L8T_^9NlQ53}Z}4lj86%%FUNtpptriT0swCAOA}8Z~sC2 zkT&$U6g|@fe{$|H84o6Zey-d!uv3FpLh*b_QeK}|Skkwj#!Z?AfVOpiO<8-JugaSJ z44i&j!M9UTNJY{6Yy>yv4LsFN84`+|eAn!?QrdMwdr4t~EFJm7vH43yJX~Tnw@7Fl z`h>BRl*!K3sNT`hM!`ZUgL<6E@+BL{Ud?GM(yfX*s<~V^Rq)|v;Wn5&EIVebwF=BU zsQxueP<`h9J>(_e`no88FUFvtNcP!9r4-dw4grD&?=9&4MTfb`&A-wAY0#f~9`Bl% zt*?oBUPQfUt0&q7pRXyU=IO=f*-m&Zy%T^LF~4Ce9EvGmLNp$*sN@Fs)I;EPY^B>L{q$Jcnkcm0)U>=~kxF-g zfBL>P9%z1sE6?2a`5EAtNzh(xaXwAyKapt@RvC7xQ&eyOxsWs<<|eF?%1DCWLkinZ z)%!VwvQpI$6$-W}GpSjt<9nc&taEZZ6a5aErRKlWsfO2t`OH5^82iwCC*-3AF_bs< zG#!>1o)mBrzFLY}bF%PuEWF=}i9W$IDRz^o1mHJkEZ82+3*qB-^F`NByGcWQ6pen8$zmKFr)6{;c10rcNuc<>O7{N$NL~Rv%CM#93WPIxi#o2S$NkZRGAC;i zZ~r5P0QvwH_z_OvV${!KbQ>KvAirh(rIv9lQ?#syk#4eJL)GQ&Z~AWT50M7AM{R$L zfc_?0H<7feXo?;5N&obZq*qkJ+1Go29+5*<8-I{&j&ciB51WRm2)IW+5>W*PhY6*E zdoFMQ{(osk0Of&Y+AGAi)=oX!^@;B9V2n3{kSa8}h2#$@#S>a&)+ID_Tlb;Kl+l{s zWx02D=3DRnTfwlGU}0N&Wr_<2zktJ)KtYi7qvA3IvteNM@6q;7Bn$XJfC55{+_;A` z&o)%zyQ&|&92$Fa5}`KS}VT&1Q!ZWz30>3-qICgqkMn`$>_b3EWJ zE>&vWxA@Lamvpgw{UP)$l*rbe-HtIuPyR#Sais&CqL?oAJ$2gPL z?e~mAgqMH$oC&0#zC}=EECOzO);XS+dq~Q6Zye^P*6b~eUjR2hSuR( zP+`Gz-xBP`05@(t3DZz{VuYkkV76V~TVxDwpGD3WOqN46v;#$elsxSa0{(p7fmLWo zF7>#nS47I@XZvG5w`rmPe@`k++^y@%cU z_D}Qw!rYi~&+k?L3#%j3BGpDi*K;Dfk9N)?nA-wI18!fHiMEq4N*%OV{JXxhwZWn6yKvV zZy|4|_E2KQ50}$gb&tXr4zSDU%g1llao6w*9?iX%Gc~2OA=&`_pt}j$AF3Q~)gpfp za`;vAv&W@PxbX8T0*jXew8BE???W0_@qE0YQ=+!$zLI56e=7(pJXNO5w0i(r!OII0 z{!k=(r&@v?)3WnM3|UJX3<t&U;p}>a&J{&CbcNKihV6^GWeD#qB z8w3bv0Xqs>ppHpw7MXe~4I{fakH5&Mc`CqEO5$<2raZg#BdGt;oc@8xRy5};Y7#D+ z{ha|zKXY~WshZ}sUos|)ZJ($i!pm>Qq#oeX`x26=OcS0`9*tZpD~L}C^#nXja8v0o z6!UKA9U=7Wu@~M!YNqh)P!=6h$~@}#n8#LQqH!Z&aUxSmhn&(P3CT3p1e>|jY$50c z+~Mz&V>ky-NodR;4IR)khKv^oS{0x*+$%*OU;oX+IX$Yfb zXyMfsBEB7#_as97Il9{M*jU~7)|0;&E%QglERy z{25SmAM8L8A+cJPjvqdnkuPexfDS@PXyY;Q>XQE!%%ZFFdDZk7Bwdu71{1Qp@O zSUiX!xjO|j6&^WdvT0GX;#^R~F^?$c6@yxoTWdRU--|uP*9V zBFRw!6X#J1h^9=^M+rd;$GzT_a$O5_t`_dOb{+1?HE!q8kQ$&#cX|!8RM0E3y5ql- zbF6I=>V{2|;u6Ixzvh>$P@^sCf7|K;3n{{ietmqd-v>`_9pAm$8xrjNvs_RMM0mwr z5?MGtN1wJ8PlgM%_V(XHc8U>_W1s)CYn_oSVlEbFUN0F+{+42G|6?UMJVL5MXSrBJ zM+qv;=I|78#Q8Ktw;jTbE8ek#Sa^-5#I=bz2rug3?y_fQ`nJo@>WK0qqb~==z8$IR zCsRT=*(Lka@R7E-e?{?82|$}aj@H6G?;9SBo@x=Krek%oM;9B>ak$h6Jm4H2)(f#* zrhSykX@~3-x86vfF8Lh%E@j)bf?eYO^!DCiQ8i1y=pd3MXGsc@1te!sGAJN9Gb$jG zksJn=l0>3_fRaWfNtPTW=pb1@GDvce9ET(W3^Qk;d%xei@4fpu&;HK&?msYVt&Y{z z)z#Hqzhc`J3b}awP3vnTgxOZh;Tu~c4~0ZE@h~eR-%w&@5Geo@G1Z<_uO+x(McpqZ zrc{wGeJ%sEp?7mB6eLtLedC1FjBOZ!kYF{oX4NH?5&bcg$*J2nvEa4IcZ__swNe&* z86ot3YO}%UvdGCis#f~vg%t&nJL2=hV?uAGv zKjzHLcbj?RR{QF*nkMYzK|8qlAd7Op&D)H8J_pB0<&P2Pg6FQ+PK%xeU;T&t{N1_J zA10p;`JBgUrF7Met#U$!$9c8;?%ig6m_CW&)t4feWE<*s$ybqE7`i@`1pQgXqZqi< z%hc>RCvrDvm092nC^3)cD&87F6qcVUAQSW&4nFp!CtUE*K^s4KJ^kd|p8M${Vi0l5 zez>{PXFY*le+;w*^Oh%DB?w{DYraxly1)P0!8cpZwwQa}OR)|kI2V@Hvy9vDP}vW? zRiOrO^M0{>#4m1OM3UH-5|oWK-p|};`lo<*L;@gt?K8fSIk@Jga)nTcxPa{G*E_Q! zn?!$P>=n#)Ry7_s0n8^mr+O>4aj&yA)1gucc@059GBV_P(*Sxk$K2Hw9 z2`jc-yvM&l7eUa4{`VZ=Op&5*Ra}wpYy2fhwzVPL!2Cxr+(ZNC?1#AhmE`aB-9lgd zd)_Os3;$RZe-AO065jcl(NmHLcw~5Y-UZTmUukI9U+dg?bBdlI^b>b5YDLmC>S5*f6Avyi*QVRTjA;3T>gEpb=56 zWS$;rko%Y9C8%6LDqf9)V2|y2S^)oB*xRy4@98}>D!=0+GH` znvYD8ZzlARy`|fn(wh*;(R1wlj`h*kW{jWX|08kXwscN_cUfuk#B}*aOhqax(@W`O z^8?@a>ATK4}Zwo&ZX(d^bAK_6L0WE}tc3)<3y8+&iIA0n5 zA%A)e(9Q1iP8vyF3eRv`jRh$A#h_Owq@~hOjG|W%_P`E;`vlXvCPx; zS0w*BZcvICQ~y%*h?k*cfEBi9B}Lg|&uixJ?=#8p7NLG0^B;x-{&4tz!g1G#;eRiU zCPM%3oh3o={+%=zk+gr&8YFcX>N)f(hVC!n_yvdr{aGXy2ddrcPp|E$(UF`J<+n@@ z+i-q2Dj#823&KnMo@F=0*0?Jcl>6kVkN3h(u{Lwh`TkiD(`&W54JF)CPj>O%hT5`Wr*_xdHZz46eK-xHC*-caL(fjWpIaX#N) zXfxl^p#QHTE;5_^yG+1;E7$))rkKLz60SYHJNEzQR;A=!u9q9_8?w zt%5)r`5#_&aM~8uFp0xo7PdX&`CFMpL_ z#f_2KL;-vW{4U`eXK=E=3h6)!SZn}T0#cS!t|aiOp(slmXKoBp1SJi7u1;M>As>D9 zF4;elXi9)?VzS?5#da@|t8{>rCmbB=uFHT^z3=X<=zfm?M>o7HdbLzz)S6=@I|l!U zs93QlxSIUX1pW7JvWWv<<*A9D0MS&OxuoFn`<)!FchwmnV$Oc^h$oeT=AZFX&`@)f z{D)H;=oUuAqYU;eqmT(F%JRF4aJoV2A&Y+RINYmv|L*r7ONG-Vp1=8=dimn;X&%2#9CH9*sj6mp=pWXlZv_uEO%qs)9R;LgG}RPK_Zu+j9XS@24=! z*txycxo}mfscep4{!)rTua4%@7>V0TXJ+OJdK-Zk$F7RsU3kts8#>!s+QO0=vX8K7 z48(1D&2KlB+fPGKky5@pR1Q4OWzb*A+6X(!y*cE@-rZ{e;5c%BoPz?7xDCj=8^3uv z6pH@2aldzyehs^yBeu5h0dskFx#W%fTMziJ6ayfnF!OEmQA6oX^iqU%ok;5A0D@nk zb03?PFv6>SIoCBfPd#7cbKc*o{PpR$Q`(pF%D^gidfim|~%rh_^X{%>dUVaPup0MNDkzn=a_9#Xy{+9n6MNs&gj6#8jc z3Zx_GrUohivOU)@L%`VDtIGY9zJ8G*8WNy8x@YaAQS4rOep@;etxa)h>ZbjY7Iw;R zfw5Dp40Me9osAPSBwK&>0{HOLCDztR0I46eD6!p~}>Bv%66pyPgWn z!=oLj#RXTfMj8amCElO++kz0FkNz16DhIA(a>1>`G) zlB+sU`YsN^A&M;amSRqr3Q!}+>n5TmwzJFePalOKCebQ)5_k`YM6U9{o|CW)HaZo* zK_yP5D%1;K#i)RTm~sgLZ$cAt&h_?><@Vf9{5}<=nJXGuB{E5!5-LzUE1R+)vCE9lNaYbC)^-N_++5 zRb`20G-8}*mQTVRXge}4WV9?GXcy!j7ShwSs@ZpSyy2};rbBRa3^Xl7hG8;sst#Pb zIo~|Qb`m;WrQREx6d8P80elf-TlRfGBhDb|kk|V%qJ$4aCLp6f)j{y5 ziOKwVT$dfR-F?_{N@9fCZj3qHMktTXmp2&R=7ur1E7N{1CD}SPCxy4R#iWj#8QDqQ zW!9ME_`>XGqK({$`3$0J;2w!BOW&mR)w_ib#Q#lD>4m+jC_s$L`W0fuG-bYgwZTFO z*(cNA82!x}gyxG&EhuCnfzHsd^{o!%fBj&~9a(WwWgEleK@$qMvE4;bB18z!J6vK_ z0AAyJW1xdh$-VyaE}GHWmg{)&n-eD{$7kFD+Vxo2zSb6a_$nGH0#mEUAUM`|%Lde_ zwGca%)kd};NKMB%>sYWiXMFHBAlu3w_6_*Tb1*UGd5^S>m!r2yuEJP^ElII(^X0|P z*w3F|#VGP)a!ERzHSC6N7Xo>l@^%@Uy2)YQ60(^gt|b^mp>1i`Lea$}vTHEsi^_^& zD6$I?P(ZKhQrRcP&M3#iQ6$#Tx@2?HSu2~A92q-zZ07i`I?y3PIce*%@#yZX@)GHa z`wl-6I=re%yT@GdP5*>elm)J^@>)&lXYHA2?bHCXX4wTq+ZzT-XY}j18dPG3B4q^h z639!*Zm61S(OTggcy#Y-boq#=udZmMzVOch04n+ZVNrw#)6rVg%EHTw8pD_QkRuYA zSqGypgAmT2(lfeq%HL)Ci+Rj^B_+na=dISyF;qd1l5ikINY7=6oWkCtd&RID6B~@o z8sGVoP=(%)r1XUsDuTjgJiU`7h7*=0CQa4RnkQ3{x=&F4grHzRU3huv2LT`R=KwU2 zkM~DX{sznVA9%2$!uIrbxsVqNV35WB${EHM=&tsko8qcyd3>;DV1f?I7wBc&ir^G9 zp$Q%APLH!A&=TN<{Y1(OX5?i3eN;GPy11a@q_ml^>)&zKX&s2BcK{8~NtpMYX|7bkeyqDQkB;>e09=lx)xy#59nS!UN zHSYk?dlFyD;*ih;r}F!Wa}izxec?S9UZ;ZQ(e; z+gIAM#V%>yCF`*XL5U^cUtA2!Yr zAAsx<++Ns}$<4sWEf@omqZF%Wa*DEw%lmtqmrIWDphI`4u(nT8eLaRq%E%_(2PbjR zq=nkr1r8xhESGEux5WI$(UNcTt347pqmT7nUQRCIK#k|gdUU}$Xj)k~CVRyaXULEp z%U_7HguddF=W@9MB5|#=B{F9&)GO1F(D~@ywIpL@&&=+ zjARMf1ri6>`a{gfOf8j=8`r-vzXf3^%=T@VLU#3652M=E-IA8*T9pzPSFR{!(jYgE z^-^r*@pqN_)#KTyu;*>pj=CJmiP?W@pMxTAhgxS>tItaHiD1NEYsy6dm&tI^qPd33 zZwh*y$)MS~Tw`}LiQ7ChnuEFu#U}BQJK+AOEz6z(`@B4its_c2LPT{U7i)q!4ZGqy z6$w175s8^Ud5+?JP#)V#YH-jQt6)liJTf1)7Xofa0N3?POIn@DF&xqr_rofb%aOwWKF zYJAUSeP_T&-T>5X%q6Mg_0&5lzyp-?1xYeO{}yH+`@DfEaVk&wMudLlMs;=NYK6FL z<7VPi7*9_X8PxvESkY>+Pj6I+5xQANSxGiMJ8Ph#d_aJ0n0od`g&vRbs?CjY4kiR6 zrFfRX*@I7=j$LOYDssU@04Reth;ggVUmNxl^B2&P&cuLPS?UVi5cO8liiulIe$LV; z`)y7m>K7#9A*03OInp2!%+U{W=^Za0Jh#&xIc%vTaTBz_a)^u|U-5=$=albukVijK5!>+x} zGJvaQMYQRj9!x_gy4mti6PvsMs6+DGs*38N>~Q*4Zof_7l&}e?s9qv^ZE%zvcd{2! z;2H6IxiVao44RGZaLhlQp!B#_0`onMZ`D(VWi@G08lr3EkEHn-Qnz>70v))gu{pJ-{SrPRZ&rFou!yr`i#o0E7>9}xOc zSfFdo9z0Vj&oo$9TE!x&f7Zo>hlmSMQ32;d>~%>wOOm}T4c{# zQYd|_OJ|{SNJV8zdAeZlhiXR^evZVBge$h9(tP-=4;p0 z-~3+uNzesZZwcO8Y4M@?mEW?u=+*CirR{`e74wu2G)o*WP&68%ieh;daiqQYah4@CSBR?^yAE&rN(v`ghw zEV%4DAYgJmO&yJSKQBRaGliKx5x_hOKWipH?obi887jc;S?7}+xv!l}`ks4EUQa|h ztwv8AtvC!**89dSSURSbHl=Bzv*dSCj{Muolbi2o{Z*;3?jVUf<1;IO!aUr`{k@ID z2$AxkDq?tm?}mZp$9v0#@ zxgVt$o`H}(cdlSA{a^eE{C7(mQ|PcDph$&(^w`%!2p`wZEKU6i`~!Tbz5+s0OYBHM zJcUBnu^Ij}em&FMgMqxrPXSXPcq2{Bz%*#(OM!ek{80~}l)=M$w!I6fm6A{UO~a-& zJR5vI4&H`Q;<)A&nEKUWxuZ`Ql8M^Eh4<^{ad5PibQVQCp^&i;#Vk!Ui(K_woq0R{MgX-Z zv>p2Cv$t*G42JdtJ8q?|nh>{+Tld2q1Klo1JM1eVZ~Q63!XZy$_9sy_v-oEO2Lr$6BSI42Ms8v^-2nO>=a$CmD? zf@gdFM;O@K2mC?c)n;0J2;`nWCj2S}C%y9pOgecr3bn6C3V}Sa0k7e&LLfIK(L`!K zV2zw0A~FO*kK~{p!h<|of|w16M(t2 zf)A&X2QWoCX(<6Yq zF~+1n=Eb80*p2BxcywyW6S3X$=`&dnt+i2M#}7Cs82C!O)X>=T5d<<#V9I}8MgsC= z>p}?~1M6kTC%>mAq+}H2kb56L2}BY+BZPdgxtEFvQ^ki&3V+bK|M@K*#Ff!J|HF@O z5J(+asiFD*cQJsU#l6%f7MTR0Ld9iao5=RbP5*>k-Yt7CVb_jz$I2{?kF86-hRIvC z>DqSkFBIWO>lA*0PG9Yd-`=%Lh?}{bUNgN-LiQ{qTq_szN<-lJ^#X8 ze|(!Fpzgc7O-MlOS7UK` zQk%R|wE;Hs4C4E#lGc^)M}Al>k;y+!o_5bnE0qpgGqDO9a}!LZXbEbSCfD3?Z`o=p zo_+G`$xh3p?W|kh{Lhx9OmcPXS^-(5dY1#+Ye2Z<;fByC6tg)~z5o%loJSD; zfW5ee`NZj!ZHw!TL{C8nX0dCoqj8y>oOi7px`)$UE_xC~CfC$gqa zUa_xDbh$Bn4{TEe6E13Aa1FdpN{g)5E-Eif%a5rOTU9{H)4fKnZ%nQ;dinLa>8#{O z4w`K?#;4`D^O9`bO+q*(1c=f-*TDbY3T%A0MBo1c(@E-5(P;z$`3T)!ra4U%MCfy? zuO=V#ppBc4TLpa3z2>8f%7JXlL+N#snz=(a7j<;dFqXG>7`FoW{KdIpid}mk>qA&> zbO}h+jyGvL7?Hpu4@@+*5+~+*a@f*{m0#BG*Oj&;4`npdSZ4~tVtdS!-ZO3hCpS!z zHrQ%aRMA3#h2O3e7EI%3959m1*E*+N`t_dqKu+Gb^2*tR$O9WoG~0Oc#&c&Q#7ZRd zmIjqUmFpCDxBK%*h1Y1FS!DG5g;W|lq^Azrr~Ybv?r4w~@fkYE%7(Y!rV*ZLn9t_m?O(i^t{1wyI||{Idjkhc&1` zB&X_}WO5t7sagMK(Q7Ml1f}yk^R;dv|AI)4A``1xOG|Ve#ZWYPlR#tIWCZC%{p;&O zDj6e2$PyWw*vJa6t6?ZS$mzpYldOVE_v`Jyeb>#K-~l9GN^Lwx;j;mwkNDc;NluTGKtrbQAD`eUvn$QSt*Gas!Y*g&Iv2- zw%m`H7hBS3lj$7(?2`+WJsNN$I8gU)iV~?~e=qq=9AW0-lXOaHU$t zOSQrX5vk>3inoYGwxPL=zH9Xzv)b!CT*fE#slhg_V~6|Jk2OYpBIixWXC$W@p##?$ zVqOlYz6R*l0(LJxZSB+QuRJ)PC_G;)NqYo8xR_2)=wH?)gTvNj6T0F;(co@q?nUi> z!-ZMnvIb+Tdbgk|lk()_v{^cD(J^JEOAYv329pvNbmqR$iwA-Et=z%(q*sSOxtH6G zsaww{k-Q@xp_1~mmW0!^20^~kmaNsda4u0u0-|>-0&Xi=eM(v6v&XTu4&P;9xXPW* z-j_ce+lVM+B?)X|)kzP)Jn6mE8kKP8;TsP9H_r%Hk0Uh{G?@m6Yy-i?`5ykVMLwyZ)xWaxfmF1TjeHx#FQ&O{@tDBRZkBz}%jN!X*G(%fU)5?`%i=qG*`RnzsnnagO*HlX zWr3#;G}{YwG~?!2GAavF`lKzS$8*KP3M-3fIETla)_lfSDT>BSt-REl8iuDO#=`+t zRQ0&KLE*P1*{qyyQ;eMOUHhX-m7L;H&l+z~ zbAfbs9+o~RN}tncG(%m5fft{KFUde8<#MUlR9d*n&x@vi4G#;%!oUt9#lC@uu>y_ zJfNe#qHG&(9ubMBEB!@5C6YWth%SXfT{R5!2Xsz)K~geq!GvQ%0R%0CCRswzi@NsF zk%DRJp;bvY2X}4^=1*NeIq06(^-Sn{y_WEA# zrQ(M9j4+Iq~6b^N}c#EdV4cb?hm$~ss z;PaglkJ62QC{hJe7aN~k!!%VmqYWm1Y&ns+p=~>@>i-3s@=EQK|I220Q1^)A=q;8yJ1TYI2E6mlscv^ z*Px2_N`kJ$IM>c5pHbGL4)u4vim7j^^+;ke>llkAyL~NdYBWa=IgS0IANv)qLgid`hw<@7zKB!Qlp77kp`*$A@C zbsp0h)fs0y;awC+RCBPFv3xzoh)5h=zk`>_QnV)GQh;I^{l0!e7O|5#>&!Zymy@ac zkjMvKWsY_qet6ZL4dLQ@J@Q;=01k>JqbZMk1i!r#zrbX8=xc;FDoz=qEDo2~XpyFg z$7`iiDw1UzCUFijY1Wy4*fVmwx0LNFCZSTTjDbwK<%}8WZ~o|c#2~FwTkbsr8X!^5 zw>4ji3^pavtcAvlJt#jO4V{i5gA_*E%0Zqqn~w2U3_>h?_wQ)@mBoaG+T@a zSleXRIS>djjMc8b*Uwi6yj+dcWk1YlX-T}Jw4ESVv*udMZ*TLIGu}D0=%NmRsgcin zSyD$T2WTTEX!9dD8ck9*uqSQnH_|3}UBZ6r$A*8lCD6irS^t{r>AC7f zI$j&YWhG;wcIfHgkbd%(Z0-92**^ZB^_2@Kq4j{lK?LM11<~OcOZJ-6Usa6jz zOaaz3H5aUWepC!uhu-U0Oa2LGX3=gn_KqH`+mX!9*xCa^szP$y8yzw#xJSOB7H-6e zBqNz_%hyCVDrW0S%-wzEI`P;FRzIcdIIHR4;Z-qx1)7;Pi{7>q_3L_TH|hog zx{X*z-vW3LRtS+p>Kobpa-ov7Kx~jl@^`MoPq)Z9Ab08h6N`?aX^G6*(=5uQDsbm= z)AnVrz%42e8CF*2j*c%V(g@xdo363K!21TqAbIaXCcV|MBX^LuNV22d&ElOb}XAAkEeohwsGF=)F^n9$YTVy~=DqWN*S z2Aok8BS-H3t%eU#z5V0MvA5@W`JQ0;*A6whRj^YA9{KAMVbS{E!G+<7370Drv4bj# z(#+ZUU@HFz7;-VAcou$6!7y)6Q|k7lv5w5V>&MBritDWoay%_cj#gVNPU1}@^g_EY zSS3`9JF^=aulxWlUvGY}W|=IW7B0#ccJJlDGSftCT?8Zb!@1@6B3C)h4(j zlO#hN;U9ODJwo=yVoz>Q^~YZ|f@q@Jw$H_cU{)>mHU=KtlGuO!n7kvMck`?U<&yO3 zpyd-o%bt4QHq!GiqJTCR^F_;Axkj5`9ikCCK-C9b@kzJw$|4W_t5Cnz=)Xt-hSltM zyiU6G(f#+ihN~UNgum*b2@OJb7p-G+_em{XBAg zVn5@7Yvh|LlFY4U+BOEF^Hq(2=>Kg*1pcpB5zm#?zKV1EWZ7QbbV!c>el z_w+78-rQ;aVHud_&-saUwRQjQw;*??GOP?$Y8`L zJoSZLdAaL}`%1D92=`AnM!FQ}o{}U4;_!2AFff+<>yrAMBO&BX1O~mZRS`H%o`*i1 zwUa0XMHo%f>f{B`(hF%#uTa8sPj6GglK?_P*LV~Lr8`w(Aic< z5>+Sg!hZlv-qWDngxh#;4#OghB5x{zhQm}%|DD#s>78oKE^>QTI?xfcCcY%Iy^)H| zsh?Xx9d6GC|9KZ=3^^(*I)3WJiR}s)^U}5fPWD;vxqlpnd<`*8GX0hZAOD#33rY zRVg7KGQzvDKUj%YxE-NC#L@|_d@d+PPIqq=aWKBf;ku@s-{XPhUxeotw|jl|RH6DZ znCQMxPqks?cg4r4$uM{}X+NbM5*|DF{5t0sv#zt&CdC(sosN<=d%H@7bK1)DhNJRZ zB9}yfJVfFEI`J&8Acv;a0P=07nN;LPghy%Uds13(XAAMuN9=ZWXt}_h$XNiYjXZG7F8W65uYEnQTvS@BH|>ttc6=$?>@W|#CVT$^rfnv6UnZM z7(kTUH-~&F`Q<29Jl}T>b9D@v*a2Knovm>ph6mZ(R`q}Zb0JS=tJTo$t@mz*xJPxV zQp1>UNXaVMX~Q#35G$zx;~Z5++h^420B({Bd-I^1)o|t93IXv06;Q#Gcj@^QbuKAi>G@!~{dO?Sz}aucXl>ZFY&deMn+%e|Vsx3$ zup5~b1$9U3WBq_t;&#^nZqK(I3?~XsD7=bolEcs$Lq1%Ehk!e<^SM28x_K9+YdwwI zbF2FR#+ZiOV>A~l-@nh&i^J3xPb&x1dR0?$tY%s@D?rCXeSllr+S_EdR-auzTludSD!t^u!0fb92i$%NJR*p6*>eW3H4C z{PbK^zc(@SP0_IxHpuk-`N#X^j7{+RwLE6f?*gobR&#VG^tf&hZ$=-yJOadjJG-uP z48=yAXH#yDk?)=l!9CuUaRiU*v$w1L+pQymwvW!h7^L8t;Fqn5o)^;R5j4oc~Isz&(d zCT`Q#tuRow%6#&2S7<&w8DgE95B3aZtV^HHBX<|S3x2vH?r%HW${o1BJYF{av2M5h z^t>cv{Ehyo*%Q2UJaxQi;NB?)M+EKflfrw`3kqs(A9Zsd`};l)a&}%0;6I4i4KZ<{ z8{$G@H;hGP<-|ngB*g`9+>pC*BdrPrE@OYo;O1fP^f2(>XIR Date: Tue, 13 Feb 2024 18:29:22 -0500 Subject: [PATCH 14/68] rearrange cell reads file with shard number --- tasks/skylab/StarAlign.wdl | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index e67bbf452b..9763e58cad 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -507,11 +507,35 @@ task MergeStarOutput { declare -a align_features_files=(~{sep=' ' align_features}) declare -a umipercell_files=(~{sep=' ' umipercell}) - for cell_read in "${cell_reads_files[@]}"; do + # for cell_read in "${cell_reads_files[@]}"; do + # if [ -f "$cell_read" ]; then + # cat "$cell_read" >> "~{input_id}_cell_reads.txt" + # fi + # done + # Destination file for cell reads + dest="~{input_id}_cell_reads.txt" + # first create the header from the first file in the list, and add a column header for the shard id + head -n 1 "${cell_reads[0]}" | awk '{print $0 "\tshard_number"}' > "$dest" + # Loop through the array and add the second row with shard number to a temp file notinpasslist.txt + for index in "${!cell_reads[@]}"; do + secondLine=$(sed -n '2p' "${cell_reads[$index]}") + echo -e "$secondLine\t$index" >> "notinpasslist.txt" + done + # add notinpasslist.txt to the destination file and delete the notinpasslist.txt + cat "notinpasslist.txt" >> "$dest" + rm notinpasslist.txt + # now add the shard id to the matrix in a temporary matrix file, and skip the first two lines + counter=0 + for cell_read in "${cell_reads[@]}"; do if [ -f "$cell_read" ]; then - cat "$cell_read" >> "~{input_id}_cell_reads.txt" + awk -v var="$counter" 'NR>2 {print $0 "\t" var}' "$cell_read" >> "matrix.txt" + let counter=counter+1 fi done + # add the matrix to the destination file, then delete the matrix file + cat "matrix.txt" >> "$dest" + rm "matrix.txt" + counter=0 for summary in "${summary_files[@]}"; do From 7aa1fc48972dcff43e956bd67975d81c10b24cd0 Mon Sep 17 00:00:00 2001 From: rsc3 Date: Tue, 13 Feb 2024 18:33:26 -0500 Subject: [PATCH 15/68] rearrange cell reads file with shard number --- tasks/skylab/StarAlign.wdl | 6 ------ 1 file changed, 6 deletions(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 9763e58cad..3623a65cb3 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -507,11 +507,6 @@ task MergeStarOutput { declare -a align_features_files=(~{sep=' ' align_features}) declare -a umipercell_files=(~{sep=' ' umipercell}) - # for cell_read in "${cell_reads_files[@]}"; do - # if [ -f "$cell_read" ]; then - # cat "$cell_read" >> "~{input_id}_cell_reads.txt" - # fi - # done # Destination file for cell reads dest="~{input_id}_cell_reads.txt" # first create the header from the first file in the list, and add a column header for the shard id @@ -535,7 +530,6 @@ task MergeStarOutput { # add the matrix to the destination file, then delete the matrix file cat "matrix.txt" >> "$dest" rm "matrix.txt" - counter=0 for summary in "${summary_files[@]}"; do From 5c8cf2c58de63cd4f5760079ec286f2d717e4ac3 Mon Sep 17 00:00:00 2001 From: rsc3 Date: Wed, 14 Feb 2024 09:05:17 -0500 Subject: [PATCH 16/68] =?UTF-8?q?fix=20=1B[200~cell=5Freads=5Ffiles~=20bug?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- tasks/skylab/StarAlign.wdl | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 3623a65cb3..b388f3dd0f 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -510,10 +510,10 @@ task MergeStarOutput { # Destination file for cell reads dest="~{input_id}_cell_reads.txt" # first create the header from the first file in the list, and add a column header for the shard id - head -n 1 "${cell_reads[0]}" | awk '{print $0 "\tshard_number"}' > "$dest" + head -n 1 "${cell_reads_files[0]}" | awk '{print $0 "\tshard_number"}' > "$dest" # Loop through the array and add the second row with shard number to a temp file notinpasslist.txt - for index in "${!cell_reads[@]}"; do - secondLine=$(sed -n '2p' "${cell_reads[$index]}") + for index in "${!cell_reads_files[@]}"; do + secondLine=$(sed -n '2p' "${cell_reads_files[$index]}") echo -e "$secondLine\t$index" >> "notinpasslist.txt" done # add notinpasslist.txt to the destination file and delete the notinpasslist.txt @@ -521,7 +521,7 @@ task MergeStarOutput { rm notinpasslist.txt # now add the shard id to the matrix in a temporary matrix file, and skip the first two lines counter=0 - for cell_read in "${cell_reads[@]}"; do + for cell_read in "${cell_reads_files[@]}"; do if [ -f "$cell_read" ]; then awk -v var="$counter" 'NR>2 {print $0 "\t" var}' "$cell_read" >> "matrix.txt" let counter=counter+1 From b38afe7836bc67453df1c4fc48b35740929c1b9f Mon Sep 17 00:00:00 2001 From: rsc3 Date: Wed, 14 Feb 2024 09:15:03 -0500 Subject: [PATCH 17/68] fix align features files --- tasks/skylab/StarAlign.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index b388f3dd0f..20302c7101 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -540,7 +540,7 @@ task MergeStarOutput { done counter=0 - for align_feature in "${align_features[@]}"; do + for align_feature in "${align_features_files[@]}"; do if [ -f "$align_feature" ]; then awk -v var="$counter" '{print $0 " " var}' "$align_feature" >> "~{input_id}_align_features.txt" let counter=counter+1 From 69c68278155f358d38fbd104417bd5de6bdba887 Mon Sep 17 00:00:00 2001 From: aawdeh Date: Wed, 14 Feb 2024 10:16:05 -0500 Subject: [PATCH 18/68] PD-2474: Changed warptools docker for all Metrics tasks (#1177) --- .../skylab/multiome/Multiome.changelog.md | 5 +++ pipelines/skylab/multiome/Multiome.wdl | 2 +- pipelines/skylab/multiome/atac.changelog.md | 5 +++ pipelines/skylab/multiome/atac.wdl | 2 +- pipelines/skylab/optimus/Optimus.changelog.md | 4 ++ pipelines/skylab/optimus/Optimus.wdl | 3 +- .../skylab/paired_tag/PairedTag.changelog.md | 5 +++ pipelines/skylab/paired_tag/PairedTag.wdl | 2 +- .../skylab/slideseq/SlideSeq.changelog.md | 6 +++ pipelines/skylab/slideseq/SlideSeq.wdl | 4 +- tasks/skylab/FastqProcessing.wdl | 4 +- tasks/skylab/H5adUtils.wdl | 6 +-- tasks/skylab/Metrics.wdl | 41 ++++++++++++++----- website/docs/Pipelines/ATAC/README.md | 4 +- .../Pipelines/Multiome_Pipeline/README.md | 3 +- .../Pipelines/Optimus_Pipeline/Loom_schema.md | 24 +++++------ .../docs/Pipelines/Optimus_Pipeline/README.md | 7 ++-- .../Pipelines/PairedTag_Pipeline/README.md | 3 +- .../Pipelines/SlideSeq_Pipeline/README.md | 2 +- .../count-matrix-overview.md | 27 ++++++------ 20 files changed, 106 insertions(+), 53 deletions(-) diff --git a/pipelines/skylab/multiome/Multiome.changelog.md b/pipelines/skylab/multiome/Multiome.changelog.md index ecd2478024..df04ac2f3c 100644 --- a/pipelines/skylab/multiome/Multiome.changelog.md +++ b/pipelines/skylab/multiome/Multiome.changelog.md @@ -1,3 +1,8 @@ +# 3.1.3 +2024-02-07 (Date of Last Commit) + +* Updated the Metrics tasks to exclude mitochondrial genes from reads_mapped_uniquely, reads_mapped_multiple and reads_mapped_exonic, reads_mapped_exonic_as and reads_mapped_intergenic + # 3.1.2 2024-02-01 (Date of Last Commit) diff --git a/pipelines/skylab/multiome/Multiome.wdl b/pipelines/skylab/multiome/Multiome.wdl index 16113b5e8c..24b3746d1c 100644 --- a/pipelines/skylab/multiome/Multiome.wdl +++ b/pipelines/skylab/multiome/Multiome.wdl @@ -6,7 +6,7 @@ import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "https://raw.githubusercontent.com/broadinstitute/CellBender/v0.3.0/wdl/cellbender_remove_background.wdl" as CellBender workflow Multiome { - String pipeline_version = "3.1.2" + String pipeline_version = "3.1.3" input { String input_id diff --git a/pipelines/skylab/multiome/atac.changelog.md b/pipelines/skylab/multiome/atac.changelog.md index 13d51a928c..170caa2aed 100644 --- a/pipelines/skylab/multiome/atac.changelog.md +++ b/pipelines/skylab/multiome/atac.changelog.md @@ -1,3 +1,8 @@ +# 1.1.8 +2024-02-07 (Date of Last Commit) + +* Updated the Metrics tasks to exclude mitochondrial genes from reads_mapped_uniquely, reads_mapped_multiple and reads_mapped_exonic, reads_mapped_exonic_as and reads_mapped_intergenic + # 1.1.7 2024-02-01 (Date of Last Commit) diff --git a/pipelines/skylab/multiome/atac.wdl b/pipelines/skylab/multiome/atac.wdl index 4db04a9968..3dd81d7bf5 100644 --- a/pipelines/skylab/multiome/atac.wdl +++ b/pipelines/skylab/multiome/atac.wdl @@ -41,7 +41,7 @@ workflow ATAC { String adapter_seq_read3 = "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG" } - String pipeline_version = "1.1.7" + String pipeline_version = "1.1.8" parameter_meta { read1_fastq_gzipped: "read 1 FASTQ file as input for the pipeline, contains read 1 of paired reads" diff --git a/pipelines/skylab/optimus/Optimus.changelog.md b/pipelines/skylab/optimus/Optimus.changelog.md index 9123a32d64..ee841074d4 100644 --- a/pipelines/skylab/optimus/Optimus.changelog.md +++ b/pipelines/skylab/optimus/Optimus.changelog.md @@ -1,3 +1,7 @@ +# 6.3.6 +2024-02-07 (Date of Last Commit) +* Updated the Metrics tasks to exclude mitochondrial genes from reads_mapped_uniquely, reads_mapped_multiple and reads_mapped_exonic, reads_mapped_exonic_as and reads_mapped_intergenic + # 6.3.5 2024-01-30 (Date of Last Commit) * Added task GetNumSplits before FastqProcess ATAC task to determine the number of splits based on the bwa-mem2 machine specs diff --git a/pipelines/skylab/optimus/Optimus.wdl b/pipelines/skylab/optimus/Optimus.wdl index f4a07d840f..af73fc415c 100644 --- a/pipelines/skylab/optimus/Optimus.wdl +++ b/pipelines/skylab/optimus/Optimus.wdl @@ -65,7 +65,7 @@ workflow Optimus { # version of this pipeline - String pipeline_version = "6.3.5" + String pipeline_version = "6.3.6" # this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays Array[Int] indices = range(length(r1_fastq)) @@ -146,6 +146,7 @@ workflow Optimus { input: bam_input = MergeBam.output_bam, mt_genes = mt_genes, + original_gtf = annotations_gtf, input_id = input_id } diff --git a/pipelines/skylab/paired_tag/PairedTag.changelog.md b/pipelines/skylab/paired_tag/PairedTag.changelog.md index 06b2ec320b..7ea45992db 100644 --- a/pipelines/skylab/paired_tag/PairedTag.changelog.md +++ b/pipelines/skylab/paired_tag/PairedTag.changelog.md @@ -1,3 +1,8 @@ +# 0.0.7 +2024-02-07 (Date of Last Commit) + +* Updated the Metrics tasks to exclude mitochondrial genes from reads_mapped_uniquely, reads_mapped_multiple and reads_mapped_exonic, reads_mapped_exonic_as and reads_mapped_intergenic + # 0.0.6 2024-02-01 (Date of Last Commit) diff --git a/pipelines/skylab/paired_tag/PairedTag.wdl b/pipelines/skylab/paired_tag/PairedTag.wdl index bc0e6763f7..5bed110675 100644 --- a/pipelines/skylab/paired_tag/PairedTag.wdl +++ b/pipelines/skylab/paired_tag/PairedTag.wdl @@ -5,7 +5,7 @@ import "../../../pipelines/skylab/optimus/Optimus.wdl" as optimus import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "../../../tasks/skylab/PairedTagUtils.wdl" as Demultiplexing workflow PairedTag { - String pipeline_version = "0.0.6" + String pipeline_version = "0.0.7" input { String input_id diff --git a/pipelines/skylab/slideseq/SlideSeq.changelog.md b/pipelines/skylab/slideseq/SlideSeq.changelog.md index fde1b8df3d..f95357e03c 100644 --- a/pipelines/skylab/slideseq/SlideSeq.changelog.md +++ b/pipelines/skylab/slideseq/SlideSeq.changelog.md @@ -1,8 +1,14 @@ +# 3.0.1 +2024-02-13 (Date of Last Commit) + +* Updated the Metrics tasks to exclude mitochondrial genes from reads_mapped_uniquely, reads_mapped_multiple and reads_mapped_exonic, reads_mapped_exonic_as and reads_mapped_intergenic; this does affect the SlideSeq workflow + # 3.0.0 2024-02-12 (Date of Last Commit) * Updated the SlideSeq WDL output to utilize the h5ad format in place of Loom + # 2.1.6 2024-01-30 (Date of Last Commit) diff --git a/pipelines/skylab/slideseq/SlideSeq.wdl b/pipelines/skylab/slideseq/SlideSeq.wdl index dd7c3de10f..2471e52310 100644 --- a/pipelines/skylab/slideseq/SlideSeq.wdl +++ b/pipelines/skylab/slideseq/SlideSeq.wdl @@ -23,7 +23,7 @@ import "../../../tasks/skylab/MergeSortBam.wdl" as Merge workflow SlideSeq { - String pipeline_version = "3.0.0" + String pipeline_version = "3.0.1" input { Array[File] r1_fastq @@ -91,11 +91,13 @@ workflow SlideSeq { call Metrics.CalculateGeneMetrics as GeneMetrics { input: bam_input = MergeBam.output_bam, + original_gtf = annotations_gtf, input_id = input_id } call Metrics.CalculateUMIsMetrics as UMIsMetrics { input: bam_input = MergeBam.output_bam, + original_gtf = annotations_gtf, input_id = input_id } diff --git a/tasks/skylab/FastqProcessing.wdl b/tasks/skylab/FastqProcessing.wdl index ac22cc38aa..a4d7a8e615 100644 --- a/tasks/skylab/FastqProcessing.wdl +++ b/tasks/skylab/FastqProcessing.wdl @@ -11,7 +11,7 @@ task FastqProcessing { String read_struct #using the latest build of warp-tools in GCR - String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.0" + String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.1" #runtime values Int machine_mem_mb = 40000 Int cpu = 16 @@ -246,7 +246,7 @@ task FastqProcessATAC { # [?] copied from corresponding optimus wdl for fastqprocessing # using the latest build of warp-tools in GCR - String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.0" + String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.1" # Runtime attributes [?] Int mem_size = 5 diff --git a/tasks/skylab/H5adUtils.wdl b/tasks/skylab/H5adUtils.wdl index 4279f6ff6c..9816107d92 100644 --- a/tasks/skylab/H5adUtils.wdl +++ b/tasks/skylab/H5adUtils.wdl @@ -6,8 +6,7 @@ task OptimusH5adGeneration { input { #runtime values - #String docker = "us.gcr.io/broad-gotc-prod/warp-tools:1.0.6-1692962087" - String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.0" + String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.1" # name of the sample String input_id # user provided id @@ -106,8 +105,7 @@ task SingleNucleusOptimusH5adOutput { input { #runtime values - #String docker = "us.gcr.io/broad-gotc-prod/warp-tools:1.0.6-1692962087" - String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.0" + String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.1" # name of the sample String input_id # user provided id diff --git a/tasks/skylab/Metrics.wdl b/tasks/skylab/Metrics.wdl index a1b3f0c74f..fb91283d71 100644 --- a/tasks/skylab/Metrics.wdl +++ b/tasks/skylab/Metrics.wdl @@ -8,12 +8,11 @@ task CalculateCellMetrics { String input_id # runtime values - #String docker = "us.gcr.io/broad-gotc-prod/warp-tools:1.0.9-1700252065" - String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.0" + String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.1" Int machine_mem_mb = 8000 Int cpu = 4 Int disk = ceil(size(bam_input, "Gi") * 4) + ceil((size(original_gtf, "Gi") * 3)) - Int preemptible = 3 + Int preemptible = 1 } meta { @@ -81,16 +80,16 @@ task CalculateCellMetrics { task CalculateGeneMetrics { input { File bam_input + File original_gtf File? mt_genes String input_id # runtime values - #String docker = "us.gcr.io/broad-gotc-prod/warp-tools:1.0.9-1700252065" - String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.0" + String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.1" Int machine_mem_mb = 32000 Int cpu = 4 - Int disk = ceil(size(bam_input, "Gi") * 4) - Int preemptible = 3 + Int disk = ceil(size(bam_input, "Gi") * 4) + ceil((size(original_gtf, "Gi") * 3)) + Int preemptible = 1 } @@ -109,9 +108,21 @@ task CalculateGeneMetrics { command { set -e + + # create the tmp folder mkdir temp + + # if GTF file in compressed then uncompress + if [[ ~{original_gtf} =~ \.gz$ ]] + then + gunzip -c ~{original_gtf} > annotation.gtf + else + mv ~{original_gtf} annotation.gtf + fi + # call TagSort with gene as metric type TagSort --bam-input ~{bam_input} \ + --gtf-file annotation.gtf \ --metric-output "~{input_id}.gene-metrics.csv" \ --compute-metric \ --metric-type gene \ @@ -149,11 +160,13 @@ task CalculateGeneMetrics { task CalculateUMIsMetrics { input { File bam_input + File original_gtf File? mt_genes String input_id + # runtime values # Did not update docker image as this task uses loom which does not play nice with the changes - String docker = "us.gcr.io/broad-gotc-prod/warp-tools:1.0.9-1700252065" + String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.1" Int machine_mem_mb = 16000 Int cpu = 8 Int disk = ceil(size(bam_input, "Gi") * 4) @@ -179,7 +192,16 @@ task CalculateUMIsMetrics { set -e mkdir temp + # if GTF file in compressed then uncompress + if [[ ~{original_gtf} =~ \.gz$ ]] + then + gunzip -c ~{original_gtf} > annotation.gtf + else + mv ~{original_gtf} annotation.gtf + fi + TagSort --bam-input ~{bam_input} \ + --gtf-file annotation.gtf \ --metric-output "~{input_id}.umi-metrics.csv" \ --compute-metric \ --metric-type umi \ @@ -219,8 +241,7 @@ task FastqMetricsSlideSeq { # Runtime attributes - #String docker = "us.gcr.io/broad-gotc-prod/warp-tools:1.0.9-1700252065" - String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.0" + String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.1" Int cpu = 16 Int machine_mb = 40000 Int disk = ceil(size(r1_fastq, "GiB")*3) + 50 diff --git a/website/docs/Pipelines/ATAC/README.md b/website/docs/Pipelines/ATAC/README.md index 8df0e7a187..e5a780f719 100644 --- a/website/docs/Pipelines/ATAC/README.md +++ b/website/docs/Pipelines/ATAC/README.md @@ -8,7 +8,9 @@ slug: /Pipelines/ATAC/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [1.1.7](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | + +| [1.1.8](https://github.com/broadinstitute/warp/releases) | January, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | + ## Introduction to the ATAC workflow ATAC is an open-source, cloud-optimized pipeline developed in collaboration with members of the [BRAIN Initiative](https://braininitiative.nih.gov/) (BICCN and [BICAN](https://brainblog.nih.gov/brain-blog/brain-issues-suite-funding-opportunities-advance-brain-cell-atlases-through-centers) Sequencing Working Group) and [SCORCH](https://nida.nih.gov/about-nida/organization/divisions/division-neuroscience-behavior-dnb/basic-research-hiv-substance-use-disorder/scorch-program) (see [Acknowledgements](#acknowledgements) below). It supports the processing of 10x single-nucleus data generated with 10x Multiome [ATAC-seq (Assay for Transposase-Accessible Chromatin)](https://www.10xgenomics.com/products/single-cell-multiome-atac-plus-gene-expression), a technique used in molecular biology to assess genome-wide chromatin accessibility. diff --git a/website/docs/Pipelines/Multiome_Pipeline/README.md b/website/docs/Pipelines/Multiome_Pipeline/README.md index 59b6b5f7ca..cb612549f2 100644 --- a/website/docs/Pipelines/Multiome_Pipeline/README.md +++ b/website/docs/Pipelines/Multiome_Pipeline/README.md @@ -8,8 +8,7 @@ slug: /Pipelines/Multiome_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [Multiome v3.1.2](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org) | - +| [Multiome v3.1.3](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org) | ![Multiome_diagram](./multiome_diagram.png) diff --git a/website/docs/Pipelines/Optimus_Pipeline/Loom_schema.md b/website/docs/Pipelines/Optimus_Pipeline/Loom_schema.md index 5b4a8f44ed..8bf61109e8 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/Loom_schema.md +++ b/website/docs/Pipelines/Optimus_Pipeline/Loom_schema.md @@ -43,16 +43,16 @@ The global attributes (unstuctured metadata) in the h5ad apply to the whole file |`n_reads`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of reads associated with the cell. Like all metrics, `n_reads` is calculated from the Optimus output BAM file. Prior to alignment, reads are checked against the whitelist and any within one edit distance (Hamming distance) are corrected. These CB-corrected reads are aligned using STARsolo, where they get further CB correction. For this reason, most reads in the aligned BAM file have both `CB` and `UB` tags. Therefore, `n_reads` represents CB-corrected reads, rather than all reads in the input FASTQ files. | |`noise_reads`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| Number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides. | |`perfect_molecule_barcodes`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of reads with molecule barcodes (sequences used to identify unique transcripts) that have no errors. Learn more about UMIs in the [Definitions](#definitions) section below. | -| `reads_mapped_exonic` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of unique reads counted as exon; counted when BAM file's `sF` tag is assigned to `1` or `3` and the `NH:i` tag is `1`. | -| `reads_mapped_exonic_as` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of reads counted as exon in the antisense direction; counted when the BAM's `sF` is assigned to a `2` or `4` and the `NH:i` tag is `1`. | -| `reads_mapped_intronic` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of unique reads counted as intron; counted when the BAM files's `sF` tag is assigned to a `5` and the `NH:i` tag is `1`. | -| `reads_mapped_intronic_as` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of unique reads counted as intron in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `6` and the `NH:i` tas is `1`. | +| `reads_mapped_exonic` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of unique reads counted as exon; counted when BAM file's `sF` tag is assigned to `1` or `3` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +| `reads_mapped_exonic_as` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of reads counted as exon in the antisense direction; counted when the BAM's `sF` is assigned to a `2` or `4` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +| `reads_mapped_intronic` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of unique reads counted as intron; counted when the BAM files's `sF` tag is assigned to a `5` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +| `reads_mapped_intronic_as` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of unique reads counted as intron in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `6` and the `NH:i` tas is `1`; mitochondrial reads are excluded. | | `duplicate_reads` | [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | Not currently calculated for Optimus output; number of duplicate reads. | |`n_mitochondrial_genes`| [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of mitochondrial genes detected by this cell. | |`n_mitochondrial_molecules`| [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of molecules from mitochondrial genes detected for this cell. | |`pct_mitochondrial_molecules`| [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The percentage of molecules from mitochondrial genes detected for this cell. | -|`reads_mapped_uniquely`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of reads mapped to a single unambiguous location in the genome. | -|`reads_mapped_multiple`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)|The number of reads mapped to multiple genomic positions with equal confidence. | +|`reads_mapped_uniquely`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of reads mapped to a single unambiguous location in the genome; mitochondrial reads are excluded. | +|`reads_mapped_multiple`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)|The number of reads mapped to multiple genomic positions with equal confidence; mitochondrial reads are excluded. | |`spliced_reads`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of reads that overlap splicing junctions. | |`antisense_reads`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| Not calculated for Optimus outputs; see `reads_mapped_exonic_as` or `reads_mapped_intronic_as` for antisense counts. | |`molecule_barcode_fraction_bases_above_30_mean`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of the cell. | @@ -93,12 +93,12 @@ The global attributes (unstuctured metadata) in the h5ad apply to the whole file |`n_reads`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of reads associated with this gene. | |`noise_reads`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| Not currently calculated for Optimus output; number of reads that are categorized by 10x Genomics Cell Ranger as "noise"; refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides. | |`perfect_molecule_barcodes`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of reads with molecule barcodes (sequences used to identify unique transcripts) that have no errors. Learn more about UMIs in the [Definitions](#definitions) section below. | -| `reads_mapped_exonic` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of unique reads counted as exon; counted when BAM file's `sF` tag is assigned to `1` or `3` and the `NH:i` tag is `1`. | -| `reads_mapped_exonic_as` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of reads counted as exon in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `2` or `4` and the `NH:i` tag is `1`. | -| `reads_mapped_intronic` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of reads counted as intron; counted when the BAM file's `sF` tag is assigned to a `5` and the `NH:i` tag is `1`. | -| `reads_mapped_intronic_as` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of reads counted as intron in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `6` and the `NH:i` tag is `1`. | -|`reads_mapped_uniquely`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of reads mapped to a single unambiguous location in the genome. | -|`reads_mapped_multiple`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)|The number of reads mapped to multiple genomic positions with equal confidence. | +| `reads_mapped_exonic` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of unique reads counted as exon; counted when BAM file's `sF` tag is assigned to `1` or `3` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +| `reads_mapped_exonic_as` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of reads counted as exon in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `2` or `4` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +| `reads_mapped_intronic` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of reads counted as intron; counted when the BAM file's `sF` tag is assigned to a `5` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +| `reads_mapped_intronic_as` | STARsolo and [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of reads counted as intron in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `6` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +|`reads_mapped_uniquely`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of reads mapped to a single unambiguous location in the genome; mitochondrial reads are excluded. | +|`reads_mapped_multiple`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)|The number of reads mapped to multiple genomic positions with equal confidence; mitochondrial reads are excluded. | |`spliced_reads`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of reads that overlap splicing junctions. | |`antisense_reads`|[TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort)| The number of reads that are mapped to the antisense strand instead of the transcribed strand. | | `duplicate_reads` | [TagSort](https://github.com/broadinstitute/warp-tools/tree/develop/tools/TagSort) | The number of duplicate reads. | diff --git a/website/docs/Pipelines/Optimus_Pipeline/README.md b/website/docs/Pipelines/Optimus_Pipeline/README.md index d57e64c815..5ef733e855 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/README.md +++ b/website/docs/Pipelines/Optimus_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Optimus_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [optimus_v6.3.5](https://github.com/broadinstitute/warp/releases?q=optimus&expanded=true) | January, 2024 | Elizabeth Kiernan | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [optimus_v6.3.6](https://github.com/broadinstitute/warp/releases?q=optimus&expanded=true) | February, 2024 | Elizabeth Kiernan | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![Optimus_diagram](Optimus_diagram.png) @@ -49,12 +49,13 @@ To discover and search releases, use the WARP command-line tool [Wreleaser](http If you’re running an Optimus workflow version prior to the latest release, the accompanying documentation for that release may be downloaded with the source code on the WARP [releases page](https://github.com/broadinstitute/warp/releases) (see the source code folder `website/docs/Pipelines/Optimus_Pipeline`). -Optimus can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/stable/), a GA4GH compliant, flexible workflow management system that supports multiple computing platforms. The workflow can also be run in [Terra](https://app.terra.bio), a cloud-based analysis platform. The Terra [Optimus Featured Workspace](https://app.terra.bio/#workspaces/featured-workspaces-hca/HCA_Optimus_Pipeline) contains the Optimus workflow, workflow configurations, required reference data and other inputs, and example testing data. +Optimus can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/stable/), a GA4GH-compliant, flexible workflow management system that supports multiple computing platforms. The workflow can also be run in [Terra](https://app.terra.bio), a cloud-based analysis platform. The Terra [Optimus Featured Workspace](https://app.terra.bio/#workspaces/featured-workspaces-hca/HCA_Optimus_Pipeline) contains the Optimus workflow, workflow configurations, required reference data and other inputs, and example testing data. ### Inputs Optimus pipeline inputs are detailed in JSON format configuration files. There are five downsampled example configuration files available for running the pipeline: + * [human_v2_example](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/optimus/example_inputs/human_v2_example.json): An example human 10x v2 single-cell dataset * [human_v3_example](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/optimus/example_inputs/human_v3_example.json): An example human 10x v3 single-cell dataset * [mouse_v2_example](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/optimus/example_inputs/mouse_v2_example.json): An example mouse 10x v2 single-cell dataset @@ -115,7 +116,7 @@ The Optimus pipeline is currently available on the cloud-based platform Terra. A The [Optimus workflow](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/optimus/Optimus.wdl) imports individual "tasks," also written in WDL script, from the WARP [tasks folder](https://github.com/broadinstitute/warp/blob/master/tasks/skylab). Overall, the Optimus workflow: -1. Checks inputs +1. Checks inputs. 1. Partitions FASTQs by CB. 1. Corrects CBs, aligns reads, corrects UMIs, and counts genes with STAR. 1. Merges the Star outputs into NPY and NPZ arrays. diff --git a/website/docs/Pipelines/PairedTag_Pipeline/README.md b/website/docs/Pipelines/PairedTag_Pipeline/README.md index 3a7c983f79..530cae5b26 100644 --- a/website/docs/Pipelines/PairedTag_Pipeline/README.md +++ b/website/docs/Pipelines/PairedTag_Pipeline/README.md @@ -7,7 +7,8 @@ slug: /Pipelines/PairedTag_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [PairedTag_v0.0.6](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | + +| [PairedTag_v0.0.7](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the Paired-Tag workflow diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/README.md b/website/docs/Pipelines/SlideSeq_Pipeline/README.md index 106538f0f7..9ef0004d98 100644 --- a/website/docs/Pipelines/SlideSeq_Pipeline/README.md +++ b/website/docs/Pipelines/SlideSeq_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/SlideSeq_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [SlideSeq v3.0.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | +| [SlideSeq v3.0.1](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | ![SlideSeq_diagram](./slide-seq_diagram.png) diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/count-matrix-overview.md b/website/docs/Pipelines/SlideSeq_Pipeline/count-matrix-overview.md index 32a87f13d2..61f7203887 100644 --- a/website/docs/Pipelines/SlideSeq_Pipeline/count-matrix-overview.md +++ b/website/docs/Pipelines/SlideSeq_Pipeline/count-matrix-overview.md @@ -40,12 +40,12 @@ The bead barcode metrics below are computed using [TagSort](https://github.com/b |`n_reads`| The number of reads associated with this entity. n_reads, like all metrics, are calculated from the Slide-Seq output BAM. Prior to alignment with STARsolo, reads are checked against the whitelist (1 hamming distance). These CB-corrected reads are the input to the STAR aligner. Then, the reads also get CB correction during STAR. For this reason, almost all reads in the aligned BAM have a CB tag and UB tag. Therefore, n_reads represents CB corrected reads, not all reads in the input FASTQ files. | |`noise_reads`| Number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides. | |`perfect_molecule_barcodes`| The number of reads whose molecule barcodes contain no errors. | -| `reads_mapped_exonic` | The number of unique reads counted as exon; counted when BAM file's `sF` tag is assigned to `1` or `3` and the `NH:i` tag is `1`. | -| `reads_mapped_exonic_as` | The number of reads counted as exon in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `2` or `4` and the `NH:i` tag is `1`. | -| `reads_mapped_intronic` | The number of reads counted as intron; counted when the BAM file's `sF` tag is assigned to a `5` and the `NH:i` tag is `1`. | -| `reads_mapped_intronic_as` | The number of reads counted as intron in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `6` and the `NH:i` tag is `1`. | -|`reads_mapped_uniquely`| The number of reads mapped to a single unambiguous location in the genome. | -|`reads_mapped_multiple`| The number of reads mapped to multiple genomic positions with equal confidence. | +| `reads_mapped_exonic` | The number of unique reads counted as exon; counted when BAM file's `sF` tag is assigned to `1` or `3` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +| `reads_mapped_exonic_as` | The number of reads counted as exon in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `2` or `4` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +| `reads_mapped_intronic` | The number of reads counted as intron; counted when the BAM file's `sF` tag is assigned to a `5` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +| `reads_mapped_intronic_as` | The number of reads counted as intron in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `6` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +|`reads_mapped_uniquely`| The number of reads mapped to a single unambiguous location in the genome; mitochondrial reads are excluded. | +|`reads_mapped_multiple`| The number of reads mapped to multiple genomic positions with equal confidence; mitochondrial reads are excluded. | | `duplicate_reads` | The number of duplicate reads. | |`spliced_reads`| The number of reads that overlap splicing junctions. | |`antisense_reads`| The number of reads that are mapped to the antisense strand instead of the transcribed strand. | @@ -88,12 +88,12 @@ The gene metrics below are computed using [TagSort](https://github.com/broadinst |`n_reads`| The number of reads associated with this entity. n_reads, like all metrics, are calculated from the Slide-Seq output BAM. Prior to alignment with STARsolo, reads are checked against the whitelist (1 hamming distance). These CB-corrected reads are the input to the STAR aligner. Then, the reads also get CB correction during STAR. For this reason, almost all reads in the aligned BAM have a CB tag and UB tag. Therefore, n_reads represents CB corrected reads, not all reads in the input FASTQ files. | |`noise_reads`| The number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides. | |`perfect_molecule_barcodes`| The number of reads with molecule barcodes that have no errors. | -| `reads_mapped_exonic` | The number of unique reads counted as exon; counted when BAM file's `sF` tag is assigned to `1` or `3` and the `NH:i` tag is `1`. | -| `reads_mapped_exonic_as` | The number of reads counted as exon in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `2` or `4` and the `NH:i` tag is `1`. | -| `reads_mapped_intronic` | The number of reads counted as intron; counted when the BAM file's `sF` tag is assigned to a `5` and the `NH:i` tag is `1`. | -| `reads_mapped_intronic_as` | The number of reads counted as intron in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `6` and the `NH:i` tag is `1`. | -|`reads_mapped_uniquely`| The number of reads mapped to a single unambiguous location in the genome. | -|`reads_mapped_multiple`| The number of reads mapped to multiple genomic positions with equal confidence. | +| `reads_mapped_exonic` | The number of unique reads counted as exon; counted when BAM file's `sF` tag is assigned to `1` or `3` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +| `reads_mapped_exonic_as` | The number of reads counted as exon in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `2` or `4` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +| `reads_mapped_intronic` | The number of reads counted as intron; counted when the BAM file's `sF` tag is assigned to a `5` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +| `reads_mapped_intronic_as` | The number of reads counted as intron in the antisense direction; counted when the BAM file's `sF` tag is assigned to a `6` and the `NH:i` tag is `1`; mitochondrial reads are excluded. | +|`reads_mapped_uniquely`| The number of reads mapped to a single unambiguous location in the genome; mitochondrial reads are excluded. | +|`reads_mapped_multiple`| The number of reads mapped to multiple genomic positions with equal confidence; mitochondrial reads are excluded. | | `duplicate_reads` | The number of duplicate reads. | |`spliced_reads`| The number of reads that overlap splicing junctions. | |`antisense_reads`| The number of reads that are mapped to the antisense strand instead of the transcribed strand. | @@ -112,3 +112,6 @@ The gene metrics below are computed using [TagSort](https://github.com/broadinst |`molecules_with_single_read_evidence`| The number of molecules associated with this entity that are observed by only one read. | |`number_cells_detected_multiple`| The number of bead barcodes which observe more than one read of this gene. | |`number_cells_expressing`| The number of bead barcodes that detect this gene. | + +## Definitions +* Bead Barcode: Short nucleotide sequence used to label and distinguish which reads come from each unique bead, allowing for tracking of many beads simultaneously. From f130cd03c3817235ef48e45e2ce2d8a75093c54b Mon Sep 17 00:00:00 2001 From: Nikelle Petrillo <38223776+nikellepetrillo@users.noreply.github.com> Date: Thu, 15 Feb 2024 10:33:12 -0500 Subject: [PATCH 19/68] Np batch snm3c PD-2482 (#1193) * add summary task * batch tasks in snm3c * linting * merge single end mapping with remove overlaps * merge_sorted_bam_tar remove * remove commented out things * fixing up * fixing up * changelog * removing some outputs * removing some outputs * Update snM3C.changelog.md * altering outputs * Update README.md --------- Co-authored-by: kayleemathews --- pipelines/skylab/snM3C/snM3C.changelog.md | 8 +- pipelines/skylab/snM3C/snM3C.wdl | 458 +++++++--------------- verification/test-wdls/TestsnM3C.wdl | 21 +- website/docs/Pipelines/snM3C/README.md | 87 ++-- 4 files changed, 192 insertions(+), 382 deletions(-) diff --git a/pipelines/skylab/snM3C/snM3C.changelog.md b/pipelines/skylab/snM3C/snM3C.changelog.md index b24145073b..29ba78d160 100644 --- a/pipelines/skylab/snM3C/snM3C.changelog.md +++ b/pipelines/skylab/snM3C/snM3C.changelog.md @@ -1,3 +1,9 @@ +# 2.0.0 +2024-2-13 (Date of Last Commit) + +* Merged several tasks in snM3C.wdl to reduce the cost of running this pipeline +* Removed several final outputs from snM3C.wdl + # 1.0.1 2024-01-31 (Date of Last Commit) @@ -6,4 +12,4 @@ # 1.0.0 2023-08-01 (Date of Last Commit) -* First release of the snM3C workflow \ No newline at end of file +* First release of the snM3C workflow diff --git a/pipelines/skylab/snM3C/snM3C.wdl b/pipelines/skylab/snM3C/snM3C.wdl index 3feefb6787..c48bff7ead 100644 --- a/pipelines/skylab/snM3C/snM3C.wdl +++ b/pipelines/skylab/snM3C/snM3C.wdl @@ -27,7 +27,7 @@ workflow snM3C { } # version of the pipeline - String pipeline_version = "1.0.1" + String pipeline_version = "2.0.0" call Demultiplexing { input: @@ -51,7 +51,7 @@ workflow snM3C { min_read_length = min_read_length, plate_id = plate_id } - + call Hisat_3n_pair_end_mapping_dna_mode { input: r1_trimmed_tar = Sort_and_trim_r1_and_r2.r1_trimmed_fq_tar, @@ -62,69 +62,43 @@ workflow snM3C { plate_id = plate_id } - call Separate_unmapped_reads { + call Separate_and_split_unmapped_reads { input: hisat3n_bam_tar = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_bam_tar, min_read_length = min_read_length, - plate_id = plate_id + plate_id = plate_id, } - call Split_unmapped_reads { + call Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap { input: - unmapped_fastq_tar = Separate_unmapped_reads.unmapped_fastq_tar, - min_read_length = min_read_length, - plate_id = plate_id - } - - call Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name { - input: - split_fq_tar = Split_unmapped_reads.split_fq_tar, + split_fq_tar = Separate_and_split_unmapped_reads.split_fq_tar, tarred_index_files = tarred_index_files, genome_fa = genome_fa, plate_id = plate_id } - call remove_overlap_read_parts { - input: - bam = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name.merge_sorted_bam_tar, - plate_id = plate_id - } - - call merge_original_and_split_bam_and_sort_all_reads_by_name_and_position { + call merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate { input: - bam = Separate_unmapped_reads.unique_bam_tar, - split_bam = remove_overlap_read_parts.output_bam_tar, + bam = Separate_and_split_unmapped_reads.unique_bam_tar, + split_bam = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap.remove_overlaps_output_bam_tar, plate_id = plate_id } call call_chromatin_contacts { input: - name_sorted_bam = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position.name_sorted_bam, + name_sorted_bam = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate.name_sorted_bam, plate_id = plate_id } - call dedup_unique_bam_and_index_unique_bam { + call unique_reads_allc_and_cgn_extraction { input: - bam = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position.position_sorted_bam, - plate_id = plate_id - } - - call unique_reads_allc { - input: - bam_and_index_tar = dedup_unique_bam_and_index_unique_bam.output_tar, + bam_and_index_tar = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate.dedup_output_bam_tar, genome_fa = genome_fa, num_upstr_bases = num_upstr_bases, num_downstr_bases = num_downstr_bases, compress_level = compress_level, - plate_id = plate_id - } - - call unique_reads_cgn_extraction { - input: - allc_tar = unique_reads_allc.allc, - tbi_tar = unique_reads_allc.tbi, - chrom_size_path = chromosome_sizes, - plate_id = plate_id + plate_id = plate_id, + chromosome_sizes = chromosome_sizes } } @@ -132,35 +106,26 @@ workflow snM3C { input: trimmed_stats = Sort_and_trim_r1_and_r2.trim_stats_tar, hisat3n_stats = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_stats_tar, - r1_hisat3n_stats = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name.hisat3n_dna_split_reads_summary_R1_tar, - r2_hisat3n_stats = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name.hisat3n_dna_split_reads_summary_R2_tar, - dedup_stats = dedup_unique_bam_and_index_unique_bam.dedup_stats_tar, + r1_hisat3n_stats = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap.hisat3n_dna_split_reads_summary_R1_tar, + r2_hisat3n_stats = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap.hisat3n_dna_split_reads_summary_R2_tar, + dedup_stats = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate.dedup_stats_tar, chromatin_contact_stats = call_chromatin_contacts.chromatin_contact_stats, - allc_uniq_reads_stats = unique_reads_allc.allc_uniq_reads_stats, - unique_reads_cgn_extraction_tbi = unique_reads_cgn_extraction.output_tbi_tar, + allc_uniq_reads_stats = unique_reads_allc_and_cgn_extraction.allc_uniq_reads_stats, + unique_reads_cgn_extraction_tbi = unique_reads_allc_and_cgn_extraction.extract_allc_output_tbi_tar, plate_id = plate_id } output { File MappingSummary = summary.mapping_summary - Array[File] trimmed_stats = Sort_and_trim_r1_and_r2.trim_stats_tar - Array[File] r1_trimmed_fq = Sort_and_trim_r1_and_r2.r1_trimmed_fq_tar - Array[File] r2_trimmed_fq = Sort_and_trim_r1_and_r2.r2_trimmed_fq_tar - Array[File] hisat3n_stats_tar = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_stats_tar - Array[File] hisat3n_bam_tar = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_bam_tar - Array[File] unique_bam_tar = Separate_unmapped_reads.unique_bam_tar - Array[File] multi_bam_tar = Separate_unmapped_reads.multi_bam_tar - Array[File] unmapped_fastq_tar = Separate_unmapped_reads.unmapped_fastq_tar - Array[File] split_fq_tar = Split_unmapped_reads.split_fq_tar - Array[File] merge_sorted_bam_tar = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name.merge_sorted_bam_tar - Array[File] name_sorted_bams = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position.name_sorted_bam - Array[File] pos_sorted_bams = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position.position_sorted_bam - Array[File] remove_overlap_read_parts_bam_tar = remove_overlap_read_parts.output_bam_tar - Array[File] dedup_unique_bam_and_index_unique_bam_tar = dedup_unique_bam_and_index_unique_bam.output_tar - Array[File] unique_reads_cgn_extraction_allc = unique_reads_cgn_extraction.output_allc_tar - Array[File] unique_reads_cgn_extraction_tbi = unique_reads_cgn_extraction.output_tbi_tar - Array[File] chromatin_contact_stats = call_chromatin_contacts.chromatin_contact_stats + Array[File] name_sorted_bams = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate.name_sorted_bam + Array[File] unique_reads_cgn_extraction_allc= unique_reads_allc_and_cgn_extraction.allc + Array[File] unique_reads_cgn_extraction_tbi = unique_reads_allc_and_cgn_extraction.tbi + Array[File] unique_reads_cgn_extraction_allc_extract = unique_reads_allc_and_cgn_extraction.extract_allc_output_allc_tar + Array[File] unique_reads_cgn_extraction_tbi_extract = unique_reads_allc_and_cgn_extraction.extract_allc_output_tbi_tar Array[File] reference_version = Hisat_3n_pair_end_mapping_dna_mode.reference_version + Array[File] chromatin_contact_stats = call_chromatin_contacts.chromatin_contact_stats + Array[File] all_reads_dedup_contacts = call_chromatin_contacts.all_reads_dedup_contacts + Array[File] all_reads_3C_contacts = call_chromatin_contacts.all_reads_3C_contacts } } @@ -233,10 +198,10 @@ task Demultiplexing { for i in $(seq 1 "${batch_number}"); do # Use seq for reliable brace expansion mkdir -p "batch${i}" # Combine batch and i, use -p to create parent dirs done - + # Counter for the folder index folder_index=1 - + # Define lists of r1 and r2 fq files R1_files=($(ls | grep "\-R1.fq.gz")) R2_files=($(ls | grep "\-R2.fq.gz")) @@ -256,7 +221,7 @@ task Demultiplexing { done - echo "TAR files created successfully." + echo "TAR files created successfully." >>> runtime { @@ -384,28 +349,19 @@ task Hisat_3n_pair_end_mapping_dna_mode{ echo "The reference is $BASE" > ~{plate_id}.reference_version.txt - mkdir reference/ - mkdir fastq/ - - cp ~{tarred_index_files} reference/ - cp ~{genome_fa} reference/ - cp ~{chromosome_sizes} reference/ - cp ~{r1_trimmed_tar} fastq/ - cp ~{r2_trimmed_tar} fastq/ - # untar the index files - cd reference/ echo "Untarring the index files" tar -zxvf ~{tarred_index_files} rm ~{tarred_index_files} + cp ~{genome_fa} . + #get the basename of the genome_fa file genome_fa_basename=$(basename ~{genome_fa} .fa) echo "samtools faidx $genome_fa_basename.fa" samtools faidx $genome_fa_basename.fa # untar the demultiplexed fastq files - cd ../fastq/ echo "Untarring the fastq files" tar -zxvf ~{r1_trimmed_tar} tar -zxvf ~{r2_trimmed_tar} @@ -418,7 +374,7 @@ task Hisat_3n_pair_end_mapping_dna_mode{ for file in "${R1_files[@]}"; do sample_id=$(basename "$file" "-R1_trimmed.fq.gz") - hisat-3n /cromwell_root/reference/$genome_fa_basename \ + hisat-3n /cromwell_root/$genome_fa_basename \ -q \ -1 ${sample_id}-R1_trimmed.fq.gz \ -2 ${sample_id}-R2_trimmed.fq.gz \ @@ -437,8 +393,6 @@ task Hisat_3n_pair_end_mapping_dna_mode{ tar -zcvf ~{plate_id}.hisat3n_paired_end_bam_files.tar.gz *.bam tar -zcvf ~{plate_id}.hisat3n_paired_end_stats_files.tar.gz *.hisat3n_dna_summary.txt - mv ~{plate_id}.hisat3n_paired_end_bam_files.tar.gz ../ - mv ~{plate_id}.hisat3n_paired_end_stats_files.tar.gz ../ >>> runtime { @@ -455,7 +409,7 @@ task Hisat_3n_pair_end_mapping_dna_mode{ } } -task Separate_unmapped_reads { +task Separate_and_split_unmapped_reads { input { File hisat3n_bam_tar Int min_read_length @@ -509,46 +463,11 @@ task Separate_unmapped_reads { # tar up the uniqe bams tar -zcvf ~{plate_id}.hisat3n_paired_end_unique_bam_files.tar.gz *.hisat3n_dna.unique_aligned.bam - # tar up the multi bams - tar -zcvf ~{plate_id}.hisat3n_paired_end_multi_bam_files.tar.gz *.hisat3n_dna.multi_aligned.bam - # tar up the unmapped fastq files tar -zcvf ~{plate_id}.hisat3n_paired_end_unmapped_fastq_files.tar.gz *.hisat3n_dna.unmapped.fastq - >>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: cpu - memory: "${mem_size} GiB" - preemptible: preemptible_tries - } - output { - File unique_bam_tar = "~{plate_id}.hisat3n_paired_end_unique_bam_files.tar.gz" - File multi_bam_tar = "~{plate_id}.hisat3n_paired_end_multi_bam_files.tar.gz" - File unmapped_fastq_tar = "~{plate_id}.hisat3n_paired_end_unmapped_fastq_files.tar.gz" - } -} - -task Split_unmapped_reads { - input { - File unmapped_fastq_tar - Int min_read_length - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 50 - Int mem_size = 10 - Int preemptible_tries = 3 - Int cpu = 1 - } - command <<< - - set -euo pipefail - # untar the unmapped fastq files - tar -xf ~{unmapped_fastq_tar} - rm ~{unmapped_fastq_tar} + tar -xf ~{plate_id}.hisat3n_paired_end_unmapped_fastq_files.tar.gz python3 <>> runtime { docker: docker @@ -589,11 +507,12 @@ task Split_unmapped_reads { preemptible: preemptible_tries } output { + File unique_bam_tar = "~{plate_id}.hisat3n_paired_end_unique_bam_files.tar.gz" File split_fq_tar = "~{plate_id}.hisat3n_paired_end_split_fastq_files.tar.gz" } } -task Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name { +task Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap { input { File split_fq_tar File genome_fa @@ -609,16 +528,13 @@ task Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name command <<< set -euo pipefail - mkdir reference/ - - cp ~{tarred_index_files} reference/ - cp ~{genome_fa} reference/ # untar the tarred index files - cd reference/ tar -xvf ~{tarred_index_files} rm ~{tarred_index_files} + cp ~{genome_fa} . + #get the basename of the genome_fa file genome_fa_basename=$(basename ~{genome_fa} .fa) samtools faidx $genome_fa_basename.fa @@ -633,7 +549,7 @@ task Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name for file in "${R1_files[@]}"; do sample_id=$(basename "$file" ".hisat3n_dna.split_reads.R1.fastq") - hisat-3n /cromwell_root/reference/$genome_fa_basename \ + hisat-3n /cromwell_root/$genome_fa_basename \ -q \ -U ${sample_id}.hisat3n_dna.split_reads.R1.fastq \ --directional-mapping-reverse \ @@ -649,7 +565,7 @@ task Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name for file in "${R2_files[@]}"; do sample_id=$(basename "$file" ".hisat3n_dna.split_reads.R2.fastq") - hisat-3n /cromwell_root/reference/$genome_fa_basename \ + hisat-3n /cromwell_root/$genome_fa_basename \ -q \ -U ${sample_id}.hisat3n_dna.split_reads.R2.fastq \ --directional-mapping \ @@ -663,8 +579,8 @@ task Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name done # tar up the r1 and r2 stats files - tar -zcvf ../~{plate_id}.hisat3n_dna_split_reads_summary.R1.tar.gz *.hisat3n_dna_split_reads_summary.R1.txt - tar -zcvf ../~{plate_id}.hisat3n_dna_split_reads_summary.R2.tar.gz *.hisat3n_dna_split_reads_summary.R2.txt + tar -zcvf ~{plate_id}.hisat3n_dna_split_reads_summary.R1.tar.gz *.hisat3n_dna_split_reads_summary.R1.txt + tar -zcvf ~{plate_id}.hisat3n_dna_split_reads_summary.R2.tar.gz *.hisat3n_dna_split_reads_summary.R2.txt # define lists of r1 and r2 bam files @@ -684,62 +600,31 @@ task Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name done #tar up the merged bam files - tar -zcvf ../~{plate_id}.hisat3n_dna.split_reads.name_sort.bam.tar.gz *.hisat3n_dna.split_reads.name_sort.bam + tar -zcvf ~{plate_id}.hisat3n_dna.split_reads.name_sort.bam.tar.gz *.hisat3n_dna.split_reads.name_sort.bam - >>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: cpu - memory: "${mem_size} GiB" - preemptible: preemptible_tries - } - output { - File merge_sorted_bam_tar = "~{plate_id}.hisat3n_dna.split_reads.name_sort.bam.tar.gz" - File hisat3n_dna_split_reads_summary_R1_tar = "~{plate_id}.hisat3n_dna_split_reads_summary.R1.tar.gz" - File hisat3n_dna_split_reads_summary_R2_tar = "~{plate_id}.hisat3n_dna_split_reads_summary.R2.tar.gz" - } -} + # unzip bam file + tar -xf ~{plate_id}.hisat3n_dna.split_reads.name_sort.bam.tar.gz -task remove_overlap_read_parts { - input { - File bam - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 80 - Int mem_size = 20 - Int preemptible_tries = 3 - Int cpu = 1 - } - - command <<< - set -euo pipefail - # unzip bam file - tar -xf ~{bam} - rm ~{bam} - - # create output dir - mkdir /cromwell_root/output_bams - - # get bams - bams=($(ls | grep "sort.bam$")) - - # loop through bams and run python script on each bam - # scatter instead of for loop to optimize - python3 <>> runtime { @@ -750,11 +635,14 @@ task remove_overlap_read_parts { preemptible: preemptible_tries } output { - File output_bam_tar = "~{plate_id}.remove_overlap_read_parts.tar.gz" + #File merge_sorted_bam_tar = "~{plate_id}.hisat3n_dna.split_reads.name_sort.bam.tar.gz" + File hisat3n_dna_split_reads_summary_R1_tar = "~{plate_id}.hisat3n_dna_split_reads_summary.R1.tar.gz" + File hisat3n_dna_split_reads_summary_R2_tar = "~{plate_id}.hisat3n_dna_split_reads_summary.R2.tar.gz" + File remove_overlaps_output_bam_tar = "~{plate_id}.remove_overlap_read_parts.tar.gz" } } -task merge_original_and_split_bam_and_sort_all_reads_by_name_and_position { +task merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate { input { File bam File split_bam @@ -790,6 +678,37 @@ task merge_original_and_split_bam_and_sort_all_reads_by_name_and_position { #tar up the merged bam files tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.pos_sort.tar.gz *.hisat3n_dna.all_reads.pos_sort.bam tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.name_sort.tar.gz *.hisat3n_dna.all_reads.name_sort.bam + + + # unzip files + tar -xf ~{plate_id}.hisat3n_dna.all_reads.pos_sort.tar.gz + + # create output dir + mkdir /cromwell_root/output_bams + mkdir /cromwell_root/temp + + # name : AD3C_BA17_2027_P1-1-B11-G13.hisat3n_dna.all_reads.pos_sort.bam + for file in *.pos_sort.bam + do + name=`echo $file | cut -d. -f1` + name=$name.hisat3n_dna.all_reads.deduped + echo $name + echo "Call Picard" + picard MarkDuplicates I=$file O=/cromwell_root/output_bams/$name.bam \ + M=/cromwell_root/output_bams/$name.matrix.txt \ + REMOVE_DUPLICATES=true TMP_DIR=/cromwell_root/temp + echo "Call samtools index" + samtools index /cromwell_root/output_bams/$name.bam + done + + cd /cromwell_root + + #tar up the output files + tar -zcvf ~{plate_id}.dedup_unique_bam_and_index_unique_bam.tar.gz output_bams + + #tar up the stats files + tar -zcvf ~{plate_id}.dedup_unique_bam_and_index_unique_bam_stats.tar.gz output_bams/*.matrix.txt + >>> runtime { docker: docker @@ -800,7 +719,8 @@ task merge_original_and_split_bam_and_sort_all_reads_by_name_and_position { } output { File name_sorted_bam = "~{plate_id}.hisat3n_dna.all_reads.name_sort.tar.gz" - File position_sorted_bam = "~{plate_id}.hisat3n_dna.all_reads.pos_sort.tar.gz" + File dedup_output_bam_tar = "~{plate_id}.dedup_unique_bam_and_index_unique_bam.tar.gz" + File dedup_stats_tar = "~{plate_id}.dedup_unique_bam_and_index_unique_bam_stats.tar.gz" } } @@ -846,8 +766,12 @@ task call_chromatin_contacts { CODE - #tar up the chromatin contact files + #tar up the all_reads.contact_stats.csv files tar -zcvf ~{plate_id}.chromatin_contact_stats.tar.gz *.hisat3n_dna.all_reads.contact_stats.csv + #tar up the .hisat3n_dna.all_reads.dedup_contacts.tsv files + tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.dedup_contacts.tar.gz *.hisat3n_dna.all_reads.dedup_contacts.tsv.gz + #tar up the .hisat3n_dna.all_reads.3C.contact.tsv.gz files + tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.3C.contact.tar.gz *.hisat3n_dna.all_reads.3C.contact.tsv.gz >>> runtime { docker: docker @@ -858,69 +782,12 @@ task call_chromatin_contacts { } output { File chromatin_contact_stats = "~{plate_id}.chromatin_contact_stats.tar.gz" + File all_reads_dedup_contacts = "~{plate_id}.hisat3n_dna.all_reads.dedup_contacts.tar.gz" + File all_reads_3C_contacts = "~{plate_id}.hisat3n_dna.all_reads.3C.contact.tar.gz" } } -task dedup_unique_bam_and_index_unique_bam { - input { - File bam - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 80 - Int mem_size = 20 - Int preemptible_tries = 3 - Int cpu = 1 - } - - command <<< - set -euo pipefail - - # unzip files - tar -xf ~{bam} - rm ~{bam} - - # create output dir - mkdir /cromwell_root/output_bams - mkdir /cromwell_root/temp - - # name : AD3C_BA17_2027_P1-1-B11-G13.hisat3n_dna.all_reads.pos_sort.bam - for file in *.bam - do - name=`echo $file | cut -d. -f1` - name=$name.hisat3n_dna.all_reads.deduped - echo $name - echo "Call Picard" - picard MarkDuplicates I=$file O=/cromwell_root/output_bams/$name.bam \ - M=/cromwell_root/output_bams/$name.matrix.txt \ - REMOVE_DUPLICATES=true TMP_DIR=/cromwell_root/temp - echo "Call samtools index" - samtools index /cromwell_root/output_bams/$name.bam - done - - cd /cromwell_root - - #tar up the output files - tar -zcvf ~{plate_id}.dedup_unique_bam_and_index_unique_bam.tar.gz output_bams - - #tar up the stats files - tar -zcvf ~{plate_id}.dedup_unique_bam_and_index_unique_bam_stats.tar.gz output_bams/*.matrix.txt - - >>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: cpu - memory: "${mem_size} GiB" - preemptible: preemptible_tries - } - output { - File output_tar = "~{plate_id}.dedup_unique_bam_and_index_unique_bam.tar.gz" - File dedup_stats_tar = "~{plate_id}.dedup_unique_bam_and_index_unique_bam_stats.tar.gz" - } -} - -task unique_reads_allc { +task unique_reads_allc_and_cgn_extraction { input { File bam_and_index_tar File genome_fa @@ -928,6 +795,7 @@ task unique_reads_allc { Int num_upstr_bases Int num_downstr_bases Int compress_level + File chromosome_sizes Int disk_size = 80 Int mem_size = 20 @@ -971,74 +839,37 @@ task unique_reads_allc { tar -zcvf ../~{plate_id}.allc.tsv.tar.gz *.allc.tsv.gz tar -zcvf ../~{plate_id}.allc.tbi.tar.gz *.allc.tsv.gz.tbi - tar -zcvf ../~{plate_id}.allc.count.tar.gz *.allc.tsv.gz.count.csv - - - >>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: cpu - memory: "${mem_size} GiB" - preemptible: preemptible_tries - } - output { - File allc = "~{plate_id}.allc.tsv.tar.gz" - File tbi = "~{plate_id}.allc.tbi.tar.gz" - File allc_uniq_reads_stats = "~{plate_id}.allc.count.tar.gz" - } -} - - -task unique_reads_cgn_extraction { - input { - File allc_tar - File tbi_tar - File chrom_size_path - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 80 - Int mem_size = 20 - Int num_upstr_bases = 0 - Int preemptible_tries = 3 - Int cpu = 1 - } - - command <<< - set -euo pipefail - - tar -xf ~{allc_tar} - rm ~{allc_tar} - - tar -xf ~{tbi_tar} - rm ~{tbi_tar} - - # prefix="allc-{mcg_context}/{cell_id}" - if [ ~{num_upstr_bases} -eq 0 ]; then - mcg_context=CGN - else - mcg_context=HCGN - fi + tar -zcvf ~{plate_id}.allc.count.tar.gz *.allc.tsv.gz.count.csv + + cd ../ + tar -xf ~{plate_id}.allc.tsv.tar.gz + tar -xf ~{plate_id}.allc.tbi.tar.gz + + # prefix="allc-{mcg_context}/{cell_id}" + if [ ~{num_upstr_bases} -eq 0 ]; then + mcg_context=CGN + else + mcg_context=HCGN + fi + # create output dir + mkdir /cromwell_root/allc-${mcg_context} + outputdir=/cromwell_root/allc-${mcg_context} - # create output dir - mkdir /cromwell_root/allc-${mcg_context} - outputdir=/cromwell_root/allc-${mcg_context} - - for gzfile in *.gz - do - name=`echo $gzfile | cut -d. -f1` - echo $name - allcools extract-allc --strandness merge --allc_path $gzfile \ - --output_prefix $outputdir/$name \ - --mc_contexts ${mcg_context} \ - --chrom_size_path ~{chrom_size_path} - done + for gzfile in *.allc.tsv.gz + do + name=`echo $gzfile | cut -d. -f1` + echo $name + allcools extract-allc --strandness merge --allc_path $gzfile \ + --output_prefix $outputdir/$name \ + --mc_contexts ${mcg_context} \ + --chrom_size_path ~{chromosome_sizes} + done - cd /cromwell_root + mv output_bams/~{plate_id}.allc.count.tar.gz /cromwell_root - tar -zcvf ~{plate_id}.output_allc_tar.tar.gz $outputdir/*.gz - tar -zcvf ~{plate_id}.output_tbi_tar.tar.gz $outputdir/*.tbi + cd /cromwell_root + tar -zcvf ~{plate_id}.extract-allc.tar.gz $outputdir/*.gz + tar -zcvf ~{plate_id}.extract-allc_tbi.tar.gz $outputdir/*.tbi >>> @@ -1049,14 +880,15 @@ task unique_reads_cgn_extraction { memory: "${mem_size} GiB" preemptible: preemptible_tries } - output { - File output_allc_tar = "~{plate_id}.output_allc_tar.tar.gz" - File output_tbi_tar = "~{plate_id}.output_tbi_tar.tar.gz" + File allc = "~{plate_id}.allc.tsv.tar.gz" + File tbi = "~{plate_id}.allc.tbi.tar.gz" + File allc_uniq_reads_stats = "~{plate_id}.allc.count.tar.gz" + File extract_allc_output_allc_tar = "~{plate_id}.extract-allc.tar.gz" + File extract_allc_output_tbi_tar = "~{plate_id}.extract-allc_tbi.tar.gz" } } - task summary { input { Array[File] trimmed_stats diff --git a/verification/test-wdls/TestsnM3C.wdl b/verification/test-wdls/TestsnM3C.wdl index 3ca01baf74..959aec4bd7 100644 --- a/verification/test-wdls/TestsnM3C.wdl +++ b/verification/test-wdls/TestsnM3C.wdl @@ -72,24 +72,13 @@ workflow TestsnM3C { ], # Array[File] outputs snM3C.reference_version, - snM3C.chromatin_contact_stats, - snM3C.unique_reads_cgn_extraction_tbi, snM3C.unique_reads_cgn_extraction_allc, - snM3C.dedup_unique_bam_and_index_unique_bam_tar, - snM3C.remove_overlap_read_parts_bam_tar, - snM3C.pos_sorted_bams, + snM3C.unique_reads_cgn_extraction_tbi, + snM3C.unique_reads_cgn_extraction_allc_extract, + snM3C.unique_reads_cgn_extraction_tbi_extract, snM3C.name_sorted_bams, - snM3C.merge_sorted_bam_tar, - snM3C.split_fq_tar, - snM3C.unmapped_fastq_tar, - snM3C.multi_bam_tar, - snM3C.unique_bam_tar, - snM3C.hisat3n_bam_tar, - snM3C.hisat3n_stats_tar, - snM3C.r2_trimmed_fq, - snM3C.r1_trimmed_fq, - snM3C.trimmed_stats, - + snM3C.all_reads_dedup_contacts, + snM3C.all_reads_3C_contacts, ]) diff --git a/website/docs/Pipelines/snM3C/README.md b/website/docs/Pipelines/snM3C/README.md index 397cada01b..22486f6cdd 100644 --- a/website/docs/Pipelines/snM3C/README.md +++ b/website/docs/Pipelines/snM3C/README.md @@ -6,7 +6,7 @@ slug: /Pipelines/snM3C/README | Pipeline Version | Date Updated | Documentation Authors | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [snM3C_v1.0.1](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Kaylee Mathews](mailto:warp-pipelines-help@broadinsitute.org) | Please file GitHub issues in the [WARP repository](https://github.com/broadinstitute/warp/issues) | +| [snM3C_v2.0.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Kaylee Mathews](mailto:warp-pipelines-help@broadinsitute.org) | Please file GitHub issues in the [WARP repository](https://github.com/broadinstitute/warp/issues) | ## Introduction to snM3C @@ -76,15 +76,12 @@ Overall, the snM3C workflow: 1. Demultiplexes, sorts, and trims reads. 2. Aligns paired-end reads. -3. Separates unmapped, uniquely aligned, multi-aligned reads. -4. Splits unmapped reads by enzyme cut sites. -5. Aligns unmapped, single-end reads. -6. Removes overlapping reads. -7. Merges mapped reads from single- and paired-end alignments. -8. Calls chromatin contacts. -9. Removes duplicate reads. -10. Creates ALLC file. -11. Creates summary output file. +3. Separates unmapped, uniquely aligned, multi-aligned reads and splits unmapped reads by enzyme cut site. +4. Aligns unmapped, single-end reads and removes overlapping reads. +5. Merges mapped reads from single- and paired-end alignments and removes duplicate reads. +6. Calls chromatin contacts. +7. Creates ALLC files. +8. Creates summary output file. The tools each snM3C task employs are detailed in the table below. @@ -95,15 +92,11 @@ To see specific tool parameters, select the [workflow WDL link](https://github.c | Demultiplexing | Cutadapt | [Cutadapt](https://cutadapt.readthedocs.io/en/stable/) | Performs demultiplexing to cell-level FASTQ files based on random primer indices. | | Sort_and_trim_r1_and_r2 | Cutadapt | [Cutadapt](https://cutadapt.readthedocs.io/en/stable/) | Sorts, filters, and trims reads using the `r1_adapter`, `r2_adapter`, `r1_left_cut`, `r1_right_cut`, `r2_left_cut`, and `r2_right_cut` input parameters. | | Hisat_3n_pair_end_mapping_dna_mode | HISAT-3N | [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/) | Performs paired-end read alignment. | -| Separate_unmapped_reads | [hisat3n_general.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/hisat3n_general.py) | python3 | Imports a custom python3 script developed by Hanqing Liu and calls the `separate_unique_and_multi_align_reads()` function to separate unmapped, uniquely aligned, multi-aligned reads from HISAT-3N BAM file; unmapped reads are stored in an unmapped FASTQ file and uniquely and multi-aligned reads are stored in separate BAM files. | -| Split_unmapped_reads | [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py) | python3 | Imports a custom python3 script developed by Hanqing Liu and calls the `split_hisat3n_unmapped_reads()` function to split the unmapped reads FASTQ file by all possible enzyme cut sites and output new R1 and R2 FASTQ files. | -| Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name | HISAT-3N | [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/) | Performs single-end alignment of unmapped reads to maximize read mapping. | -| remove_overlap_read_parts | [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py) | python3 | Imports a custom python3 script developed by Hanqing Liu and calls the `remove_overlap_read_parts()` function to remove overlapping reads from the split alignment BAM file produced during single-end alignment. | -| merge_original_and_split_bam_and_sort_all_reads_by_name_and_position | merge, sort | [samtools](https://www.htslib.org/) | Merges and sorts all mapped reads from the paired-end and single-end alignments; creates a position-sorted BAM file and a name-sorted BAM file. | +| Separate_and_split_unmapped_reads | [hisat3n_general.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/hisat3n_general.py), [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py) | python3 | Imports 2 custom python3 scripts developed by Hanqing Liu and calls the `separate_unique_and_multi_align_reads()` and `split_hisat3n_unmapped_reads()` functions to separate unmapped, uniquely aligned, multi-aligned reads from HISAT-3N BAM file, then splits the unmapped reads FASTQ file by all possible enzyme cut sites and output new R1 and R2 FASTQ files; unmapped reads are stored in unmapped FASTQ files and uniquely and multi-aligned reads are stored in separate BAM files. | +| Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap | HISAT-3N, [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py) | [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/), python3 | Performs single-end alignment of unmapped reads to maximize read mapping, imports a custom python3 script developed by Hanqing Liu, and calls the `remove_overlap_read_parts()` function to remove overlapping reads from the split alignment BAM file produced during single-end alignment. | +| merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate | merge, sort, MarkDuplicates | [samtools](https://www.htslib.org/), [Picard](https://broadinstitute.github.io/picard/) | Merges and sorts all mapped reads from the paired-end and single-end alignments; creates a position-sorted BAM file and a name-sorted BAM file; removes duplicate reads from the position-sorted, merged BAM file. | | call_chromatin_contacts | [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py) | python3 | Imports a custom python3 script developed by Hanqing Liu and calls the `call_chromatin_contacts()` function to call chromatin contacts from the name-sorted, merged BAM file; reads are considered chromatin contacts if they are greater than 2,500 base pairs apart. | -| dedup_unique_bam_and_index_unique_bam | MarkDuplicates | [Picard](https://broadinstitute.github.io/picard/) | Removes duplicate reads from the position-sorted, merged BAM file. | -| unique_reads_allc | bam-to-allc | [ALLCools](https://lhqing.github.io/ALLCools/intro.html) | Creates an ALLC file with a list of methylation points. | -| unique_reads_cgn_extraction | extract-allc | [ALLCools](https://lhqing.github.io/ALLCools/intro.html) | Creates an ALLC file containing methylation contexts. | +| unique_reads_allc_and_cgn_extraction | bam-to-allc, extract-allc | [ALLCools](https://lhqing.github.io/ALLCools/intro.html) | Creates a first ALLC file with a list of methylation points and a second ALLC file containing methylation contexts. | | summary | [summary.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/summary.py) | python3 | Imports a custom python3 script developed by Hanqing Liu and calls the `snm3c_summary()` function to generate a single, summary file for the pipeline in TSV format; contains trimming, mapping, deduplication, chromatin contact, and AllC site statistics. | #### 1. Demultiplexes, sorts, and trims reads @@ -114,37 +107,35 @@ After demultiplexing, the pipeline uses [Cutadapt](https://cutadapt.readthedocs. #### 2. Aligns paired-end reads In the next step of the pipeline, the `Hisat_3n_pair_end_mapping_dna_mode` task uses [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/) to perform paired-end read alignment to a reference genome FASTA file (`genome_fa`) and outputs an aligned BAM file. Additionally, the task outputs a stats file and a text file containing the genomic reference version used. -#### 3. Separates unmapped, uniquely aligned, multi-aligned reads -After paired-end alignment, the pipeline calls the `Separate_unmapped_reads` task, which imports a custom python3 script ([hisat3n_general.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/hisat3n_general.py)) developed by Hanqing Liu. The task calls the script's `separate_unique_and_multi_align_reads()` function to separate unmapped, uniquely aligned, and multi-aligned reads from the HISAT-3N BAM file. Three new files are output from this step of the pipeline: +#### 3. Separates unmapped, uniquely aligned, multi-aligned reads and splits unmapped reads by enzyme cut site + +After paired-end alignment, the pipeline calls the `Separate_and_split_unmapped_reads` task, which imports a custom python3 script ([hisat3n_general.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/hisat3n_general.py)) developed by Hanqing Liu. The task calls the script's `separate_unique_and_multi_align_reads()` function to separate unmapped, uniquely aligned, and multi-aligned reads from the HISAT-3N BAM file. Three new files are output from this step of the pipeline: 1. A FASTQ file that contains the unmapped reads (`unmapped_fastq_tar`) 2. A BAM file that contains the uniquely aligned reads (`unique_bam_tar`) 3. A BAM file that contains the multi-aligned reads (`multi_bam_tar`) -#### 4. Splits unmapped reads by enzyme cut sites -The `Split_unmapped_reads` task imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu and calls the script's `split_hisat3n_unmapped_reads()` function. This splits the FASTQ file containing the unmapped reads by all possible enzyme cut sites and outputs new R1 and R2 files. +After separating reads, the task imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu and calls the script's `split_hisat3n_unmapped_reads()` function. This splits the FASTQ file containing the unmapped reads by all possible enzyme cut sites and outputs new R1 and R2 files. -#### 5. Aligns unmapped, single-end reads -In the next step of the pipeline, the `Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name ` task uses [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/) to perform single-end read alignment of the previously unmapped reads to maximize read mapping and outputs a single, aligned BAM file. +#### 4. Aligns unmapped, single-end reads and removes overlapping reads +In the next step of the pipeline, the `Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap ` task uses [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/) to perform single-end read alignment of the previously unmapped reads to maximize read mapping and outputs a single, aligned BAM file. -#### 6. Removes overlapping reads -After the second alignment step, the pipeline calls the `remove_overlap_read_parts ` task, which imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu. The task calls the script's `remove_overlap_read_parts()` function to remove overlapping reads from the BAM file produced during single-end alignment and output another BAM file. +After the second alignment step, the task imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu. The task calls the script's `remove_overlap_read_parts()` function to remove overlapping reads from the BAM file produced during single-end alignment and output another BAM file. -#### 7. Merges mapped reads from single- and paired-end alignments -The `merge_original_and_split_bam_and_sort_all_reads_by_name_and_position` task uses [samtools](https://www.htslib.org/) to merge and sort all of the mapped reads from the paired-end and single-end alignments into a single BAM file. The BAM file is output as both a position-sorted and a name-sorted BAM file. +#### 5. Merges mapped reads from single- and paired-end alignments and removes duplicate reads +The `merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate` task uses [samtools](https://www.htslib.org/) to merge and sort all of the mapped reads from the paired-end and single-end alignments into a single BAM file. The BAM file is output as both a position-sorted and a name-sorted BAM file. -#### 8. Calls chromatin contacts -In the `call_chromatin_contacts` task, the pipeline imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu. The task calls the script's `call_chromatin_contacts()` function to call chromatin contacts from the name-sorted, merged BAM file. If reads are greater than 2,500 base pairs apart, they are considered chromatin contacts. If reads are less than 2,500 base pairs apart, they are considered the same fragment. +After calling chromatin contacts, the task uses Picard's MarkDuplicates tool to remove duplicate reads from the position-sorted, merged BAM file and output a deduplicated BAM file. -#### 9. Removes duplicate reads -After calling chromatin contacts, the `dedup_unique_bam_and_index_unique_bam` task uses Picard's MarkDuplicates tool to remove duplicate reads from the position-sorted, merged BAM file and output a deduplicated BAM file. +#### 6. Calls chromatin contacts +In the `call_chromatin_contacts` task, the pipeline imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu. The task calls the script's `call_chromatin_contacts()` function to call chromatin contacts from the name-sorted, merged BAM file. If reads are greater than 2,500 base pairs apart, they are considered chromatin contacts. If reads are less than 2,500 base pairs apart, they are considered the same fragment. -#### 10. Creates ALLC file -The `unique_reads_allc` task uses the [ALLCools](https://lhqing.github.io/ALLCools/intro.html) `bam-to-allc` function to create an ALLC file from the deduplicated BAM file that contains a list of methylation points. The `num_upstr_bases` and `num_downstr_bases` input parameters are used to define the number of bases upstream and downstream of the C base to include in the ALLC context column. +#### 7. Creates ALLC files +The `unique_reads_allc_and_cgn_extraction` task uses the [ALLCools](https://lhqing.github.io/ALLCools/intro.html) `bam-to-allc` function to create an ALLC file from the deduplicated BAM file that contains a list of methylation points. The `num_upstr_bases` and `num_downstr_bases` input parameters are used to define the number of bases upstream and downstream of the C base to include in the ALLC context column. -Next, the `unique_reads_cgn_extraction` task uses the [ALLCools](https://lhqing.github.io/ALLCools/intro.html) `extract-allc` function to extract methylation contexts from the input ALLC file and output a second ALLC file that can be used to generate an [MCDS file](https://github.com/lhqing/allcools_doc/blob/master/tech-background/file-formats.md#mcds-file). +Next, the task uses the [ALLCools](https://lhqing.github.io/ALLCools/intro.html) `extract-allc` function to extract methylation contexts from the input ALLC file and output a second ALLC file that can be used to generate an [MCDS file](https://github.com/lhqing/allcools_doc/blob/master/tech-background/file-formats.md#mcds-file). -#### 11. Creates summary output file +#### 8. Creates summary output file In the last step of the pipeline, the `summary` task imports a custom python3 script ([summary.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/summary.py)) developed by Hanqing Liu. The task calls the script's `snm3c_summary()` function to generate a single, summary file for the pipeline in TSV format; contains trimming, mapping, deduplication, chromatin contact, and AllC site statistics. This is the main output of the pipeline. ## Outputs @@ -154,24 +145,16 @@ The following table lists the output variables and files produced by the pipelin | Output name | Filename, if applicable | Output format and description | | ------ | ------ | ------ | | MappingSummary | `_MappingSummary.csv.gz` | Mapping summary file in CSV format. | -| trimmed_stats | `.trimmed_stats_files.tar.gz` | Array of tarred files containing trimming stats files; for more information, see the [Cutadapt documentation](https://cutadapt.readthedocs.io/en/stable/guide.html#reporting). | -| r1_trimmed_fq | `.R1_trimmed_files.tar.gz` | Array of tarred files containing trimmed R1 FASTQ files. | -| r2_trimmed_fq | `.R2_trimmed_files.tar.gz` | Array of tarred files containing trimmed R2 FASTQ files. | -| hisat3n_stats_tar | `.hisat3n_paired_end_stats_files.tar.gz` | Array of tarred files containing paired-end alignment summary files; see the [HISAT2 alignment summary documentation](https://daehwankimlab.github.io/hisat2/manual/) for more information. | -| hisat3n_bam_tar | `.hisat3n_paired_end_bam_files.tar.gz` | Array of tarred files containing BAM files from paired-end alignment. | -| unique_bam_tar | `.hisat3n_paired_end_unique_bam_files.tar.gz` | Array of tarred files containing BAM files with uniquely aligned reads from paired-end alignment. | -| multi_bam_tar | `.hisat3n_paired_end_multi_bam_files.tar.gz` | Array of tarred files containing BAM files with multi-aligned reads from paired-end alignment. | -| unmapped_fastq_tar | `.hisat3n_paired_end_unmapped_fastq_files.tar.gz` | Array of tarred files containing FASTQ files with unmapped reads from paired-end alignment. | -| split_fq_tar | `.hisat3n_paired_end_split_fastq_files.tar.gz` | Array of tarred files containing FASTQ files with unmapped reads split by possible enzyme cut sites. | -| merge_sorted_bam_tar | `.hisat3n_dna.split_reads.name_sort.bam.tar.gz` | Array of tarred files containing BAM files from single-end alignment. | | name_sorted_bams | `.hisat3n_dna.all_reads.name_sort.tar.gz` | Array of tarred files containing name-sorted, merged BAM files. | -| pos_sorted_bams | `.hisat3n_dna.all_reads.pos_sort.tar.gz` | Array of tarred files containing position-sorted, merged BAM files. | -| remove_overlap_read_parts_bam_tar | `.remove_overlap_read_parts.tar.gz` | Array of tarred files containing BAM files from single-end alignment with overlapping reads removed. | -| dedup_unique_bam_and_index_unique_bam_tar | `.dedup_unique_bam_and_index_unique_bam.tar.gz` | Array of tarred files containing deduplicated, position-sorted BAM files. | -| unique_reads_cgn_extraction_allc | `.output_allc_tar.tar.gz` | Array of tarred files containing CGN context-specific ALLC files that can be used to generate an [MCDS file](https://github.com/lhqing/allcools_doc/blob/master/tech-background/file-formats.md#mcds-file). | -| unique_reads_cgn_extraction_tbi | `.output_tbi_tar.tar.gz` | Array of tarred files containing ALLC index files. | -| chromatin_contact_stats | `.chromatin_contact_stats.tar.gz` | Array of tarred files containing chromatin contact files. | +| unique_reads_cgn_extraction_allc | `.allc.tsv.tar.gz` | Array of tarred files containing list of methylation points. | +| unique_reads_cgn_extraction_tbi | `.allc.tbi.tar.gz` | Array of tarred files containing ALLC index files. | +| unique_reads_cgn_extraction_allc_extract | `.extract-allc.tar.gz` | Array of tarred files containing CGN context-specific ALLC files that can be used to generate an [MCDS file](https://github.com/lhqing/allcools_doc/blob/master/tech-background/file-formats.md#mcds-file). | +| unique_reads_cgn_extraction_tbi_extract | `.extract-allc_tbi.tar.gz` | Array of tarred files containing ALLC index files. | | reference_version | `.reference_version.txt` | Array of tarred files containing the genomic reference version used. | +| chromatin_contact_stats | `.chromatin_contact_stats.tar.gz` | Array of tarred files containing chromatin contact statistics. | +| all_reads_dedup_contacts | `.hisat3n_dna.all_reads.dedup_contacts.tar.gz` | Array of tarred TSV files containing deduplicated chromatin contacts. | +| all_reads_3C_contacts | `.hisat3n_dna.all_reads.3C.contact.tar.gz` | Array of tarred TSV files containing chromatin contacts in Hi-C format. | + ## Versioning From e6b2356c5ccc6729ca748dd1f724a5591d2fefb4 Mon Sep 17 00:00:00 2001 From: ekiernan <55763654+ekiernan@users.noreply.github.com> Date: Thu, 15 Feb 2024 12:17:18 -0500 Subject: [PATCH 20/68] snm3C memory, disk, and CPU updates (#1207) Updated runtime parameters --- pipelines/skylab/snM3C/snM3C.changelog.md | 5 ++++ pipelines/skylab/snM3C/snM3C.wdl | 36 +++++++++++------------ 2 files changed, 23 insertions(+), 18 deletions(-) diff --git a/pipelines/skylab/snM3C/snM3C.changelog.md b/pipelines/skylab/snM3C/snM3C.changelog.md index 29ba78d160..dc90a21239 100644 --- a/pipelines/skylab/snM3C/snM3C.changelog.md +++ b/pipelines/skylab/snM3C/snM3C.changelog.md @@ -1,3 +1,8 @@ +# 2.0.1 +2024-2-15 (Date of Last Commit) + +* Updated the snM3C task memory, disk, and CPUs + # 2.0.0 2024-2-13 (Date of Last Commit) diff --git a/pipelines/skylab/snM3C/snM3C.wdl b/pipelines/skylab/snM3C/snM3C.wdl index c48bff7ead..bcdc71a861 100644 --- a/pipelines/skylab/snM3C/snM3C.wdl +++ b/pipelines/skylab/snM3C/snM3C.wdl @@ -27,7 +27,7 @@ workflow snM3C { } # version of the pipeline - String pipeline_version = "2.0.0" + String pipeline_version = "2.0.1" call Demultiplexing { input: @@ -138,10 +138,10 @@ task Demultiplexing { Int batch_number String docker_image = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 50 + Int disk_size = 1000 Int mem_size = 10 Int preemptible_tries = 3 - Int cpu = 1 + Int cpu = 8 } command <<< @@ -250,11 +250,11 @@ task Sort_and_trim_r1_and_r2 { Int r2_right_cut Int min_read_length - Int disk_size = 50 - Int mem_size = 10 + Int disk_size = 500 + Int mem_size = 16 String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" Int preemptible_tries = 3 - Int cpu = 1 + Int cpu = 4 } command <<< @@ -416,10 +416,10 @@ task Separate_and_split_unmapped_reads { String plate_id String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 50 + Int disk_size = 200 Int mem_size = 10 Int preemptible_tries = 3 - Int cpu = 1 + Int cpu = 8 } command <<< @@ -649,10 +649,10 @@ task merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_de String plate_id String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 80 - Int mem_size = 20 + Int disk_size = 1000 + Int mem_size = 50 Int preemptible_tries = 3 - Int cpu = 1 + Int cpu = 8 } command <<< set -euo pipefail @@ -730,10 +730,10 @@ task call_chromatin_contacts { String plate_id String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 80 - Int mem_size = 20 + Int disk_size = 500 + Int mem_size = 32 Int preemptible_tries = 3 - Int cpu = 1 + Int cpu = 8 } command <<< set -euo pipefail @@ -797,12 +797,12 @@ task unique_reads_allc_and_cgn_extraction { Int compress_level File chromosome_sizes - Int disk_size = 80 + Int disk_size = 200 Int mem_size = 20 String genome_base = basename(genome_fa) String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" Int preemptible_tries = 3 - Int cpu = 1 + Int cpu = 8 } command <<< set -euo pipefail @@ -903,9 +903,9 @@ task summary { String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" Int disk_size = 80 - Int mem_size = 20 + Int mem_size = 5 Int preemptible_tries = 3 - Int cpu = 1 + Int cpu = 4 } command <<< set -euo pipefail From 32bcffefba4f0aa9671a321dbd076fbc5b616dbc Mon Sep 17 00:00:00 2001 From: meganshand Date: Fri, 16 Feb 2024 08:43:08 -0500 Subject: [PATCH 21/68] updating gatk to 4.5.0.0 (#1153) * updating gatk to 4.5.0.0 * changelogs * missed one changelog * update pipeline docs * fixing Ultima Cram Only test * fixing dragen scientific tests * versions * more disk in compareVCFs * update docs --------- Co-authored-by: kayleemathews Co-authored-by: Nikelle Petrillo <38223776+nikellepetrillo@users.noreply.github.com> Co-authored-by: Nikelle Petrillo --- .../AnnotationFiltration.changelog.md | 5 +++ .../AnnotationFiltration.wdl | 4 +-- .../test_inputs/Plumbing/hg38.json | 2 +- .../test_inputs/Scientific/hg38.json | 2 +- .../arrays/imputation/Imputation.changelog.md | 5 +++ .../broad/arrays/imputation/Imputation.wdl | 2 +- .../arrays/single_sample/Arrays.changelog.md | 5 +++ .../broad/arrays/single_sample/Arrays.wdl | 2 +- .../validate_chip/ValidateChip.changelog.md | 5 +++ .../arrays/validate_chip/ValidateChip.wdl | 2 +- .../JointGenotyping.changelog.md | 5 +++ .../joint_genotyping/JointGenotyping.wdl | 2 +- ...UltimaGenomicsJointGenotyping.changelog.md | 5 +++ .../UltimaGenomicsJointGenotyping.wdl | 2 +- ...GenotypingByChromosomePartOne.changelog.md | 5 +++ .../JointGenotypingByChromosomePartOne.wdl | 2 +- ...GenotypingByChromosomePartTwo.changelog.md | 5 +++ .../JointGenotypingByChromosomePartTwo.wdl | 2 +- .../reblocking/ReblockGVCF.changelog.md | 6 +++- .../reblocking/ReblockGVCF.wdl | 4 +-- .../ExomeGermlineSingleSample.changelog.md | 6 +++- .../exome/ExomeGermlineSingleSample.wdl | 2 +- ...maGenomicsWholeGenomeGermline.changelog.md | 5 +++ .../UltimaGenomicsWholeGenomeGermline.wdl | 2 +- ...oleGenomeGermlineSingleSample.changelog.md | 5 +++ .../wgs/WholeGenomeGermlineSingleSample.wdl | 2 +- ...4982.NA12878.dragen_mode_best_results.json | 3 +- ...78.dragen_mode_functional_equivalence.json | 3 +- .../VariantCalling.changelog.md | 5 +++ .../variant_calling/VariantCalling.wdl | 4 +-- ...maGenomicsWholeGenomeCramOnly.changelog.md | 5 +++ .../UltimaGenomicsWholeGenomeCramOnly.wdl | 2 +- .../ugwgs/test_inputs/Scientific/HCC1187.json | 3 +- .../IlluminaGenotypingArray.changelog.md | 5 +++ .../illumina/IlluminaGenotypingArray.wdl | 2 +- .../BroadInternalImputation.changelog.md | 5 +++ .../imputation/BroadInternalImputation.wdl | 2 +- .../BroadInternalArrays.changelog.md | 5 +++ .../single_sample/BroadInternalArrays.wdl | 2 +- .../BroadInternalUltimaGenomics.changelog.md | 5 +++ .../BroadInternalUltimaGenomics.wdl | 2 +- .../BroadInternalRNAWithUMIs.changelog.md | 5 +++ .../rna_seq/BroadInternalRNAWithUMIs.wdl | 2 +- .../broad/qc/CheckFingerprint.changelog.md | 5 +++ pipelines/broad/qc/CheckFingerprint.wdl | 2 +- .../exome/ExomeReprocessing.changelog.md | 6 +++- .../reprocessing/exome/ExomeReprocessing.wdl | 2 +- .../ExternalExomeReprocessing.changelog.md | 6 +++- .../exome/ExternalExomeReprocessing.wdl | 2 +- ...ternalWholeGenomeReprocessing.changelog.md | 6 +++- .../wgs/ExternalWholeGenomeReprocessing.wdl | 2 +- .../wgs/WholeGenomeReprocessing.changelog.md | 5 +++ .../wgs/WholeGenomeReprocessing.wdl | 2 +- .../rna_seq/RNAWithUMIsPipeline.changelog.md | 5 +++ .../broad/rna_seq/RNAWithUMIsPipeline.wdl | 2 +- .../cemba/cemba_methylcseq/CEMBA.changelog.md | 5 +++ pipelines/cemba/cemba_methylcseq/CEMBA.wdl | 6 ++-- scripts/BuildAFComparisonTable.wdl | 10 +++--- scripts/RemoveBadSitesByID.wdl | 4 +-- tasks/broad/DragenTasks.wdl | 2 +- tasks/broad/GermlineVariantDiscovery.wdl | 10 +++--- tasks/broad/IlluminaGenotypingArrayTasks.wdl | 8 ++--- tasks/broad/ImputationTasks.wdl | 16 ++++----- tasks/broad/JointGenotypingTasks.wdl | 34 +++++++++---------- tasks/broad/Qc.wdl | 4 +-- tasks/broad/RNAWithUMIsTasks.wdl | 4 +-- ...timaGenomicsGermlineFilteringThreshold.wdl | 4 +-- ...UltimaGenomicsWholeGenomeGermlineTasks.wdl | 12 +++---- verification/VerifyNA12878.wdl | 2 +- verification/VerifyTasks.wdl | 2 +- .../CEMBA.methods.md | 4 +-- .../CEMBA_MethylC_Seq_Pipeline/README.md | 8 ++--- .../README.md | 4 +-- .../docs/Pipelines/JointGenotyping/README.md | 2 +- .../README.md | 2 +- .../README.md | 2 +- .../wgs.methods.md | 6 ++-- 77 files changed, 239 insertions(+), 111 deletions(-) diff --git a/pipelines/broad/annotation_filtration/AnnotationFiltration.changelog.md b/pipelines/broad/annotation_filtration/AnnotationFiltration.changelog.md index e87877e3db..2e661ad34d 100644 --- a/pipelines/broad/annotation_filtration/AnnotationFiltration.changelog.md +++ b/pipelines/broad/annotation_filtration/AnnotationFiltration.changelog.md @@ -1,3 +1,8 @@ +# 1.2.5 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0 + # 1.2.4 2022-11-09 (Date of Last Commit) diff --git a/pipelines/broad/annotation_filtration/AnnotationFiltration.wdl b/pipelines/broad/annotation_filtration/AnnotationFiltration.wdl index e7178ab9bc..af4fa32a27 100644 --- a/pipelines/broad/annotation_filtration/AnnotationFiltration.wdl +++ b/pipelines/broad/annotation_filtration/AnnotationFiltration.wdl @@ -4,7 +4,7 @@ import "../../../tasks/broad/Funcotator.wdl" as Funcotator workflow AnnotationFiltration { - String pipeline_version = "1.2.4" + String pipeline_version = "1.2.5" input { Array[File] vcfs @@ -15,7 +15,7 @@ workflow AnnotationFiltration { File ref_dict File? funcotator_interval_list - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" File? custom_data_source_tar_gz } diff --git a/pipelines/broad/annotation_filtration/test_inputs/Plumbing/hg38.json b/pipelines/broad/annotation_filtration/test_inputs/Plumbing/hg38.json index a464cbb5d3..d87a07c1ea 100644 --- a/pipelines/broad/annotation_filtration/test_inputs/Plumbing/hg38.json +++ b/pipelines/broad/annotation_filtration/test_inputs/Plumbing/hg38.json @@ -10,6 +10,6 @@ "AnnotationFiltration.ref_fasta_index": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai", "AnnotationFiltration.ref_dict": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict", - "AnnotationFiltration.gatk_docker": "us.gcr.io/broad-gatk/gatk:4.3.0.0", + "AnnotationFiltration.gatk_docker": "us.gcr.io/broad-gatk/gatk:4.5.0.0", "AnnotationFiltration.custom_data_source_tar_gz": "gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.6.20190124g.tar.gz" } diff --git a/pipelines/broad/annotation_filtration/test_inputs/Scientific/hg38.json b/pipelines/broad/annotation_filtration/test_inputs/Scientific/hg38.json index c3e324d5f2..6f36aeaafa 100644 --- a/pipelines/broad/annotation_filtration/test_inputs/Scientific/hg38.json +++ b/pipelines/broad/annotation_filtration/test_inputs/Scientific/hg38.json @@ -8,6 +8,6 @@ "AnnotationFiltration.ref_fasta_index": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai", "AnnotationFiltration.ref_dict": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict", - "AnnotationFiltration.gatk_docker": "us.gcr.io/broad-gatk/gatk:4.3.0.0", + "AnnotationFiltration.gatk_docker": "us.gcr.io/broad-gatk/gatk:4.5.0.0", "AnnotationFiltration.custom_data_source_tar_gz": "gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.6.20190124g.tar.gz" } diff --git a/pipelines/broad/arrays/imputation/Imputation.changelog.md b/pipelines/broad/arrays/imputation/Imputation.changelog.md index 4f87b2ac76..e96dabb6a6 100644 --- a/pipelines/broad/arrays/imputation/Imputation.changelog.md +++ b/pipelines/broad/arrays/imputation/Imputation.changelog.md @@ -1,3 +1,8 @@ +# 1.1.12 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0 + # 1.1.11 2023-08-01 (Date of last Commit) diff --git a/pipelines/broad/arrays/imputation/Imputation.wdl b/pipelines/broad/arrays/imputation/Imputation.wdl index 245ab01455..44d5a93cd0 100644 --- a/pipelines/broad/arrays/imputation/Imputation.wdl +++ b/pipelines/broad/arrays/imputation/Imputation.wdl @@ -6,7 +6,7 @@ import "../../../../tasks/broad/Utilities.wdl" as utils workflow Imputation { - String pipeline_version = "1.1.11" + String pipeline_version = "1.1.12" input { Int chunkLength = 25000000 diff --git a/pipelines/broad/arrays/single_sample/Arrays.changelog.md b/pipelines/broad/arrays/single_sample/Arrays.changelog.md index dc8494544b..1155468625 100644 --- a/pipelines/broad/arrays/single_sample/Arrays.changelog.md +++ b/pipelines/broad/arrays/single_sample/Arrays.changelog.md @@ -1,3 +1,8 @@ +# 2.6.22 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0 + # 2.6.21 2023-12-08 (Date of Last Commit) diff --git a/pipelines/broad/arrays/single_sample/Arrays.wdl b/pipelines/broad/arrays/single_sample/Arrays.wdl index e7693b152d..a59633f839 100644 --- a/pipelines/broad/arrays/single_sample/Arrays.wdl +++ b/pipelines/broad/arrays/single_sample/Arrays.wdl @@ -23,7 +23,7 @@ import "../../../../tasks/broad/Utilities.wdl" as utils workflow Arrays { - String pipeline_version = "2.6.21" + String pipeline_version = "2.6.22" input { String chip_well_barcode diff --git a/pipelines/broad/arrays/validate_chip/ValidateChip.changelog.md b/pipelines/broad/arrays/validate_chip/ValidateChip.changelog.md index 0e36caabd0..a55bbaaeea 100644 --- a/pipelines/broad/arrays/validate_chip/ValidateChip.changelog.md +++ b/pipelines/broad/arrays/validate_chip/ValidateChip.changelog.md @@ -1,3 +1,8 @@ +# 1.16.4 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0 + # 1.16.3 2023-01-13 (Date of Last Commit) diff --git a/pipelines/broad/arrays/validate_chip/ValidateChip.wdl b/pipelines/broad/arrays/validate_chip/ValidateChip.wdl index a2db629677..a18dffecbb 100644 --- a/pipelines/broad/arrays/validate_chip/ValidateChip.wdl +++ b/pipelines/broad/arrays/validate_chip/ValidateChip.wdl @@ -21,7 +21,7 @@ import "../../../../tasks/broad/InternalArraysTasks.wdl" as InternalTasks workflow ValidateChip { - String pipeline_version = "1.16.3" + String pipeline_version = "1.16.4" input { String sample_alias diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.changelog.md b/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.changelog.md index 55378fdca6..d97dee1d00 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.changelog.md +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.changelog.md @@ -1,3 +1,8 @@ +# 1.6.10 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0 + # 1.6.9 2023-09-08 (Date of Last Commit) diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.wdl b/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.wdl index 87832c4c12..097f398553 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.wdl +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.wdl @@ -7,7 +7,7 @@ import "https://raw.githubusercontent.com/broadinstitute/gatk/4.5.0.0/scripts/vc # Joint Genotyping for hg38 Whole Genomes and Exomes (has not been tested on hg19) workflow JointGenotyping { - String pipeline_version = "1.6.9" + String pipeline_version = "1.6.10" input { File unpadded_intervals_file diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.changelog.md b/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.changelog.md index b3e7a610e9..55087513c4 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.changelog.md +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.changelog.md @@ -1,3 +1,8 @@ +# 1.1.7 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 1.1.6 2023-02-06 (Date of Last Commit) diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.wdl b/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.wdl index 2104739e3d..7ad923f4a6 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.wdl +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/UltimaGenomics/UltimaGenomicsJointGenotyping.wdl @@ -11,7 +11,7 @@ import "../../../../../../tasks/broad/UltimaGenomicsGermlineFilteringThreshold.w # For choosing a filtering threshold (where on the ROC curve to filter) a sample with truth data is required. workflow UltimaGenomicsJointGenotyping { - String pipeline_version = "1.1.6" + String pipeline_version = "1.1.7" input { File unpadded_intervals_file diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartOne.changelog.md b/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartOne.changelog.md index 61a0a0daed..9054632fd0 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartOne.changelog.md +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartOne.changelog.md @@ -1,3 +1,8 @@ +# 1.4.12 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0 + # 1.4.11 2023-09-08 (Date of Last Commit) diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartOne.wdl b/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartOne.wdl index a067eb2882..0cbbabc68d 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartOne.wdl +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartOne.wdl @@ -5,7 +5,7 @@ import "../../../../../../tasks/broad/JointGenotypingTasks.wdl" as Tasks # Joint Genotyping for hg38 Exomes and Whole Genomes (has not been tested on hg19) workflow JointGenotypingByChromosomePartOne { - String pipeline_version = "1.4.11" + String pipeline_version = "1.4.12" input { File unpadded_intervals_file diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartTwo.changelog.md b/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartTwo.changelog.md index 4d158220e9..74948716f2 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartTwo.changelog.md +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartTwo.changelog.md @@ -1,3 +1,8 @@ +# 1.4.11 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0 + # 1.4.10 2023-09-08 (Date of Last Commit) diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartTwo.wdl b/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartTwo.wdl index ccb36af7d0..7e9e2b7bb1 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartTwo.wdl +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartTwo.wdl @@ -5,7 +5,7 @@ import "../../../../../../tasks/broad/JointGenotypingTasks.wdl" as Tasks # Joint Genotyping for hg38 Exomes and Whole Genomes (has not been tested on hg19) workflow JointGenotypingByChromosomePartTwo { - String pipeline_version = "1.4.10" + String pipeline_version = "1.4.11" input { String callset_name diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/reblocking/ReblockGVCF.changelog.md b/pipelines/broad/dna_seq/germline/joint_genotyping/reblocking/ReblockGVCF.changelog.md index 7f3d1abfc6..58f1643d4e 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/reblocking/ReblockGVCF.changelog.md +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/reblocking/ReblockGVCF.changelog.md @@ -1,10 +1,14 @@ +# 2.1.11 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. Header documentation change for RAW_GT_COUNT annotation. + # 2.1.10 2023-12-14 (Date of Last Commit) * Updated GATK for Reblock task to version 4.5.0.0 * Added options to Reblock task to remove annotations and move filters to genotype level - # 2.1.9 2023-12-08 (Date of Last Commit) diff --git a/pipelines/broad/dna_seq/germline/joint_genotyping/reblocking/ReblockGVCF.wdl b/pipelines/broad/dna_seq/germline/joint_genotyping/reblocking/ReblockGVCF.wdl index 69c0e37591..b27820b937 100644 --- a/pipelines/broad/dna_seq/germline/joint_genotyping/reblocking/ReblockGVCF.wdl +++ b/pipelines/broad/dna_seq/germline/joint_genotyping/reblocking/ReblockGVCF.wdl @@ -5,7 +5,7 @@ import "../../../../../../tasks/broad/Qc.wdl" as QC workflow ReblockGVCF { - String pipeline_version = "2.1.10" + String pipeline_version = "2.1.11" input { @@ -50,7 +50,7 @@ workflow ReblockGVCF { calling_interval_list_index = gvcf_index, is_gvcf = true, extra_args = "--no-overlaps", - gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } output { diff --git a/pipelines/broad/dna_seq/germline/single_sample/exome/ExomeGermlineSingleSample.changelog.md b/pipelines/broad/dna_seq/germline/single_sample/exome/ExomeGermlineSingleSample.changelog.md index 14369eee0e..c167d5d698 100644 --- a/pipelines/broad/dna_seq/germline/single_sample/exome/ExomeGermlineSingleSample.changelog.md +++ b/pipelines/broad/dna_seq/germline/single_sample/exome/ExomeGermlineSingleSample.changelog.md @@ -1,10 +1,14 @@ +# 3.1.18 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 3.1.17 2023-12-14 (Date of Last Commit) * Updated GATK for Reblock task to version 4.5.0.0 * Added options to Reblock task to remove annotations and move filters to genotype level - # 3.1.16 2023-12-08 (Date of Last Commit) diff --git a/pipelines/broad/dna_seq/germline/single_sample/exome/ExomeGermlineSingleSample.wdl b/pipelines/broad/dna_seq/germline/single_sample/exome/ExomeGermlineSingleSample.wdl index 8be4e1a5fc..31900c8c6f 100644 --- a/pipelines/broad/dna_seq/germline/single_sample/exome/ExomeGermlineSingleSample.wdl +++ b/pipelines/broad/dna_seq/germline/single_sample/exome/ExomeGermlineSingleSample.wdl @@ -44,7 +44,7 @@ import "../../../../../../structs/dna_seq/DNASeqStructs.wdl" # WORKFLOW DEFINITION workflow ExomeGermlineSingleSample { - String pipeline_version = "3.1.17" + String pipeline_version = "3.1.18" input { diff --git a/pipelines/broad/dna_seq/germline/single_sample/ugwgs/UltimaGenomicsWholeGenomeGermline.changelog.md b/pipelines/broad/dna_seq/germline/single_sample/ugwgs/UltimaGenomicsWholeGenomeGermline.changelog.md index c011691d1f..bca633e289 100644 --- a/pipelines/broad/dna_seq/germline/single_sample/ugwgs/UltimaGenomicsWholeGenomeGermline.changelog.md +++ b/pipelines/broad/dna_seq/germline/single_sample/ugwgs/UltimaGenomicsWholeGenomeGermline.changelog.md @@ -1,3 +1,8 @@ +# 1.0.15 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 1.0.14 2023-12-14 (Date of Last Commit) diff --git a/pipelines/broad/dna_seq/germline/single_sample/ugwgs/UltimaGenomicsWholeGenomeGermline.wdl b/pipelines/broad/dna_seq/germline/single_sample/ugwgs/UltimaGenomicsWholeGenomeGermline.wdl index 479e6b914a..8548e22478 100644 --- a/pipelines/broad/dna_seq/germline/single_sample/ugwgs/UltimaGenomicsWholeGenomeGermline.wdl +++ b/pipelines/broad/dna_seq/germline/single_sample/ugwgs/UltimaGenomicsWholeGenomeGermline.wdl @@ -50,7 +50,7 @@ workflow UltimaGenomicsWholeGenomeGermline { filtering_model_no_gt_name: "String describing the optional filtering model; default set to rf_model_ignore_gt_incl_hpol_runs" } - String pipeline_version = "1.0.14" + String pipeline_version = "1.0.15" References references = alignment_references.references diff --git a/pipelines/broad/dna_seq/germline/single_sample/wgs/WholeGenomeGermlineSingleSample.changelog.md b/pipelines/broad/dna_seq/germline/single_sample/wgs/WholeGenomeGermlineSingleSample.changelog.md index 052c66c391..38fdadfaa8 100644 --- a/pipelines/broad/dna_seq/germline/single_sample/wgs/WholeGenomeGermlineSingleSample.changelog.md +++ b/pipelines/broad/dna_seq/germline/single_sample/wgs/WholeGenomeGermlineSingleSample.changelog.md @@ -1,3 +1,8 @@ +# 3.1.19 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 3.1.18 2023-12-14 (Date of Last Commit) diff --git a/pipelines/broad/dna_seq/germline/single_sample/wgs/WholeGenomeGermlineSingleSample.wdl b/pipelines/broad/dna_seq/germline/single_sample/wgs/WholeGenomeGermlineSingleSample.wdl index 2b1fad60a3..23e9d76845 100644 --- a/pipelines/broad/dna_seq/germline/single_sample/wgs/WholeGenomeGermlineSingleSample.wdl +++ b/pipelines/broad/dna_seq/germline/single_sample/wgs/WholeGenomeGermlineSingleSample.wdl @@ -40,7 +40,7 @@ import "../../../../../../structs/dna_seq/DNASeqStructs.wdl" workflow WholeGenomeGermlineSingleSample { - String pipeline_version = "3.1.18" + String pipeline_version = "3.1.19" input { diff --git a/pipelines/broad/dna_seq/germline/single_sample/wgs/test_inputs/Scientific/G94982.NA12878.dragen_mode_best_results.json b/pipelines/broad/dna_seq/germline/single_sample/wgs/test_inputs/Scientific/G94982.NA12878.dragen_mode_best_results.json index 5d7841519a..94f90073c8 100644 --- a/pipelines/broad/dna_seq/germline/single_sample/wgs/test_inputs/Scientific/G94982.NA12878.dragen_mode_best_results.json +++ b/pipelines/broad/dna_seq/germline/single_sample/wgs/test_inputs/Scientific/G94982.NA12878.dragen_mode_best_results.json @@ -82,5 +82,6 @@ "agg_preemptible_tries": 3 }, - "WholeGenomeGermlineSingleSample.dragen_maximum_quality_mode": true + "WholeGenomeGermlineSingleSample.dragen_maximum_quality_mode": true, + "WholeGenomeGermlineSingleSample.BamToGvcf.HaplotypeCallerGATK4.memory_multiplier":2 } diff --git a/pipelines/broad/dna_seq/germline/single_sample/wgs/test_inputs/Scientific/G94982.NA12878.dragen_mode_functional_equivalence.json b/pipelines/broad/dna_seq/germline/single_sample/wgs/test_inputs/Scientific/G94982.NA12878.dragen_mode_functional_equivalence.json index d52c139fd2..c4b9608f29 100644 --- a/pipelines/broad/dna_seq/germline/single_sample/wgs/test_inputs/Scientific/G94982.NA12878.dragen_mode_functional_equivalence.json +++ b/pipelines/broad/dna_seq/germline/single_sample/wgs/test_inputs/Scientific/G94982.NA12878.dragen_mode_functional_equivalence.json @@ -81,5 +81,6 @@ "agg_preemptible_tries": 3 }, - "WholeGenomeGermlineSingleSample.dragen_functional_equivalence_mode": true + "WholeGenomeGermlineSingleSample.dragen_functional_equivalence_mode": true, + "WholeGenomeGermlineSingleSample.BamToGvcf.HaplotypeCallerGATK4.memory_multiplier":2 } diff --git a/pipelines/broad/dna_seq/germline/variant_calling/VariantCalling.changelog.md b/pipelines/broad/dna_seq/germline/variant_calling/VariantCalling.changelog.md index 676deb72b6..12af2b9efb 100644 --- a/pipelines/broad/dna_seq/germline/variant_calling/VariantCalling.changelog.md +++ b/pipelines/broad/dna_seq/germline/variant_calling/VariantCalling.changelog.md @@ -1,3 +1,8 @@ +# 2.1.17 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. Header documentation change for RAW_GT_COUNT annotation. + # 2.1.16 2023-12-14 (Date of Last Commit) diff --git a/pipelines/broad/dna_seq/germline/variant_calling/VariantCalling.wdl b/pipelines/broad/dna_seq/germline/variant_calling/VariantCalling.wdl index 90dff51c08..27263d6150 100644 --- a/pipelines/broad/dna_seq/germline/variant_calling/VariantCalling.wdl +++ b/pipelines/broad/dna_seq/germline/variant_calling/VariantCalling.wdl @@ -9,7 +9,7 @@ import "../../../../../tasks/broad/DragenTasks.wdl" as DragenTasks workflow VariantCalling { - String pipeline_version = "2.1.16" + String pipeline_version = "2.1.17" input { @@ -183,7 +183,7 @@ workflow VariantCalling { calling_interval_list = calling_interval_list, is_gvcf = make_gvcf, extra_args = if (skip_reblocking == false) then "--no-overlaps" else "", - gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0", + gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0", preemptible_tries = agg_preemptible_tries } diff --git a/pipelines/broad/dna_seq/somatic/single_sample/ugwgs/UltimaGenomicsWholeGenomeCramOnly.changelog.md b/pipelines/broad/dna_seq/somatic/single_sample/ugwgs/UltimaGenomicsWholeGenomeCramOnly.changelog.md index 53166ccfe6..129c68527a 100644 --- a/pipelines/broad/dna_seq/somatic/single_sample/ugwgs/UltimaGenomicsWholeGenomeCramOnly.changelog.md +++ b/pipelines/broad/dna_seq/somatic/single_sample/ugwgs/UltimaGenomicsWholeGenomeCramOnly.changelog.md @@ -1,3 +1,8 @@ +# 1.0.15 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 1.0.14 2023-12-14 (Date of Last Commit) diff --git a/pipelines/broad/dna_seq/somatic/single_sample/ugwgs/UltimaGenomicsWholeGenomeCramOnly.wdl b/pipelines/broad/dna_seq/somatic/single_sample/ugwgs/UltimaGenomicsWholeGenomeCramOnly.wdl index 3946c87545..dd411ed458 100644 --- a/pipelines/broad/dna_seq/somatic/single_sample/ugwgs/UltimaGenomicsWholeGenomeCramOnly.wdl +++ b/pipelines/broad/dna_seq/somatic/single_sample/ugwgs/UltimaGenomicsWholeGenomeCramOnly.wdl @@ -43,7 +43,7 @@ workflow UltimaGenomicsWholeGenomeCramOnly { save_bam_file: "If true, then save intermeidate ouputs used by germline pipeline (such as the output BAM) otherwise they won't be kept as outputs." } - String pipeline_version = "1.0.14" + String pipeline_version = "1.0.15" References references = alignment_references.references diff --git a/pipelines/broad/dna_seq/somatic/single_sample/ugwgs/test_inputs/Scientific/HCC1187.json b/pipelines/broad/dna_seq/somatic/single_sample/ugwgs/test_inputs/Scientific/HCC1187.json index 1dbd4f99e3..340a82d66e 100644 --- a/pipelines/broad/dna_seq/somatic/single_sample/ugwgs/test_inputs/Scientific/HCC1187.json +++ b/pipelines/broad/dna_seq/somatic/single_sample/ugwgs/test_inputs/Scientific/HCC1187.json @@ -41,5 +41,6 @@ "ref_dbsnp": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf", "ref_dbsnp_index": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx", "wgs_coverage_interval_list": "gs://gcp-public-data--broad-references/hg38/v0/wgs_coverage_regions.hg38.interval_list" - } + }, + "UltimaGenomicsWholeGenomeCramOnly.AlignmentAndMarkDuplicates.MarkDuplicatesSpark.memory_mb":300000 } diff --git a/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.changelog.md b/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.changelog.md index ba50f4ee6d..64fd9a9fb1 100644 --- a/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.changelog.md +++ b/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.changelog.md @@ -1,3 +1,8 @@ +# 1.12.16 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 1.12.15 2023-12-08 (Date of Last Commit) diff --git a/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl b/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl index c7a35494db..af3b1e57ec 100644 --- a/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl +++ b/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl @@ -21,7 +21,7 @@ import "../../../../tasks/broad/Qc.wdl" as Qc workflow IlluminaGenotypingArray { - String pipeline_version = "1.12.15" + String pipeline_version = "1.12.16" input { String sample_alias diff --git a/pipelines/broad/internal/arrays/imputation/BroadInternalImputation.changelog.md b/pipelines/broad/internal/arrays/imputation/BroadInternalImputation.changelog.md index 3efa30e876..0ac74c9794 100644 --- a/pipelines/broad/internal/arrays/imputation/BroadInternalImputation.changelog.md +++ b/pipelines/broad/internal/arrays/imputation/BroadInternalImputation.changelog.md @@ -1,3 +1,8 @@ +# 1.1.10 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 1.1.9 2023-08-01 (Date of Last Commit) diff --git a/pipelines/broad/internal/arrays/imputation/BroadInternalImputation.wdl b/pipelines/broad/internal/arrays/imputation/BroadInternalImputation.wdl index 653b8cefad..3021fe6a4c 100644 --- a/pipelines/broad/internal/arrays/imputation/BroadInternalImputation.wdl +++ b/pipelines/broad/internal/arrays/imputation/BroadInternalImputation.wdl @@ -9,7 +9,7 @@ workflow BroadInternalImputation { description: "Push outputs of Imputation.wdl to TDR dataset table ImputationOutputsTable and split out Imputation arrays into ImputationWideOutputsTable." allowNestedInputs: true } - String pipeline_version = "1.1.9" + String pipeline_version = "1.1.10" input { # inputs to wrapper task diff --git a/pipelines/broad/internal/arrays/single_sample/BroadInternalArrays.changelog.md b/pipelines/broad/internal/arrays/single_sample/BroadInternalArrays.changelog.md index ba8009535f..f4b807441a 100644 --- a/pipelines/broad/internal/arrays/single_sample/BroadInternalArrays.changelog.md +++ b/pipelines/broad/internal/arrays/single_sample/BroadInternalArrays.changelog.md @@ -1,3 +1,8 @@ +# 1.1.6 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 1.1.5 2023-12-08 (Date of Last Commit) diff --git a/pipelines/broad/internal/arrays/single_sample/BroadInternalArrays.wdl b/pipelines/broad/internal/arrays/single_sample/BroadInternalArrays.wdl index 185ab12d95..762eef0709 100644 --- a/pipelines/broad/internal/arrays/single_sample/BroadInternalArrays.wdl +++ b/pipelines/broad/internal/arrays/single_sample/BroadInternalArrays.wdl @@ -9,7 +9,7 @@ workflow BroadInternalArrays { description: "Push outputs of Arrays.wdl to TDR dataset table ArraysOutputsTable." } - String pipeline_version = "1.1.5" + String pipeline_version = "1.1.6" input { # inputs to wrapper task diff --git a/pipelines/broad/internal/dna_seq/germline/single_sample/UltimaGenomics/BroadInternalUltimaGenomics.changelog.md b/pipelines/broad/internal/dna_seq/germline/single_sample/UltimaGenomics/BroadInternalUltimaGenomics.changelog.md index 69de3d250d..f30781fe04 100644 --- a/pipelines/broad/internal/dna_seq/germline/single_sample/UltimaGenomics/BroadInternalUltimaGenomics.changelog.md +++ b/pipelines/broad/internal/dna_seq/germline/single_sample/UltimaGenomics/BroadInternalUltimaGenomics.changelog.md @@ -1,3 +1,8 @@ +# 1.0.16 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. +* # 1.0.15 2023-12-14 (Date of Last Commit) diff --git a/pipelines/broad/internal/dna_seq/germline/single_sample/UltimaGenomics/BroadInternalUltimaGenomics.wdl b/pipelines/broad/internal/dna_seq/germline/single_sample/UltimaGenomics/BroadInternalUltimaGenomics.wdl index fe1761415a..6d5a522cf8 100644 --- a/pipelines/broad/internal/dna_seq/germline/single_sample/UltimaGenomics/BroadInternalUltimaGenomics.wdl +++ b/pipelines/broad/internal/dna_seq/germline/single_sample/UltimaGenomics/BroadInternalUltimaGenomics.wdl @@ -6,7 +6,7 @@ import "../../../../../../../pipelines/broad/qc/CheckFingerprint.wdl" as FP workflow BroadInternalUltimaGenomics { - String pipeline_version = "1.0.15" + String pipeline_version = "1.0.16" input { diff --git a/pipelines/broad/internal/rna_seq/BroadInternalRNAWithUMIs.changelog.md b/pipelines/broad/internal/rna_seq/BroadInternalRNAWithUMIs.changelog.md index 782832746f..45e54d0d46 100644 --- a/pipelines/broad/internal/rna_seq/BroadInternalRNAWithUMIs.changelog.md +++ b/pipelines/broad/internal/rna_seq/BroadInternalRNAWithUMIs.changelog.md @@ -1,3 +1,8 @@ +# 1.0.28 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 1.0.27 2023-12-08 (Date of Last Commit) diff --git a/pipelines/broad/internal/rna_seq/BroadInternalRNAWithUMIs.wdl b/pipelines/broad/internal/rna_seq/BroadInternalRNAWithUMIs.wdl index 466ea8c070..293b03a33e 100644 --- a/pipelines/broad/internal/rna_seq/BroadInternalRNAWithUMIs.wdl +++ b/pipelines/broad/internal/rna_seq/BroadInternalRNAWithUMIs.wdl @@ -7,7 +7,7 @@ import "../../../../tasks/broad/Utilities.wdl" as utils workflow BroadInternalRNAWithUMIs { - String pipeline_version = "1.0.27" + String pipeline_version = "1.0.28" input { # input needs to be either "hg19" or "hg38" diff --git a/pipelines/broad/qc/CheckFingerprint.changelog.md b/pipelines/broad/qc/CheckFingerprint.changelog.md index cd3db0e0e3..e47ab3c0f4 100644 --- a/pipelines/broad/qc/CheckFingerprint.changelog.md +++ b/pipelines/broad/qc/CheckFingerprint.changelog.md @@ -1,3 +1,8 @@ +# 1.0.15 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 1.0.14 2023-12-08 (Date of Last Commit) diff --git a/pipelines/broad/qc/CheckFingerprint.wdl b/pipelines/broad/qc/CheckFingerprint.wdl index 24d2c5f3ea..dd8d48b424 100644 --- a/pipelines/broad/qc/CheckFingerprint.wdl +++ b/pipelines/broad/qc/CheckFingerprint.wdl @@ -24,7 +24,7 @@ import "../../../tasks/broad/Qc.wdl" as Qc workflow CheckFingerprint { - String pipeline_version = "1.0.14" + String pipeline_version = "1.0.15" input { File? input_vcf diff --git a/pipelines/broad/reprocessing/exome/ExomeReprocessing.changelog.md b/pipelines/broad/reprocessing/exome/ExomeReprocessing.changelog.md index 7a0b8b9a7d..f6fc478efb 100644 --- a/pipelines/broad/reprocessing/exome/ExomeReprocessing.changelog.md +++ b/pipelines/broad/reprocessing/exome/ExomeReprocessing.changelog.md @@ -1,10 +1,14 @@ +# 3.1.18 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 3.1.17 2023-12-14 (Date of Last Commit) * Updated GATK for Reblock task to version 4.5.0.0 * Added options to Reblock task to remove annotations and move filters to genotype level - # 3.1.16 2023-12-08 (Date of Last Commit) diff --git a/pipelines/broad/reprocessing/exome/ExomeReprocessing.wdl b/pipelines/broad/reprocessing/exome/ExomeReprocessing.wdl index 46607e6499..a26b442550 100644 --- a/pipelines/broad/reprocessing/exome/ExomeReprocessing.wdl +++ b/pipelines/broad/reprocessing/exome/ExomeReprocessing.wdl @@ -7,7 +7,7 @@ import "../../../../structs/dna_seq/DNASeqStructs.wdl" workflow ExomeReprocessing { - String pipeline_version = "3.1.17" + String pipeline_version = "3.1.18" input { File? input_cram diff --git a/pipelines/broad/reprocessing/external/exome/ExternalExomeReprocessing.changelog.md b/pipelines/broad/reprocessing/external/exome/ExternalExomeReprocessing.changelog.md index af5579f916..c33dcb6544 100644 --- a/pipelines/broad/reprocessing/external/exome/ExternalExomeReprocessing.changelog.md +++ b/pipelines/broad/reprocessing/external/exome/ExternalExomeReprocessing.changelog.md @@ -1,10 +1,14 @@ +# 3.1.20 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 3.1.19 2023-12-14 (Date of Last Commit) * Updated GATK for Reblock task to version 4.5.0.0 * Added options to Reblock task to remove annotations and move filters to genotype level - # 3.1.18 2023-12-08 (Date of Last Commit) diff --git a/pipelines/broad/reprocessing/external/exome/ExternalExomeReprocessing.wdl b/pipelines/broad/reprocessing/external/exome/ExternalExomeReprocessing.wdl index be94c054af..391afed2c2 100644 --- a/pipelines/broad/reprocessing/external/exome/ExternalExomeReprocessing.wdl +++ b/pipelines/broad/reprocessing/external/exome/ExternalExomeReprocessing.wdl @@ -5,7 +5,7 @@ import "../../../../../tasks/broad/CopyFilesFromCloudToCloud.wdl" as Copy workflow ExternalExomeReprocessing { - String pipeline_version = "3.1.19" + String pipeline_version = "3.1.20" input { diff --git a/pipelines/broad/reprocessing/external/wgs/ExternalWholeGenomeReprocessing.changelog.md b/pipelines/broad/reprocessing/external/wgs/ExternalWholeGenomeReprocessing.changelog.md index 152a5ce375..21ed6a7962 100644 --- a/pipelines/broad/reprocessing/external/wgs/ExternalWholeGenomeReprocessing.changelog.md +++ b/pipelines/broad/reprocessing/external/wgs/ExternalWholeGenomeReprocessing.changelog.md @@ -1,10 +1,14 @@ +# 2.1.20 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 2.1.19 2023-12-14 (Date of Last Commit) * Updated GATK for Reblock task to version 4.5.0.0 * Added options to Reblock task to remove annotations and move filters to genotype level - # 2.1.18 2023-12-08 (Date of Last Commit) diff --git a/pipelines/broad/reprocessing/external/wgs/ExternalWholeGenomeReprocessing.wdl b/pipelines/broad/reprocessing/external/wgs/ExternalWholeGenomeReprocessing.wdl index a80fde6b94..f4dd1af8d9 100644 --- a/pipelines/broad/reprocessing/external/wgs/ExternalWholeGenomeReprocessing.wdl +++ b/pipelines/broad/reprocessing/external/wgs/ExternalWholeGenomeReprocessing.wdl @@ -6,7 +6,7 @@ import "../../../../../tasks/broad/CopyFilesFromCloudToCloud.wdl" as Copy workflow ExternalWholeGenomeReprocessing { - String pipeline_version = "2.1.19" + String pipeline_version = "2.1.20" input { File? input_cram diff --git a/pipelines/broad/reprocessing/wgs/WholeGenomeReprocessing.changelog.md b/pipelines/broad/reprocessing/wgs/WholeGenomeReprocessing.changelog.md index 83d465aae4..9b7163a4a0 100644 --- a/pipelines/broad/reprocessing/wgs/WholeGenomeReprocessing.changelog.md +++ b/pipelines/broad/reprocessing/wgs/WholeGenomeReprocessing.changelog.md @@ -1,3 +1,8 @@ +# 3.1.19 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 3.1.18 2023-12-14 (Date of Last Commit) diff --git a/pipelines/broad/reprocessing/wgs/WholeGenomeReprocessing.wdl b/pipelines/broad/reprocessing/wgs/WholeGenomeReprocessing.wdl index f60603e7bd..6cfd2bbf39 100644 --- a/pipelines/broad/reprocessing/wgs/WholeGenomeReprocessing.wdl +++ b/pipelines/broad/reprocessing/wgs/WholeGenomeReprocessing.wdl @@ -6,7 +6,7 @@ import "../../../../structs/dna_seq/DNASeqStructs.wdl" workflow WholeGenomeReprocessing { - String pipeline_version = "3.1.18" + String pipeline_version = "3.1.19" input { File? input_cram diff --git a/pipelines/broad/rna_seq/RNAWithUMIsPipeline.changelog.md b/pipelines/broad/rna_seq/RNAWithUMIsPipeline.changelog.md index bb98577d93..3007372a10 100644 --- a/pipelines/broad/rna_seq/RNAWithUMIsPipeline.changelog.md +++ b/pipelines/broad/rna_seq/RNAWithUMIsPipeline.changelog.md @@ -1,3 +1,8 @@ +# 1.0.16 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 1.0.15 2023-07-27 (Date of Last Commit) diff --git a/pipelines/broad/rna_seq/RNAWithUMIsPipeline.wdl b/pipelines/broad/rna_seq/RNAWithUMIsPipeline.wdl index d88fc95fed..9787fa6dcd 100644 --- a/pipelines/broad/rna_seq/RNAWithUMIsPipeline.wdl +++ b/pipelines/broad/rna_seq/RNAWithUMIsPipeline.wdl @@ -20,7 +20,7 @@ import "../../../tasks/broad/RNAWithUMIsTasks.wdl" as tasks workflow RNAWithUMIsPipeline { - String pipeline_version = "1.0.15" + String pipeline_version = "1.0.16" input { File? bam diff --git a/pipelines/cemba/cemba_methylcseq/CEMBA.changelog.md b/pipelines/cemba/cemba_methylcseq/CEMBA.changelog.md index 37c70e3f93..e778cd427c 100644 --- a/pipelines/cemba/cemba_methylcseq/CEMBA.changelog.md +++ b/pipelines/cemba/cemba_methylcseq/CEMBA.changelog.md @@ -1,3 +1,8 @@ +# 1.1.6 +2023-12-18 (Date of Last Commit) + +* Updated to GATK version 4.5.0.0. + # 1.1.5 2023-01-13 (Date of Last Commit) diff --git a/pipelines/cemba/cemba_methylcseq/CEMBA.wdl b/pipelines/cemba/cemba_methylcseq/CEMBA.wdl index 49998776cf..ab2df2802f 100644 --- a/pipelines/cemba/cemba_methylcseq/CEMBA.wdl +++ b/pipelines/cemba/cemba_methylcseq/CEMBA.wdl @@ -57,7 +57,7 @@ workflow CEMBA { } # version of this pipeline - String pipeline_version = "1.1.5" + String pipeline_version = "1.1.6" # trim off hardcoded sequence adapters call Trim as TrimAdapters { @@ -1008,7 +1008,7 @@ task AddReadGroup { >>> runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" # if the input size is less than 1 GB adjust to min input size of 1 GB # disks should be set to 2 * input file size disks: "local-disk " + ceil(2 * (if input_size < 1 then 1 else input_size)) + " HDD" @@ -1063,7 +1063,7 @@ task MethylationTypeCaller { >>> runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" # if the input size is less than 1 GB adjust to min input size of 1 GB disks: "local-disk " + ceil(4.5 * (if input_size < 1 then 1 else input_size)) + " HDD" cpu: 1 diff --git a/scripts/BuildAFComparisonTable.wdl b/scripts/BuildAFComparisonTable.wdl index 066d5691f8..43e331df4e 100644 --- a/scripts/BuildAFComparisonTable.wdl +++ b/scripts/BuildAFComparisonTable.wdl @@ -117,7 +117,7 @@ task AnnotateWithAF_t { >>> runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" disks: "local-disk " + disk_size + " HDD" memory: mem + " GB" # some of the gnomad vcfs are like 38 gigs so maybe need more ? } @@ -145,7 +145,7 @@ task GatherVCFsCloud { >>> runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" disks: "local-disk " + disk_size + " HDD" memory: "16 GB" } @@ -169,7 +169,7 @@ task MakeSitesOnlyVcf { >>> runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" disks: "local-disk " + disk_size + " HDD" memory: "16 GB" } @@ -224,7 +224,7 @@ task VariantsToTable { >>> runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" disks: "local-disk " + disk_size + " HDD" memory: "16 GB" } @@ -280,7 +280,7 @@ task RemoveSymbolicAlleles { File output_vcf_index = "~{output_basename}.vcf.gz.tbi" } runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" disks: "local-disk " + disk_size + " HDD" memory: "4 GB" } diff --git a/scripts/RemoveBadSitesByID.wdl b/scripts/RemoveBadSitesByID.wdl index 700ee05a4a..bb1bcefd8f 100644 --- a/scripts/RemoveBadSitesByID.wdl +++ b/scripts/RemoveBadSitesByID.wdl @@ -129,7 +129,7 @@ task SplitX { >>> runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" disks: "local-disk " + disk_size + " HDD" memory: "16 GB" } @@ -215,7 +215,7 @@ task RemoveBadSitesFromVcf { >>> runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" disks: "local-disk 100 HDD" memory: "16 GB" } diff --git a/tasks/broad/DragenTasks.wdl b/tasks/broad/DragenTasks.wdl index ebc4146a7e..149eb5fd12 100644 --- a/tasks/broad/DragenTasks.wdl +++ b/tasks/broad/DragenTasks.wdl @@ -24,7 +24,7 @@ task CalibrateDragstrModel { File str_table_file File alignment ## can handle cram or bam. File alignment_index - String docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int preemptible_tries = 3 Int threads = 4 Int? memory_mb diff --git a/tasks/broad/GermlineVariantDiscovery.wdl b/tasks/broad/GermlineVariantDiscovery.wdl index 0e446a1993..0e3c8f2e6e 100644 --- a/tasks/broad/GermlineVariantDiscovery.wdl +++ b/tasks/broad/GermlineVariantDiscovery.wdl @@ -96,7 +96,7 @@ task HaplotypeCaller_GATK4_VCF { Boolean use_dragen_hard_filtering = false Boolean use_spanning_event_genotyping = true File? dragstr_model - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int memory_multiplier = 1 } @@ -256,7 +256,7 @@ task HardFilterVcf { String vcf_basename File interval_list Int preemptible_tries - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } Int disk_size = ceil(2 * size(input_vcf, "GiB")) + 20 @@ -292,7 +292,7 @@ task DragenHardFilterVcf { Boolean make_gvcf String vcf_basename Int preemptible_tries - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } Int disk_size = ceil(2 * size(input_vcf, "GiB")) + 20 @@ -332,7 +332,7 @@ task CNNScoreVariants { File ref_fasta_index File ref_dict Int preemptible_tries - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } Int disk_size = ceil(size(bamout, "GiB") + size(ref_fasta, "GiB") + (size(input_vcf, "GiB") * 2)) @@ -389,7 +389,7 @@ task FilterVariantTranches { File dbsnp_resource_vcf_index String info_key Int preemptible_tries - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } Int disk_size = ceil(size(hapmap_resource_vcf, "GiB") + diff --git a/tasks/broad/IlluminaGenotypingArrayTasks.wdl b/tasks/broad/IlluminaGenotypingArrayTasks.wdl index dff9fdee6e..2598bed60b 100644 --- a/tasks/broad/IlluminaGenotypingArrayTasks.wdl +++ b/tasks/broad/IlluminaGenotypingArrayTasks.wdl @@ -404,7 +404,7 @@ task SelectVariants { >>> runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" bootDiskSizeGb: 15 disks: "local-disk " + disk_size + " HDD" memory: "3500 MiB" @@ -441,7 +441,7 @@ task SelectIndels { >>> runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" bootDiskSizeGb: 15 disks: "local-disk " + disk_size + " HDD" memory: "3500 MiB" @@ -577,7 +577,7 @@ task SubsetArrayVCF { } runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" bootDiskSizeGb: 15 disks: "local-disk " + disk_size + " HDD" memory: "3500 MiB" @@ -676,7 +676,7 @@ task ValidateVariants { >>> runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" bootDiskSizeGb: 15 disks: "local-disk " + disk_size + " HDD" memory: "3500 MiB" diff --git a/tasks/broad/ImputationTasks.wdl b/tasks/broad/ImputationTasks.wdl index 533cdb6dfc..5e575c002a 100644 --- a/tasks/broad/ImputationTasks.wdl +++ b/tasks/broad/ImputationTasks.wdl @@ -65,7 +65,7 @@ task GenerateChunk { Int disk_size_gb = ceil(2*size(vcf, "GiB")) + 50 # not sure how big the disk size needs to be since we aren't downloading the entire VCF here Int cpu = 1 Int memory_mb = 8000 - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } Int command_mem = memory_mb - 1000 Int max_heap = memory_mb - 500 @@ -112,7 +112,7 @@ task CountVariantsInChunks { File panel_vcf File panel_vcf_index - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int cpu = 1 Int memory_mb = 4000 Int disk_size_gb = 2 * ceil(size([vcf, vcf_index, panel_vcf, panel_vcf_index], "GiB")) + 20 @@ -266,7 +266,7 @@ task GatherVcfs { Array[File] input_vcfs String output_vcf_basename - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int cpu = 1 Int memory_mb = 16000 Int disk_size_gb = ceil(3*size(input_vcfs, "GiB")) @@ -336,7 +336,7 @@ task UpdateHeader { String basename Int disk_size_gb = ceil(4*(size(vcf, "GiB") + size(vcf_index, "GiB"))) + 20 - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int cpu = 1 Int memory_mb = 8000 } @@ -372,7 +372,7 @@ task RemoveSymbolicAlleles { String output_basename Int disk_size_gb = ceil(3*(size(original_vcf, "GiB") + size(original_vcf_index, "GiB"))) - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int cpu = 1 Int memory_mb = 4000 } @@ -659,7 +659,7 @@ task SubsetVcfToRegion { Int disk_size_gb = ceil(2*size(vcf, "GiB")) + 50 # not sure how big the disk size needs to be since we aren't downloading the entire VCF here Int cpu = 1 Int memory_mb = 8000 - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } Int command_mem = memory_mb - 1000 Int max_heap = memory_mb - 500 @@ -754,7 +754,7 @@ task SelectVariantsByIds { File ids String basename - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int cpu = 1 Int memory_mb = 16000 Int disk_size_gb = ceil(1.2*size(vcf, "GiB")) + 100 @@ -820,7 +820,7 @@ task InterleaveVariants { Array[File] vcfs String basename - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int cpu = 1 Int memory_mb = 16000 Int disk_size_gb = ceil(3.2*size(vcfs, "GiB")) + 100 diff --git a/tasks/broad/JointGenotypingTasks.wdl b/tasks/broad/JointGenotypingTasks.wdl index 41918ed50b..65386b0f06 100644 --- a/tasks/broad/JointGenotypingTasks.wdl +++ b/tasks/broad/JointGenotypingTasks.wdl @@ -51,7 +51,7 @@ task SplitIntervalList { Int machine_mem_mb = 3750 String scatter_mode = "BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW" String? extra_args - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } parameter_meta { @@ -94,7 +94,7 @@ task ImportGVCFs { Int machine_mem_mb = 30000 Int batch_size - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } command <<< @@ -159,7 +159,7 @@ task GenotypeGVCFs { Int machine_mem_mb = 26000 # This is needed for gVCFs generated with GATK3 HaplotypeCaller Boolean allow_old_rms_mapping_quality_annotation_data = false - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } parameter_meta { @@ -216,7 +216,7 @@ task GnarlyGenotyper { String dbsnp_vcf Boolean make_annotation_db = false - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int machine_mem_mb = 26000 Int disk_size_gb = ceil(size(workspace_tar, "GiB") + size(ref_fasta, "GiB") + size(dbsnp_vcf, "GiB") * 3) } @@ -277,7 +277,7 @@ task HardFilterAndMakeSitesOnlyVcf { Int disk_size_gb Int machine_mem_mb = 3750 - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } command <<< @@ -339,7 +339,7 @@ task IndelsVariantRecalibrator { Int disk_size_gb Int machine_mem_mb = 26000 - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } command <<< @@ -404,7 +404,7 @@ task SNPsVariantRecalibratorCreateModel { Int disk_size_gb Int machine_mem_mb = 104000 - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } command <<< @@ -468,7 +468,7 @@ task SNPsVariantRecalibrator { Int max_gaussians = 6 Int disk_size_gb - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int? machine_mem_mb } @@ -533,7 +533,7 @@ task GatherTranches { String mode Int disk_size_gb Int machine_mem_mb = 7500 - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } parameter_meta { @@ -607,7 +607,7 @@ task ApplyRecalibration { Boolean use_allele_specific_annotations Int disk_size_gb Int machine_mem_mb = 7000 - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } command <<< @@ -658,7 +658,7 @@ task GatherVcfs { String output_vcf_name Int disk_size_gb Int machine_mem_mb = 7000 - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } parameter_meta { @@ -706,7 +706,7 @@ task SelectFingerprintSiteVariants { String base_output_name Int disk_size_gb Int machine_mem_mb = 7500 - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } parameter_meta { @@ -759,7 +759,7 @@ task CollectVariantCallingMetrics { File ref_dict Int disk_size_gb Int machine_mem_mb = 7500 - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } command <<< @@ -798,7 +798,7 @@ task GatherVariantCallingMetrics { String output_prefix Int disk_size_gb Int machine_mem_mb = 3000 - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } parameter_meta { @@ -879,7 +879,7 @@ task CrossCheckFingerprint { String output_base_name Boolean scattered = false Array[String] expected_inconclusive_samples = [] - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int? machine_mem_mb Int disk = 100 } @@ -1000,7 +1000,7 @@ task GetFingerprintingIntervalIndices { input { Array[File] unpadded_intervals File haplotype_database - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int disk_size_gb = 10 Int machine_mem_mb = 3750 } @@ -1114,7 +1114,7 @@ task CalculateAverageAnnotations { File vcf Array[String] annotations_to_divide = ["ASSEMBLED_HAPS", "FILTERED_HAPS", "TREE_SCORE"] - String docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int disk_size_gb = ceil(size(vcf, "GB") + 50) Int memory_mb = 12000 Int preemptible = 3 diff --git a/tasks/broad/Qc.wdl b/tasks/broad/Qc.wdl index 3dadae1b72..0ff525b571 100644 --- a/tasks/broad/Qc.wdl +++ b/tasks/broad/Qc.wdl @@ -621,7 +621,7 @@ task ValidateVCF { Int preemptible_tries = 3 Boolean is_gvcf = true String? extra_args - String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int machine_mem_mb = 7000 } @@ -629,7 +629,7 @@ task ValidateVCF { String calling_interval_list_basename = basename(calling_interval_list) String calling_interval_list_index_basename = if calling_intervals_is_vcf then basename(select_first([calling_interval_list_index])) else "" - Int command_mem_mb = machine_mem_mb - 1000 + Int command_mem_mb = machine_mem_mb - 2000 Float ref_size = size(ref_fasta, "GiB") + size(ref_fasta_index, "GiB") + size(ref_dict, "GiB") Int disk_size = ceil(size(input_vcf, "GiB") + size(dbsnp_vcf, "GiB") + ref_size) + 20 diff --git a/tasks/broad/RNAWithUMIsTasks.wdl b/tasks/broad/RNAWithUMIsTasks.wdl index 9e5b459f28..0c5ee13362 100644 --- a/tasks/broad/RNAWithUMIsTasks.wdl +++ b/tasks/broad/RNAWithUMIsTasks.wdl @@ -278,7 +278,7 @@ task GetSampleName { input { File bam - String docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int cpu = 1 Int memory_mb = 1000 Int disk_size_gb = ceil(2.0 * size(bam, "GiB")) + 10 @@ -852,7 +852,7 @@ task CalculateContamination { File population_vcf File population_vcf_index # runtime - String docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int cpu = 1 Int memory_mb = 8192 Int disk_size_gb = 256 diff --git a/tasks/broad/UltimaGenomicsGermlineFilteringThreshold.wdl b/tasks/broad/UltimaGenomicsGermlineFilteringThreshold.wdl index e293defa2e..6d1e5999bb 100644 --- a/tasks/broad/UltimaGenomicsGermlineFilteringThreshold.wdl +++ b/tasks/broad/UltimaGenomicsGermlineFilteringThreshold.wdl @@ -326,7 +326,7 @@ task HardThresholdVCF { String output_basename String score_key Int disk_size - String docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" } command <<< @@ -384,7 +384,7 @@ task AnnotateSampleVCF { File input_vcf_index String output_basename Int disk_size = ceil(size(input_vcf, "GB") * 2) + 50 - String docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" File ref_fasta File ref_fasta_index File ref_dict diff --git a/tasks/broad/UltimaGenomicsWholeGenomeGermlineTasks.wdl b/tasks/broad/UltimaGenomicsWholeGenomeGermlineTasks.wdl index d1761ba041..4942f79280 100644 --- a/tasks/broad/UltimaGenomicsWholeGenomeGermlineTasks.wdl +++ b/tasks/broad/UltimaGenomicsWholeGenomeGermlineTasks.wdl @@ -56,7 +56,7 @@ task SplitCram { String base_file_name Int reads_per_file - String docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int disk_size_gb = ceil(3 * size(input_cram_bam, "GiB") + 20) Int cpu = 1 Int memory_gb = 10 @@ -269,7 +269,7 @@ task MarkDuplicatesSpark { Array[File] input_bams String output_bam_basename Boolean save_bam_file - String docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int disk_size_gb Int cpu = 32 Int memory_mb = if 4 * ceil(size(input_bams, "MB")) / 4000 > 600000 then 300000 else 208000 @@ -319,7 +319,7 @@ task ExtractSampleNameFlowOrder{ File input_bam References references - String docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int disk_size_gb = ceil(size(input_bam, "GB") + size(references.ref_fasta, "GB") + 20) Int cpu = 1 Int memory_mb = 2000 @@ -480,7 +480,7 @@ task HaplotypeCaller { Boolean native_sw = false String? contamination_extra_args - String docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int disk_size_gb = ceil((size(input_bam_list, "GB")) + size(references.ref_fasta, "GB") + size(references.ref_fasta_index, "GB") + size(references.ref_dict, "GB") + 60) Int cpu = 2 Int memory_mb = 12000 @@ -591,7 +591,7 @@ task ConvertGVCFtoVCF { String output_vcf_name References references - String docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int disk_size_gb = ceil(2 * size(input_gvcf, "GB") + size(references.ref_fasta, "GB") + size(input_gvcf_index, "GB") + 20) Int cpu = 1 Int memory_mb = 12000 @@ -947,7 +947,7 @@ task AnnotateVCF { String flow_order String final_vcf_base_name - String docker = "us.gcr.io/broad-gatk/gatk:4.3.0.0" + String docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0" Int disk_size_gb = ceil(2 * size(input_vcf, "GB") + size(references.ref_fasta, "GB") + size(reference_dbsnp, "GB") + 20) Int cpu = 1 Int memory_mb = 15000 diff --git a/verification/VerifyNA12878.wdl b/verification/VerifyNA12878.wdl index 5bee339931..0a8e699ffa 100644 --- a/verification/VerifyNA12878.wdl +++ b/verification/VerifyNA12878.wdl @@ -80,7 +80,7 @@ task RunValidation { } runtime { - docker: "us.gcr.io/broad-gatk/gatk:4.3.0.0" + docker: "us.gcr.io/broad-gatk/gatk:4.5.0.0" memory: machine_mem + " MiB" disks: "local-disk " + select_first([disk_space, 100]) + if use_ssd then " SSD" else " HDD" diff --git a/verification/VerifyTasks.wdl b/verification/VerifyTasks.wdl index 3e8bcdce3e..2772b41aec 100644 --- a/verification/VerifyTasks.wdl +++ b/verification/VerifyTasks.wdl @@ -20,7 +20,7 @@ task CompareVcfs { runtime { docker: "gcr.io/gcp-runtimes/ubuntu_16_0_4:latest" - disks: "local-disk 50 HDD" + disks: "local-disk 70 HDD" memory: "32 GiB" preemptible: 3 } diff --git a/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/CEMBA.methods.md b/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/CEMBA.methods.md index 13ede6b7c9..97c7ef4c94 100644 --- a/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/CEMBA.methods.md +++ b/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/CEMBA.methods.md @@ -2,7 +2,7 @@ sidebar_position: 2 --- -# CEMBA_v1.1.5 Publication Methods +# CEMBA_v1.1.6 Publication Methods Below we provide a sample methods section for a publication. For the complete pipeline documentation, see the [CEMBA README](./README.md). @@ -20,7 +20,7 @@ The trimmed R1 and R2 reads were then aligned to mouse (mm10) or human (hg19) ge After alignment, the output R1 and R2 BAMs were sorted in coordinate order and duplicates removed using the Picard MarkDuplicates REMOVE_DUPLICATE option. Samtools 1.9 was used to further filter BAMs with a minimum map quality of 30 using the parameter `-bhq 30`. -Methylation reports were produced for the filtered BAMs using Bismark. The barcodes from the R1 uBAM were then attached to the aligned, filtered R1 BAM with Picard. The R1 and R2 BAMs were merged with Samtools. Readnames were added to the merged BAM and a methylated VCF created using MethylationTypeCaller in GATK 4.3.0.0. The VCF was then converted to an additional ALLC file using a custom python script. +Methylation reports were produced for the filtered BAMs using Bismark. The barcodes from the R1 uBAM were then attached to the aligned, filtered R1 BAM with Picard. The R1 and R2 BAMs were merged with Samtools. Readnames were added to the merged BAM and a methylated VCF created using MethylationTypeCaller in GATK 4.5.0.0. The VCF was then converted to an additional ALLC file using a custom python script. Samtools was then used to calculate coverage depth for sites with coverage greater than 1 and to create BAM index files. The final outputs included the barcoded aligned BAM, BAM index, a VCF with locus-specific methylation information, VCF index, ALLC file, and methylation reports. diff --git a/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/README.md b/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/README.md index d38b188f5c..af51b43992 100644 --- a/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/README.md +++ b/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/CEMBA_MethylC_Seq_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [CEMBA_v1.1.5](https://github.com/broadinstitute/warp/releases) | December, 2023 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [CEMBA_v1.1.6](https://github.com/broadinstitute/warp/releases) | December, 2023 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![CEMBA](./CEMBA.png) @@ -28,7 +28,7 @@ Interested in using the pipeline for your publication? See the [“CEMBA publica | Workflow Language | WDL 1.0 | [openWDL](https://github.com/openwdl/wdl) | | Genomic Reference Sequence| GRCH38 and GRCM38 | [GENCODE](https://www.gencodegenes.org/) | | Aligner | BISMARK v0.21.0 with --bowtie2 | [Bismark](https://www.bioinformatics.babraham.ac.uk/projects/bismark/) | -| Variant Caller | GATK 4.3.0.0 | [GATK 4.3.0.0](https://gatk.broadinstitute.org/hc/en-us) +| Variant Caller | GATK 4.5.0.0 | [GATK 4.5.0.0](https://gatk.broadinstitute.org/hc/en-us) | Data Input File Format | File format in which sequencing data is provided | [Zipped FASTQs (.fastq.gz)](https://support.illumina.com/bulletins/2016/04/fastq-files-explained.html) | | Data Output File Format | File formats in which CEMBA output is provided | [BAM](http://samtools.github.io/hts-specs/), [VCF](https://samtools.github.io/hts-specs/VCFv4.2.pdf), [ALLC](https://github.com/yupenghe/methylpy#output-format) | @@ -105,10 +105,10 @@ The table and summary sections below detail the tasks and tools of the CEMBA pip | GetMethylationReport | [Bismark v0.21.0](https://www.bioinformatics.babraham.ac.uk/projects/bismark/) | Produce methylation report for reads above map quality and below map quality | quay.io/broadinstitute/bismark:0.21.0 | | AttachBarcodes | [Picard v2.26.10](https://broadinstitute.github.io/picard/) | Add barcodes from the tagged uBAM to the aligned BAM | us.gcr.io/broad-gotc-prod/picard-cloud:2.26.10 | | MergeBams | [Samtools v.19](http://www.htslib.org/) | Merge R1 and R2 BAM files into single BAM | quay.io/broadinstitute/samtools:1.9 | -| AddReadGroup | [GATK v4.3.0.0](https://gatk.broadinstitute.org/hc/en-us) | Add read groups to the merged BAM | us.gcr.io/broad-gatk/gatk:4.3.0.0 | +| AddReadGroup | [GATK v4.5.0.0](https://gatk.broadinstitute.org/hc/en-us) | Add read groups to the merged BAM | us.gcr.io/broad-gatk/gatk:4.5.0.0 | | Sort | [Picard v2.26.10](https://broadinstitute.github.io/picard/) | Sort in coordinate order after adding read group | us.gcr.io/broad-gotc-prod/picard-cloud:2.26.10 | | IndexBam | [Samtools v1.9](http://www.htslib.org/) | Index the output BAM | quay.io/broadinstitute/samtools:1.9 | -| MethylationTypeCaller | [GATK v4.3.0.0](https://gatk.broadinstitute.org/hc/en-us) | Produce a VCF with locus-specific methylation information | us.gcr.io/broad-gatk/gatk:4.3.0.0 | +| MethylationTypeCaller | [GATK v4.5.0.0](https://gatk.broadinstitute.org/hc/en-us) | Produce a VCF with locus-specific methylation information | us.gcr.io/broad-gatk/gatk:4.5.0.0 | | VCFtoALLC | Python | Creates an [ALLC](https://github.com/yupenghe/methylpy#output-format) file from the VCF produced with MethylationTypeCaller | quay.io/cemba/vcftoallc:v0.0.1 | | ComputeCoverageDepth | [Samtools v1.9](http://www.htslib.org/) | Compute number of sites with coverage greater than 1 | quay.io/broadinstitute/samtools:1.9 | diff --git a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md index 2165bd249d..5640900abc 100644 --- a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Exome_Germline_Single_Sample_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [ExomeGermlineSingleSample_v3.1.16](https://github.com/broadinstitute/warp/releases?q=ExomeGermlineSingleSample_v3.0.0&expanded=true) | December, 2023 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [ExomeGermlineSingleSample_v3.1.18](https://github.com/broadinstitute/warp/releases?q=ExomeGermlineSingleSample_v3.0.0&expanded=true) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | The Exome Germline Single Sample pipeline implements data pre-processing and initial variant calling according to the GATK Best Practices for germline SNP and Indel discovery in human exome sequencing data. @@ -27,7 +27,7 @@ The Exome Germline Single Sample workflow is written in the Workflow Description ### Software Version Requirements -* [GATK 4.3.0.0](https://github.com/broadinstitute/gatk/releases/tag/4.3.0.0) +* [GATK 4.5.0.0](https://github.com/broadinstitute/gatk/releases/tag/4.5.0.0) * Picard 2.26.10 * Samtools 1.11 * Python 3.0 diff --git a/website/docs/Pipelines/JointGenotyping/README.md b/website/docs/Pipelines/JointGenotyping/README.md index 98eaeaa858..fb9d12cf61 100644 --- a/website/docs/Pipelines/JointGenotyping/README.md +++ b/website/docs/Pipelines/JointGenotyping/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/JointGenotyping_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [JointGenotyping_v1.6.9](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in WARP or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | +| [JointGenotyping_v1.6.10](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in WARP or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the JointGenotyping workflow diff --git a/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/README.md b/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/README.md index bd97ab96fa..90c81b2b95 100644 --- a/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/README.md +++ b/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/README | Pipeline Version | Date Updated | Documentation Authors | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [UltimaGenomicsWholeGenomeGermline_v1.0.13](https://github.com/broadinstitute/warp/releases) | December, 2023 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) & [Kaylee Mathews](mailto:kmathews@broadinstitute.org)| Please file GitHub issues in warp or contact [the wARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [UltimaGenomicsWholeGenomeGermline_v1.0.15](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) & [Kaylee Mathews](mailto:kmathews@broadinstitute.org)| Please file GitHub issues in warp or contact [the wARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![UG_diagram](ug_diagram.png) diff --git a/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/README.md b/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/README.md index 27ad3c9355..f34182b974 100644 --- a/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| WholeGenomeGermlineSingleSample_v3.1.17 (see [releases page](https://github.com/broadinstitute/warp/releases)) | December, 2023 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| WholeGenomeGermlineSingleSample_v3.1.19 (see [releases page](https://github.com/broadinstitute/warp/releases)) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the Whole Genome Germline Single Sample Pipeline The Whole Genome Germline Single Sample (WGS) pipeline implements data pre-processing and initial variant calling according to the GATK Best Practices for germline SNP and Indel discovery in human whole-genome sequencing data. It includes the DRAGEN-GATK mode, which makes the pipeline functionally equivalent to DRAGEN’s analysis pipeline (read more in this [DRAGEN-GATK blog](https://gatk.broadinstitute.org/hc/en-us/articles/360039984151)). diff --git a/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/wgs.methods.md b/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/wgs.methods.md index 842dd9ec59..01bd7457d0 100644 --- a/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/wgs.methods.md +++ b/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/wgs.methods.md @@ -2,13 +2,13 @@ sidebar_position: 2 --- -# Whole Genome Germline Single Sample v3.1.17 Methods (Default workflow) +# Whole Genome Germline Single Sample v3.1.19 Methods (Default workflow) The following contains a detailed methods description outlining the pipeline’s process, software, and tools that can be modified for a publication methods section. ## Detailed methods for the default Whole Genome Germline Single Sample workflow -Preprocessing and variant calling was performed using the WholeGenomeGermlineSingleSample v3.1.17 pipeline using Picard v2.26.10, GATK v4.3.0.0, and Samtools v1.11 with default tool parameters unless otherwise specified. All reference files are available in the public [Broad References Google Bucket](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0). The pipeline follows GATK Best Practices as previously described ([Van der Auwera & O'Connor, 2020](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/)) as well as the Functional Equivalence specification ([Regier et al., 2018](https://www.nature.com/articles/s41467-018-06159-4)). +Preprocessing and variant calling was performed using the WholeGenomeGermlineSingleSample v3.1.19 pipeline using Picard v2.26.10, GATK v4.5.0.0, and Samtools v1.11 with default tool parameters unless otherwise specified. All reference files are available in the public [Broad References Google Bucket](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0). The pipeline follows GATK Best Practices as previously described ([Van der Auwera & O'Connor, 2020](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/)) as well as the Functional Equivalence specification ([Regier et al., 2018](https://www.nature.com/articles/s41467-018-06159-4)). ### Pre-processing and quality control metrics @@ -34,7 +34,7 @@ The pipeline’s final outputs included metrics, validation reports, an aligned ## Detailed methods for the Functional Equivalence mode of the Whole Genome Germline Single Sample workflow -Preprocessing and variant calling was performed using the WholeGenomeGermlineSingleSample v3.1.17 pipeline using v2.26.10, GATK v4.3.0.0, and Samtools v1.11 with default tool parameters unless otherwise specified. All reference files are available in the public [Broad References Google Bucket](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0). The pipeline is functionally equivalent (as described in GATK Support: https://gatk.broadinstitute.org/hc/en-us/articles/4410456501915) to DRAGEN v3.4.12. +Preprocessing and variant calling was performed using the WholeGenomeGermlineSingleSample v3.1.17 pipeline using v2.26.10, GATK v4.5.0.0, and Samtools v1.11 with default tool parameters unless otherwise specified. All reference files are available in the public [Broad References Google Bucket](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0). The pipeline is functionally equivalent (as described in GATK Support: https://gatk.broadinstitute.org/hc/en-us/articles/4410456501915) to DRAGEN v3.4.12. ### Pre-processing and quality control metrics From 149e04920e8863cd471c2f0dcad47a370a0f7e8c Mon Sep 17 00:00:00 2001 From: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> Date: Fri, 16 Feb 2024 12:06:57 -0500 Subject: [PATCH 22/68] Km doc updates (#1206) * add azure JG link * update exome docs * update illumina docs * updated imputation docs * Update website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> --------- Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> --- .../Exome_Germline_Single_Sample_Pipeline/exome.methods.md | 5 +++-- .../Illumina_genotyping_array_spec.md | 6 +++--- .../Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md | 6 +++--- website/docs/Pipelines/Imputation_Pipeline/README.md | 4 ++-- website/docs/Pipelines/JointGenotyping/README.md | 2 ++ 5 files changed, 13 insertions(+), 10 deletions(-) diff --git a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md index a66740f22a..a09c96719b 100644 --- a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md +++ b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md @@ -2,13 +2,13 @@ sidebar_position: 2 --- -# Exome Germline Single Sample v3.0.0 Methods +# Exome Germline Single Sample v3.1.17 Methods The following contains a detailed methods description outlining the pipeline’s process, software, and tools that can be modified for a publication methods section. ## Detailed Methods -Preprocessing and variant calling was performed using the ExomeGermlineSingleSample 3.0.0 pipeline using Picard 2.23.8, GATK 4.2.2.0, and Samtools 1.11 with default tool parameters unless otherwise specified. All reference files are available in the public [Broad References Google Bucket](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0). The pipeline follows GATK Best Practices as previously described ([Van der Auwera & O'Connor, 2020](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/)) as well as the Functional Equivalence specification ([Regier et al., 2018](https://www.nature.com/articles/s41467-018-06159-4)). +Preprocessing and variant calling was performed using the ExomeGermlineSingleSample 3.1.17 pipeline using Picard 2.26.10, GATK 4.5.0.0, and Samtools 1.11 with default tool parameters unless otherwise specified. All reference files are available in the public [Broad References Google Bucket](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0). The pipeline follows GATK Best Practices as previously described ([Van der Auwera & O'Connor, 2020](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/)) as well as the Functional Equivalence specification ([Regier et al., 2018](https://www.nature.com/articles/s41467-018-06159-4)). ### Pre-processing and QC @@ -31,4 +31,5 @@ Prior to variant calling, the variant calling interval list was split to enable The pipeline’s final outputs included metrics, the ValidateSamFile validation reports, an aligned CRAM with index, and a reblocked GVCF containing variant calls with an accompanying index. ## Previous methods documents +- [ExomeGermlineSingleSample_v3.0.0](https://github.com/broadinstitute/warp/blob/ExomeGermlineSingleSample_v3.0.0/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md) - [ExomeGermlineSingleSample_v2.4.4](https://github.com/broadinstitute/warp/blob/ExomeGermlineSingleSample_v2.6.0/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md) \ No newline at end of file diff --git a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md index 46da522506..1462c4fdb7 100644 --- a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md +++ b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md @@ -4,7 +4,7 @@ sidebar_position: 2 # VCF Overview: Illumina Genotyping Array -The [Illumina Genotyping Array Pipeline](https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl) v1.11.0 pipeline produces a VCF (Variant Call Format) output with data processing and sample-specific genotype information. The VCF follows the format listed in the [VCF 4.2 specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf), but additionally contains fields and attributes that are unique to the Arrays pipeline. +The [Illumina Genotyping Array Pipeline](https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl) v1.12.15 pipeline produces a VCF (Variant Call Format) output with data processing and sample-specific genotype information. The VCF follows the format listed in the [VCF 4.2 specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf), but additionally contains fields and attributes that are unique to the Arrays pipeline. This document describes the Array pipeline’s unique VCF fields and attributes that are not listed in the standard VCF specification. To learn more about the pipeline, see the [Illumina Genotyping Array Pipeline Overview](./README.md). @@ -26,7 +26,7 @@ Each VCF has meta information fields with attributes that generally describe the - extendedIlluminaManifestVersion - Version of the ‘extended Illumina manifest’ used by the VCF - generation software. - extendedManifestFile - File name of the ‘extended Illumina manifest’ used by the VCF generation software - fingerprintGender - Gender (sex) determined using an orthogonal fingerprinting technology, populated by an optional parameter used by the VCF generation software -- gtcCallRate - GTC call rate of the sample processed that is generated by the autocall/gencall software and represents the fraction of callable loci that had valid calls +- gtcCallRate - GTC call rate of the sample processed that is generated by the autocall/gencall software and represents the fraction of callable loci that had valid calls; ignores zeroed-out SNPs - imagingDate - Creation date for the chip well barcode IDATs (raw image scans) - manifestFile - Name of the Illumina manifest (.bpm) file used by the VCF generation software - sampleAlias - Sample name @@ -112,4 +112,4 @@ The remaining attributes describe the cluster definitions provided in the cluste - meanX_BB - Mean of normalized X for BB cluster - meanY_AA - Mean of normalized Y for AA cluster - meanY_AB - Mean of normalized Y for AB cluster -- meanY_BB - Mean of normalized Y for BB cluster +- meanY_BB - Mean of normalized Y for BB cluster \ No newline at end of file diff --git a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md index 0cebb97fea..8eb9bed3b0 100644 --- a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md +++ b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Illumina_Genotyping_Arrays_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [Version 1.11.6](https://github.com/broadinstitute/warp/releases) | October, 2021 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [Version 1.12.15](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![The Illumina Genotyping Array Pipeline](./IlluminaGenotyping.png) @@ -121,7 +121,7 @@ The following table provides a summary of the WDL tasks and software tools calle | SubsetArrayVCF | [SubsetArrayVCF](https://gatk.broadinstitute.org/hc/en-us/articles/360036362532) | GATK | | CollectArraysVariantCallingMetrics | [CollectArraysVariantCallingMetrics](https://gatk.broadinstitute.org/hc/en-us/articles/360037593871) | Picard | | SelectVariants | [SelectVariants](https://gatk.broadinstitute.org/hc/en-us/articles/360036362532) | GATK | -| CheckFingerprint | [CheckFingerprint](https://gatk.broadinstitute.org/hc/en-us/articles/360036358752) | Picard | +| CheckFingerprintTask | [CheckFingerprint](https://gatk.broadinstitute.org/hc/en-us/articles/360036358752) | Picard | | VcfToIntervalList | [VcfToIntervalList](https://gatk.broadinstitute.org/hc/en-us/articles/360036897672) | Picard | | GenotypeConcordance | [GenotypeConcordance](https://gatk.broadinstitute.org/hc/en-us/articles/360036348932) | Picard | @@ -176,7 +176,7 @@ DNA fingerprinting helps maintain sample identity and avoid sample swaps. The Il #### 6. Evaluating an existing fingerprint (optional) -If the genotyping sample already has a corresponding fingerprint VCF file, the workflow can also optionally check the existing fingerprint to confirm sample identity. It uses the CheckFingerPrints task to calculate genotype concordance between the workflow’s genotyping output VCF (final_output_vcf) and the known genotype specified in a fingerprint_genotypes_vcf_file. The workflow returns a boolean for if the sample genotype failed concordance, as well as a Logarithm of Odds (LOD) score for concordance. +If the genotyping sample already has a corresponding fingerprint VCF file, the workflow can also optionally check the existing fingerprint to confirm sample identity. It uses the CheckFingerprintTask task to calculate genotype concordance between the workflow’s genotyping output VCF (final_output_vcf) and the known genotype specified in a fingerprint_genotypes_vcf_file. The workflow returns a boolean for if the sample genotype failed concordance, as well as a Logarithm of Odds (LOD) score for concordance. #### 7. Genotype concordance (optional) diff --git a/website/docs/Pipelines/Imputation_Pipeline/README.md b/website/docs/Pipelines/Imputation_Pipeline/README.md index 8d82efbc58..4743d3c1af 100644 --- a/website/docs/Pipelines/Imputation_Pipeline/README.md +++ b/website/docs/Pipelines/Imputation_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Imputation_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [Imputation_v1.0.0](https://github.com/broadinstitute/warp/releases?q=Imputation_v1.0.0&expanded=true) | August, 2021 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [Imputation_v1.1.11](https://github.com/broadinstitute/warp/releases?q=Imputation_v1.0.0&expanded=true) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the Imputation pipeline The Imputation pipeline imputes missing genotypes from either a multi-sample VCF or an array of single sample VCFs using a large genomic reference panel. It is based on the [Michigan Imputation Server pipeline](https://imputationserver.readthedocs.io/en/latest/pipeline/). Overall, the pipeline filters, phases, and performs imputation on a multi-sample VCF. It outputs the imputed VCF along with key imputation metrics. @@ -54,7 +54,7 @@ For examples of how to specify each input in a configuration file, as well as cl | genetics_maps_eagle | Genetic map file for phasing.| File | | output_callset_name | Output callset name. | String | | split_output_to_single_sample | Boolean to split out the final combined VCF to individual sample VCFs; set to false by default. | Boolean | -| merge_ssvcf_mem_gb | Memory allocation for MergeSingleSampleVcfs (in GB). | Int | +| merge_ssvcf_mem_mb | Optional integer specifying memory allocation for MergeSingleSampleVcfs (in MB); default is 3000. | Int | | frac_well_imputed_threshold | Threshold for the fraction of well-imputed sites; default set to 0.9. | Float | | chunks_fail_threshold | Maximum threshold for the number of chunks allowed to fail; default set to 1. | Float | | vcf_suffix | File extension used for the VCF in the reference panel. | String | diff --git a/website/docs/Pipelines/JointGenotyping/README.md b/website/docs/Pipelines/JointGenotyping/README.md index fb9d12cf61..27748df1f6 100644 --- a/website/docs/Pipelines/JointGenotyping/README.md +++ b/website/docs/Pipelines/JointGenotyping/README.md @@ -25,6 +25,8 @@ The pipeline can be configured to run using one of the following GATK variant fi The pipeline takes in a sample map file listing GVCF files produced by HaplotypeCaller in GVCF mode and produces a filtered VCF file (with index) containing genotypes for all samples present in the input VCF files. All sites that are present in the input VCF file are retained. Filtered sites are annotated as such in the FILTER field. If you are new to VCF files, see the [file type specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf). +The JointGenotyping pipeline can be adapted to run on Microsoft Azure instead of Google Cloud. For more information, see the [azure-warp-joint-calling GitHub repository](https://github.com/broadinstitute/azure-warp-joint-calling). + ## Set-up ### JointGenotyping Installation and Requirements From 64b433c641906bf8b07758cc9ac359c8377417bd Mon Sep 17 00:00:00 2001 From: Sid Cox Date: Fri, 16 Feb 2024 12:19:48 -0500 Subject: [PATCH 23/68] add check to modify cell_reads only when files exist --- tasks/skylab/StarAlign.wdl | 54 ++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 20302c7101..81f6668c42 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -507,30 +507,38 @@ task MergeStarOutput { declare -a align_features_files=(~{sep=' ' align_features}) declare -a umipercell_files=(~{sep=' ' umipercell}) - # Destination file for cell reads - dest="~{input_id}_cell_reads.txt" - # first create the header from the first file in the list, and add a column header for the shard id - head -n 1 "${cell_reads_files[0]}" | awk '{print $0 "\tshard_number"}' > "$dest" - # Loop through the array and add the second row with shard number to a temp file notinpasslist.txt - for index in "${!cell_reads_files[@]}"; do - secondLine=$(sed -n '2p' "${cell_reads_files[$index]}") - echo -e "$secondLine\t$index" >> "notinpasslist.txt" - done - # add notinpasslist.txt to the destination file and delete the notinpasslist.txt - cat "notinpasslist.txt" >> "$dest" - rm notinpasslist.txt - # now add the shard id to the matrix in a temporary matrix file, and skip the first two lines - counter=0 - for cell_read in "${cell_reads_files[@]}"; do - if [ -f "$cell_read" ]; then - awk -v var="$counter" 'NR>2 {print $0 "\t" var}' "$cell_read" >> "matrix.txt" - let counter=counter+1 - fi - done - # add the matrix to the destination file, then delete the matrix file - cat "matrix.txt" >> "$dest" - rm "matrix.txt" + if [ -f "${cell_reads_files[0]}" ]; then + + # Destination file for cell reads + dest="~{input_id}_cell_reads.txt" + # first create the header from the first file in the list, and add a column header for the shard id + head -n 1 "${cell_reads_files[0]}" | awk '{print $0 "\tshard_number"}' > "$dest" + + # Loop through the array and add the second row with shard number to a temp file notinpasslist.txt + for index in "${!cell_reads_files[@]}"; do + secondLine=$(sed -n '2p' "${cell_reads_files[$index]}") + echo -e "$secondLine\t$index" >> "notinpasslist.txt" + done + + # add notinpasslist.txt to the destination file and delete the notinpasslist.txt + cat "notinpasslist.txt" >> "$dest" + rm notinpasslist.txt + + # now add the shard id to the matrix in a temporary matrix file, and skip the first two lines + counter=0 + for cell_read in "${cell_reads_files[@]}"; do + if [ -f "$cell_read" ]; then + awk -v var="$counter" 'NR>2 {print $0 "\t" var}' "$cell_read" >> "matrix.txt" + let counter=counter+1 + fi + done + + # add the matrix to the destination file, then delete the matrix file + cat "matrix.txt" >> "$dest" + rm "matrix.txt" + fi + counter=0 for summary in "${summary_files[@]}"; do if [ -f "$summary" ]; then From 4325a0d2a7eae93e29e547b5c92b9e68b23ed6b7 Mon Sep 17 00:00:00 2001 From: Sid Cox Date: Wed, 21 Feb 2024 14:58:18 -0500 Subject: [PATCH 24/68] update Optimus version --- pipelines/skylab/optimus/Optimus.changelog.md | 4 ++++ pipelines/skylab/optimus/Optimus.wdl | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/pipelines/skylab/optimus/Optimus.changelog.md b/pipelines/skylab/optimus/Optimus.changelog.md index ee841074d4..6873c2b9f0 100644 --- a/pipelines/skylab/optimus/Optimus.changelog.md +++ b/pipelines/skylab/optimus/Optimus.changelog.md @@ -1,3 +1,7 @@ +# 6.4.0 +2024-02-21 (Date of Last Commit) +* Updated StarAlign.MergeStarOutput to add a shard number to the metrics files + # 6.3.6 2024-02-07 (Date of Last Commit) * Updated the Metrics tasks to exclude mitochondrial genes from reads_mapped_uniquely, reads_mapped_multiple and reads_mapped_exonic, reads_mapped_exonic_as and reads_mapped_intergenic diff --git a/pipelines/skylab/optimus/Optimus.wdl b/pipelines/skylab/optimus/Optimus.wdl index af73fc415c..dac74e4818 100644 --- a/pipelines/skylab/optimus/Optimus.wdl +++ b/pipelines/skylab/optimus/Optimus.wdl @@ -65,7 +65,7 @@ workflow Optimus { # version of this pipeline - String pipeline_version = "6.3.6" + String pipeline_version = "6.4.0" # this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays Array[Int] indices = range(length(r1_fastq)) From efdd174628be6b6112c24a8b273eae28c7252049 Mon Sep 17 00:00:00 2001 From: kayleemathews Date: Thu, 22 Feb 2024 11:45:30 -0500 Subject: [PATCH 25/68] Update README.md --- website/docs/Pipelines/Optimus_Pipeline/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/docs/Pipelines/Optimus_Pipeline/README.md b/website/docs/Pipelines/Optimus_Pipeline/README.md index 5ef733e855..9c700439a3 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/README.md +++ b/website/docs/Pipelines/Optimus_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Optimus_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [optimus_v6.3.6](https://github.com/broadinstitute/warp/releases?q=optimus&expanded=true) | February, 2024 | Elizabeth Kiernan | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [optimus_v6.4.0](https://github.com/broadinstitute/warp/releases?q=optimus&expanded=true) | February, 2024 | Elizabeth Kiernan | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![Optimus_diagram](Optimus_diagram.png) @@ -253,7 +253,7 @@ The following table lists the output files produced from the pipeline. For sampl | matrix_col_index | `_sparse_counts_col_index.npy` | Index of genes in count matrix. | NPY | | cell_metrics | `.cell-metrics.csv.gz` | Matrix of metrics by cells. | Compressed CSV | | gene_metrics | `.gene-metrics.csv.gz` | Matrix of metrics by genes. | Compressed CSV | -| aligner_metrics | `.cell_reads.txt` | Per barcode metrics (CellReads.stats) produced by the STARsolo aligner. | TXT | +| aligner_metrics | `.star_metrics.tar` | Tarred metrics files produced by the STARsolo aligner; contains align features, cell reads, summary, and UMI per cell metrics files. | TXT | | multimappers_EM_matrix | `UniqueAndMult-EM.mtx` | Optional output produced when `soloMultiMappers` is "EM"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | | multimappers_Uniform_matrix | `UniqueAndMult-Uniform.mtx` | Optional output produced when `soloMultiMappers` is "Uniform"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | | multimappers_Rescue_matrix | `UniqueAndMult-Rescue.mtx` | Optional output produced when `soloMultiMappers` is "Rescue"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | From 55edd6d262e316bb1c9663e82d5fe45b221ad1fe Mon Sep 17 00:00:00 2001 From: Sid Cox Date: Thu, 22 Feb 2024 11:26:34 -0500 Subject: [PATCH 26/68] update Multiome version --- pipelines/skylab/multiome/Multiome.changelog.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/pipelines/skylab/multiome/Multiome.changelog.md b/pipelines/skylab/multiome/Multiome.changelog.md index df04ac2f3c..cfde164ca6 100644 --- a/pipelines/skylab/multiome/Multiome.changelog.md +++ b/pipelines/skylab/multiome/Multiome.changelog.md @@ -1,3 +1,7 @@ +# 3.2.0 +2024-02-22 (Date of Last Commit) +* Updated StarAlign.MergeStarOutput to add a shard number to the metrics files + # 3.1.3 2024-02-07 (Date of Last Commit) From d527c34a8f39080e292dfb5cbc2b9b88f27596db Mon Sep 17 00:00:00 2001 From: Sid Cox Date: Thu, 22 Feb 2024 11:28:57 -0500 Subject: [PATCH 27/68] update SlideSeq version --- pipelines/skylab/slideseq/SlideSeq.changelog.md | 5 +++++ pipelines/skylab/slideseq/SlideSeq.wdl | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/pipelines/skylab/slideseq/SlideSeq.changelog.md b/pipelines/skylab/slideseq/SlideSeq.changelog.md index f95357e03c..1746cc4257 100644 --- a/pipelines/skylab/slideseq/SlideSeq.changelog.md +++ b/pipelines/skylab/slideseq/SlideSeq.changelog.md @@ -1,3 +1,8 @@ +# 3.1.0 +2024-02-07 (Date of Last Commit) + +* Updated StarAlign output metrics to include shard ids + # 3.0.1 2024-02-13 (Date of Last Commit) diff --git a/pipelines/skylab/slideseq/SlideSeq.wdl b/pipelines/skylab/slideseq/SlideSeq.wdl index 2471e52310..ce033d33b0 100644 --- a/pipelines/skylab/slideseq/SlideSeq.wdl +++ b/pipelines/skylab/slideseq/SlideSeq.wdl @@ -23,7 +23,7 @@ import "../../../tasks/skylab/MergeSortBam.wdl" as Merge workflow SlideSeq { - String pipeline_version = "3.0.1" + String pipeline_version = "3.1.0" input { Array[File] r1_fastq From 379b506e79e3784f4354b5700389ae58c538b9c0 Mon Sep 17 00:00:00 2001 From: Sid Cox Date: Thu, 22 Feb 2024 12:28:08 -0500 Subject: [PATCH 28/68] update MultiSampleSmartSeq2SingleNucleus version --- .../MultiSampleSmartSeq2SingleNucleus.changelog.md | 7 ++++++- .../MultiSampleSmartSeq2SingleNucleus.wdl | 2 +- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md index b0e84df63f..64b516e8b9 100644 --- a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md +++ b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md @@ -1,4 +1,9 @@ -# 1.2.28 +# 1.3.0 +2024-01-22 (Date of Last Commit) + +* Updated StarAlign output metrics to include shard ids + + # 1.2.28 2024-01-11 (Date of Last Commit) * Increased memory for MergeStarOutputs in StarAlign.wdl, RunEmptyDrops in RunEmptyDrops.wdl, OptimusH5ad in H5adUtils.wdl and GeneMetrics in Metrics.wdl diff --git a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl index d0bf9dbb2f..7a4c1066f8 100644 --- a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl +++ b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl @@ -40,7 +40,7 @@ workflow MultiSampleSmartSeq2SingleNucleus { String? input_id_metadata_field } # Version of this pipeline - String pipeline_version = "1.2.28" + String pipeline_version = "1.3.0" if (false) { String? none = "None" From 4e4e4810d7bb2192c4ba62e6fa72fb755ba18285 Mon Sep 17 00:00:00 2001 From: Sid Cox Date: Thu, 22 Feb 2024 12:51:56 -0500 Subject: [PATCH 29/68] update PairedTag version --- pipelines/skylab/paired_tag/PairedTag.changelog.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/pipelines/skylab/paired_tag/PairedTag.changelog.md b/pipelines/skylab/paired_tag/PairedTag.changelog.md index 7ea45992db..9537dfc7ce 100644 --- a/pipelines/skylab/paired_tag/PairedTag.changelog.md +++ b/pipelines/skylab/paired_tag/PairedTag.changelog.md @@ -1,3 +1,8 @@ +# 0.1.0 +2024-02-22 (Date of Last Commit) + +* Updated StarAlign output metrics to include shard ids, which is called by Optimus + # 0.0.7 2024-02-07 (Date of Last Commit) From 232be55dcc929026af220b7e34be812dc7c6fe5c Mon Sep 17 00:00:00 2001 From: Sid Cox Date: Thu, 22 Feb 2024 16:08:08 -0500 Subject: [PATCH 30/68] fix Multiome and PairedTag version numbers --- pipelines/skylab/multiome/Multiome.wdl | 2 +- pipelines/skylab/paired_tag/PairedTag.wdl | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pipelines/skylab/multiome/Multiome.wdl b/pipelines/skylab/multiome/Multiome.wdl index 24b3746d1c..80dd2d0b75 100644 --- a/pipelines/skylab/multiome/Multiome.wdl +++ b/pipelines/skylab/multiome/Multiome.wdl @@ -6,7 +6,7 @@ import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "https://raw.githubusercontent.com/broadinstitute/CellBender/v0.3.0/wdl/cellbender_remove_background.wdl" as CellBender workflow Multiome { - String pipeline_version = "3.1.3" + String pipeline_version = "3.2.0" input { String input_id diff --git a/pipelines/skylab/paired_tag/PairedTag.wdl b/pipelines/skylab/paired_tag/PairedTag.wdl index 5bed110675..50d7dca498 100644 --- a/pipelines/skylab/paired_tag/PairedTag.wdl +++ b/pipelines/skylab/paired_tag/PairedTag.wdl @@ -5,7 +5,7 @@ import "../../../pipelines/skylab/optimus/Optimus.wdl" as optimus import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "../../../tasks/skylab/PairedTagUtils.wdl" as Demultiplexing workflow PairedTag { - String pipeline_version = "0.0.7" + String pipeline_version = "0.1.0" input { String input_id From 52b6a2b165fbc192b7cf8eda0b324fe423d5062b Mon Sep 17 00:00:00 2001 From: Robert Sidney Cox III Date: Thu, 22 Feb 2024 18:03:48 -0500 Subject: [PATCH 31/68] Rc 2489 removegenref (#1191) * remove ref_genome_fasta input from Multiome and Optimus WDLs and JSONs * patch version Multiome Optimus * remove input from docs * Update Multiome.wdl update Multiome version * update docs * Update pipelines/skylab/optimus/Optimus.wdl Co-authored-by: Jessica Way * Update pipelines/skylab/optimus/Optimus.changelog.md Co-authored-by: Jessica Way * remove ref_genome_fasta * remove ref_genome_fasta * * Remove ref_genome_fasta from Optimus input for PairedTag.wdl * Update PairedTag docs * remove trailing comma in test input jsons * remove trailing comma in test input jsons --------- Co-authored-by: kayleemathews Co-authored-by: Jessica Way --- pipelines/skylab/multiome/Multiome.changelog.md | 6 ++++++ pipelines/skylab/multiome/Multiome.wdl | 4 +--- .../multiome/test_inputs/Plumbing/10k_pbmc_downsampled.json | 1 - .../skylab/multiome/test_inputs/Scientific/10k_pbmc.json | 1 - pipelines/skylab/optimus/Optimus.changelog.md | 4 ++++ pipelines/skylab/optimus/Optimus.wdl | 4 +--- .../skylab/optimus/example_inputs/human_v2_example.json | 3 +-- .../skylab/optimus/example_inputs/human_v3_example.json | 3 +-- .../skylab/optimus/example_inputs/mouse_v2_example.json | 3 +-- .../optimus/example_inputs/mouse_v2_snRNA_example.json | 1 - .../optimus/test_inputs/Plumbing/human_v3_example.json | 3 +-- .../optimus/test_inputs/Plumbing/mouse_v2_example.json | 3 +-- .../test_inputs/Plumbing/mouse_v2_snRNA_example.json | 1 - .../optimus/test_inputs/Scientific/inputs_8k_pbmc.json | 3 +-- .../test_inputs/Scientific/inputs_8k_pbmc_stranded.json | 3 +-- pipelines/skylab/paired_tag/PairedTag.changelog.md | 5 +++++ pipelines/skylab/paired_tag/PairedTag.wdl | 4 +--- verification/test-wdls/TestMultiome.wdl | 2 -- verification/test-wdls/TestOptimus.wdl | 2 -- website/docs/Pipelines/Multiome_Pipeline/README.md | 3 +-- website/docs/Pipelines/Optimus_Pipeline/README.md | 3 ++- website/docs/Pipelines/PairedTag_Pipeline/README.md | 3 +-- 22 files changed, 29 insertions(+), 36 deletions(-) diff --git a/pipelines/skylab/multiome/Multiome.changelog.md b/pipelines/skylab/multiome/Multiome.changelog.md index df04ac2f3c..de3057b8b7 100644 --- a/pipelines/skylab/multiome/Multiome.changelog.md +++ b/pipelines/skylab/multiome/Multiome.changelog.md @@ -1,3 +1,8 @@ +# 3.2.0 +2024-02-13 (Date of Last Commit) + +* Removed ref_genome_fasta input from Multiome WDL and JSON + # 3.1.3 2024-02-07 (Date of Last Commit) @@ -8,6 +13,7 @@ * Add new paired-tag task to parse sample barcodes from cell barcodes when preindexing is set to true; this does not affect the Multiome pipeline + # 3.1.1 2024-01-30 (Date of Last Commit) diff --git a/pipelines/skylab/multiome/Multiome.wdl b/pipelines/skylab/multiome/Multiome.wdl index 24b3746d1c..8c44dedebc 100644 --- a/pipelines/skylab/multiome/Multiome.wdl +++ b/pipelines/skylab/multiome/Multiome.wdl @@ -6,7 +6,7 @@ import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "https://raw.githubusercontent.com/broadinstitute/CellBender/v0.3.0/wdl/cellbender_remove_background.wdl" as CellBender workflow Multiome { - String pipeline_version = "3.1.3" + String pipeline_version = "3.2.0" input { String input_id @@ -18,7 +18,6 @@ workflow Multiome { Array[File]? gex_i1_fastq File tar_star_reference File annotations_gtf - File ref_genome_fasta File? mt_genes Int tenx_chemistry_version = 3 Int emptydrops_lower = 100 @@ -61,7 +60,6 @@ workflow Multiome { output_bam_basename = input_id + "_gex", tar_star_reference = tar_star_reference, annotations_gtf = annotations_gtf, - ref_genome_fasta = ref_genome_fasta, mt_genes = mt_genes, tenx_chemistry_version = tenx_chemistry_version, whitelist = gex_whitelist, diff --git a/pipelines/skylab/multiome/test_inputs/Plumbing/10k_pbmc_downsampled.json b/pipelines/skylab/multiome/test_inputs/Plumbing/10k_pbmc_downsampled.json index 902b564388..7d15111f38 100644 --- a/pipelines/skylab/multiome/test_inputs/Plumbing/10k_pbmc_downsampled.json +++ b/pipelines/skylab/multiome/test_inputs/Plumbing/10k_pbmc_downsampled.json @@ -16,7 +16,6 @@ "Multiome.atac_r3_fastq":[ "gs://broad-gotc-test-storage/Multiome/input/plumbing/fastq_R3_atac.fastq.gz" ], - "Multiome.ref_genome_fasta":"gs://gcp-public-data--broad-references/hg38/v0/GRCh38.primary_assembly.genome.fa", "Multiome.tar_bwa_reference":"gs://gcp-public-data--broad-references/hg38/v0/bwa/v2_2_1/bwa-mem2-2.2.1-Human-GENCODE-build-GRCh38.tar", "Multiome.tar_star_reference":"gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_star2.7.10a-Human-GENCODE-build-GRCh38-43.tar", "Multiome.chrom_sizes":"gs://broad-gotc-test-storage/Multiome/input/hg38.chrom.sizes", diff --git a/pipelines/skylab/multiome/test_inputs/Scientific/10k_pbmc.json b/pipelines/skylab/multiome/test_inputs/Scientific/10k_pbmc.json index 846b91ed2d..a5ddf2c947 100644 --- a/pipelines/skylab/multiome/test_inputs/Scientific/10k_pbmc.json +++ b/pipelines/skylab/multiome/test_inputs/Scientific/10k_pbmc.json @@ -25,7 +25,6 @@ "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_atac_S1_L001_R3_001.fastq.gz", "gs://broad-gotc-test-storage/Multiome/input/scientific/10k_PBMC_Multiome/10k_PBMC_Multiome_nextgem_Chromium_Controller_atac_S1_L002_R3_001.fastq.gz" ], - "Multiome.ref_genome_fasta":"gs://gcp-public-data--broad-references/hg38/v0/GRCh38.primary_assembly.genome.fa", "Multiome.tar_bwa_reference":"gs://gcp-public-data--broad-references/hg38/v0/bwa/v2_2_1/bwa-mem2-2.2.1-Human-GENCODE-build-GRCh38.tar", "Multiome.tar_star_reference":"gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_star2.7.10a-Human-GENCODE-build-GRCh38-43.tar", "Multiome.chrom_sizes":"gs://broad-gotc-test-storage/Multiome/input/hg38.chrom.sizes", diff --git a/pipelines/skylab/optimus/Optimus.changelog.md b/pipelines/skylab/optimus/Optimus.changelog.md index ee841074d4..b16480696b 100644 --- a/pipelines/skylab/optimus/Optimus.changelog.md +++ b/pipelines/skylab/optimus/Optimus.changelog.md @@ -1,3 +1,7 @@ +# 6.4.0 +2024-02-01 (Date of Last Commit) +* Removed ref_genome_fasta input from Optimus WDL and JSON + # 6.3.6 2024-02-07 (Date of Last Commit) * Updated the Metrics tasks to exclude mitochondrial genes from reads_mapped_uniquely, reads_mapped_multiple and reads_mapped_exonic, reads_mapped_exonic_as and reads_mapped_intergenic diff --git a/pipelines/skylab/optimus/Optimus.wdl b/pipelines/skylab/optimus/Optimus.wdl index af73fc415c..db8a6cef60 100644 --- a/pipelines/skylab/optimus/Optimus.wdl +++ b/pipelines/skylab/optimus/Optimus.wdl @@ -29,7 +29,6 @@ workflow Optimus { # organism reference parameters File tar_star_reference File annotations_gtf - File ref_genome_fasta File? mt_genes String? soloMultiMappers @@ -65,7 +64,7 @@ workflow Optimus { # version of this pipeline - String pipeline_version = "6.3.6" + String pipeline_version = "6.4.0" # this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays Array[Int] indices = range(length(r1_fastq)) @@ -86,7 +85,6 @@ workflow Optimus { input_name_metadata_field: "String that describes the metadata field containing the input_name" tar_star_reference: "star genome reference" annotations_gtf: "gtf containing annotations for gene tagging (must match star reference)" - ref_genome_fasta: "genome fasta file (must match star reference)" whitelist: "10x genomics cell barcode whitelist" tenx_chemistry_version: "10X Genomics v2 (10 bp UMI) or v3 chemistry (12bp UMI)" force_no_check: "Set to true to override input checks and allow pipeline to proceed with invalid input" diff --git a/pipelines/skylab/optimus/example_inputs/human_v2_example.json b/pipelines/skylab/optimus/example_inputs/human_v2_example.json index 04e54e6d80..0b0da39f58 100644 --- a/pipelines/skylab/optimus/example_inputs/human_v2_example.json +++ b/pipelines/skylab/optimus/example_inputs/human_v2_example.json @@ -15,6 +15,5 @@ "Optimus.tar_star_reference": "gs://gcp-public-data--broad-references/hg38/v0/star/star_2.7.9a_primary_gencode_human_v27.tar", "Optimus.input_id": "pbmc4k_human", "Optimus.chemistry": "tenX_v2", - "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/hg38/v0/gencode.v27.primary_assembly.annotation.gtf", - "Optimus.ref_genome_fasta": "gs://gcp-public-data--broad-references/hg38/v0/GRCh38.primary_assembly.genome.fa" + "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/hg38/v0/gencode.v27.primary_assembly.annotation.gtf" } diff --git a/pipelines/skylab/optimus/example_inputs/human_v3_example.json b/pipelines/skylab/optimus/example_inputs/human_v3_example.json index 82dd8c219a..6a0e8edf98 100644 --- a/pipelines/skylab/optimus/example_inputs/human_v3_example.json +++ b/pipelines/skylab/optimus/example_inputs/human_v3_example.json @@ -15,6 +15,5 @@ "Optimus.tar_star_reference": "gs://gcp-public-data--broad-references/hg38/v0/star/star_2.7.9a_primary_gencode_human_v27.tar", "Optimus.input_id": "pbmc_human_v3", "Optimus.chemistry": "tenX_v3", - "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/hg38/v0/gencode.v27.primary_assembly.annotation.gtf", - "Optimus.ref_genome_fasta": "gs://gcp-public-data--broad-references/hg38/v0/GRCh38.primary_assembly.genome.fa" + "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/hg38/v0/gencode.v27.primary_assembly.annotation.gtf" } diff --git a/pipelines/skylab/optimus/example_inputs/mouse_v2_example.json b/pipelines/skylab/optimus/example_inputs/mouse_v2_example.json index 45981c2ac7..8efad7a498 100644 --- a/pipelines/skylab/optimus/example_inputs/mouse_v2_example.json +++ b/pipelines/skylab/optimus/example_inputs/mouse_v2_example.json @@ -27,6 +27,5 @@ "Optimus.tar_star_reference": "gs://gcp-public-data--broad-references/mm10/v0/star/star_2.7.9a_primary_gencode_mouse_vM21.tar", "Optimus.input_id": "neurons2k_mouse", "Optimus.chemistry": "tenX_v2", - "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/mm10/v0/gencode.vM21.primary_assembly.annotation.gtf", - "Optimus.ref_genome_fasta": "gs://gcp-public-data--broad-references/mm10/v0/GRCm38.primary_assembly.genome.fa" + "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/mm10/v0/gencode.vM21.primary_assembly.annotation.gtf" } diff --git a/pipelines/skylab/optimus/example_inputs/mouse_v2_snRNA_example.json b/pipelines/skylab/optimus/example_inputs/mouse_v2_snRNA_example.json index 293c9f326f..e3b905f62d 100644 --- a/pipelines/skylab/optimus/example_inputs/mouse_v2_snRNA_example.json +++ b/pipelines/skylab/optimus/example_inputs/mouse_v2_snRNA_example.json @@ -24,7 +24,6 @@ "Optimus.input_id": "nuclei_2k_mouse", "Optimus.chemistry": "tenX_v2", "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/mm10/v0/gencode.vM21.primary_assembly.annotation.gtf", - "Optimus.ref_genome_fasta": "gs://gcp-public-data--broad-references/mm10/v0/GRCm38.primary_assembly.genome.fa", "Optimus.counting_mode": "sn_rna", "Optimus.count_exons": true } diff --git a/pipelines/skylab/optimus/test_inputs/Plumbing/human_v3_example.json b/pipelines/skylab/optimus/test_inputs/Plumbing/human_v3_example.json index ff5a02caaf..612659d25c 100644 --- a/pipelines/skylab/optimus/test_inputs/Plumbing/human_v3_example.json +++ b/pipelines/skylab/optimus/test_inputs/Plumbing/human_v3_example.json @@ -15,6 +15,5 @@ "Optimus.input_id": "pbmc_human_v3", "Optimus.tenx_chemistry_version": "3", "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_v43.annotation.gtf", - "Optimus.star_strand_mode": "Forward", - "Optimus.ref_genome_fasta": "gs://gcp-public-data--broad-references/hg38/v0/GRCh38.primary_assembly.genome.fa" + "Optimus.star_strand_mode": "Forward" } diff --git a/pipelines/skylab/optimus/test_inputs/Plumbing/mouse_v2_example.json b/pipelines/skylab/optimus/test_inputs/Plumbing/mouse_v2_example.json index bbf625ef27..0dc26af9fd 100644 --- a/pipelines/skylab/optimus/test_inputs/Plumbing/mouse_v2_example.json +++ b/pipelines/skylab/optimus/test_inputs/Plumbing/mouse_v2_example.json @@ -27,6 +27,5 @@ "Optimus.input_id": "neurons2k_mouse", "Optimus.tenx_chemistry_version": "2", "Optimus.star_strand_mode": "Unstranded", - "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/GRCm39/star/v2_7_10a/modified_vM32.annotation.gtf", - "Optimus.ref_genome_fasta": "gs://gcp-public-data--broad-references/GRCm39/GRCm39.primary_assembly.genome.fa.gz" + "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/GRCm39/star/v2_7_10a/modified_vM32.annotation.gtf" } diff --git a/pipelines/skylab/optimus/test_inputs/Plumbing/mouse_v2_snRNA_example.json b/pipelines/skylab/optimus/test_inputs/Plumbing/mouse_v2_snRNA_example.json index 239b7d1fcb..787a1a8347 100644 --- a/pipelines/skylab/optimus/test_inputs/Plumbing/mouse_v2_snRNA_example.json +++ b/pipelines/skylab/optimus/test_inputs/Plumbing/mouse_v2_snRNA_example.json @@ -24,7 +24,6 @@ "Optimus.tenx_chemistry_version": "2", "Optimus.star_strand_mode": "Unstranded", "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/GRCm39/star/v2_7_10a/modified_vM32.annotation.gtf", - "Optimus.ref_genome_fasta": "gs://gcp-public-data--broad-references/GRCm39/GRCm39.primary_assembly.genome.fa.gz", "Optimus.counting_mode": "sn_rna", "Optimus.count_exons": true } diff --git a/pipelines/skylab/optimus/test_inputs/Scientific/inputs_8k_pbmc.json b/pipelines/skylab/optimus/test_inputs/Scientific/inputs_8k_pbmc.json index 0f5ce301f1..773af4f2f4 100644 --- a/pipelines/skylab/optimus/test_inputs/Scientific/inputs_8k_pbmc.json +++ b/pipelines/skylab/optimus/test_inputs/Scientific/inputs_8k_pbmc.json @@ -15,8 +15,7 @@ "Optimus.input_id": "8k_pbmc", "Optimus.tenx_chemistry_version": "2", "Optimus.star_strand_mode": "Unstranded", - "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_v43.annotation.gtf", - "Optimus.ref_genome_fasta": "gs://gcp-public-data--broad-references/hg38/v0/GRCh38.primary_assembly.genome.fa" + "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/hg38/v0/star/v2_7_10a/modified_v43.annotation.gtf" } diff --git a/pipelines/skylab/optimus/test_inputs/Scientific/inputs_8k_pbmc_stranded.json b/pipelines/skylab/optimus/test_inputs/Scientific/inputs_8k_pbmc_stranded.json index 2581f222dc..98c9c9912d 100644 --- a/pipelines/skylab/optimus/test_inputs/Scientific/inputs_8k_pbmc_stranded.json +++ b/pipelines/skylab/optimus/test_inputs/Scientific/inputs_8k_pbmc_stranded.json @@ -15,8 +15,7 @@ "Optimus.input_id": "8k_pbmc", "Optimus.tenx_chemistry_version": "2", "Optimus.star_strand_mode": "Forward", - "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/hg38/v0/gencode.v27.primary_assembly.annotation.gtf", - "Optimus.ref_genome_fasta": "gs://gcp-public-data--broad-references/hg38/v0/GRCh38.primary_assembly.genome.fa" + "Optimus.annotations_gtf": "gs://gcp-public-data--broad-references/hg38/v0/gencode.v27.primary_assembly.annotation.gtf" } diff --git a/pipelines/skylab/paired_tag/PairedTag.changelog.md b/pipelines/skylab/paired_tag/PairedTag.changelog.md index 7ea45992db..b568562f95 100644 --- a/pipelines/skylab/paired_tag/PairedTag.changelog.md +++ b/pipelines/skylab/paired_tag/PairedTag.changelog.md @@ -1,3 +1,8 @@ +# 0.1.0 +2024-02-21 (Date of Last Commit) + +* Remove ref_genome_fasta from Optimus input + # 0.0.7 2024-02-07 (Date of Last Commit) diff --git a/pipelines/skylab/paired_tag/PairedTag.wdl b/pipelines/skylab/paired_tag/PairedTag.wdl index 5bed110675..ed9821d0dc 100644 --- a/pipelines/skylab/paired_tag/PairedTag.wdl +++ b/pipelines/skylab/paired_tag/PairedTag.wdl @@ -5,7 +5,7 @@ import "../../../pipelines/skylab/optimus/Optimus.wdl" as optimus import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "../../../tasks/skylab/PairedTagUtils.wdl" as Demultiplexing workflow PairedTag { - String pipeline_version = "0.0.7" + String pipeline_version = "0.1.0" input { String input_id @@ -17,7 +17,6 @@ workflow PairedTag { Array[File]? gex_i1_fastq File tar_star_reference File annotations_gtf - File ref_genome_fasta File? mt_genes Int tenx_chemistry_version = 3 Int emptydrops_lower = 100 @@ -55,7 +54,6 @@ workflow PairedTag { output_bam_basename = input_id + "_gex", tar_star_reference = tar_star_reference, annotations_gtf = annotations_gtf, - ref_genome_fasta = ref_genome_fasta, mt_genes = mt_genes, tenx_chemistry_version = tenx_chemistry_version, whitelist = gex_whitelist, diff --git a/verification/test-wdls/TestMultiome.wdl b/verification/test-wdls/TestMultiome.wdl index bb9aff4018..9a4a0ec83a 100644 --- a/verification/test-wdls/TestMultiome.wdl +++ b/verification/test-wdls/TestMultiome.wdl @@ -18,7 +18,6 @@ workflow TestMultiome { Array[File]? gex_i1_fastq File tar_star_reference File annotations_gtf - File ref_genome_fasta File? mt_genes Int tenx_chemistry_version = 3 Int emptydrops_lower = 100 @@ -69,7 +68,6 @@ workflow TestMultiome { input_id = input_id, tar_star_reference = tar_star_reference, annotations_gtf = annotations_gtf, - ref_genome_fasta = ref_genome_fasta, mt_genes = mt_genes, tenx_chemistry_version = tenx_chemistry_version, emptydrops_lower = emptydrops_lower, diff --git a/verification/test-wdls/TestOptimus.wdl b/verification/test-wdls/TestOptimus.wdl index 535eb8d530..82bdf03adc 100644 --- a/verification/test-wdls/TestOptimus.wdl +++ b/verification/test-wdls/TestOptimus.wdl @@ -24,7 +24,6 @@ workflow TestOptimus { # organism reference parameters File tar_star_reference File annotations_gtf - File ref_genome_fasta File? mt_genes String? soloMultiMappers @@ -79,7 +78,6 @@ workflow TestOptimus { input_name_metadata_field = input_name_metadata_field, tar_star_reference = tar_star_reference, annotations_gtf = annotations_gtf, - ref_genome_fasta = ref_genome_fasta, tenx_chemistry_version = tenx_chemistry_version, emptydrops_lower = emptydrops_lower, force_no_check = force_no_check, diff --git a/website/docs/Pipelines/Multiome_Pipeline/README.md b/website/docs/Pipelines/Multiome_Pipeline/README.md index cb612549f2..354c951a5e 100644 --- a/website/docs/Pipelines/Multiome_Pipeline/README.md +++ b/website/docs/Pipelines/Multiome_Pipeline/README.md @@ -8,7 +8,7 @@ slug: /Pipelines/Multiome_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [Multiome v3.1.3](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org) | +| [Multiome v3.2.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org) | ![Multiome_diagram](./multiome_diagram.png) @@ -61,7 +61,6 @@ Multiome can be deployed using [Cromwell](https://cromwell.readthedocs.io/en/sta | gex_r2_fastq | Array of read 2 FASTQ files representing a single GEX 10x library.| Array[File] | | gex_i1_fastq | Optional array of index FASTQ files representing a single GEX 10x library; multiplexed samples are not currently supported, but the file may be passed to the pipeline. | Array[File] | | tar_star_reference | TAR file containing a species-specific reference genome and GTF for Optimus (GEX) pipeline. | File | -| ref_genome_fasta | Genome FASTA file used for building the indices. | File | | mt_genes | Optional file for the Optimus (GEX) pipeline containing mitochondrial gene names used for metric calculation; default assumes 'mt' prefix in GTF (case insensitive). | File | | counting_mode | Optional string that determines whether the Optimus (GEX) pipeline should be run in single-cell mode (sc_rna) or single-nucleus mode (sn_rna); default is "sn_rna". | String | | tenx_chemistry_version | Optional integer for the Optimus (GEX) pipeline specifying the 10x version chemistry the data was generated with; validated by examination of the first read 1 FASTQ file read structure; default is "3". | Integer | diff --git a/website/docs/Pipelines/Optimus_Pipeline/README.md b/website/docs/Pipelines/Optimus_Pipeline/README.md index 5ef733e855..2f494c7e06 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/README.md +++ b/website/docs/Pipelines/Optimus_Pipeline/README.md @@ -7,7 +7,8 @@ slug: /Pipelines/Optimus_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [optimus_v6.3.6](https://github.com/broadinstitute/warp/releases?q=optimus&expanded=true) | February, 2024 | Elizabeth Kiernan | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [optimus_v6.4.0](https://github.com/broadinstitute/warp/releases?q=optimus&expanded=true) | February, 2024 | Elizabeth Kiernan | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | + ![Optimus_diagram](Optimus_diagram.png) diff --git a/website/docs/Pipelines/PairedTag_Pipeline/README.md b/website/docs/Pipelines/PairedTag_Pipeline/README.md index 530cae5b26..4114516ba3 100644 --- a/website/docs/Pipelines/PairedTag_Pipeline/README.md +++ b/website/docs/Pipelines/PairedTag_Pipeline/README.md @@ -7,8 +7,8 @@ slug: /Pipelines/PairedTag_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | +| [PairedTag_v0.1.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | -| [PairedTag_v0.0.7](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the Paired-Tag workflow @@ -69,7 +69,6 @@ The Paired-Tag workflow inputs are specified in JSON configuration files. Exampl | gex_i1_fastq | Optional array of index FASTQ files representing a single GEX 10x library; multiplexed samples are not currently supported, but the file may be passed to the pipeline. | Array[File] | | tar_star_reference | TAR file containing a species-specific reference genome and GTF for Optimus (GEX) pipeline. | File | | annotations_gtf | GTF file containing gene annotations used for GEX cell metric calculation and ATAC fragment metrics; must match the GTF used to build the STAR aligner. | File | -| ref_genome_fasta | Genome FASTA file used for building the indices. | File | | mt_genes | Optional file for the Optimus (GEX) pipeline containing mitochondrial gene names used for metric calculation; default assumes 'mt' prefix in GTF (case insensitive). | File | | tenx_chemistry_version | Optional integer for the Optimus (GEX) pipeline specifying the 10x version chemistry the data was generated with; validated by examination of the first read 1 FASTQ file read structure; default is "3". | Integer | | emptydrops_lower | **Not used for single-nucleus data.** Optional threshold for UMIs for the Optimus (GEX) pipeline that empty drops tool should consider for determining cell; data below threshold is not removed; default is "100". | Integer | From a146e6e0c5d3ee6370028c3876b0afbfe908d791 Mon Sep 17 00:00:00 2001 From: kayleemathews Date: Fri, 23 Feb 2024 10:40:40 -0500 Subject: [PATCH 32/68] update pipeline docs --- website/docs/Pipelines/SlideSeq_Pipeline/README.md | 2 +- .../Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md | 2 +- .../multi_snss2.methods.md | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/README.md b/website/docs/Pipelines/SlideSeq_Pipeline/README.md index 9ef0004d98..f0132571db 100644 --- a/website/docs/Pipelines/SlideSeq_Pipeline/README.md +++ b/website/docs/Pipelines/SlideSeq_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/SlideSeq_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [SlideSeq v3.0.1](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | +| [SlideSeq v3.1.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | ![SlideSeq_diagram](./slide-seq_diagram.png) diff --git a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md index e21fe808ee..4cb42c4cf6 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [MultiSampleSmartSeq2SingleNuclei_v1.2.28](https://github.com/broadinstitute/warp/releases) | January, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [MultiSampleSmartSeq2SingleNuclei_v1.3.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![](./snSS2.png) diff --git a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md index 5239ba7f97..8ab56b15bd 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md +++ b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md @@ -2,13 +2,13 @@ sidebar_position: 2 --- -# Smart-seq2 Single Nucleus Multi-Sample v1.2.28 Publication Methods +# Smart-seq2 Single Nucleus Multi-Sample v1.3.0 Publication Methods Below we provide an example methods section for a publication. For the complete pipeline documentation, see the [Smart-seq2 Single Nucleus Multi-Sample Overview](./README.md). ## Methods -Data preprocessing and count matrix construction for a batch (or plate) were performed using the Smart-seq2 Single Nucleus Multi-Sample v1.2.28 Pipeline (RRID:SCR_021312) as well as Picard v.2.26.10 with default tool parameters unless otherwise specified. Genomic references are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/mm10/v0/single_nucleus?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in the [example workflow configuration](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_nucleus_multisample/mouse_example.json) in GitHub. +Data preprocessing and count matrix construction for a batch (or plate) were performed using the Smart-seq2 Single Nucleus Multi-Sample v1.3.0 Pipeline (RRID:SCR_021312) as well as Picard v.2.26.10 with default tool parameters unless otherwise specified. Genomic references are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/mm10/v0/single_nucleus?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in the [example workflow configuration](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_nucleus_multisample/mouse_example.json) in GitHub. For each nucleus in the batch, paired-end FASTQ files were first trimmed to remove adapters using the fastq-mcf tool with a subsampling parameter of 200,000 reads. The trimmed FASTQ files were then aligned to the GENCODE GRCm38 mouse genome using STAR v.2.7.10a. To count the number of reads per gene, but not isoforms, the quantMode parameter was set to GeneCounts. Multi-mapped reads, and optical and PCR duplicates, were removed from the resulting aligned BAM using the Picard MarkDuplicates tool with REMOVE_DUPLICATES = true. Metrics were collected on the deduplicated BAM using Picard CollectMultipleMetrics with VALIDATION_STRINGENCY =SILENT. From acca1bcc5581b6745307ba0f48069598170d4cd8 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Fri, 23 Feb 2024 20:42:12 -0500 Subject: [PATCH 33/68] Update StarAlign.wdl --- tasks/skylab/StarAlign.wdl | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 81f6668c42..9144a0ecd6 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -475,11 +475,12 @@ task MergeStarOutput { Array[File]? summary Array[File]? align_features Array[File]? umipercell + String counting_mode String input_id #runtime values - String docker = "us.gcr.io/broad-gotc-prod/pytools:1.0.0-1661263730" + String docker = "us.gcr.io/broad-gotc-prod/warp-tools:lk-PD-2518-shard-metrics" Int machine_mem_gb = 20 Int cpu = 1 Int disk = ceil(size(matrix, "Gi") * 2) + 10 @@ -564,6 +565,10 @@ task MergeStarOutput { fi done + # Create a single metric file for library-level metrics + python3 /warptools/scripts/combine_shard_metrics.py ~{input_id}_summary.txt ~{input_id}_align_features.txt ~{input_id}_cell_reads.txt ~{counting_mode} + + # If text files are present, create a tar archive with them if ls *.txt 1> /dev/null 2>&1; then tar -zcvf ~{input_id}.star_metrics.tar *.txt From f72dda873a194a5f224f766f4b30ba3818d30a53 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Fri, 23 Feb 2024 20:50:43 -0500 Subject: [PATCH 34/68] updated optimus for mergestar input --- pipelines/skylab/optimus/Optimus.wdl | 6 ++++-- tasks/skylab/StarAlign.wdl | 8 +++----- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/pipelines/skylab/optimus/Optimus.wdl b/pipelines/skylab/optimus/Optimus.wdl index db8a6cef60..b10beffe98 100644 --- a/pipelines/skylab/optimus/Optimus.wdl +++ b/pipelines/skylab/optimus/Optimus.wdl @@ -165,7 +165,8 @@ workflow Optimus { summary = STARsoloFastq.summary, align_features = STARsoloFastq.align_features, umipercell = STARsoloFastq.umipercell, - input_id = input_id + input_id = input_id, + counting_mode = counting_mode } if (counting_mode == "sc_rna"){ call RunEmptyDrops.RunEmptyDrops { @@ -202,7 +203,8 @@ workflow Optimus { features = STARsoloFastq.features_sn_rna, matrix = STARsoloFastq.matrix_sn_rna, cell_reads = STARsoloFastq.cell_reads_sn_rna, - input_id = input_id + input_id = input_id, + counting_mode = counting_mode } call H5adUtils.SingleNucleusOptimusH5adOutput as OptimusH5adGenerationWithExons{ input: diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 9144a0ecd6..e7580a8f3b 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -565,12 +565,9 @@ task MergeStarOutput { fi done - # Create a single metric file for library-level metrics - python3 /warptools/scripts/combine_shard_metrics.py ~{input_id}_summary.txt ~{input_id}_align_features.txt ~{input_id}_cell_reads.txt ~{counting_mode} - - - # If text files are present, create a tar archive with them + # If text files are present, create a tar archive with them and run python script to combine shard metrics if ls *.txt 1> /dev/null 2>&1; then + python3 /warptools/scripts/combine_shard_metrics.py ~{input_id}_summary.txt ~{input_id}_align_features.txt ~{input_id}_cell_reads.txt ~{counting_mode} ~{input_id} tar -zcvf ~{input_id}.star_metrics.tar *.txt else echo "No text files found in the folder." @@ -598,6 +595,7 @@ task MergeStarOutput { File col_index = "~{input_id}_sparse_counts_col_index.npy" File sparse_counts = "~{input_id}_sparse_counts.npz" File? cell_reads_out = "~{input_id}.star_metrics.tar" + File? library_metrics="~{input_id}_library_metrics.csv" } } From 8948f4f5facfcbd2853e26477b7dcdee3ae58294 Mon Sep 17 00:00:00 2001 From: Robert Sidney Cox III Date: Mon, 26 Feb 2024 10:03:05 -0500 Subject: [PATCH 35/68] Rc 2356 shardids (#1203) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * remove optional from merge files, remove double umipercellloop * make merge files optional again * Add shardid column to umipercell file * Add shardid column to umipercell file * modify awk WDL to cat directly to output file with >> instead of cat * add shard counter to summary file output * add shard counter for align_features * rearrange cell reads file with shard number * rearrange cell reads file with shard number * fix [200~cell_reads_files~ bug * fix align features files * add check to modify cell_reads only when files exist * update Optimus version * Update README.md * update Multiome version * update SlideSeq version * update MultiSampleSmartSeq2SingleNucleus version * update PairedTag version * fix Multiome and PairedTag version numbers * update pipeline docs --------- Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> Co-authored-by: Sid Cox Co-authored-by: kayleemathews --- .../skylab/multiome/Multiome.changelog.md | 4 +- pipelines/skylab/optimus/Optimus.changelog.md | 3 +- .../skylab/paired_tag/PairedTag.changelog.md | 3 +- .../skylab/slideseq/SlideSeq.changelog.md | 5 ++ pipelines/skylab/slideseq/SlideSeq.wdl | 2 +- ...iSampleSmartSeq2SingleNucleus.changelog.md | 7 ++- .../MultiSampleSmartSeq2SingleNucleus.wdl | 2 +- tasks/skylab/StarAlign.wdl | 55 ++++++++++++++----- .../docs/Pipelines/Optimus_Pipeline/README.md | 2 +- .../Pipelines/SlideSeq_Pipeline/README.md | 2 +- .../README.md | 2 +- .../multi_snss2.methods.md | 4 +- 12 files changed, 65 insertions(+), 26 deletions(-) diff --git a/pipelines/skylab/multiome/Multiome.changelog.md b/pipelines/skylab/multiome/Multiome.changelog.md index de3057b8b7..9e57b1917f 100644 --- a/pipelines/skylab/multiome/Multiome.changelog.md +++ b/pipelines/skylab/multiome/Multiome.changelog.md @@ -1,6 +1,6 @@ # 3.2.0 -2024-02-13 (Date of Last Commit) - +2024-02-22 (Date of Last Commit) +* Updated StarAlign.MergeStarOutput to add a shard number to the metrics files * Removed ref_genome_fasta input from Multiome WDL and JSON # 3.1.3 diff --git a/pipelines/skylab/optimus/Optimus.changelog.md b/pipelines/skylab/optimus/Optimus.changelog.md index b16480696b..d93c3d3610 100644 --- a/pipelines/skylab/optimus/Optimus.changelog.md +++ b/pipelines/skylab/optimus/Optimus.changelog.md @@ -1,5 +1,6 @@ # 6.4.0 -2024-02-01 (Date of Last Commit) +2024-02-21 (Date of Last Commit) +* Updated StarAlign.MergeStarOutput to add a shard number to the metrics files * Removed ref_genome_fasta input from Optimus WDL and JSON # 6.3.6 diff --git a/pipelines/skylab/paired_tag/PairedTag.changelog.md b/pipelines/skylab/paired_tag/PairedTag.changelog.md index b568562f95..811b073097 100644 --- a/pipelines/skylab/paired_tag/PairedTag.changelog.md +++ b/pipelines/skylab/paired_tag/PairedTag.changelog.md @@ -1,6 +1,7 @@ # 0.1.0 -2024-02-21 (Date of Last Commit) +2024-02-22 (Date of Last Commit) +* Updated StarAlign output metrics to include shard ids, which is called by Optimus * Remove ref_genome_fasta from Optimus input # 0.0.7 diff --git a/pipelines/skylab/slideseq/SlideSeq.changelog.md b/pipelines/skylab/slideseq/SlideSeq.changelog.md index f95357e03c..1746cc4257 100644 --- a/pipelines/skylab/slideseq/SlideSeq.changelog.md +++ b/pipelines/skylab/slideseq/SlideSeq.changelog.md @@ -1,3 +1,8 @@ +# 3.1.0 +2024-02-07 (Date of Last Commit) + +* Updated StarAlign output metrics to include shard ids + # 3.0.1 2024-02-13 (Date of Last Commit) diff --git a/pipelines/skylab/slideseq/SlideSeq.wdl b/pipelines/skylab/slideseq/SlideSeq.wdl index 2471e52310..ce033d33b0 100644 --- a/pipelines/skylab/slideseq/SlideSeq.wdl +++ b/pipelines/skylab/slideseq/SlideSeq.wdl @@ -23,7 +23,7 @@ import "../../../tasks/skylab/MergeSortBam.wdl" as Merge workflow SlideSeq { - String pipeline_version = "3.0.1" + String pipeline_version = "3.1.0" input { Array[File] r1_fastq diff --git a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md index b0e84df63f..64b516e8b9 100644 --- a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md +++ b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md @@ -1,4 +1,9 @@ -# 1.2.28 +# 1.3.0 +2024-01-22 (Date of Last Commit) + +* Updated StarAlign output metrics to include shard ids + + # 1.2.28 2024-01-11 (Date of Last Commit) * Increased memory for MergeStarOutputs in StarAlign.wdl, RunEmptyDrops in RunEmptyDrops.wdl, OptimusH5ad in H5adUtils.wdl and GeneMetrics in Metrics.wdl diff --git a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl index d0bf9dbb2f..7a4c1066f8 100644 --- a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl +++ b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl @@ -40,7 +40,7 @@ workflow MultiSampleSmartSeq2SingleNucleus { String? input_id_metadata_field } # Version of this pipeline - String pipeline_version = "1.2.28" + String pipeline_version = "1.3.0" if (false) { String? none = "None" diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 8ab0c8d615..81f6668c42 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -507,33 +507,60 @@ task MergeStarOutput { declare -a align_features_files=(~{sep=' ' align_features}) declare -a umipercell_files=(~{sep=' ' umipercell}) - for cell_read in "${cell_reads_files[@]}"; do - if [ -f "$cell_read" ]; then - cat "$cell_read" >> "~{input_id}_cell_reads.txt" - fi - done + if [ -f "${cell_reads_files[0]}" ]; then + + # Destination file for cell reads + dest="~{input_id}_cell_reads.txt" + # first create the header from the first file in the list, and add a column header for the shard id + head -n 1 "${cell_reads_files[0]}" | awk '{print $0 "\tshard_number"}' > "$dest" + + # Loop through the array and add the second row with shard number to a temp file notinpasslist.txt + for index in "${!cell_reads_files[@]}"; do + secondLine=$(sed -n '2p' "${cell_reads_files[$index]}") + echo -e "$secondLine\t$index" >> "notinpasslist.txt" + done + + # add notinpasslist.txt to the destination file and delete the notinpasslist.txt + cat "notinpasslist.txt" >> "$dest" + rm notinpasslist.txt + + # now add the shard id to the matrix in a temporary matrix file, and skip the first two lines + counter=0 + for cell_read in "${cell_reads_files[@]}"; do + if [ -f "$cell_read" ]; then + awk -v var="$counter" 'NR>2 {print $0 "\t" var}' "$cell_read" >> "matrix.txt" + let counter=counter+1 + fi + done + + # add the matrix to the destination file, then delete the matrix file + cat "matrix.txt" >> "$dest" + rm "matrix.txt" + fi + + counter=0 for summary in "${summary_files[@]}"; do if [ -f "$summary" ]; then - cat "$summary" >> "~{input_id}_summary.txt" + awk -v var=",$counter" '{print $0 var}' "$summary" >> "~{input_id}_summary.txt" + let counter=counter+1 fi done + counter=0 for align_feature in "${align_features_files[@]}"; do if [ -f "$align_feature" ]; then - cat "$align_feature" >> "~{input_id}_align_features.txt" - fi - done - - for umipercell in "${umipercell_files[@]}"; do - if [ -f "$umipercell" ]; then - cat "$umipercell" >> "~{input_id}_umipercell.txt" + awk -v var="$counter" '{print $0 " " var}' "$align_feature" >> "~{input_id}_align_features.txt" + let counter=counter+1 fi done + # note that the counter might not correspond to the shard number, it is just the order of files in bash (e.g. 10 before 2) + counter=0 for umipercell in "${umipercell_files[@]}"; do if [ -f "$umipercell" ]; then - cat "$umipercell" >> "~{input_id}_umipercell.txt" + awk -v var="$counter" '{print $0, var}' "$umipercell" >> "~{input_id}_umipercell.txt" + let counter=counter+1 fi done diff --git a/website/docs/Pipelines/Optimus_Pipeline/README.md b/website/docs/Pipelines/Optimus_Pipeline/README.md index 2f494c7e06..54a4cf43fd 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/README.md +++ b/website/docs/Pipelines/Optimus_Pipeline/README.md @@ -254,7 +254,7 @@ The following table lists the output files produced from the pipeline. For sampl | matrix_col_index | `_sparse_counts_col_index.npy` | Index of genes in count matrix. | NPY | | cell_metrics | `.cell-metrics.csv.gz` | Matrix of metrics by cells. | Compressed CSV | | gene_metrics | `.gene-metrics.csv.gz` | Matrix of metrics by genes. | Compressed CSV | -| aligner_metrics | `.cell_reads.txt` | Per barcode metrics (CellReads.stats) produced by the STARsolo aligner. | TXT | +| aligner_metrics | `.star_metrics.tar` | Tarred metrics files produced by the STARsolo aligner; contains align features, cell reads, summary, and UMI per cell metrics files. | TXT | | multimappers_EM_matrix | `UniqueAndMult-EM.mtx` | Optional output produced when `soloMultiMappers` is "EM"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | | multimappers_Uniform_matrix | `UniqueAndMult-Uniform.mtx` | Optional output produced when `soloMultiMappers` is "Uniform"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | | multimappers_Rescue_matrix | `UniqueAndMult-Rescue.mtx` | Optional output produced when `soloMultiMappers` is "Rescue"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/README.md b/website/docs/Pipelines/SlideSeq_Pipeline/README.md index 9ef0004d98..f0132571db 100644 --- a/website/docs/Pipelines/SlideSeq_Pipeline/README.md +++ b/website/docs/Pipelines/SlideSeq_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/SlideSeq_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [SlideSeq v3.0.1](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | +| [SlideSeq v3.1.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | ![SlideSeq_diagram](./slide-seq_diagram.png) diff --git a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md index e21fe808ee..4cb42c4cf6 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [MultiSampleSmartSeq2SingleNuclei_v1.2.28](https://github.com/broadinstitute/warp/releases) | January, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [MultiSampleSmartSeq2SingleNuclei_v1.3.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![](./snSS2.png) diff --git a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md index 5239ba7f97..8ab56b15bd 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md +++ b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md @@ -2,13 +2,13 @@ sidebar_position: 2 --- -# Smart-seq2 Single Nucleus Multi-Sample v1.2.28 Publication Methods +# Smart-seq2 Single Nucleus Multi-Sample v1.3.0 Publication Methods Below we provide an example methods section for a publication. For the complete pipeline documentation, see the [Smart-seq2 Single Nucleus Multi-Sample Overview](./README.md). ## Methods -Data preprocessing and count matrix construction for a batch (or plate) were performed using the Smart-seq2 Single Nucleus Multi-Sample v1.2.28 Pipeline (RRID:SCR_021312) as well as Picard v.2.26.10 with default tool parameters unless otherwise specified. Genomic references are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/mm10/v0/single_nucleus?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in the [example workflow configuration](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_nucleus_multisample/mouse_example.json) in GitHub. +Data preprocessing and count matrix construction for a batch (or plate) were performed using the Smart-seq2 Single Nucleus Multi-Sample v1.3.0 Pipeline (RRID:SCR_021312) as well as Picard v.2.26.10 with default tool parameters unless otherwise specified. Genomic references are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/mm10/v0/single_nucleus?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in the [example workflow configuration](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_nucleus_multisample/mouse_example.json) in GitHub. For each nucleus in the batch, paired-end FASTQ files were first trimmed to remove adapters using the fastq-mcf tool with a subsampling parameter of 200,000 reads. The trimmed FASTQ files were then aligned to the GENCODE GRCm38 mouse genome using STAR v.2.7.10a. To count the number of reads per gene, but not isoforms, the quantMode parameter was set to GeneCounts. Multi-mapped reads, and optical and PCR duplicates, were removed from the resulting aligned BAM using the Picard MarkDuplicates tool with REMOVE_DUPLICATES = true. Metrics were collected on the deduplicated BAM using Picard CollectMultipleMetrics with VALIDATION_STRINGENCY =SILENT. From 89fd5ec806fbcd49b261728182938ae9bcbb1c59 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Wed, 28 Feb 2024 09:57:06 -0500 Subject: [PATCH 36/68] changed path for created-merged-npz-output the pytools docker was previously used for this task --- tasks/skylab/StarAlign.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index e7580a8f3b..40354daa3e 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -574,7 +574,7 @@ task MergeStarOutput { fi # create the compressed raw count matrix with the counts, gene names and the barcodes - python3 /usr/gitc/create-merged-npz-output.py \ + python3 /warptools/scripts/create-merged-npz-output.py \ --barcodes ${barcodes_files[@]} \ --features ${features_files[@]} \ --matrix ${matrix_files[@]} \ From 664e47b042285c7cfc9c7283a3cc6377c9c0253d Mon Sep 17 00:00:00 2001 From: ekiernan Date: Wed, 28 Feb 2024 10:21:45 -0500 Subject: [PATCH 37/68] adding monitoring --- tasks/skylab/StarAlign.wdl | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 40354daa3e..363628cfa8 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -567,6 +567,8 @@ task MergeStarOutput { # If text files are present, create a tar archive with them and run python script to combine shard metrics if ls *.txt 1> /dev/null 2>&1; then + echo "listing files" + ls python3 /warptools/scripts/combine_shard_metrics.py ~{input_id}_summary.txt ~{input_id}_align_features.txt ~{input_id}_cell_reads.txt ~{counting_mode} ~{input_id} tar -zcvf ~{input_id}.star_metrics.tar *.txt else From 1f44294e3822defc5583fb2f1b978a65bea248d3 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Wed, 28 Feb 2024 15:30:24 -0500 Subject: [PATCH 38/68] made counting mode optional because slideseq doesn't use --- tasks/skylab/StarAlign.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 363628cfa8..0375ee0fc1 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -475,7 +475,7 @@ task MergeStarOutput { Array[File]? summary Array[File]? align_features Array[File]? umipercell - String counting_mode + String? counting_mode String input_id From 1f255ca471140de23d3067b3b727098dccf31d30 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Wed, 28 Feb 2024 15:42:48 -0500 Subject: [PATCH 39/68] wdl versions and changelog entries --- pipelines/skylab/multiome/Multiome.changelog.md | 6 ++++++ pipelines/skylab/multiome/Multiome.wdl | 3 ++- pipelines/skylab/optimus/Optimus.changelog.md | 5 +++++ pipelines/skylab/optimus/Optimus.wdl | 4 +++- pipelines/skylab/slideseq/SlideSeq.changelog.md | 5 +++++ pipelines/skylab/slideseq/SlideSeq.wdl | 2 +- 6 files changed, 22 insertions(+), 3 deletions(-) diff --git a/pipelines/skylab/multiome/Multiome.changelog.md b/pipelines/skylab/multiome/Multiome.changelog.md index 9e57b1917f..a1226bf184 100644 --- a/pipelines/skylab/multiome/Multiome.changelog.md +++ b/pipelines/skylab/multiome/Multiome.changelog.md @@ -1,5 +1,11 @@ +# 3.3.0 +2024-02-28 (Date of Last Commit) + +* Added the gene expression library-level metrics CSV as output of the Multiome pipeline; this produced by the Optimus subworkflow + # 3.2.0 2024-02-22 (Date of Last Commit) + * Updated StarAlign.MergeStarOutput to add a shard number to the metrics files * Removed ref_genome_fasta input from Multiome WDL and JSON diff --git a/pipelines/skylab/multiome/Multiome.wdl b/pipelines/skylab/multiome/Multiome.wdl index 8c44dedebc..c25a291ab8 100644 --- a/pipelines/skylab/multiome/Multiome.wdl +++ b/pipelines/skylab/multiome/Multiome.wdl @@ -6,7 +6,7 @@ import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "https://raw.githubusercontent.com/broadinstitute/CellBender/v0.3.0/wdl/cellbender_remove_background.wdl" as CellBender workflow Multiome { - String pipeline_version = "3.2.0" + String pipeline_version = "3.3.0" input { String input_id @@ -141,6 +141,7 @@ workflow Multiome { Array[File?] multimappers_Rescue_matrix = Optimus.multimappers_Rescue_matrix Array[File?] multimappers_PropUnique_matrix = Optimus.multimappers_PropUnique_matrix File? gex_aligner_metrics = Optimus.aligner_metrics + File? library_metrics = Optimus.library_metrics # cellbender outputs File? cell_barcodes_csv = CellBender.cell_csv diff --git a/pipelines/skylab/optimus/Optimus.changelog.md b/pipelines/skylab/optimus/Optimus.changelog.md index d93c3d3610..1fb04b3ec2 100644 --- a/pipelines/skylab/optimus/Optimus.changelog.md +++ b/pipelines/skylab/optimus/Optimus.changelog.md @@ -1,3 +1,8 @@ +# 6.5.0 +2024-02-28 (Date of Last Commit) + +* Added a library-level metrics CSV as output of the Optimus workflow; this iteration includes read-level metrics + # 6.4.0 2024-02-21 (Date of Last Commit) * Updated StarAlign.MergeStarOutput to add a shard number to the metrics files diff --git a/pipelines/skylab/optimus/Optimus.wdl b/pipelines/skylab/optimus/Optimus.wdl index b10beffe98..ca13fb11aa 100644 --- a/pipelines/skylab/optimus/Optimus.wdl +++ b/pipelines/skylab/optimus/Optimus.wdl @@ -64,7 +64,7 @@ workflow Optimus { # version of this pipeline - String pipeline_version = "6.4.0" + String pipeline_version = "6.5.0" # this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays Array[Int] indices = range(length(r1_fastq)) @@ -240,10 +240,12 @@ workflow Optimus { File gene_metrics = GeneMetrics.gene_metrics File? cell_calls = RunEmptyDrops.empty_drops_result File? aligner_metrics = MergeStarOutputs.cell_reads_out + File? library_metrics = MergeStarOutputs.library_metrics Array[File?] multimappers_EM_matrix = STARsoloFastq.multimappers_EM_matrix Array[File?] multimappers_Uniform_matrix = STARsoloFastq.multimappers_Uniform_matrix Array[File?] multimappers_Rescue_matrix = STARsoloFastq.multimappers_Rescue_matrix Array[File?] multimappers_PropUnique_matrix = STARsoloFastq.multimappers_PropUnique_matrix + # h5ad File h5ad_output_file = final_h5ad_output diff --git a/pipelines/skylab/slideseq/SlideSeq.changelog.md b/pipelines/skylab/slideseq/SlideSeq.changelog.md index 1746cc4257..c556a7c5bd 100644 --- a/pipelines/skylab/slideseq/SlideSeq.changelog.md +++ b/pipelines/skylab/slideseq/SlideSeq.changelog.md @@ -1,3 +1,8 @@ +# 3.1.1 +2024-02-28 (Date of Last Commit) + +* Updated the Optimus workflow to produce a library-level metrics CSV; this does not impact the slide-seq pipeline + # 3.1.0 2024-02-07 (Date of Last Commit) diff --git a/pipelines/skylab/slideseq/SlideSeq.wdl b/pipelines/skylab/slideseq/SlideSeq.wdl index ce033d33b0..66f6001da8 100644 --- a/pipelines/skylab/slideseq/SlideSeq.wdl +++ b/pipelines/skylab/slideseq/SlideSeq.wdl @@ -23,7 +23,7 @@ import "../../../tasks/skylab/MergeSortBam.wdl" as Merge workflow SlideSeq { - String pipeline_version = "3.1.0" + String pipeline_version = "3.1.1" input { Array[File] r1_fastq From d5b9920f9566b69430408f67d048d28c0783b206 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Wed, 28 Feb 2024 15:48:35 -0500 Subject: [PATCH 40/68] more wdl versions and changelogs --- pipelines/skylab/multiome/Multiome.changelog.md | 2 +- pipelines/skylab/paired_tag/PairedTag.changelog.md | 5 +++++ pipelines/skylab/paired_tag/PairedTag.wdl | 3 ++- .../MultiSampleSmartSeq2SingleNucleus.changelog.md | 5 +++++ .../MultiSampleSmartSeq2SingleNucleus.wdl | 2 +- 5 files changed, 14 insertions(+), 3 deletions(-) diff --git a/pipelines/skylab/multiome/Multiome.changelog.md b/pipelines/skylab/multiome/Multiome.changelog.md index a1226bf184..6b24e21fc4 100644 --- a/pipelines/skylab/multiome/Multiome.changelog.md +++ b/pipelines/skylab/multiome/Multiome.changelog.md @@ -1,7 +1,7 @@ # 3.3.0 2024-02-28 (Date of Last Commit) -* Added the gene expression library-level metrics CSV as output of the Multiome pipeline; this produced by the Optimus subworkflow +* Added the gene expression library-level metrics CSV as output of the Multiome pipeline; this is produced by the Optimus subworkflow # 3.2.0 2024-02-22 (Date of Last Commit) diff --git a/pipelines/skylab/paired_tag/PairedTag.changelog.md b/pipelines/skylab/paired_tag/PairedTag.changelog.md index 811b073097..ac73cc370b 100644 --- a/pipelines/skylab/paired_tag/PairedTag.changelog.md +++ b/pipelines/skylab/paired_tag/PairedTag.changelog.md @@ -1,3 +1,8 @@ +# 0.2.0 +2024-02-28 (Date of Last Commit) + +* Added the gene expression library-level metrics CSV as output of the Paired-tag pipeline; this is produced by the Optimus subworkflow + # 0.1.0 2024-02-22 (Date of Last Commit) diff --git a/pipelines/skylab/paired_tag/PairedTag.wdl b/pipelines/skylab/paired_tag/PairedTag.wdl index ed9821d0dc..243168e388 100644 --- a/pipelines/skylab/paired_tag/PairedTag.wdl +++ b/pipelines/skylab/paired_tag/PairedTag.wdl @@ -5,7 +5,7 @@ import "../../../pipelines/skylab/optimus/Optimus.wdl" as optimus import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "../../../tasks/skylab/PairedTagUtils.wdl" as Demultiplexing workflow PairedTag { - String pipeline_version = "0.1.0" + String pipeline_version = "0.2.0" input { String input_id @@ -125,5 +125,6 @@ workflow PairedTag { File gene_metrics_gex = Optimus.gene_metrics File? cell_calls_gex = Optimus.cell_calls File h5ad_output_file_gex = Optimus.h5ad_output_file + File library_metrics = Optimus.library_metrics } } diff --git a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md index 64b516e8b9..1d030b4af5 100644 --- a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md +++ b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md @@ -1,3 +1,8 @@ +# 1.3.1 +2024-02-28 (Date of Last Commit) + +* Updated the Optimus workflow to produce a library-level metrics CSV; this does not impact the Single-nucleus Multi Sample Smart-seq2 pipeline + # 1.3.0 2024-01-22 (Date of Last Commit) diff --git a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl index 7a4c1066f8..de5824ae13 100644 --- a/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl +++ b/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.wdl @@ -40,7 +40,7 @@ workflow MultiSampleSmartSeq2SingleNucleus { String? input_id_metadata_field } # Version of this pipeline - String pipeline_version = "1.3.0" + String pipeline_version = "1.3.1" if (false) { String? none = "None" From 2d47df72ebf0ed913f4a3c34eac506c9c1fac35e Mon Sep 17 00:00:00 2001 From: ekiernan Date: Wed, 28 Feb 2024 15:58:59 -0500 Subject: [PATCH 41/68] made library_metrics optional for paired-tag --- pipelines/skylab/paired_tag/PairedTag.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pipelines/skylab/paired_tag/PairedTag.wdl b/pipelines/skylab/paired_tag/PairedTag.wdl index 243168e388..e6dc936ba7 100644 --- a/pipelines/skylab/paired_tag/PairedTag.wdl +++ b/pipelines/skylab/paired_tag/PairedTag.wdl @@ -125,6 +125,6 @@ workflow PairedTag { File gene_metrics_gex = Optimus.gene_metrics File? cell_calls_gex = Optimus.cell_calls File h5ad_output_file_gex = Optimus.h5ad_output_file - File library_metrics = Optimus.library_metrics + File? library_metrics = Optimus.library_metrics } } From 4aa98e8b6daf3dd7073fdc7d184d9e2ea526e2f1 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Wed, 28 Feb 2024 16:55:09 -0500 Subject: [PATCH 42/68] Update Optimus.wdl --- pipelines/skylab/optimus/Optimus.wdl | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/pipelines/skylab/optimus/Optimus.wdl b/pipelines/skylab/optimus/Optimus.wdl index ca13fb11aa..16add1c67d 100644 --- a/pipelines/skylab/optimus/Optimus.wdl +++ b/pipelines/skylab/optimus/Optimus.wdl @@ -205,6 +205,11 @@ workflow Optimus { cell_reads = STARsoloFastq.cell_reads_sn_rna, input_id = input_id, counting_mode = counting_mode + summary = STARsoloFastq.summary_sn_rna, + align_features = STARsoloFastq.align_features_sn_rna, + umipercell = STARsoloFastq.umipercell_sn_rna, + input_id = input_id, + counting_mode = counting_mode } call H5adUtils.SingleNucleusOptimusH5adOutput as OptimusH5adGenerationWithExons{ input: From 6a8aa3e8b5c3b358ac4fa948c24b265be90b7f9b Mon Sep 17 00:00:00 2001 From: ekiernan Date: Wed, 28 Feb 2024 16:55:32 -0500 Subject: [PATCH 43/68] Update Optimus.wdl --- pipelines/skylab/optimus/Optimus.wdl | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/pipelines/skylab/optimus/Optimus.wdl b/pipelines/skylab/optimus/Optimus.wdl index 16add1c67d..a4e710b28b 100644 --- a/pipelines/skylab/optimus/Optimus.wdl +++ b/pipelines/skylab/optimus/Optimus.wdl @@ -204,12 +204,11 @@ workflow Optimus { matrix = STARsoloFastq.matrix_sn_rna, cell_reads = STARsoloFastq.cell_reads_sn_rna, input_id = input_id, - counting_mode = counting_mode + counting_mode = counting_mode, summary = STARsoloFastq.summary_sn_rna, align_features = STARsoloFastq.align_features_sn_rna, umipercell = STARsoloFastq.umipercell_sn_rna, - input_id = input_id, - counting_mode = counting_mode + input_id = input_id } call H5adUtils.SingleNucleusOptimusH5adOutput as OptimusH5adGenerationWithExons{ input: From 381c9c86e237035a13fe843d3361f4872a229a45 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Thu, 29 Feb 2024 09:05:19 -0500 Subject: [PATCH 44/68] hard coding counting mode if count_exons is true for MergeStarOutputs --- pipelines/skylab/optimus/Optimus.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pipelines/skylab/optimus/Optimus.wdl b/pipelines/skylab/optimus/Optimus.wdl index a4e710b28b..3d12f0cabe 100644 --- a/pipelines/skylab/optimus/Optimus.wdl +++ b/pipelines/skylab/optimus/Optimus.wdl @@ -204,7 +204,7 @@ workflow Optimus { matrix = STARsoloFastq.matrix_sn_rna, cell_reads = STARsoloFastq.cell_reads_sn_rna, input_id = input_id, - counting_mode = counting_mode, + counting_mode = "sc_rna", summary = STARsoloFastq.summary_sn_rna, align_features = STARsoloFastq.align_features_sn_rna, umipercell = STARsoloFastq.umipercell_sn_rna, From fa1a2fc5594079c06adc2d57936c6a6d70a0cf09 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Thu, 29 Feb 2024 10:17:31 -0500 Subject: [PATCH 45/68] official docker update --- tasks/skylab/StarAlign.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 0375ee0fc1..2107836c16 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -480,7 +480,7 @@ task MergeStarOutput { String input_id #runtime values - String docker = "us.gcr.io/broad-gotc-prod/warp-tools:lk-PD-2518-shard-metrics" + String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.2-1709218388" Int machine_mem_gb = 20 Int cpu = 1 Int disk = ceil(size(matrix, "Gi") * 2) + 10 From e94d450e507e61210764a45127963e4733d56d0c Mon Sep 17 00:00:00 2001 From: ekiernan Date: Thu, 29 Feb 2024 10:25:27 -0500 Subject: [PATCH 46/68] added doc updates for library-level metrics --- website/docs/Pipelines/Multiome_Pipeline/README.md | 3 ++- website/docs/Pipelines/Optimus_Pipeline/README.md | 3 ++- website/docs/Pipelines/PairedTag_Pipeline/README.md | 3 ++- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/website/docs/Pipelines/Multiome_Pipeline/README.md b/website/docs/Pipelines/Multiome_Pipeline/README.md index 354c951a5e..2d3434020c 100644 --- a/website/docs/Pipelines/Multiome_Pipeline/README.md +++ b/website/docs/Pipelines/Multiome_Pipeline/README.md @@ -8,7 +8,7 @@ slug: /Pipelines/Multiome_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [Multiome v3.2.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org) | +| [Multiome v3.3.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org) | ![Multiome_diagram](./multiome_diagram.png) @@ -120,6 +120,7 @@ The Multiome workflow calls two WARP subworkflows, one external subworkflow (opt | multimappers_Rescue_matrix | `UniqueAndMult-Rescue.mtx` | Optional output produced when `soloMultiMappers` is "Rescue"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | | multimappers_PropUnique_matrix | `UniqueAndMult-PropUnique.mtx` | Optional output produced when `soloMultiMappers` is "PropUnique"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information.| | gex_aligner_metrics | `.star_metrics.tar` | Text file containing per barcode metrics (`CellReads.stats`) produced by the GEX pipeline STARsolo aligner. | +| library_metrics | `_library_metrics.csv` | CSV file with all library-level metrics calculated from STARsolo for gene expression data. | | cell_barcodes_csv | `` | Optional output produced when `run_cellbender` is "true"; see CellBender [documentation](https://cellbender.readthedocs.io/en/latest/usage/index.html) and [GitHub repository](https://github.com/broadinstitute/CellBender/tree/master) for more information.| | checkpoint_file | `` | Optional output produced when `run_cellbender` is "true"; see CellBender [documentation](https://cellbender.readthedocs.io/en/latest/usage/index.html) and [GitHub repository](https://github.com/broadinstitute/CellBender/tree/master) for more information. | | h5_array | `` | Optional output produced when `run_cellbender` is "true"; see CellBender [documentation](https://cellbender.readthedocs.io/en/latest/usage/index.html) and [GitHub repository](https://github.com/broadinstitute/CellBender/tree/master) for more information. | diff --git a/website/docs/Pipelines/Optimus_Pipeline/README.md b/website/docs/Pipelines/Optimus_Pipeline/README.md index 54a4cf43fd..50067739ce 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/README.md +++ b/website/docs/Pipelines/Optimus_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Optimus_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [optimus_v6.4.0](https://github.com/broadinstitute/warp/releases?q=optimus&expanded=true) | February, 2024 | Elizabeth Kiernan | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [optimus_v6.5.0](https://github.com/broadinstitute/warp/releases?q=optimus&expanded=true) | February, 2024 | Elizabeth Kiernan | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![Optimus_diagram](Optimus_diagram.png) @@ -255,6 +255,7 @@ The following table lists the output files produced from the pipeline. For sampl | cell_metrics | `.cell-metrics.csv.gz` | Matrix of metrics by cells. | Compressed CSV | | gene_metrics | `.gene-metrics.csv.gz` | Matrix of metrics by genes. | Compressed CSV | | aligner_metrics | `.star_metrics.tar` | Tarred metrics files produced by the STARsolo aligner; contains align features, cell reads, summary, and UMI per cell metrics files. | TXT | +| library_metrics | `_library_metrics.csv` | CSV file with all library-level metrics calculated from STARsolo for gene expression data. | CSV | | multimappers_EM_matrix | `UniqueAndMult-EM.mtx` | Optional output produced when `soloMultiMappers` is "EM"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | | multimappers_Uniform_matrix | `UniqueAndMult-Uniform.mtx` | Optional output produced when `soloMultiMappers` is "Uniform"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | | multimappers_Rescue_matrix | `UniqueAndMult-Rescue.mtx` | Optional output produced when `soloMultiMappers` is "Rescue"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | diff --git a/website/docs/Pipelines/PairedTag_Pipeline/README.md b/website/docs/Pipelines/PairedTag_Pipeline/README.md index 4114516ba3..69d1cac9b6 100644 --- a/website/docs/Pipelines/PairedTag_Pipeline/README.md +++ b/website/docs/Pipelines/PairedTag_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/PairedTag_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [PairedTag_v0.1.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | +| [PairedTag_v0.2.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the Paired-Tag workflow @@ -117,6 +117,7 @@ The Paired-Tag workflow calls two WARP subworkflows and an additional task which | gene_metrics_gex | `_gex.gene_metrics.csv.gz` | CSV file containing the per-gene metrics. | | cell_calls_gex | `_gex.emptyDrops` | TSV file containing the EmptyDrops results when the Optimus workflow is run in sc_rna mode. | | h5ad_output_file_gex | `_gex.h5ad` | h5ad (Anndata) file containing the raw cell-by-gene count matrix, gene metrics, cell metrics, and global attributes. See the [Optimus Count Matrix Overview](../Optimus_Pipeline/Loom_schema.md) for more details. | +| library_metrics | `_library_metrics.csv` | CSV file with all library-level metrics calculated from STARsolo for gene expression data. | ## Versioning and testing From 99c5ad6560cfa60f9210d231259dd8279c8a422c Mon Sep 17 00:00:00 2001 From: ekiernan <55763654+ekiernan@users.noreply.github.com> Date: Thu, 29 Feb 2024 11:56:19 -0500 Subject: [PATCH 47/68] joinbarcodes disk and mem inputs --- tasks/skylab/H5adUtils.wdl | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/tasks/skylab/H5adUtils.wdl b/tasks/skylab/H5adUtils.wdl index 9816107d92..18fed45fc1 100644 --- a/tasks/skylab/H5adUtils.wdl +++ b/tasks/skylab/H5adUtils.wdl @@ -193,14 +193,13 @@ task JoinMultiomeBarcodes { Int nthreads = 1 String cpuPlatform = "Intel Cascade Lake" + Int machine_mem_mb = ceil((size(atac_h5ad, "MiB") + size(gex_h5ad, "MiB") + size(atac_fragment, "MiB")) * 3) + 10000 + Int disk = ceil((size(atac_h5ad, "GiB") + size(gex_h5ad, "GiB") + size(atac_fragment, "GiB")) * 5) + 10 } String gex_base_name = basename(gex_h5ad, ".h5ad") String atac_base_name = basename(atac_h5ad, ".h5ad") String atac_fragment_base = basename(atac_fragment, ".tsv") - Int machine_mem_mb = ceil((size(atac_h5ad, "MiB") + size(gex_h5ad, "MiB") + size(atac_fragment, "MiB")) * 3) + 10000 - Int disk = ceil((size(atac_h5ad, "GiB") + size(gex_h5ad, "GiB") + size(atac_fragment, "GiB")) * 5) + 10 - parameter_meta { atac_h5ad: "The resulting h5ad from the ATAC workflow." atac_fragment: "The resulting fragment TSV from the ATAC workflow." From aeef6dd4674e2dad30926951cf48b81144596899 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Thu, 29 Feb 2024 11:59:46 -0500 Subject: [PATCH 48/68] Update Multiome.changelog.md --- pipelines/skylab/multiome/Multiome.changelog.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/pipelines/skylab/multiome/Multiome.changelog.md b/pipelines/skylab/multiome/Multiome.changelog.md index 9e57b1917f..da8bc38753 100644 --- a/pipelines/skylab/multiome/Multiome.changelog.md +++ b/pipelines/skylab/multiome/Multiome.changelog.md @@ -1,3 +1,8 @@ +# 3.2.1 +2024-02-29 (Date of Last Commit) + +* Moved the disk and mem for the Multiome Join Barcodes task into the task inputs section + # 3.2.0 2024-02-22 (Date of Last Commit) * Updated StarAlign.MergeStarOutput to add a shard number to the metrics files From fe95f704ef551ee5e4a637fb926044fbd08a9a4a Mon Sep 17 00:00:00 2001 From: ekiernan Date: Thu, 29 Feb 2024 13:08:41 -0500 Subject: [PATCH 49/68] Update Multiome.wdl --- pipelines/skylab/multiome/Multiome.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pipelines/skylab/multiome/Multiome.wdl b/pipelines/skylab/multiome/Multiome.wdl index 8c44dedebc..1e6bc2edae 100644 --- a/pipelines/skylab/multiome/Multiome.wdl +++ b/pipelines/skylab/multiome/Multiome.wdl @@ -6,7 +6,7 @@ import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "https://raw.githubusercontent.com/broadinstitute/CellBender/v0.3.0/wdl/cellbender_remove_background.wdl" as CellBender workflow Multiome { - String pipeline_version = "3.2.0" + String pipeline_version = "3.2.1" input { String input_id From 4f804b501e70a36c326acb8a2e7a5ef20a602ffa Mon Sep 17 00:00:00 2001 From: ekiernan Date: Thu, 29 Feb 2024 13:18:25 -0500 Subject: [PATCH 50/68] changelog updates --- pipelines/skylab/optimus/Optimus.changelog.md | 4 ++++ pipelines/skylab/optimus/Optimus.wdl | 2 +- pipelines/skylab/paired_tag/PairedTag.changelog.md | 4 ++++ pipelines/skylab/paired_tag/PairedTag.wdl | 2 +- pipelines/skylab/slideseq/SlideSeq.changelog.md | 4 ++++ pipelines/skylab/slideseq/SlideSeq.wdl | 2 +- 6 files changed, 15 insertions(+), 3 deletions(-) diff --git a/pipelines/skylab/optimus/Optimus.changelog.md b/pipelines/skylab/optimus/Optimus.changelog.md index d93c3d3610..23098dd7a0 100644 --- a/pipelines/skylab/optimus/Optimus.changelog.md +++ b/pipelines/skylab/optimus/Optimus.changelog.md @@ -1,3 +1,7 @@ +# 6.4.1 +2024-02-29 (Date of Last Commit) +* Added mem and disk to inputs of Join Barcodes task of Multiome workflow; does not impact the Optimus workflow + # 6.4.0 2024-02-21 (Date of Last Commit) * Updated StarAlign.MergeStarOutput to add a shard number to the metrics files diff --git a/pipelines/skylab/optimus/Optimus.wdl b/pipelines/skylab/optimus/Optimus.wdl index db8a6cef60..159490afbf 100644 --- a/pipelines/skylab/optimus/Optimus.wdl +++ b/pipelines/skylab/optimus/Optimus.wdl @@ -64,7 +64,7 @@ workflow Optimus { # version of this pipeline - String pipeline_version = "6.4.0" + String pipeline_version = "6.4.1" # this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays Array[Int] indices = range(length(r1_fastq)) diff --git a/pipelines/skylab/paired_tag/PairedTag.changelog.md b/pipelines/skylab/paired_tag/PairedTag.changelog.md index 811b073097..8a7d095a09 100644 --- a/pipelines/skylab/paired_tag/PairedTag.changelog.md +++ b/pipelines/skylab/paired_tag/PairedTag.changelog.md @@ -1,3 +1,7 @@ +# 0.2.1 +2024-02-29 (Date of Last Commit) +* Added mem and disk to inputs of Join Barcodes task of Multiome workflow; does not impact the Paired-tag workflow + # 0.1.0 2024-02-22 (Date of Last Commit) diff --git a/pipelines/skylab/paired_tag/PairedTag.wdl b/pipelines/skylab/paired_tag/PairedTag.wdl index ed9821d0dc..eb11e9acc4 100644 --- a/pipelines/skylab/paired_tag/PairedTag.wdl +++ b/pipelines/skylab/paired_tag/PairedTag.wdl @@ -5,7 +5,7 @@ import "../../../pipelines/skylab/optimus/Optimus.wdl" as optimus import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "../../../tasks/skylab/PairedTagUtils.wdl" as Demultiplexing workflow PairedTag { - String pipeline_version = "0.1.0" + String pipeline_version = "0.2.0" input { String input_id diff --git a/pipelines/skylab/slideseq/SlideSeq.changelog.md b/pipelines/skylab/slideseq/SlideSeq.changelog.md index 1746cc4257..e041750353 100644 --- a/pipelines/skylab/slideseq/SlideSeq.changelog.md +++ b/pipelines/skylab/slideseq/SlideSeq.changelog.md @@ -1,3 +1,7 @@ +# 3.1.1 +2024-02-29 (Date of Last Commit) +* Added mem and disk to inputs of Join Barcodes task of Multiome workflow; does not impact the Slideseq workflow + # 3.1.0 2024-02-07 (Date of Last Commit) diff --git a/pipelines/skylab/slideseq/SlideSeq.wdl b/pipelines/skylab/slideseq/SlideSeq.wdl index ce033d33b0..66f6001da8 100644 --- a/pipelines/skylab/slideseq/SlideSeq.wdl +++ b/pipelines/skylab/slideseq/SlideSeq.wdl @@ -23,7 +23,7 @@ import "../../../tasks/skylab/MergeSortBam.wdl" as Merge workflow SlideSeq { - String pipeline_version = "3.1.0" + String pipeline_version = "3.1.1" input { Array[File] r1_fastq From 8d7713497a2ec5a2a1776309547c71db9618fccf Mon Sep 17 00:00:00 2001 From: ekiernan Date: Thu, 29 Feb 2024 13:19:49 -0500 Subject: [PATCH 51/68] Update PairedTag.changelog.md --- pipelines/skylab/paired_tag/PairedTag.changelog.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pipelines/skylab/paired_tag/PairedTag.changelog.md b/pipelines/skylab/paired_tag/PairedTag.changelog.md index 8a7d095a09..17255ab77f 100644 --- a/pipelines/skylab/paired_tag/PairedTag.changelog.md +++ b/pipelines/skylab/paired_tag/PairedTag.changelog.md @@ -1,4 +1,4 @@ -# 0.2.1 +# 0.2.0 2024-02-29 (Date of Last Commit) * Added mem and disk to inputs of Join Barcodes task of Multiome workflow; does not impact the Paired-tag workflow From b58faa8f7172062cd797c2cf6095c5eceec30485 Mon Sep 17 00:00:00 2001 From: ekiernan <55763654+ekiernan@users.noreply.github.com> Date: Thu, 29 Feb 2024 13:33:00 -0500 Subject: [PATCH 52/68] Update website/docs/Pipelines/Multiome_Pipeline/README.md Co-authored-by: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> --- website/docs/Pipelines/Multiome_Pipeline/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/Pipelines/Multiome_Pipeline/README.md b/website/docs/Pipelines/Multiome_Pipeline/README.md index 2d3434020c..f9c860157f 100644 --- a/website/docs/Pipelines/Multiome_Pipeline/README.md +++ b/website/docs/Pipelines/Multiome_Pipeline/README.md @@ -120,7 +120,7 @@ The Multiome workflow calls two WARP subworkflows, one external subworkflow (opt | multimappers_Rescue_matrix | `UniqueAndMult-Rescue.mtx` | Optional output produced when `soloMultiMappers` is "Rescue"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | | multimappers_PropUnique_matrix | `UniqueAndMult-PropUnique.mtx` | Optional output produced when `soloMultiMappers` is "PropUnique"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information.| | gex_aligner_metrics | `.star_metrics.tar` | Text file containing per barcode metrics (`CellReads.stats`) produced by the GEX pipeline STARsolo aligner. | -| library_metrics | `_library_metrics.csv` | CSV file with all library-level metrics calculated from STARsolo for gene expression data. | +| library_metrics | `_library_metrics.csv` | Optional CSV file containing all library-level metrics calculated with STARsolo for gene expression data. | | cell_barcodes_csv | `` | Optional output produced when `run_cellbender` is "true"; see CellBender [documentation](https://cellbender.readthedocs.io/en/latest/usage/index.html) and [GitHub repository](https://github.com/broadinstitute/CellBender/tree/master) for more information.| | checkpoint_file | `` | Optional output produced when `run_cellbender` is "true"; see CellBender [documentation](https://cellbender.readthedocs.io/en/latest/usage/index.html) and [GitHub repository](https://github.com/broadinstitute/CellBender/tree/master) for more information. | | h5_array | `` | Optional output produced when `run_cellbender` is "true"; see CellBender [documentation](https://cellbender.readthedocs.io/en/latest/usage/index.html) and [GitHub repository](https://github.com/broadinstitute/CellBender/tree/master) for more information. | From 3746f91233befe2ac357ba0d633041a526da4cfc Mon Sep 17 00:00:00 2001 From: ekiernan <55763654+ekiernan@users.noreply.github.com> Date: Thu, 29 Feb 2024 13:33:08 -0500 Subject: [PATCH 53/68] Update website/docs/Pipelines/Optimus_Pipeline/README.md Co-authored-by: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> --- website/docs/Pipelines/Optimus_Pipeline/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/Pipelines/Optimus_Pipeline/README.md b/website/docs/Pipelines/Optimus_Pipeline/README.md index 50067739ce..a89debfa34 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/README.md +++ b/website/docs/Pipelines/Optimus_Pipeline/README.md @@ -255,7 +255,7 @@ The following table lists the output files produced from the pipeline. For sampl | cell_metrics | `.cell-metrics.csv.gz` | Matrix of metrics by cells. | Compressed CSV | | gene_metrics | `.gene-metrics.csv.gz` | Matrix of metrics by genes. | Compressed CSV | | aligner_metrics | `.star_metrics.tar` | Tarred metrics files produced by the STARsolo aligner; contains align features, cell reads, summary, and UMI per cell metrics files. | TXT | -| library_metrics | `_library_metrics.csv` | CSV file with all library-level metrics calculated from STARsolo for gene expression data. | CSV | +| library_metrics | `_library_metrics.csv` | Optional CSV file containing all library-level metrics calculated with STARsolo for gene expression data. | CSV | | multimappers_EM_matrix | `UniqueAndMult-EM.mtx` | Optional output produced when `soloMultiMappers` is "EM"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | | multimappers_Uniform_matrix | `UniqueAndMult-Uniform.mtx` | Optional output produced when `soloMultiMappers` is "Uniform"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | | multimappers_Rescue_matrix | `UniqueAndMult-Rescue.mtx` | Optional output produced when `soloMultiMappers` is "Rescue"; see STARsolo [documentation](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#multi-gene-reads) for more information. | MTX | From 957c80e8d1c89e023984b2f222fe6010f0eb1bdc Mon Sep 17 00:00:00 2001 From: ekiernan <55763654+ekiernan@users.noreply.github.com> Date: Thu, 29 Feb 2024 13:33:17 -0500 Subject: [PATCH 54/68] Update website/docs/Pipelines/PairedTag_Pipeline/README.md Co-authored-by: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> --- website/docs/Pipelines/PairedTag_Pipeline/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/Pipelines/PairedTag_Pipeline/README.md b/website/docs/Pipelines/PairedTag_Pipeline/README.md index 69d1cac9b6..7817ef536b 100644 --- a/website/docs/Pipelines/PairedTag_Pipeline/README.md +++ b/website/docs/Pipelines/PairedTag_Pipeline/README.md @@ -117,7 +117,7 @@ The Paired-Tag workflow calls two WARP subworkflows and an additional task which | gene_metrics_gex | `_gex.gene_metrics.csv.gz` | CSV file containing the per-gene metrics. | | cell_calls_gex | `_gex.emptyDrops` | TSV file containing the EmptyDrops results when the Optimus workflow is run in sc_rna mode. | | h5ad_output_file_gex | `_gex.h5ad` | h5ad (Anndata) file containing the raw cell-by-gene count matrix, gene metrics, cell metrics, and global attributes. See the [Optimus Count Matrix Overview](../Optimus_Pipeline/Loom_schema.md) for more details. | -| library_metrics | `_library_metrics.csv` | CSV file with all library-level metrics calculated from STARsolo for gene expression data. | +| library_metrics | `_library_metrics.csv` | Optional CSV file containing all library-level metrics calculated with STARsolo for gene expression data. | ## Versioning and testing From 1040c19d8f8f751357d9584a95a7a2b17d3ab28d Mon Sep 17 00:00:00 2001 From: ekiernan Date: Thu, 29 Feb 2024 13:36:48 -0500 Subject: [PATCH 55/68] doc updates --- website/docs/Pipelines/SlideSeq_Pipeline/README.md | 2 +- .../Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md | 2 +- .../multi_snss2.methods.md | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/README.md b/website/docs/Pipelines/SlideSeq_Pipeline/README.md index f0132571db..46d1d9a1b4 100644 --- a/website/docs/Pipelines/SlideSeq_Pipeline/README.md +++ b/website/docs/Pipelines/SlideSeq_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/SlideSeq_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [SlideSeq v3.1.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | +| [SlideSeq v3.1.1](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | ![SlideSeq_diagram](./slide-seq_diagram.png) diff --git a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md index 4cb42c4cf6..f1d35d3611 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [MultiSampleSmartSeq2SingleNuclei_v1.3.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [MultiSampleSmartSeq2SingleNuclei_v1.3.1](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![](./snSS2.png) diff --git a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md index 8ab56b15bd..a758e085cb 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md +++ b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/multi_snss2.methods.md @@ -2,13 +2,13 @@ sidebar_position: 2 --- -# Smart-seq2 Single Nucleus Multi-Sample v1.3.0 Publication Methods +# Smart-seq2 Single Nucleus Multi-Sample v1.3.1 Publication Methods Below we provide an example methods section for a publication. For the complete pipeline documentation, see the [Smart-seq2 Single Nucleus Multi-Sample Overview](./README.md). ## Methods -Data preprocessing and count matrix construction for a batch (or plate) were performed using the Smart-seq2 Single Nucleus Multi-Sample v1.3.0 Pipeline (RRID:SCR_021312) as well as Picard v.2.26.10 with default tool parameters unless otherwise specified. Genomic references are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/mm10/v0/single_nucleus?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in the [example workflow configuration](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_nucleus_multisample/mouse_example.json) in GitHub. +Data preprocessing and count matrix construction for a batch (or plate) were performed using the Smart-seq2 Single Nucleus Multi-Sample v1.3.1 Pipeline (RRID:SCR_021312) as well as Picard v.2.26.10 with default tool parameters unless otherwise specified. Genomic references are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/mm10/v0/single_nucleus?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in the [example workflow configuration](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_nucleus_multisample/mouse_example.json) in GitHub. For each nucleus in the batch, paired-end FASTQ files were first trimmed to remove adapters using the fastq-mcf tool with a subsampling parameter of 200,000 reads. The trimmed FASTQ files were then aligned to the GENCODE GRCm38 mouse genome using STAR v.2.7.10a. To count the number of reads per gene, but not isoforms, the quantMode parameter was set to GeneCounts. Multi-mapped reads, and optical and PCR duplicates, were removed from the resulting aligned BAM using the Picard MarkDuplicates tool with REMOVE_DUPLICATES = true. Metrics were collected on the deduplicated BAM using Picard CollectMultipleMetrics with VALIDATION_STRINGENCY =SILENT. From d3f01ceee7e0e2852714a12dc6b1df405933f1e2 Mon Sep 17 00:00:00 2001 From: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> Date: Thu, 29 Feb 2024 15:29:26 -0500 Subject: [PATCH 56/68] Km doc updates and add preprint (#1219) * update pipeline docs * Update README.md * add citation to welcome page * add citation to pipeline docs * add citation to repo README * Update README.md --- README.md | 6 ++++++ website/docs/Pipelines/ATAC/README.md | 10 ++++++++-- .../Pipelines/BuildIndices_Pipeline/README.md | 6 ++++++ .../CEMBA_MethylC_Seq_Pipeline/README.md | 8 +++++++- .../README.md | 6 ++++++ .../exome.methods.md | 2 +- .../README.md | 6 ++++++ .../Illumina_genotyping_array_spec.md | 2 +- .../README.md | 8 +++++++- .../Pipelines/Imputation_Pipeline/README.md | 8 +++++++- .../docs/Pipelines/JointGenotyping/README.md | 6 ++++++ .../Pipelines/Multiome_Pipeline/README.md | 8 +++++++- .../docs/Pipelines/Optimus_Pipeline/README.md | 9 +++++++-- .../Pipelines/PairedTag_Pipeline/README.md | 6 ++++++ .../RNA_with_UMIs_Pipeline/README.md | 8 ++++++-- .../rna-with-umis.methods.md | 8 ++++---- .../Single_Cell_ATAC_Seq_Pipeline/README.md | 8 +++++++- .../Pipelines/SlideSeq_Pipeline/README.md | 8 +++++++- .../README.md | 8 +++++++- .../README.md | 8 ++++++-- .../README.md | 20 ++++++++++++------- .../README.md | 6 +++++- .../README.md | 6 ++++++ website/docs/Pipelines/snM3C/README.md | 8 +++++++- website/docs/get-started.md | 2 +- 25 files changed, 150 insertions(+), 31 deletions(-) diff --git a/README.md b/README.md index 94329bce04..9f1d5d8091 100644 --- a/README.md +++ b/README.md @@ -17,4 +17,10 @@ Read more about our pipelines and repository on the [WARP documentation site](ht To contribute to WARP, please read the [contribution guidelines](https://broadinstitute.github.io/warp/docs/contribution/README). +### Citing WARP + +When citing WARP, please use the following: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + [![Build Status](https://img.shields.io/github/workflow/status/broadinstitute/warp/Deploy%20WARP%20Website?label=Website&logo=github&style=flat-square)](https://github.com/broadinstitute/warp/actions?query=workflow%3A%22Deploy+WARP+Website%22) diff --git a/website/docs/Pipelines/ATAC/README.md b/website/docs/Pipelines/ATAC/README.md index e5a780f719..4f0750f35d 100644 --- a/website/docs/Pipelines/ATAC/README.md +++ b/website/docs/Pipelines/ATAC/README.md @@ -8,7 +8,6 @@ slug: /Pipelines/ATAC/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | - | [1.1.8](https://github.com/broadinstitute/warp/releases) | January, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | @@ -97,9 +96,16 @@ To see specific tool parameters, select the task WDL link in the table; then vie All ATAC pipeline releases are documented in the [ATAC changelog](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/multiome/atac.changelog.md) and tested using [plumbing and scientific test data](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/multiome/test_inputs). To learn more about WARP pipeline testing, see [Testing Pipelines](https://broadinstitute.github.io/warp/docs/About_WARP/TestingPipelines). ## Citing the ATAC Pipeline -Please identify the pipeline in your methods section using the ATAC Pipeline's [SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_024656/resolver?q=SCR_024656%2A&l=SCR_024656%2A&i=rrid:scr_024656). + +If you use the ATAC Pipeline in your research, please identify the pipeline in your methods section using the [ATAC SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_024656/resolver?q=SCR_024656%2A&l=SCR_024656%2A&i=rrid:scr_024656). + * Ex: *ATAC Pipeline (RRID:SCR_024656)* +Please also consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + + ## Acknowledgements We are immensely grateful to the members of the BRAIN Initiative (BICAN Sequencing Working Group) and SCORCH for their invaluable and exceptional contributions to this pipeline. Our heartfelt appreciation goes to Alex Dobin, Aparna Bhaduri, Alec Wysoker, Anish Chakka, Brian Herb, Daofeng Li, Fenna Krienen, Guo-Long Zuo, Jeff Goldy, Kai Zhang, Khalid Shakir, Bo Li, Mariano Gabitto, Michael DeBerardine, Mengyi Song, Melissa Goldman, Nelson Johansen, James Nemesh, and Theresa Hodges for their unwavering dedication and remarkable efforts. diff --git a/website/docs/Pipelines/BuildIndices_Pipeline/README.md b/website/docs/Pipelines/BuildIndices_Pipeline/README.md index fc328379aa..0d0431edc4 100644 --- a/website/docs/Pipelines/BuildIndices_Pipeline/README.md +++ b/website/docs/Pipelines/BuildIndices_Pipeline/README.md @@ -112,6 +112,12 @@ The following table lists the output variables and files produced by the pipelin All BuildIndices pipeline releases are documented in the [BuildIndices changelog](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/build_indices/BuildIndices.changelog.md) and tested manually using [reference JSON files](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/build_indices). +## Citing the BuildIndices Pipeline + +If you use the BuildIndices Pipeline in your research, please consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Consortia support This pipeline is supported by the [BRAIN Initiative](https://braininitiative.nih.gov/) (BICCN and BICAN). diff --git a/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/README.md b/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/README.md index af51b43992..5dac529f8d 100644 --- a/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/README.md +++ b/website/docs/Pipelines/CEMBA_MethylC_Seq_Pipeline/README.md @@ -178,9 +178,15 @@ The table below details the pipeline outputs. **If using multiplexed samples, th All CEMBA pipeline releases are documented in the [CEMBA changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/cemba/cemba_methylcseq/CEMBA.changelog.md). ## Citing the CEMBA Pipeline -Please identify the pipeline in your methods section using the CEMBA Pipeline's [SciCrunch resource identifier](https://scicrunch.org/scicrunch/Resources/record/nlx_144509-1/SCR_021219/resolver?q=CEMBA&l=CEMBA). + +If you use the CEMBA Pipeline in your research, please identify the pipeline in your methods section using the [CEMBA SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_021219/resolver?q=SCR_021219&l=SCR_021219&i=rrid:scr_021219). + * Ex: *CEMBA MethylC Seq Pipeline (RRID:SCR_021219)* +Please also consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Consortia Support This pipeline is supported and used by the [BRAIN Initiative Cell Census Network](https://biccn.org/) (BICCN). diff --git a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md index 5640900abc..10582bdb6d 100644 --- a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md @@ -130,6 +130,12 @@ view the following tutorial [(How to) Execute Workflows from the gatk-workflows - Please visit the [GATK Technical Documentation](https://gatk.broadinstitute.org/hc/en-us/categories/360002310591) site for further documentation on our workflows and tools. - You can access relevant reference and resource bundles in the [GATK Resource Bundle](https://gatk.broadinstitute.org/hc/en-us/articles/360035890811). +## Citing the Exome Germline Single Sample Pipeline + +If you use the Exome Germline Single Sample Pipeline in your research, please consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Contact Us This material is provided by the Data Science Platform group at the Broad Institute. Please direct any questions or concerns to one of our forum sites : [GATK](https://gatk.broadinstitute.org/hc/en-us/community/topics) or [Terra](https://support.terra.bio/hc/en-us/community/topics/360000500432). diff --git a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md index a09c96719b..28b0ee4fcd 100644 --- a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md +++ b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md @@ -2,7 +2,7 @@ sidebar_position: 2 --- -# Exome Germline Single Sample v3.1.17 Methods +# Exome Germline Single Sample v3.1.18 Methods The following contains a detailed methods description outlining the pipeline’s process, software, and tools that can be modified for a publication methods section. diff --git a/website/docs/Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/README.md b/website/docs/Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/README.md index cb0ee1be99..d3151c4060 100644 --- a/website/docs/Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/README.md +++ b/website/docs/Pipelines/Genomic_Data_Commons_Whole_Genome_Somatic/README.md @@ -114,6 +114,12 @@ Alternatively, Cromwell allows you to specify an output directory using an optio - Runtime parameters are optimized for Broad's Google Cloud Platform implementation. - Please visit the [GATK Technical Documentation](https://gatk.broadinstitute.org/hc/en-us/categories/360002310591) site for further documentation on GATK-related workflows and tools. +## Citing the GDC Pipeline + +If you use the GDC Pipeline in your research, please consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Contact us Please help us make our tools better by contacting [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) for pipeline-related suggestions or questions. diff --git a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md index 1462c4fdb7..566e3722b7 100644 --- a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md +++ b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md @@ -4,7 +4,7 @@ sidebar_position: 2 # VCF Overview: Illumina Genotyping Array -The [Illumina Genotyping Array Pipeline](https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl) v1.12.15 pipeline produces a VCF (Variant Call Format) output with data processing and sample-specific genotype information. The VCF follows the format listed in the [VCF 4.2 specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf), but additionally contains fields and attributes that are unique to the Arrays pipeline. +The [Illumina Genotyping Array Pipeline](https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl) v1.12.16 pipeline produces a VCF (Variant Call Format) output with data processing and sample-specific genotype information. The VCF follows the format listed in the [VCF 4.2 specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf), but additionally contains fields and attributes that are unique to the Arrays pipeline. This document describes the Array pipeline’s unique VCF fields and attributes that are not listed in the standard VCF specification. To learn more about the pipeline, see the [Illumina Genotyping Array Pipeline Overview](./README.md). diff --git a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md index 8eb9bed3b0..c5127827d0 100644 --- a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md +++ b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Illumina_Genotyping_Arrays_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [Version 1.12.15](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [Version 1.12.16](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![The Illumina Genotyping Array Pipeline](./IlluminaGenotyping.png) @@ -237,6 +237,12 @@ All Illumina Genotyping Array workflow releases are documented in the [workflow The Illumina Genotyping Array Pipeline is available on the cloud-based platform [Terra](https://app.terra.bio). If you have a Terra account, you can access the Featured Workspace using this address: `https://app.terra.bio/#workspaces/warp-pipelines/Illumina-Genotyping-Array`. The workspace is preloaded with instructions and sample data. For more information on using the Terra platform, please view the [Support Center](https://support.terra.bio/hc/en-us). +## Citing the Illumina Genotyping Array Pipeline + +If you use the Illumina Genotyping Array Pipeline in your research, please consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Feedback and questions Please help us make our tools better by contacting [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) for pipeline-related suggestions or questions. diff --git a/website/docs/Pipelines/Imputation_Pipeline/README.md b/website/docs/Pipelines/Imputation_Pipeline/README.md index 4743d3c1af..4c8faa68cc 100644 --- a/website/docs/Pipelines/Imputation_Pipeline/README.md +++ b/website/docs/Pipelines/Imputation_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Imputation_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [Imputation_v1.1.11](https://github.com/broadinstitute/warp/releases?q=Imputation_v1.0.0&expanded=true) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [Imputation_v1.1.12](https://github.com/broadinstitute/warp/releases?q=Imputation_v1.0.0&expanded=true) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the Imputation pipeline The Imputation pipeline imputes missing genotypes from either a multi-sample VCF or an array of single sample VCFs using a large genomic reference panel. It is based on the [Michigan Imputation Server pipeline](https://imputationserver.readthedocs.io/en/latest/pipeline/). Overall, the pipeline filters, phases, and performs imputation on a multi-sample VCF. It outputs the imputed VCF along with key imputation metrics. @@ -138,6 +138,12 @@ The pipeline is cost-optimized for between 100 and 1,000 samples, where the cost | 100 | 0.11 | | 1000 | 0.024 | | 13.5 K | 0.025 | + +## Citing the Imputation Pipeline + +If you use the Imputation Pipeline in your research, please consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 ## Contact us diff --git a/website/docs/Pipelines/JointGenotyping/README.md b/website/docs/Pipelines/JointGenotyping/README.md index 27748df1f6..aa9eb7af3b 100644 --- a/website/docs/Pipelines/JointGenotyping/README.md +++ b/website/docs/Pipelines/JointGenotyping/README.md @@ -238,6 +238,12 @@ The following table lists the output variables and files produced by the pipelin All JointGenotyping pipeline releases are documented in the [JointGenotyping changelog](https://github.com/broadinstitute/warp/blob/master/pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.changelog.md) and tested using [plumbing and scientific test data](https://github.com/broadinstitute/warp/blob/master/pipelines/broad/dna_seq/germline/joint_genotyping/test_data_overview.md). To learn more about WARP pipeline testing, see [Testing Pipelines](https://broadinstitute.github.io/warp/docs/About_WARP/TestingPipelines). +## Citing the JointGenotyping Pipeline + +If you use the JointGenotyping Pipeline in your research, please consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Feedback Please help us make our tools better by contacting the [WARP Pipelines Team](mailto:warp-pipelines-help@broadinstitute.org) for pipeline-related suggestions or questions. \ No newline at end of file diff --git a/website/docs/Pipelines/Multiome_Pipeline/README.md b/website/docs/Pipelines/Multiome_Pipeline/README.md index 354c951a5e..3409347d3f 100644 --- a/website/docs/Pipelines/Multiome_Pipeline/README.md +++ b/website/docs/Pipelines/Multiome_Pipeline/README.md @@ -136,9 +136,15 @@ The Multiome workflow calls two WARP subworkflows, one external subworkflow (opt All Multiome pipeline releases are documented in the [Multiome changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/multiome/Multiome.changelog.md) and tested using [plumbing and scientific test data](https://github.com/broadinstitute/warp/tree/master/pipelines/skylab/multiome/test_inputs). To learn more about WARP pipeline testing, see [Testing Pipelines](https://broadinstitute.github.io/warp/docs/About_WARP/TestingPipelines). ## Citing the Multiome Pipeline -Please identify the pipeline in your methods section using the Multiome Pipeline's [SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_024217/resolver?q=SCR_024217&l=SCR_024217&i=rrid:scr_024217). + +If you use the Multiome Pipeline in your research, please identify the pipeline in your methods section using the [Multiome SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_024217/resolver?q=SCR_024217&l=SCR_024217&i=rrid:scr_024217). + * Ex: *Multiome Pipeline (RRID:SCR_024217)* +Please also consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Consortia support This pipeline is supported by the [BRAIN Initiative](https://braininitiative.nih.gov/) (BICCN and BICAN). diff --git a/website/docs/Pipelines/Optimus_Pipeline/README.md b/website/docs/Pipelines/Optimus_Pipeline/README.md index 54a4cf43fd..382804e447 100644 --- a/website/docs/Pipelines/Optimus_Pipeline/README.md +++ b/website/docs/Pipelines/Optimus_Pipeline/README.md @@ -284,11 +284,16 @@ Optimus has been validated for processing both human and mouse single-cell and s All Optimus pipeline releases are documented in the [Optimus changelog](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/optimus/Optimus.changelog.md). +## Citing the Optimus Pipeline + +If you use the Optimus Pipeline in your research, please identify the pipeline in your methods section using the [Optimus SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_018908/resolver?q=SCR_018908&l=SCR_018908&i=rrid:scr_018908). -## Citing the Optimus pipeline -Please identify the pipeline in your methods section using the Optimus Pipeline's [SciCrunch resource identifier](https://scicrunch.org/scicrunch/Resources/record/nlx_144509-1/SCR_018908/resolver?q=SCR_018908&l=SCR_018908). * Ex: *Optimus Pipeline (RRID:SCR_018908)* +Please also consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Consortia support This pipeline is supported and used by the [Human Cell Atlas](https://www.humancellatlas.org/) (HCA) project and the [BRAIN Initiative Cell Census Network](https://biccn.org/) (BICCN). diff --git a/website/docs/Pipelines/PairedTag_Pipeline/README.md b/website/docs/Pipelines/PairedTag_Pipeline/README.md index 4114516ba3..cc0114a766 100644 --- a/website/docs/Pipelines/PairedTag_Pipeline/README.md +++ b/website/docs/Pipelines/PairedTag_Pipeline/README.md @@ -123,6 +123,12 @@ The Paired-Tag workflow calls two WARP subworkflows and an additional task which All Paired-Tag pipeline releases are documented in the [Paired-Tag changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/paired_tag/PairedTag.wdl) and tested using [plumbing and scientific test data](https://github.com/broadinstitute/warp/tree/develop/pipelines/skylab/paired_tag/test_inputs). To learn more about WARP pipeline testing, see [Testing Pipelines](https://broadinstitute.github.io/warp/docs/About_WARP/TestingPipelines). Note that paired-tag tests are still in development. +## Citing the Paired-Tag Pipeline + +If you use the Paired-Tag Pipeline in your research, please consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Consortia support This pipeline is supported by the [BRAIN Initiative](https://braininitiative.nih.gov/) (BICCN and BICAN). diff --git a/website/docs/Pipelines/RNA_with_UMIs_Pipeline/README.md b/website/docs/Pipelines/RNA_with_UMIs_Pipeline/README.md index c407efd7f4..2c7b1f08ca 100644 --- a/website/docs/Pipelines/RNA_with_UMIs_Pipeline/README.md +++ b/website/docs/Pipelines/RNA_with_UMIs_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/RNA_with_UMIs_Pipeline/README | Pipeline Version | Date Updated | Documentation Authors | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [RNAWithUMIsPipeline_v1.0.15](https://github.com/broadinstitute/warp/releases?q=RNAwithUMIs&expanded=true) | December, 2023 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) & [Kaylee Mathews](mailto:kmathews@broadinstitute.org)| Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [RNAWithUMIsPipeline_v1.0.16](https://github.com/broadinstitute/warp/releases?q=RNAwithUMIs&expanded=true) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) & [Kaylee Mathews](mailto:kmathews@broadinstitute.org)| Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![RNAWithUMIs_diagram](rna-with-umis_diagram.png) @@ -266,7 +266,11 @@ Workflow outputs are described in the table below. All RNA with UMIs pipeline releases are documented in the [pipeline changelog](https://github.com/broadinstitute/warp/blob/master/pipelines/broad/rna_seq/RNAWithUMIsPipeline.changelog.md). - +## Citing the RNA with UMIs Pipeline + +If you use the RNA with UMIs Pipeline in your research, please consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 ## Feedback diff --git a/website/docs/Pipelines/RNA_with_UMIs_Pipeline/rna-with-umis.methods.md b/website/docs/Pipelines/RNA_with_UMIs_Pipeline/rna-with-umis.methods.md index 7d29774ea1..856690ea74 100644 --- a/website/docs/Pipelines/RNA_with_UMIs_Pipeline/rna-with-umis.methods.md +++ b/website/docs/Pipelines/RNA_with_UMIs_Pipeline/rna-with-umis.methods.md @@ -2,13 +2,13 @@ sidebar_position: 2 --- -# RNA with UMIs v1.0.15 Methods +# RNA with UMIs v1.0.16 Methods Below we provide an example methods section for publications using the RNA with UMIs pipeline. For the complete pipeline documentation, see the [RNA with UMIs Overview](./README.md). ## Methods -Data preprocessing, gene counting, and metric calculation were performed using the RNA with UMIs v1.0.6 pipeline, which uses Picard, fgbio v1.4.0, fastp v0.20.1, FastQC v0.11.9, STAR v2.7.10a, Samtools v1.11, UMI-tools v1.1.1, GATK, and RNA-SeQC v2.4.2 with default tool parameters unless otherwise specified. Reference files are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references;tab=objects?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in [example configuration files](https://github.com/broadinstitute/warp/tree/develop/pipelines/broad/rna_seq/test_inputs) in the in the WARP repository. +Data preprocessing, gene counting, and metric calculation were performed using the RNA with UMIs v1.0.16 pipeline, which uses Picard, fgbio v1.4.0, fastp v0.20.1, FastQC v0.11.9, STAR v2.7.10a, Samtools v1.11, UMI-tools v1.1.1, GATK 4.5.0.0, and RNA-SeQC v2.4.2 with default tool parameters unless otherwise specified. Reference files are publicly available in the [Broad References](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references;tab=objects?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) Google Bucket and are also listed in [example configuration files](https://github.com/broadinstitute/warp/tree/develop/pipelines/broad/rna_seq/test_inputs) in the in the WARP repository. Paired-end FASTQ files were first converted to an unmapped BAM (uBAM) using Picard's (v3.0.0) FastqToSam tool with SORT_ORDER = unsorted. (If a read group unmapped BAM file is used as input for the pipeline, this step is skipped.) Unique molecular identifiers (UMIs) were extracted from the uBAM using fgbio's ExtractUmisFromBam and stored in the RX read tag. @@ -16,9 +16,9 @@ After the extraction of UMIs, reads that failed quality control checks performed Reads were aligned using STAR to the GRCh38 (hg38) reference with HLA, ALT, and decoy contigs removed with gene annotations from GENCODE v34 (or GRCh37 [hg19] with gene annotations from GENCODE v19). The --readFilesType and --readFilesCommand parameters were set to "SAM PE" and "samtools view -h", respectively, to indicate that the input was a BAM file. To specify that the output was an unsorted BAM that included unmapped reads, --outSAMtype was set to "BAM Unsorted" and --outSAMunmapped was set to "Within". A transcriptome-aligned BAM was also output with --quantMode = TranscriptomeSAM. To match [ENCODE bulk RNA-seq data standards](https://www.encodeproject.org/data-standards/rna-seq/long-rnas/), the alignment was performed with parameters --outFilterType = BySJout, --outFilterMultimapNmax = 20, --outFilterMismatchNmax = 999, --alignIntronMin = 20, --alignIntronMax = 1000000, --alignMatesGapMax = 1000000, --alignSJoverhangMin = 8, and --alignSJDBoverhangMin = 1. The fraction of reads required to match the reference was set with --outFilterMatchNminOverLread = 0.33 and the fraction of allowable mismatches to read length was set with --outFilterMismatchNoverLmax = 0.1. Chimeric alignments were included with --chimSegmentMin = 15, where 15 was the minimum length of each segment, and --chimMainSegmentMultNmax = 1 to prevent main chimeric segments from mapping to multiple sites. To output chimeric segments with soft-clipping in the aligned BAM, --chimOutType was set to "WithinBAM SoftClip". A maximum of 20 protruding bases at the ends of alignments was allowed with --alignEndsProtrude set to "20 ConcordantPair" to prevent reads from small cDNA fragments that were sequenced into adapters from being dropped. -Following alignment, both BAM files were sorted by coordinate with Picard's (v2.6.11) SortSam tool. UMI-tools was then used to further divide putative duplicates into subgroups based on UMI and sequencing errors in UMIs were corrected. To specify the tag where the UMIs were stored, --extract-umi-method was set to "tag" and --umi-tag was set to "RX". Unmapped reads were included in the output file with --unmapped-reads = use. Tagged BAM files were output using the option --output-bam. SortSam was used again to sort the BAM files by queryname for Picard's (v2.26.11) MarkDuplicates tool. MarkDuplicates was used to mark PCR duplicates and calculate duplicate metrics. After duplicate marking, BAM files were sorted by coordiante using SortSam to facilitate downstream analysis. The transcriptome-aligned, duplicate-marked BAM was sorted and postprocessed using GATK's (v4.2.6.0) PostProcessReadsForRSEM tool for compatability with RSEM. +Following alignment, both BAM files were sorted by coordinate with Picard's (v2.6.11) SortSam tool. UMI-tools was then used to further divide putative duplicates into subgroups based on UMI and sequencing errors in UMIs were corrected. To specify the tag where the UMIs were stored, --extract-umi-method was set to "tag" and --umi-tag was set to "RX". Unmapped reads were included in the output file with --unmapped-reads = use. Tagged BAM files were output using the option --output-bam. SortSam was used again to sort the BAM files by queryname for Picard's (v2.26.11) MarkDuplicates tool. MarkDuplicates was used to mark PCR duplicates and calculate duplicate metrics. After duplicate marking, BAM files were sorted by coordiante using SortSam to facilitate downstream analysis. The transcriptome-aligned, duplicate-marked BAM was sorted and postprocessed using GATK's PostProcessReadsForRSEM tool for compatability with RSEM. -The genome-aligned, duplicate-marked BAM file was then used to calculate summary metrics using RNASeQC, Picard's (v2.26.11) CollectRNASeqMetrics and (v3.0.0) CollectMultipleMetrics tools, and GATK's (v4.3.0.0) GetPileupSummaries and CalculateContamination tools. CollectMultipleMetrics was used with the programs “CollectInsertSizeMetrics” and “CollectAlignmentSummaryMetrics”. GetPileupSummaries was run with the read filters, "WellformedReadFilter" and "MappingQualityAvailableReadFilter" disabled. +The genome-aligned, duplicate-marked BAM file was then used to calculate summary metrics using RNASeQC, Picard's (v2.26.11) CollectRNASeqMetrics and (v3.0.0) CollectMultipleMetrics tools, and GATK's GetPileupSummaries and CalculateContamination tools. CollectMultipleMetrics was used with the programs “CollectInsertSizeMetrics” and “CollectAlignmentSummaryMetrics”. GetPileupSummaries was run with the read filters, "WellformedReadFilter" and "MappingQualityAvailableReadFilter" disabled. The final outputs of the RNA with UMIs pipeline included metrics generated before alignment with FastQC, a transcriptome-aligned, duplicate-marked BAM file with duplication metrics, and a genome-aligned, duplicate-marked BAM file with corresponding index, duplication metrics, and metrics generated with RNASeQC, Picard, and GATK tools. diff --git a/website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/README.md b/website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/README.md index d61e98fcb6..038463eb60 100644 --- a/website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/README.md +++ b/website/docs/Pipelines/Single_Cell_ATAC_Seq_Pipeline/README.md @@ -157,9 +157,15 @@ The following table details the metrics available in the output_snap_qc file. All scATAC workflow releases are documented in the [scATAC changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/scATAC/scATAC.changelog.md). ## Citing the scATAC Pipeline -Please identify the pipeline in your methods section using the scATAC Pipeline's [SciCrunch resource identifier](https://scicrunch.org/scicrunch/Resources/record/nlx_144509-1/SCR_018919/resolver?q=SCR_018919&l=SCR_018919). + +If you use the scATAC Pipeline in your research, please identify the pipeline in your methods section using the [scATAC SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_018919/resolver?q=SCR_018919&l=SCR_018919&i=rrid:scr_018919). + * Ex: *scATAC Pipeline (RRID:SCR_018919)* +Please also consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Consortia Support This pipeline is supported and used by the [BRAIN Initiative Cell Census Network](https://biccn.org/) (BICCN). diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/README.md b/website/docs/Pipelines/SlideSeq_Pipeline/README.md index f0132571db..0b59323acf 100644 --- a/website/docs/Pipelines/SlideSeq_Pipeline/README.md +++ b/website/docs/Pipelines/SlideSeq_Pipeline/README.md @@ -221,9 +221,15 @@ All Slide-seq pipeline releases are documented in the [Slide-seq changelog](http ## Citing the Slide-seq Pipeline -Please identify the pipeline in your methods section using the Slide-seq Pipeline's [SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_023379/resolver?q=%22Slide-seq%22&l=%22Slide-seq%22&i=rrid:scr_023379). + +If you use the Slide-seq Pipeline in your research, please identify the pipeline in your methods section using the [Slide-seq SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_023379/resolver?q=%22Slide-seq%22&l=%22Slide-seq%22&i=rrid:scr_023379). + * Ex: *Slide-seq Pipeline (RRID:SCR_023379)* +Please also consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Consortia support This pipeline is supported by the [BRAIN Initiative Cell Census Network](https://biccn.org/) (BICCN) and BRAIN Initiative Cell Atlas Network (BICAN). diff --git a/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/README.md b/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/README.md index 1f069a419d..1a6368c014 100644 --- a/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Smart-seq2_Multi_Sample_Pipeline/README.md @@ -98,9 +98,15 @@ The Multi-SS2 Pipeline has been validated for processing human and mouse, strand Release information for the Multi-SS2 Pipeline can be found in the [changelog](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_multisample/MultiSampleSmartSeq2.changelog.md). Please note that any major changes to the Smart-seq2 pipeline will be documented in the [Smart-seq2 Single Sample changelog](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_sample/SmartSeq2SingleSample.changelog.md). ## Citing the Smart-seq2 Multi-Sample Pipeline -Please identify the pipeline in your methods section using the Smart-seq2 Multi-Sample Pipeline's [SciCrunch resource identifier](https://scicrunch.org/scicrunch/Resources/record/nlx_144509-1/SCR_018920/resolver?q=Smart-seq2&l=Smart-seq2). + +If you use the Smart-seq2 Multi-Sample Pipeline in your research, please identify the pipeline in your methods section using the [Smart-seq2 Multi-Sample SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_018920/resolver?q=SCR_018920&l=SCR_018920&i=rrid:scr_018920). + * Ex: *Smart-seq2 Multi-Sample Pipeline (RRID:SCR_018920)* +Please also consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Consortia Support This pipeline is supported and used by the [Human Cell Atlas](https://www.humancellatlas.org/) (HCA) project. diff --git a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md index 4cb42c4cf6..09acab0beb 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Smart-seq2_Single_Nucleus_Multi_Sample_Pipeline/README.md @@ -177,10 +177,14 @@ The Multi-snSS2 pipeline was scientifically validated by the BRAIN Initiatives C All Multi-snSS2 release notes are documented in the [Multi-snSS2 changelog](https://github.com/broadinstitute/warp/blob/master/pipelines/skylab/smartseq2_single_nucleus_multisample/MultiSampleSmartSeq2SingleNucleus.changelog.md). ## Citing the Multi-snSS2 Pipeline -To cite the Multi-snSS2 pipeline, use the [SciCrunch resource identifier](https://scicrunch.org/scicrunch/Resources/record/nlx_144509-1/SCR_021312/resolver). + +If you use the Multi-snSS2 Pipeline in your research, please identify the pipeline in your methods section using the [Multi-snSS2 SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_021312/resolver?q=SCR_021312&l=SCR_021312&i=rrid:scr_021312). + * Ex: *Smart-seq2 Single Nucleus Multi-Sample Pipeline (RRID:SCR_021312)* -To view an example of this citation as well as a publication-style methods section, see the Multi-snSS2 [Example Methods](./multi_snss2.methods.md). +Please also consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 ## Consortia Support This pipeline is supported and used by the [BRAIN Initiative Cell Census Network](https://biccn.org/) (BICCN). diff --git a/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/README.md b/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/README.md index 080b79071f..214484949e 100644 --- a/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Smart-seq2_Single_Sample_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Smart-seq2_Single_Sample_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [smartseq2_v5.1.1](https://github.com/broadinstitute/warp/releases) | December, 2020 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [smartseq2_v5.1.20](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![](./smartseq_image.png) @@ -33,7 +33,7 @@ Check out the [Smart-seq2 Publication Methods](../Smart-seq2_Multi_Sample_Pipeli | Genomic Reference Sequence (for validation)| GRCh38 human genome primary sequence and M21 (GRCm38.p6) mouse genome primary sequence | GENCODE [human reference files](https://www.gencodegenes.org/human/release_27.html) and [mouse reference files](https://www.gencodegenes.org/mouse/release_M21.html) | Transcriptomic Reference Annotation (for validation) | V27 GENCODE human transcriptome and M21 mouse transcriptome | GENCODE [human GTF](ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.annotation.gtf.gz) and [mouse GTF](ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M21/gencode.vM21.annotation.gff3.gz) | | Aligner | HISAT2 (v.2.1.0) | [Kim, et al.,2019](https://www.nature.com/articles/s41587-019-0201-4) | -| QC Metrics | Picard (v.2.10.10) | [Broad Institute](https://broadinstitute.github.io/picard/) | +| QC Metrics | Picard (v.2.26.10) | [Broad Institute](https://broadinstitute.github.io/picard/) | | Transcript Quantification | Utilities for processing large-scale single cell datasets | [RSEM v.1.3.0](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-323) | Data Input File Format | File format in which sequencing data is provided | [FASTQ](https://academic.oup.com/nar/article/38/6/1767/3112533) | | Data Output File Formats | File formats in which Smart-seq2 output is provided | [BAM](http://samtools.github.io/hts-specs/), Loom (generated with [Loompy v.3.0.6)](http://loompy.org/), CSV (QC metrics and counts) | @@ -99,7 +99,7 @@ Overall, the workflow is divided into two parts that are completed after an init **Part 1: Quality Control Tasks** 1. Aligns reads to the genome with HISAT2 v.2.1.0 - 2. Calculates summary metrics from an aligned BAM using Picard v.2.10.10 + 2. Calculates summary metrics from an aligned BAM using Picard v.2.26.10 **Part 2: Transcriptome Quantification Tasks** 1. Aligns reads to the transcriptome with HISAT v.2.1.0 @@ -133,11 +133,11 @@ HISAT2 is a fast, cost-efficient alignment tool that can determine the presence The [Picard task](https://github.com/broadinstitute/warp/blob/master/tasks/skylab/Picard.wdl) generates QC metrics by using three sub-tasks: -* CollectMultipleMetrics: calls the [CollectMultipleMetrics](https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.0.0/picard_analysis_CollectMultipleMetrics.php) tool which uses the aligned BAM file and reference genome fasta to collect metrics on [alignment](http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics), [insert size](http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics), [GC bias](https://broadinstitute.github.io/picard/command-line-overview.html#CollectGcBiasMetrics), [base distribution by cycle](http://broadinstitute.github.io/picard/picard-metric-definitions.html#BaseDistributionByCycleMetrics), [quality score distribution](https://broadinstitute.github.io/picard/command-line-overview.html#QualityScoreDistribution), [quality distribution by cycle](https://broadinstitute.github.io/picard/command-line-overview.html#MeanQualityByCycle), [sequencing artifacts](http://broadinstitute.github.io/picard/picard-metric-definitions.html#ErrorSummaryMetrics), and [quality yield](http://broadinstitute.github.io/picard/picard-metric-definitions.html#CollectQualityYieldMetrics.QualityYieldMetrics). +* CollectMultipleMetrics: calls the [CollectMultipleMetrics](https://software.broadinstitute.org/gatk/documentation/tooldocs/4.2.6.1/picard_analysis_CollectMultipleMetrics.php) tool which uses the aligned BAM file and reference genome fasta to collect metrics on [alignment](http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics), [insert size](http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics), [GC bias](https://broadinstitute.github.io/picard/command-line-overview.html#CollectGcBiasMetrics), [base distribution by cycle](http://broadinstitute.github.io/picard/picard-metric-definitions.html#BaseDistributionByCycleMetrics), [quality score distribution](https://broadinstitute.github.io/picard/command-line-overview.html#QualityScoreDistribution), [quality distribution by cycle](https://broadinstitute.github.io/picard/command-line-overview.html#MeanQualityByCycle), [sequencing artifacts](http://broadinstitute.github.io/picard/picard-metric-definitions.html#ErrorSummaryMetrics), and [quality yield](http://broadinstitute.github.io/picard/picard-metric-definitions.html#CollectQualityYieldMetrics.QualityYieldMetrics). -* CollectRnaMetrics: calls the [CollectRnaSeqMetrics](https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.0.0/picard_analysis_CollectRnaSeqMetrics.php) tool which uses the aligned BAM, a RefFlat genome annotation file, and a ribosomal intervals file to produce RNA alignment metrics (metric descriptions are found in the [Picard Metrics Dictionary](http://broadinstitute.github.io/picard/picard-metric-definitions.html#RnaSeqMetrics)). +* CollectRnaMetrics: calls the [CollectRnaSeqMetrics](https://software.broadinstitute.org/gatk/documentation/tooldocs/4.2.6.1/picard_analysis_CollectRnaSeqMetrics.php) tool which uses the aligned BAM, a RefFlat genome annotation file, and a ribosomal intervals file to produce RNA alignment metrics (metric descriptions are found in the [Picard Metrics Dictionary](http://broadinstitute.github.io/picard/picard-metric-definitions.html#RnaSeqMetrics)). -* CollectDuplicationMetrics: calls the [MarkDuplicates](https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.4.0/picard_sam_markduplicates_MarkDuplicates.php) tool which uses the aligned BAM to identify duplicate reads (output metrics are listed in the [Picard Metrics Dictionary](http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics)). +* CollectDuplicationMetrics: calls the [MarkDuplicates](https://software.broadinstitute.org/gatk/documentation/tooldocs/4.2.6.1/picard_sam_markduplicates_MarkDuplicates.php) tool which uses the aligned BAM to identify duplicate reads (output metrics are listed in the [Picard Metrics Dictionary](http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics)). #### Part 2: Transcriptome Quantification Tasks @@ -211,9 +211,15 @@ The SS2 pipeline has been validated for processing human and mouse, stranded or All SS2 release notes are documented in the [Smartseq2 Single Sample changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/smartseq2_single_sample/SmartSeq2SingleSample.changelog.md). ## Citing the Smart-seq2 Single Sample Pipeline -Please identify the SS2 pipeline in your methods section using the Smart-seq2 Single Sample Pipeline's [SciCrunch resource identifier](https://scicrunch.org/browse/resourcedashboard). + +If you use the Smart-seq2 Single Sample Pipeline in your research, please identify the pipeline in your methods section using the [Smart-seq2 Single Sample SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_021228/resolver?q=SCR_021228&l=SCR_021228&i=rrid:scr_021228). + * Ex: *Smart-seq2 Single Sample Pipeline (RRID:SCR_021228)* +Please also consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Consortia Support This pipeline is supported and used by the [Human Cell Atlas](https://www.humancellatlas.org/) (HCA) project. diff --git a/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/README.md b/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/README.md index 90c81b2b95..923b419675 100644 --- a/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/README.md +++ b/website/docs/Pipelines/Ultima_Genomics_Whole_Genome_Germline_Pipeline/README.md @@ -272,7 +272,11 @@ The outputs of the UG_WGS workflow are not yet compatible with the WARP [Ultimat All UG_WGS pipeline releases are documented in the [pipeline changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/dna_seq/germline/single_sample/ugwgs/UltimaGenomicsWholeGenomeGermline.changelog.md). - +## Citing the UG_WGS Pipeline + +If you use the UG_WGS Pipeline in your research, please consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 ## Feedback diff --git a/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/README.md b/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/README.md index f34182b974..bec572c824 100644 --- a/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Whole_Genome_Germline_Single_Sample_Pipeline/README.md @@ -369,6 +369,12 @@ The final CRAM files have base quality scores binned according to the [Functiona - When the pipeline runs in the **dragen_functional_equivalence_mode**, it produces functionally equivalent outputs to the DRAGEN pipeline. - Additional information about the GATK tool parameters and the DRAGEN-GATK best practices pipeline can be found on the [GATK support site](https://gatk.broadinstitute.org/hc/en-us). +## Citing the WGS Pipeline + +If you use the WGS Pipeline in your research, please consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Contact us Please help us make our tools better by contacting [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) for pipeline-related suggestions or questions. diff --git a/website/docs/Pipelines/snM3C/README.md b/website/docs/Pipelines/snM3C/README.md index 22486f6cdd..99765c5c2b 100644 --- a/website/docs/Pipelines/snM3C/README.md +++ b/website/docs/Pipelines/snM3C/README.md @@ -6,7 +6,7 @@ slug: /Pipelines/snM3C/README | Pipeline Version | Date Updated | Documentation Authors | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [snM3C_v2.0.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Kaylee Mathews](mailto:warp-pipelines-help@broadinsitute.org) | Please file GitHub issues in the [WARP repository](https://github.com/broadinstitute/warp/issues) | +| [snM3C_v2.0.1](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Kaylee Mathews](mailto:warp-pipelines-help@broadinsitute.org) | Please file GitHub issues in the [WARP repository](https://github.com/broadinstitute/warp/issues) | ## Introduction to snM3C @@ -160,6 +160,12 @@ The following table lists the output variables and files produced by the pipelin All snM3C pipeline releases are documented in the [pipeline changelog](https://github.com/broadinstitute/warp/blob/develop/pipelines/skylab/snM3C/snM3C.changelog.md). +## Citing the snM3C Pipeline + +If you use the snM3C Pipeline in your research, please consider citing our preprint: + +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Consortia support This pipeline is supported by the [BRAIN Initiative](https://braininitiative.nih.gov/) (BICCN and BICAN). diff --git a/website/docs/get-started.md b/website/docs/get-started.md index c42a4a06ae..0c6a77bc70 100755 --- a/website/docs/get-started.md +++ b/website/docs/get-started.md @@ -101,7 +101,7 @@ Our planned upcoming improvements include: ## Citing WARP When citing WARP, please use the following: -Degatano K, Grant G, Khajouei F et al. Introducing WARP: A collection of cloud-optimized workflows for biological data processing and reproducible analysis [version 1; not peer reviewed]. F1000Research 2021, 10(ISCB Comm J):705 (slides) (doi: 10.7490/f1000research.1118678.1) +Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 ## Acknowledgements From 5c81acd6ea63f0effa24f27e4975e70d541390a5 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Fri, 1 Mar 2024 11:22:18 -0500 Subject: [PATCH 57/68] new docker for mergestaroutputs --- tasks/skylab/StarAlign.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tasks/skylab/StarAlign.wdl b/tasks/skylab/StarAlign.wdl index 2107836c16..ca2c543d4a 100644 --- a/tasks/skylab/StarAlign.wdl +++ b/tasks/skylab/StarAlign.wdl @@ -480,7 +480,7 @@ task MergeStarOutput { String input_id #runtime values - String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.2-1709218388" + String docker = "us.gcr.io/broad-gotc-prod/warp-tools:2.0.2-1709308985" Int machine_mem_gb = 20 Int cpu = 1 Int disk = ceil(size(matrix, "Gi") * 2) + 10 From bfe4052a11cde40c585d71b377fb202ea3c588f8 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Fri, 1 Mar 2024 11:45:40 -0500 Subject: [PATCH 58/68] wdl versions --- pipelines/skylab/paired_tag/PairedTag.wdl | 2 +- pipelines/skylab/slideseq/SlideSeq.wdl | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pipelines/skylab/paired_tag/PairedTag.wdl b/pipelines/skylab/paired_tag/PairedTag.wdl index e6dc936ba7..05aa4867ee 100644 --- a/pipelines/skylab/paired_tag/PairedTag.wdl +++ b/pipelines/skylab/paired_tag/PairedTag.wdl @@ -5,7 +5,7 @@ import "../../../pipelines/skylab/optimus/Optimus.wdl" as optimus import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils import "../../../tasks/skylab/PairedTagUtils.wdl" as Demultiplexing workflow PairedTag { - String pipeline_version = "0.2.0" + String pipeline_version = "0.3.0" input { String input_id diff --git a/pipelines/skylab/slideseq/SlideSeq.wdl b/pipelines/skylab/slideseq/SlideSeq.wdl index 66f6001da8..4f241b24d8 100644 --- a/pipelines/skylab/slideseq/SlideSeq.wdl +++ b/pipelines/skylab/slideseq/SlideSeq.wdl @@ -23,7 +23,7 @@ import "../../../tasks/skylab/MergeSortBam.wdl" as Merge workflow SlideSeq { - String pipeline_version = "3.1.1" + String pipeline_version = "3.1.2" input { Array[File] r1_fastq From 74cfd4fb5e81d57a4e8c57e35a30187da8920b7c Mon Sep 17 00:00:00 2001 From: ekiernan <55763654+ekiernan@users.noreply.github.com> Date: Fri, 1 Mar 2024 18:48:00 -0500 Subject: [PATCH 59/68] Update website/docs/Pipelines/PairedTag_Pipeline/README.md Co-authored-by: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> --- website/docs/Pipelines/PairedTag_Pipeline/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/Pipelines/PairedTag_Pipeline/README.md b/website/docs/Pipelines/PairedTag_Pipeline/README.md index 26f11b45af..9be2233657 100644 --- a/website/docs/Pipelines/PairedTag_Pipeline/README.md +++ b/website/docs/Pipelines/PairedTag_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/PairedTag_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [PairedTag_v0.2.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | +| [PairedTag_v0.3.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the Paired-Tag workflow From b85db16482d2b33d5b06cf65bd008654aa14444c Mon Sep 17 00:00:00 2001 From: ekiernan <55763654+ekiernan@users.noreply.github.com> Date: Fri, 1 Mar 2024 18:48:16 -0500 Subject: [PATCH 60/68] Update website/docs/Pipelines/SlideSeq_Pipeline/README.md Co-authored-by: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> --- website/docs/Pipelines/SlideSeq_Pipeline/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/README.md b/website/docs/Pipelines/SlideSeq_Pipeline/README.md index f3c45486a0..ffccb6c445 100644 --- a/website/docs/Pipelines/SlideSeq_Pipeline/README.md +++ b/website/docs/Pipelines/SlideSeq_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/SlideSeq_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [SlideSeq v3.1.1](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | +| [SlideSeq v3.1.2](https://github.com/broadinstitute/warp/releases) | February, 2024 | Elizabeth Kiernan & Kaylee Mathews | Please file GitHub issues in warp or contact [documentation authors](mailto:warp-pipelines-help@broadinstitute.org) | ![SlideSeq_diagram](./slide-seq_diagram.png) From 920bf10f2fc009c3c149c18e5281183c29a3a037 Mon Sep 17 00:00:00 2001 From: Nikelle Petrillo <38223776+nikellepetrillo@users.noreply.github.com> Date: Wed, 6 Mar 2024 15:24:34 -0500 Subject: [PATCH 61/68] Snm3c hackathon branch (#1222) * Lk fix repeat index snm3c (#1217) --attempting to not break the snm3c branch * snm3c hackathon (#1218) added aa's changes * fix summary task and add pairedend optimizations * fix summary task and add pairedend optimizations * Update snM3C.wdl * fix summary task and add pairedend optimizations * fix summary task and add pairedend optimizations * fix summary task and add pairedend optimizations * fix summary task and add pairedend optimizations * fix summary task and add pairedend optimizations * add paired end optimizations to hackathon branch (#1220) * 4 threads * 128 threads * samtoools threads * samtoools threads * update * Update pipelines/skylab/snM3C/snM3C.wdl Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> --------- Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> * re-wire the outputs * re-wire the outputs * re-wire the outputs * re-wire the outputs * re-wire the outputs * cascade lake * stop using ice lake as default * stop using ice lake as default * stop using ice lake as default * stop using ice lake as default * Update README.md * reverted snm3c hisat to --no-repeat-index * Update snM3C.wdl --------- Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> Co-authored-by: ekiernan Co-authored-by: kayleemathews --- pipelines/skylab/snM3C/snM3C.changelog.md | 5 + pipelines/skylab/snM3C/snM3C.wdl | 753 +++++++++--------- .../test_inputs/Plumbing/miseq_M16_G13.json | 4 +- verification/test-wdls/TestsnM3C.wdl | 9 +- website/docs/Pipelines/snM3C/README.md | 43 +- 5 files changed, 432 insertions(+), 382 deletions(-) diff --git a/pipelines/skylab/snM3C/snM3C.changelog.md b/pipelines/skylab/snM3C/snM3C.changelog.md index dc90a21239..a7901c38ee 100644 --- a/pipelines/skylab/snM3C/snM3C.changelog.md +++ b/pipelines/skylab/snM3C/snM3C.changelog.md @@ -1,3 +1,8 @@ +# 3.0.0 +2024-02-23 (Date of Last Commit) + +* Updated the snM3C docker to include the latest changes to the CEMBA repostiory; this impacts the scientific outputs + # 2.0.1 2024-2-15 (Date of Last Commit) diff --git a/pipelines/skylab/snM3C/snM3C.wdl b/pipelines/skylab/snM3C/snM3C.wdl index bcdc71a861..17990ac2ed 100644 --- a/pipelines/skylab/snM3C/snM3C.wdl +++ b/pipelines/skylab/snM3C/snM3C.wdl @@ -23,11 +23,13 @@ workflow snM3C { Int num_downstr_bases = 2 Int compress_level = 5 Int batch_number - + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + String single_end_hisat_cpu_platform = "Intel Ice Lake" + String merge_sort_analyze_cpu_platform = "Intel Ice Lake" } # version of the pipeline - String pipeline_version = "2.0.1" + String pipeline_version = "3.0.0" call Demultiplexing { input: @@ -69,36 +71,28 @@ workflow snM3C { plate_id = plate_id, } - call Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap { + call hisat_single_end { input: split_fq_tar = Separate_and_split_unmapped_reads.split_fq_tar, tarred_index_files = tarred_index_files, genome_fa = genome_fa, - plate_id = plate_id - } - - call merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate { - input: - bam = Separate_and_split_unmapped_reads.unique_bam_tar, - split_bam = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap.remove_overlaps_output_bam_tar, - plate_id = plate_id + plate_id = plate_id, + docker = docker, + single_end_hisat_cpu_platform = single_end_hisat_cpu_platform } - call call_chromatin_contacts { + call merge_sort_analyze { input: - name_sorted_bam = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate.name_sorted_bam, - plate_id = plate_id - } - - call unique_reads_allc_and_cgn_extraction { - input: - bam_and_index_tar = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate.dedup_output_bam_tar, - genome_fa = genome_fa, - num_upstr_bases = num_upstr_bases, - num_downstr_bases = num_downstr_bases, - compress_level = compress_level, - plate_id = plate_id, - chromosome_sizes = chromosome_sizes + paired_end_unique_tar = Separate_and_split_unmapped_reads.unique_bam_tar, + read_overlap_tar = hisat_single_end.remove_overlaps_output_bam_tar, + genome_fa = genome_fa, + num_upstr_bases = num_upstr_bases, + num_downstr_bases = num_downstr_bases, + compress_level = compress_level, + chromosome_sizes = chromosome_sizes, + plate_id = plate_id, + docker = docker, + merge_sort_analyze_cpu_platform = merge_sort_analyze_cpu_platform } } @@ -106,26 +100,27 @@ workflow snM3C { input: trimmed_stats = Sort_and_trim_r1_and_r2.trim_stats_tar, hisat3n_stats = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_stats_tar, - r1_hisat3n_stats = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap.hisat3n_dna_split_reads_summary_R1_tar, - r2_hisat3n_stats = Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap.hisat3n_dna_split_reads_summary_R2_tar, - dedup_stats = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate.dedup_stats_tar, - chromatin_contact_stats = call_chromatin_contacts.chromatin_contact_stats, - allc_uniq_reads_stats = unique_reads_allc_and_cgn_extraction.allc_uniq_reads_stats, - unique_reads_cgn_extraction_tbi = unique_reads_allc_and_cgn_extraction.extract_allc_output_tbi_tar, + r1_hisat3n_stats = hisat_single_end.hisat3n_dna_split_reads_summary_R1_tar, + r2_hisat3n_stats = hisat_single_end.hisat3n_dna_split_reads_summary_R2_tar, + dedup_stats = merge_sort_analyze.dedup_stats_tar, + chromatin_contact_stats = merge_sort_analyze.chromatin_contact_stats, + allc_uniq_reads_stats = merge_sort_analyze.allc_uniq_reads_stats, + unique_reads_cgn_extraction_tbi = merge_sort_analyze.extract_allc_output_tbi_tar, plate_id = plate_id } output { File MappingSummary = summary.mapping_summary - Array[File] name_sorted_bams = merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate.name_sorted_bam - Array[File] unique_reads_cgn_extraction_allc= unique_reads_allc_and_cgn_extraction.allc - Array[File] unique_reads_cgn_extraction_tbi = unique_reads_allc_and_cgn_extraction.tbi - Array[File] unique_reads_cgn_extraction_allc_extract = unique_reads_allc_and_cgn_extraction.extract_allc_output_allc_tar - Array[File] unique_reads_cgn_extraction_tbi_extract = unique_reads_allc_and_cgn_extraction.extract_allc_output_tbi_tar + Array[File] name_sorted_bams = merge_sort_analyze.name_sorted_bam + Array[File] unique_reads_cgn_extraction_allc= merge_sort_analyze.allc + Array[File] unique_reads_cgn_extraction_tbi = merge_sort_analyze.tbi Array[File] reference_version = Hisat_3n_pair_end_mapping_dna_mode.reference_version - Array[File] chromatin_contact_stats = call_chromatin_contacts.chromatin_contact_stats - Array[File] all_reads_dedup_contacts = call_chromatin_contacts.all_reads_dedup_contacts - Array[File] all_reads_3C_contacts = call_chromatin_contacts.all_reads_3C_contacts + Array[File] all_reads_dedup_contacts = merge_sort_analyze.all_reads_dedup_contacts + Array[File] all_reads_3C_contacts = merge_sort_analyze.all_reads_3C_contacts + Array[File] chromatin_contact_stats = merge_sort_analyze.chromatin_contact_stats + Array[File] unique_reads_cgn_extraction_allc_extract = merge_sort_analyze.extract_allc_output_allc_tar + Array[File] unique_reads_cgn_extraction_tbi_extract = merge_sort_analyze.extract_allc_output_tbi_tar + } } @@ -137,7 +132,7 @@ task Demultiplexing { String plate_id Int batch_number - String docker_image = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + String docker_image = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" Int disk_size = 1000 Int mem_size = 10 Int preemptible_tries = 3 @@ -252,7 +247,7 @@ task Sort_and_trim_r1_and_r2 { Int disk_size = 500 Int mem_size = 16 - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + String docker = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" Int preemptible_tries = 3 Int cpu = 4 @@ -334,11 +329,11 @@ task Hisat_3n_pair_end_mapping_dna_mode{ File chromosome_sizes String plate_id - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + String docker = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" Int disk_size = 1000 Int mem_size = 64 Int preemptible_tries = 3 - Int cpu = 16 + Int cpu = 48 } command <<< set -euo pipefail @@ -372,28 +367,47 @@ task Hisat_3n_pair_end_mapping_dna_mode{ R1_files=($(ls | grep "\-R1_trimmed.fq.gz")) R2_files=($(ls | grep "\-R2_trimmed.fq.gz")) + echo "starting hisat" + + task() { + sample_id=$(basename "$file" "-R1_trimmed.fq.gz") + hisat-3n /cromwell_root/$genome_fa_basename \ + -q \ + -1 ${sample_id}-R1_trimmed.fq.gz \ + -2 ${sample_id}-R2_trimmed.fq.gz \ + --directional-mapping-reverse \ + --base-change C,T \ + --no-repeat-index \ + --no-spliced-alignment \ + --no-temp-splicesite \ + -t \ + --new-summary \ + --summary-file ${sample_id}.hisat3n_dna_summary.txt \ + --threads 8 | samtools view -b -q 0 -o "${sample_id}.hisat3n_dna.unsort.bam" + } + for file in "${R1_files[@]}"; do - sample_id=$(basename "$file" "-R1_trimmed.fq.gz") - hisat-3n /cromwell_root/$genome_fa_basename \ - -q \ - -1 ${sample_id}-R1_trimmed.fq.gz \ - -2 ${sample_id}-R2_trimmed.fq.gz \ - --directional-mapping-reverse \ - --base-change C,T \ - --no-repeat-index \ - --no-spliced-alignment \ - --no-temp-splicesite \ - -t \ - --new-summary \ - --summary-file ${sample_id}.hisat3n_dna_summary.txt \ - --threads 11 | samtools view -b -q 0 -o "${sample_id}.hisat3n_dna.unsort.bam" + ( + echo "starting task $file.." + task "$file" + sleep $(( (RANDOM % 3) + 1)) + ) & + + if [[ $(jobs -r -p | wc -l) -ge 4 ]]; then + wait -n + fi done + # Wait for all background jobs to finish before continuing + wait + + echo "done hisat" + + echo "tarring up the outputs" # tar up the bam files and stats files tar -zcvf ~{plate_id}.hisat3n_paired_end_bam_files.tar.gz *.bam tar -zcvf ~{plate_id}.hisat3n_paired_end_stats_files.tar.gz *.hisat3n_dna_summary.txt - >>> runtime { docker: docker @@ -415,8 +429,8 @@ task Separate_and_split_unmapped_reads { Int min_read_length String plate_id - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 200 + String docker = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" + Int disk_size = 1000 Int mem_size = 10 Int preemptible_tries = 3 Int cpu = 8 @@ -438,7 +452,7 @@ task Separate_and_split_unmapped_reads { pattern = "*.hisat3n_dna.unsort.bam" bam_files = glob.glob(os.path.join('/cromwell_root/', pattern)) - + for file in bam_files: full_filename = os.path.basename(file) @@ -512,381 +526,404 @@ task Separate_and_split_unmapped_reads { } } -task Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap { +task hisat_single_end { input { File split_fq_tar File genome_fa File tarred_index_files String plate_id - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 500 - Int mem_size = 64 - Int preemptible_tries = 3 - Int cpu = 16 + String single_end_hisat_cpu_platform + Int disk_size = 1000 + Int mem_size = 128 + Int cpu = 32 + Int preemptible_tries = 2 + String docker = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" } + command <<< set -euo pipefail - - + set -x + lscpu + # untar the tarred index files - tar -xvf ~{tarred_index_files} + echo "Untar tarred_index_files" + start=$(date +%s) + pigz -dc ~{tarred_index_files} | tar -xf - rm ~{tarred_index_files} - + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to untar tarred_index_files: $elapsed seconds" + cp ~{genome_fa} . #get the basename of the genome_fa file + echo "samtools faidx" + start=$(date +%s) genome_fa_basename=$(basename ~{genome_fa} .fa) samtools faidx $genome_fa_basename.fa - + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to samtools faidx: $elapsed seconds" + # untar the unmapped fastq files - tar -xvf ~{split_fq_tar} + echo "Untar split_fq_tar" + start=$(date +%s) + pigz -dc ~{split_fq_tar} | tar -xf - rm ~{split_fq_tar} - + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to untar split_fq_tar: $elapsed seconds" + + # make directories + mkdir -p /cromwell_root/merged_sort_bams + mkdir -p /cromwell_root/read_overlap + # define lists of r1 and r2 fq files R1_files=($(ls | grep "\.hisat3n_dna.split_reads.R1.fastq")) R2_files=($(ls | grep "\.hisat3n_dna.split_reads.R2.fastq")) - for file in "${R1_files[@]}"; do - sample_id=$(basename "$file" ".hisat3n_dna.split_reads.R1.fastq") + task() { + BASE=$(basename "$file" ".hisat3n_dna.split_reads.R1.fastq") + echo $BASE + echo "Running hisat on sample_id_R1" $BASE + + echo "Hisat 3n R1" + start=$(date +%s) + + # hisat on R1 single end hisat-3n /cromwell_root/$genome_fa_basename \ -q \ - -U ${sample_id}.hisat3n_dna.split_reads.R1.fastq \ - --directional-mapping-reverse \ - --base-change C,T \ + -U ${BASE}.hisat3n_dna.split_reads.R1.fastq \ + -S ${BASE}.hisat3n_dna.split_reads.R1.sam --directional-mapping-reverse --base-change C,T \ --no-repeat-index \ --no-spliced-alignment \ --no-temp-splicesite \ -t \ --new-summary \ - --summary-file ${sample_id}.hisat3n_dna_split_reads_summary.R1.txt \ - --threads 11 | samtools view -b -q 10 -o "${sample_id}.hisat3n_dna.split_reads.R1.bam" - done - - for file in "${R2_files[@]}"; do - sample_id=$(basename "$file" ".hisat3n_dna.split_reads.R2.fastq") + --summary-file ${BASE}.hisat3n_dna_split_reads_summary.R1.txt \ + --threads 8 + + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to run $elapsed seconds" + echo "Finish running hisat on sample_id_R1" $BASE + + echo "Hisat 3n R2" + start=$(date +%s) + echo "Running hisat on sample_id_R2" $BASE + + # hisat on R2 single end hisat-3n /cromwell_root/$genome_fa_basename \ -q \ - -U ${sample_id}.hisat3n_dna.split_reads.R2.fastq \ - --directional-mapping \ - --base-change C,T \ + -U ${BASE}.hisat3n_dna.split_reads.R2.fastq \ + -S ${BASE}.hisat3n_dna.split_reads.R2.sam --directional-mapping --base-change C,T \ --no-repeat-index \ --no-spliced-alignment \ --no-temp-splicesite \ -t --new-summary \ - --summary-file ${sample_id}.hisat3n_dna_split_reads_summary.R2.txt \ - --threads 11 | samtools view -b -q 10 -o "${sample_id}.hisat3n_dna.split_reads.R2.bam" - done - - # tar up the r1 and r2 stats files - tar -zcvf ~{plate_id}.hisat3n_dna_split_reads_summary.R1.tar.gz *.hisat3n_dna_split_reads_summary.R1.txt - tar -zcvf ~{plate_id}.hisat3n_dna_split_reads_summary.R2.tar.gz *.hisat3n_dna_split_reads_summary.R2.txt - - - # define lists of r1 and r2 bam files - R1_bams=($(ls | grep "\.hisat3n_dna.split_reads.R1.bam")) - R2_bams=($(ls | grep "\.hisat3n_dna.split_reads.R2.bam")) - - # Loop through the R1 BAM files - for r1_bam in "${R1_bams[@]}"; do - # Extract the corresponding R2 BAM file - r2_bam="${r1_bam/.hisat3n_dna.split_reads.R1.bam/.hisat3n_dna.split_reads.R2.bam}" - - # Define the output BAM file name - output_bam="$(basename ${r1_bam/.hisat3n_dna.split_reads.R1.bam/.hisat3n_dna.split_reads.name_sort.bam})" - - # Perform the samtools merge and sort commands - samtools merge -o - "$r1_bam" "$r2_bam" | samtools sort -n -o "$output_bam" - - done - - #tar up the merged bam files - tar -zcvf ~{plate_id}.hisat3n_dna.split_reads.name_sort.bam.tar.gz *.hisat3n_dna.split_reads.name_sort.bam - - # unzip bam file - tar -xf ~{plate_id}.hisat3n_dna.split_reads.name_sort.bam.tar.gz - - # create output dir - mkdir /cromwell_root/output_bams - # get bams - bams=($(ls | grep "sort.bam$")) - - # loop through bams and run python script on each bam - # scatter instead of for loop to optimize - python3 < ~{plate_id}.hisat3n_dna_split_reads_summary.R1.tar.gz + tar -cf - *.hisat3n_dna_split_reads_summary.R2.txt | pigz > ~{plate_id}.hisat3n_dna_split_reads_summary.R2.tar.gz + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to run tar summary text files $elapsed seconds" + + # tar up read overlap files + echo "Tar up read_overlap bams" + start=$(date +%s) + tar -zcvf ~{plate_id}.remove_overlap_read_parts.tar.gz *read_overlap.bam + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to tar read_overlap bams $elapsed seconds" >>> + runtime { docker: docker - disks: "local-disk ${disk_size} HDD" + disks: "local-disk ${disk_size} SSD" cpu: cpu memory: "${mem_size} GiB" + cpuPlatform: single_end_hisat_cpu_platform preemptible: preemptible_tries } + output { - #File merge_sorted_bam_tar = "~{plate_id}.hisat3n_dna.split_reads.name_sort.bam.tar.gz" - File hisat3n_dna_split_reads_summary_R1_tar = "~{plate_id}.hisat3n_dna_split_reads_summary.R1.tar.gz" - File hisat3n_dna_split_reads_summary_R2_tar = "~{plate_id}.hisat3n_dna_split_reads_summary.R2.tar.gz" - File remove_overlaps_output_bam_tar = "~{plate_id}.remove_overlap_read_parts.tar.gz" + File hisat3n_dna_split_reads_summary_R1_tar = "~{plate_id}.hisat3n_dna_split_reads_summary.R1.tar.gz" + File hisat3n_dna_split_reads_summary_R2_tar = "~{plate_id}.hisat3n_dna_split_reads_summary.R2.tar.gz" + File remove_overlaps_output_bam_tar = "~{plate_id}.remove_overlap_read_parts.tar.gz" + } } - -task merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate { + +task merge_sort_analyze { input { - File bam - File split_bam String plate_id + File paired_end_unique_tar + File read_overlap_tar - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + #input for allcools bam-to-allc + File genome_fa + String genome_base = basename(genome_fa) + Int num_upstr_bases + Int num_downstr_bases + Int compress_level + File chromosome_sizes + + String merge_sort_analyze_cpu_platform + String docker = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" Int disk_size = 1000 - Int mem_size = 50 + Int mem_size = 64 + Int cpu = 16 Int preemptible_tries = 3 - Int cpu = 8 } + command <<< set -euo pipefail - #unzip bam file - tar -xf ~{bam} - tar -xf ~{split_bam} - rm ~{bam} - rm ~{split_bam} + set -x + lscpu + + # unzip tars + echo "Untar paired_end_unique_tar" + start=$(date +%s) + pigz -dc ~{paired_end_unique_tar} | tar -xf - + rm ~{paired_end_unique_tar} + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to untar paired_end_unique_tar: $elapsed seconds" + + echo "Untar read_overlap_tar" + start=$(date +%s) + pigz -dc ~{read_overlap_tar} | tar -xf - + rm ~{read_overlap_tar} + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to untar read_overlap_tar: $elapsed seconds" + + # reference and index + start=$(date +%s) + echo "Reference and index fasta" + mkdir reference + cp ~{genome_fa} reference + ls reference + samtools faidx reference/*.fa + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to index fasta $elapsed seconds" - echo "samtools merge and sort" # define lists of r1 and r2 fq files UNIQUE_BAMS=($(ls | grep "\.hisat3n_dna.unique_aligned.bam")) - SPLIT_BAMS=($(ls | grep "\.hisat3n_dna.split_reads.read_overlap.bam")) - - for file in "${UNIQUE_BAMS[@]}"; do - sample_id=$(basename "$file" ".hisat3n_dna.unique_aligned.bam") - samtools merge -f "${sample_id}.hisat3n_dna.all_reads.bam" "${sample_id}.hisat3n_dna.unique_aligned.bam" "${sample_id}.hisat3n_dna.split_reads.read_overlap.bam" - samtools sort -n -o "${sample_id}.hisat3n_dna.all_reads.name_sort.bam" "${sample_id}.hisat3n_dna.all_reads.bam" - samtools sort -O BAM -o "${sample_id}.hisat3n_dna.all_reads.pos_sort.bam" "${sample_id}.hisat3n_dna.all_reads.name_sort.bam" - done - - echo "Zip files" - #tar up the merged bam files - tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.pos_sort.tar.gz *.hisat3n_dna.all_reads.pos_sort.bam - tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.name_sort.tar.gz *.hisat3n_dna.all_reads.name_sort.bam - + SPLIT_BAMS=($(ls | grep "\.read_overlap.bam")) - # unzip files - tar -xf ~{plate_id}.hisat3n_dna.all_reads.pos_sort.tar.gz + # for allcools bam-to-allc + if [ ~{num_upstr_bases} -eq 0 ]; then + mcg_context=CGN + else + mcg_context=HCGN + fi - # create output dir + # make directories mkdir /cromwell_root/output_bams mkdir /cromwell_root/temp - - # name : AD3C_BA17_2027_P1-1-B11-G13.hisat3n_dna.all_reads.pos_sort.bam - for file in *.pos_sort.bam - do - name=`echo $file | cut -d. -f1` - name=$name.hisat3n_dna.all_reads.deduped - echo $name - echo "Call Picard" - picard MarkDuplicates I=$file O=/cromwell_root/output_bams/$name.bam \ - M=/cromwell_root/output_bams/$name.matrix.txt \ + mkdir /cromwell_root/allc-${mcg_context} + + task() { + local file=$1 + sample_id=$(basename "$file" ".hisat3n_dna.unique_aligned.bam") + echo $sample_id + + start=$(date +%s) + echo "Merge all unique_aligned and read_overlap" + samtools merge -f "${sample_id}.hisat3n_dna.all_reads.bam" "${sample_id}.hisat3n_dna.unique_aligned.bam" "${sample_id}.read_overlap.bam" -@4 + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to run merge $elapsed seconds" + + start=$(date +%s) + echo "Sort all reads by name" + samtools sort -n -@4 -m1g -o "${sample_id}.hisat3n_dna.all_reads.name_sort.bam" "${sample_id}.hisat3n_dna.all_reads.bam" + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to run sort by name $elapsed seconds" + + start=$(date +%s) + echo "Sort all reads by name" + samtools sort -O BAM -@4 -m1g -o "${sample_id}.hisat3n_dna.all_reads.pos_sort.bam" "${sample_id}.hisat3n_dna.all_reads.name_sort.bam" + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to run sort by pos $elapsed seconds" + + start=$(date +%s) + echo "Call Picard remove duplicates" + name=${sample_id}.hisat3n_dna.all_reads.deduped + picard MarkDuplicates I=${sample_id}.hisat3n_dna.all_reads.pos_sort.bam O=/cromwell_root/output_bams/${name}.bam \ + M=/cromwell_root/output_bams/${name}.matrix.txt \ REMOVE_DUPLICATES=true TMP_DIR=/cromwell_root/temp + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to run picard $elapsed seconds" + + start=$(date +%s) echo "Call samtools index" - samtools index /cromwell_root/output_bams/$name.bam + samtools index /cromwell_root/output_bams/${name}.bam + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to samtools index $elapsed seconds" + + start=$(date +%s) + echo "Call chromatin contacts from name sorted bams" + python3 -c 'from cemba_data.hisat3n import *;import os;import glob;call_chromatin_contacts(bam_path="'"$sample_id"'.hisat3n_dna.all_reads.name_sort.bam",contact_prefix="'"$sample_id"'.hisat3n_dna.all_reads",save_raw=False,save_hic_format=True)' + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to chromatin contacts $elapsed seconds" + + start=$(date +%s) + echo "Call allcools bam-to-allc from deduped.bams" + /opt/conda/bin/allcools bam-to-allc \ + --bam_path /cromwell_root/output_bams/${name}.bam \ + --reference_fasta /cromwell_root/reference/~{genome_base} \ + --output_path "${sample_id}.allc.tsv.gz" \ + --num_upstr_bases ~{num_upstr_bases} \ + --num_downstr_bases ~{num_downstr_bases} \ + --compress_level ~{compress_level} \ + --save_count_df \ + --convert_bam_strandness + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to allcools bam-to-allc $elapsed seconds" + + start=$(date +%s) + echo "Call allcools extract-all" + allcools extract-allc --strandness merge \ + --allc_path ${sample_id}.allc.tsv.gz \ + --output_prefix /cromwell_root/allc-${mcg_context}/${sample_id} \ + --mc_contexts ${mcg_context} \ + --chrom_size_path ~{chromosome_sizes} + end=$(date +%s) + elapsed=$((end - start)) + echo "Elapsed time to allcools extract-all $elapsed seconds" + + echo "Remove some bams" + rm ${sample_id}.hisat3n_dna.all_reads.bam + rm ${sample_id}.hisat3n_dna.all_reads.pos_sort.bam + rm /cromwell_root/${sample_id}.read_overlap.bam + rm /cromwell_root/${sample_id}.hisat3n_dna.unique_aligned.bam + } + + # run 4 instances of task in parallel + for file in "${UNIQUE_BAMS[@]}"; do + ( + echo "starting task $file.." + task "$file" + sleep $(( (RANDOM % 3) + 1)) + ) & + # allow to execute up to 4 jobs in parallel + if [[ $(jobs -r -p | wc -l) -ge 4 ]]; then + wait -n + fi done - cd /cromwell_root + wait + echo "Tasks all done." + du -h * - #tar up the output files - tar -zcvf ~{plate_id}.dedup_unique_bam_and_index_unique_bam.tar.gz output_bams - - #tar up the stats files + echo "Tar files." tar -zcvf ~{plate_id}.dedup_unique_bam_and_index_unique_bam_stats.tar.gz output_bams/*.matrix.txt - - >>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: cpu - memory: "${mem_size} GiB" - preemptible: preemptible_tries - } - output { - File name_sorted_bam = "~{plate_id}.hisat3n_dna.all_reads.name_sort.tar.gz" - File dedup_output_bam_tar = "~{plate_id}.dedup_unique_bam_and_index_unique_bam.tar.gz" - File dedup_stats_tar = "~{plate_id}.dedup_unique_bam_and_index_unique_bam_stats.tar.gz" - } -} - -task call_chromatin_contacts { - input { - File name_sorted_bam - String plate_id - - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int disk_size = 500 - Int mem_size = 32 - Int preemptible_tries = 3 - Int cpu = 8 - } - command <<< - set -euo pipefail - - # untar the name sorted bam files - tar -xf ~{name_sorted_bam} - rm ~{name_sorted_bam} - - python3 <>> - runtime { - docker: docker - disks: "local-disk ${disk_size} HDD" - cpu: cpu - memory: "${mem_size} GiB" - preemptible: preemptible_tries - } - output { - File chromatin_contact_stats = "~{plate_id}.chromatin_contact_stats.tar.gz" - File all_reads_dedup_contacts = "~{plate_id}.hisat3n_dna.all_reads.dedup_contacts.tar.gz" - File all_reads_3C_contacts = "~{plate_id}.hisat3n_dna.all_reads.3C.contact.tar.gz" - } -} - -task unique_reads_allc_and_cgn_extraction { - input { - File bam_and_index_tar - File genome_fa - String plate_id - Int num_upstr_bases - Int num_downstr_bases - Int compress_level - File chromosome_sizes - - Int disk_size = 200 - Int mem_size = 20 - String genome_base = basename(genome_fa) - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" - Int preemptible_tries = 3 - Int cpu = 8 - } - command <<< - set -euo pipefail - - # unzip files - tar -xf ~{bam_and_index_tar} - rm ~{bam_and_index_tar} - - mkdir reference - cp ~{genome_fa} reference - cd reference - - # index the fasta - echo "Indexing FASTA" - samtools faidx *.fa - cd ../output_bams - - echo "Starting allcools" - bam_files=($(ls | grep "\.hisat3n_dna.all_reads.deduped.bam$")) - echo ${bam_files[@]} - for file in "${bam_files[@]}"; do - sample_id=$(basename "$file" ".hisat3n_dna.all_reads.deduped.bam") - /opt/conda/bin/allcools bam-to-allc \ - --bam_path "$file" \ - --reference_fasta /cromwell_root/reference/~{genome_base} \ - --output_path "${sample_id}.allc.tsv.gz" \ - --num_upstr_bases ~{num_upstr_bases} \ - --num_downstr_bases ~{num_downstr_bases} \ - --compress_level ~{compress_level} \ - --save_count_df \ - --convert_bam_strandness - done - echo "Zipping files" - - tar -zcvf ../~{plate_id}.allc.tsv.tar.gz *.allc.tsv.gz - tar -zcvf ../~{plate_id}.allc.tbi.tar.gz *.allc.tsv.gz.tbi - tar -zcvf ~{plate_id}.allc.count.tar.gz *.allc.tsv.gz.count.csv - - cd ../ - tar -xf ~{plate_id}.allc.tsv.tar.gz - tar -xf ~{plate_id}.allc.tbi.tar.gz - - # prefix="allc-{mcg_context}/{cell_id}" - if [ ~{num_upstr_bases} -eq 0 ]; then - mcg_context=CGN - else - mcg_context=HCGN - fi - # create output dir - mkdir /cromwell_root/allc-${mcg_context} - outputdir=/cromwell_root/allc-${mcg_context} - - for gzfile in *.allc.tsv.gz - do - name=`echo $gzfile | cut -d. -f1` - echo $name - allcools extract-allc --strandness merge --allc_path $gzfile \ - --output_prefix $outputdir/$name \ - --mc_contexts ${mcg_context} \ - --chrom_size_path ~{chromosome_sizes} - done - - mv output_bams/~{plate_id}.allc.count.tar.gz /cromwell_root - - cd /cromwell_root - tar -zcvf ~{plate_id}.extract-allc.tar.gz $outputdir/*.gz - tar -zcvf ~{plate_id}.extract-allc_tbi.tar.gz $outputdir/*.tbi - + tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.name_sort.tar.gz *.hisat3n_dna.all_reads.name_sort.bam + # tar outputs of call_chromatin_contacts + tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.3C.contact.tar.gz *.hisat3n_dna.all_reads.3C.contact.tsv.gz + tar -zcvf ~{plate_id}.hisat3n_dna.all_reads.dedup_contacts.tar.gz *.hisat3n_dna.all_reads.dedup_contacts.tsv.gz + tar -zcvf ~{plate_id}.chromatin_contact_stats.tar.gz *.hisat3n_dna.all_reads.contact_stats.csv + # tar outputs of allcools + tar -zcvf ~{plate_id}.allc.tsv.tar.gz *.allc.tsv.gz + tar -zcvf ~{plate_id}.allc.tbi.tar.gz *.allc.tsv.gz.tbi + tar -zcvf ~{plate_id}.allc.count.tar.gz *.allc.tsv.gz.count.csv + tar -zcvf ~{plate_id}.extract-allc_tbi.tar.gz *.tbi + tar -zcvf ~{plate_id}.extract-allc.tar.gz /cromwell_root/allc-${mcg_context}/*.gz + tar -zcvf ~{plate_id}.extract-allc_tbi.tar.gz /cromwell_root/allc-${mcg_context}/*.tbi >>> runtime { docker: docker - disks: "local-disk ${disk_size} HDD" + disks: "local-disk ${disk_size} SSD" cpu: cpu memory: "${mem_size} GiB" + cpuPlatform: merge_sort_analyze_cpu_platform preemptible: preemptible_tries } - output { + + output { File allc = "~{plate_id}.allc.tsv.tar.gz" File tbi = "~{plate_id}.allc.tbi.tar.gz" + File all_reads_dedup_contacts = "~{plate_id}.hisat3n_dna.all_reads.dedup_contacts.tar.gz" + File all_reads_3C_contacts = "~{plate_id}.hisat3n_dna.all_reads.3C.contact.tar.gz" + File name_sorted_bam = "~{plate_id}.hisat3n_dna.all_reads.name_sort.tar.gz" + File dedup_stats_tar = "~{plate_id}.dedup_unique_bam_and_index_unique_bam_stats.tar.gz" + File chromatin_contact_stats = "~{plate_id}.chromatin_contact_stats.tar.gz" File allc_uniq_reads_stats = "~{plate_id}.allc.count.tar.gz" - File extract_allc_output_allc_tar = "~{plate_id}.extract-allc.tar.gz" File extract_allc_output_tbi_tar = "~{plate_id}.extract-allc_tbi.tar.gz" - } + File extract_allc_output_allc_tar = "~{plate_id}.extract-allc.tar.gz" + File extract_allc_output_tbi_tar = "~{plate_id}.extract-allc_tbi.tar.gz" + } } task summary { @@ -901,7 +938,7 @@ task summary { Array[File] unique_reads_cgn_extraction_tbi String plate_id - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + String docker = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" Int disk_size = 80 Int mem_size = 5 Int preemptible_tries = 3 @@ -961,4 +998,4 @@ task summary { output { File mapping_summary = "~{plate_id}_MappingSummary.csv.gz" } -} \ No newline at end of file +} diff --git a/pipelines/skylab/snM3C/test_inputs/Plumbing/miseq_M16_G13.json b/pipelines/skylab/snM3C/test_inputs/Plumbing/miseq_M16_G13.json index 8df63dba8b..e7d1cfe078 100644 --- a/pipelines/skylab/snM3C/test_inputs/Plumbing/miseq_M16_G13.json +++ b/pipelines/skylab/snM3C/test_inputs/Plumbing/miseq_M16_G13.json @@ -16,5 +16,7 @@ "snM3C.tarred_index_files":"gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38_index_files.tar.gz", "snM3C.chromosome_sizes": "gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38.chrom.sizes", "snM3C.genome_fa": "gs://broad-gotc-test-storage/methylome/input/plumbing/index_files/hg38.fa", - "snM3C.batch_number": 2 + "snM3C.batch_number": 2, + "snM3C.single_end_hisat_cpu_platform": "Intel Cascade Lake", + "snM3C.merge_sort_analyze_cpu_platform": "Intel Cascade Lake" } diff --git a/verification/test-wdls/TestsnM3C.wdl b/verification/test-wdls/TestsnM3C.wdl index 959aec4bd7..bded2b2f8a 100644 --- a/verification/test-wdls/TestsnM3C.wdl +++ b/verification/test-wdls/TestsnM3C.wdl @@ -35,6 +35,10 @@ workflow TestsnM3C { Boolean update_truth String vault_token_path String google_account_vault_path + + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + String single_end_hisat_cpu_platform = "Intel Ice Lake" + String merge_sort_analyze_cpu_platform = "Intel Ice Lake" } meta { @@ -60,7 +64,10 @@ workflow TestsnM3C { num_upstr_bases = num_upstr_bases, num_downstr_bases = num_downstr_bases, compress_level = compress_level, - batch_number = batch_number + batch_number = batch_number, + docker = docker, + single_end_hisat_cpu_platform = single_end_hisat_cpu_platform, + merge_sort_analyze_cpu_platform = merge_sort_analyze_cpu_platform } diff --git a/website/docs/Pipelines/snM3C/README.md b/website/docs/Pipelines/snM3C/README.md index 99765c5c2b..89586b3824 100644 --- a/website/docs/Pipelines/snM3C/README.md +++ b/website/docs/Pipelines/snM3C/README.md @@ -6,7 +6,7 @@ slug: /Pipelines/snM3C/README | Pipeline Version | Date Updated | Documentation Authors | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [snM3C_v2.0.1](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Kaylee Mathews](mailto:warp-pipelines-help@broadinsitute.org) | Please file GitHub issues in the [WARP repository](https://github.com/broadinstitute/warp/issues) | +| [snM3C_v3.0.0](https://github.com/broadinstitute/warp/releases) | March, 2024 | [Kaylee Mathews](mailto:warp-pipelines-help@broadinsitute.org) | Please file GitHub issues in the [WARP repository](https://github.com/broadinstitute/warp/issues) | ## Introduction to snM3C @@ -78,10 +78,8 @@ Overall, the snM3C workflow: 2. Aligns paired-end reads. 3. Separates unmapped, uniquely aligned, multi-aligned reads and splits unmapped reads by enzyme cut site. 4. Aligns unmapped, single-end reads and removes overlapping reads. -5. Merges mapped reads from single- and paired-end alignments and removes duplicate reads. -6. Calls chromatin contacts. -7. Creates ALLC files. -8. Creates summary output file. +5. Merges mapped reads, calls chromatin contacts, and creates ALLC files. +6. Creates summary output file. The tools each snM3C task employs are detailed in the table below. @@ -93,12 +91,11 @@ To see specific tool parameters, select the [workflow WDL link](https://github.c | Sort_and_trim_r1_and_r2 | Cutadapt | [Cutadapt](https://cutadapt.readthedocs.io/en/stable/) | Sorts, filters, and trims reads using the `r1_adapter`, `r2_adapter`, `r1_left_cut`, `r1_right_cut`, `r2_left_cut`, and `r2_right_cut` input parameters. | | Hisat_3n_pair_end_mapping_dna_mode | HISAT-3N | [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/) | Performs paired-end read alignment. | | Separate_and_split_unmapped_reads | [hisat3n_general.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/hisat3n_general.py), [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py) | python3 | Imports 2 custom python3 scripts developed by Hanqing Liu and calls the `separate_unique_and_multi_align_reads()` and `split_hisat3n_unmapped_reads()` functions to separate unmapped, uniquely aligned, multi-aligned reads from HISAT-3N BAM file, then splits the unmapped reads FASTQ file by all possible enzyme cut sites and output new R1 and R2 FASTQ files; unmapped reads are stored in unmapped FASTQ files and uniquely and multi-aligned reads are stored in separate BAM files. | -| Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap | HISAT-3N, [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py) | [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/), python3 | Performs single-end alignment of unmapped reads to maximize read mapping, imports a custom python3 script developed by Hanqing Liu, and calls the `remove_overlap_read_parts()` function to remove overlapping reads from the split alignment BAM file produced during single-end alignment. | -| merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate | merge, sort, MarkDuplicates | [samtools](https://www.htslib.org/), [Picard](https://broadinstitute.github.io/picard/) | Merges and sorts all mapped reads from the paired-end and single-end alignments; creates a position-sorted BAM file and a name-sorted BAM file; removes duplicate reads from the position-sorted, merged BAM file. | -| call_chromatin_contacts | [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py) | python3 | Imports a custom python3 script developed by Hanqing Liu and calls the `call_chromatin_contacts()` function to call chromatin contacts from the name-sorted, merged BAM file; reads are considered chromatin contacts if they are greater than 2,500 base pairs apart. | -| unique_reads_allc_and_cgn_extraction | bam-to-allc, extract-allc | [ALLCools](https://lhqing.github.io/ALLCools/intro.html) | Creates a first ALLC file with a list of methylation points and a second ALLC file containing methylation contexts. | +| hisat_single_end | HISAT-3N, [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py) | [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/), python3 | Performs single-end alignment of unmapped reads to maximize read mapping, imports a custom python3 script developed by Hanqing Liu, and calls the `remove_overlap_read_parts()` function to remove overlapping reads from the split alignment BAM file produced during single-end alignment. | +| merge_sort_analyze | merge, sort, MarkDuplicates, [hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py), bam-to-allc, extract-allc | [samtools](https://www.htslib.org/), [Picard](https://broadinstitute.github.io/picard/), python3, [ALLCools](https://lhqing.github.io/ALLCools/intro.html) | Merges and sorts all mapped reads from the paired-end and single-end alignments; creates a position-sorted BAM file and a name-sorted BAM file; removes duplicate reads from the position-sorted, merged BAM file; imports a custom python3 script developed by Hanqing Liu and calls the `call_chromatin_contacts()` function to call chromatin contacts from the name-sorted, merged BAM file; reads are considered chromatin contacts if they are greater than 2,500 base pairs apart; creates a first ALLC file with a list of methylation points and a second ALLC file containing methylation contexts. | | summary | [summary.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/summary.py) | python3 | Imports a custom python3 script developed by Hanqing Liu and calls the `snm3c_summary()` function to generate a single, summary file for the pipeline in TSV format; contains trimming, mapping, deduplication, chromatin contact, and AllC site statistics. | + #### 1. Demultiplexes, sorts, and trims reads In the first step of the pipeline (`Demultiplexing`), raw sequencing reads are demultiplexed by random primer index into cell-level FASTQ files using [Cutadapt](https://cutadapt.readthedocs.io/en/stable/). For more information on barcoding, see the [YAP documentation](https://hq-1.gitbook.io/mc/tech-background/barcoding#two-round-of-barcoding). @@ -118,24 +115,26 @@ After paired-end alignment, the pipeline calls the `Separate_and_split_unmapped_ After separating reads, the task imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu and calls the script's `split_hisat3n_unmapped_reads()` function. This splits the FASTQ file containing the unmapped reads by all possible enzyme cut sites and outputs new R1 and R2 files. #### 4. Aligns unmapped, single-end reads and removes overlapping reads -In the next step of the pipeline, the `Hisat_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name_and_remove_overlap ` task uses [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/) to perform single-end read alignment of the previously unmapped reads to maximize read mapping and outputs a single, aligned BAM file. +In the next step of the pipeline, the `hisat_single_end ` task uses [HISAT-3N](https://daehwankimlab.github.io/hisat2/hisat-3n/) to perform single-end read alignment of the previously unmapped reads to maximize read mapping and outputs a single, aligned BAM file. After the second alignment step, the task imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu. The task calls the script's `remove_overlap_read_parts()` function to remove overlapping reads from the BAM file produced during single-end alignment and output another BAM file. -#### 5. Merges mapped reads from single- and paired-end alignments and removes duplicate reads -The `merge_original_and_split_bam_and_sort_all_reads_by_name_and_position_and_deduplicate` task uses [samtools](https://www.htslib.org/) to merge and sort all of the mapped reads from the paired-end and single-end alignments into a single BAM file. The BAM file is output as both a position-sorted and a name-sorted BAM file. +#### 5. Merges mapped reads, calls chromatin contacts, and creates ALLC files + +**Merged mapped reads** +The `merge_sort_analyze` task uses [samtools](https://www.htslib.org/) to merge and sort all of the mapped reads from the paired-end and single-end alignments into a single BAM file. The BAM file is output as both a position-sorted and a name-sorted BAM file. -After calling chromatin contacts, the task uses Picard's MarkDuplicates tool to remove duplicate reads from the position-sorted, merged BAM file and output a deduplicated BAM file. +After merging, the task uses Picard's MarkDuplicates tool to remove duplicate reads from the position-sorted, merged BAM file and output a deduplicated BAM file. -#### 6. Calls chromatin contacts -In the `call_chromatin_contacts` task, the pipeline imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu. The task calls the script's `call_chromatin_contacts()` function to call chromatin contacts from the name-sorted, merged BAM file. If reads are greater than 2,500 base pairs apart, they are considered chromatin contacts. If reads are less than 2,500 base pairs apart, they are considered the same fragment. +**Calls chromatin contacts** +Next, the pipeline imports a custom python3 script ([hisat3n_m3c.py](https://github.com/lhqing/cemba_data/blob/bf6248239074d0423d45a67d83da99250a43e50c/cemba_data/hisat3n/hisat3n_m3c.py)) developed by Hanqing Liu. The task calls the script's `call_chromatin_contacts()` function to call chromatin contacts from the name-sorted, merged BAM file. If reads are greater than 2,500 base pairs apart, they are considered chromatin contacts. If reads are less than 2,500 base pairs apart, they are considered the same fragment. -#### 7. Creates ALLC files -The `unique_reads_allc_and_cgn_extraction` task uses the [ALLCools](https://lhqing.github.io/ALLCools/intro.html) `bam-to-allc` function to create an ALLC file from the deduplicated BAM file that contains a list of methylation points. The `num_upstr_bases` and `num_downstr_bases` input parameters are used to define the number of bases upstream and downstream of the C base to include in the ALLC context column. +**Creates ALLC files** +After calling chromatin contacts, the task uses the [ALLCools](https://lhqing.github.io/ALLCools/intro.html) `bam-to-allc` function to create an ALLC file from the deduplicated BAM file that contains a list of methylation points. The `num_upstr_bases` and `num_downstr_bases` input parameters are used to define the number of bases upstream and downstream of the C base to include in the ALLC context column. -Next, the task uses the [ALLCools](https://lhqing.github.io/ALLCools/intro.html) `extract-allc` function to extract methylation contexts from the input ALLC file and output a second ALLC file that can be used to generate an [MCDS file](https://github.com/lhqing/allcools_doc/blob/master/tech-background/file-formats.md#mcds-file). +Next, the task uses the [ALLCools](https://lhqing.github.io/ALLCools/intro.html) `extract-allc` function to extract methylation contexts from the input ALLC file and output a second ALLC file that can be used to generate an [MCDS file](https://github.com/lhqing/allcools_doc/blob/master/tech-background/file-formats.md#mcds-file). -#### 8. Creates summary output file +#### 6. Creates summary output file In the last step of the pipeline, the `summary` task imports a custom python3 script ([summary.py](https://github.com/lhqing/cemba_data/blob/788e83cd66f3b556bdfacf3485bed9500d381f23/cemba_data/hisat3n/summary.py)) developed by Hanqing Liu. The task calls the script's `snm3c_summary()` function to generate a single, summary file for the pipeline in TSV format; contains trimming, mapping, deduplication, chromatin contact, and AllC site statistics. This is the main output of the pipeline. ## Outputs @@ -148,12 +147,12 @@ The following table lists the output variables and files produced by the pipelin | name_sorted_bams | `.hisat3n_dna.all_reads.name_sort.tar.gz` | Array of tarred files containing name-sorted, merged BAM files. | | unique_reads_cgn_extraction_allc | `.allc.tsv.tar.gz` | Array of tarred files containing list of methylation points. | | unique_reads_cgn_extraction_tbi | `.allc.tbi.tar.gz` | Array of tarred files containing ALLC index files. | -| unique_reads_cgn_extraction_allc_extract | `.extract-allc.tar.gz` | Array of tarred files containing CGN context-specific ALLC files that can be used to generate an [MCDS file](https://github.com/lhqing/allcools_doc/blob/master/tech-background/file-formats.md#mcds-file). | -| unique_reads_cgn_extraction_tbi_extract | `.extract-allc_tbi.tar.gz` | Array of tarred files containing ALLC index files. | | reference_version | `.reference_version.txt` | Array of tarred files containing the genomic reference version used. | -| chromatin_contact_stats | `.chromatin_contact_stats.tar.gz` | Array of tarred files containing chromatin contact statistics. | | all_reads_dedup_contacts | `.hisat3n_dna.all_reads.dedup_contacts.tar.gz` | Array of tarred TSV files containing deduplicated chromatin contacts. | | all_reads_3C_contacts | `.hisat3n_dna.all_reads.3C.contact.tar.gz` | Array of tarred TSV files containing chromatin contacts in Hi-C format. | +| chromatin_contact_stats | `.chromatin_contact_stats.tar.gz` | Array of tarred files containing chromatin contact statistics. | +| unique_reads_cgn_extraction_allc_extract | `.extract-allc.tar.gz` | Array of tarred files containing CGN context-specific ALLC files that can be used to generate an [MCDS file](https://github.com/lhqing/allcools_doc/blob/master/tech-background/file-formats.md#mcds-file). | +| unique_reads_cgn_extraction_tbi_extract | `.extract-allc_tbi.tar.gz` | Array of tarred files containing ALLC index files. | ## Versioning From 94df11d3215085c4266b23e3263cf26cd99cad35 Mon Sep 17 00:00:00 2001 From: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> Date: Thu, 7 Mar 2024 10:15:04 -0500 Subject: [PATCH 62/68] Km add Paired-Tag and snM3C RRIDs (#1233) * fix overview table * fix broken bookmarks * add Paired-Tag and snM3C RRIDs * Update README.md --------- Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> Co-authored-by: ekiernan --- .../docs/Pipelines/JointGenotyping/README.md | 4 ++-- .../docs/Pipelines/Multiome_Pipeline/README.md | 1 - .../Pipelines/PairedTag_Pipeline/README.md | 6 +++++- .../SlideSeq_Pipeline/count-matrix-overview.md | 2 +- website/docs/Pipelines/snM3C/README.md | 18 ++++++++++++++++-- 5 files changed, 24 insertions(+), 7 deletions(-) diff --git a/website/docs/Pipelines/JointGenotyping/README.md b/website/docs/Pipelines/JointGenotyping/README.md index aa9eb7af3b..6b8aa181ed 100644 --- a/website/docs/Pipelines/JointGenotyping/README.md +++ b/website/docs/Pipelines/JointGenotyping/README.md @@ -201,7 +201,7 @@ Next, the site-specific VCF and index files for each interval are gathered into **VQSR (default)** -If `run_vets` is “false”, the [IndelsVariantRecalibrator](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task takes in the site-specific VCF and index files generated in [Step 3](#3-creates-single-site-specific-VCF-and-index-files) and uses GATK’s VariantRecalibrator tool to perform the first step of the Variant Quality Score Recalibration (VQSR) technique of filtering variants. The tool builds a model to be used to score and filter indels and produces a recalibration table as output. +If `run_vets` is “false”, the [IndelsVariantRecalibrator](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task takes in the site-specific VCF and index files generated in [Step 3](#3-creates-single-site-specific-vcf-and-index-files) and uses GATK’s VariantRecalibrator tool to perform the first step of the Variant Quality Score Recalibration (VQSR) technique of filtering variants. The tool builds a model to be used to score and filter indels and produces a recalibration table as output. After building the indel filtering model, the workflow uses the VariantRecalibrator tool to build a model to be used to score and filter SNPs. If the number of input GVCF files is greater than `snps_variant_recalibration_threshold`, the [SNPsVariantRecalibratorCreateModel](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl), [SNPsVariantRecalibrator as SNPsVariantRecalibratorScattered](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl), and [Tasks.GatherTranches as SNPGatherTranches](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) tasks are called to scatter the site-specific VCF and index files, build the SNP model, and gather scattered tranches into a single file. If the number of input GVCF files is less than `snps_variant_recalibration_threshold`, the [SNPsVariantRecalibrator as SNPsVariantRecalibratorClassic](https://github.com/broadinstitute/warp/blob/develop/tasks/broad/JointGenotypingTasks.wdl) task is called to build the SNP model. @@ -209,7 +209,7 @@ The [ApplyRecalibration](https://github.com/broadinstitute/warp/blob/develop/tas **VETS** -If `run_vets` is “true”, the [JointVcfFiltering as TrainAndApplyVETS](https://github.com/broadinstitute/gatk/blob/master/scripts/vcf_site_level_filtering_wdl/JointVcfFiltering.wdl) task takes in the hard filtered and site-specific VCF and index files generated in [Step 3](#3-creates-single-site-specific-VCF-and-index-files) and calls the `JointVcfFiltering.wdl` subworkflow. This workflow uses the Variant Extract-Train-Score (VETS) algorithm to extract variant-level annotations, train a filtering model, and score variants based on the model. The subworkflow uses the GATK ExtractVariantAnnotations, TrainVariantAnnotationsModel, and ScoreVariantAnnotations tools to create extracted and scored VCF and index files. The output VCF and index files are not filtered by the score assigned by the model. The score is included in the output VCF files in the INFO field as an annotation called “SCORE”. +If `run_vets` is “true”, the [JointVcfFiltering as TrainAndApplyVETS](https://github.com/broadinstitute/gatk/blob/master/scripts/vcf_site_level_filtering_wdl/JointVcfFiltering.wdl) task takes in the hard filtered and site-specific VCF and index files generated in [Step 3](#3-creates-single-site-specific-vcf-and-index-files) and calls the `JointVcfFiltering.wdl` subworkflow. This workflow uses the Variant Extract-Train-Score (VETS) algorithm to extract variant-level annotations, train a filtering model, and score variants based on the model. The subworkflow uses the GATK ExtractVariantAnnotations, TrainVariantAnnotationsModel, and ScoreVariantAnnotations tools to create extracted and scored VCF and index files. The output VCF and index files are not filtered by the score assigned by the model. The score is included in the output VCF files in the INFO field as an annotation called “SCORE”. The VETS algorithm trains the model only over target regions, rather than including exon tails which can lead to poor-quality data. However, the model is applied everywhere including the exon tails. diff --git a/website/docs/Pipelines/Multiome_Pipeline/README.md b/website/docs/Pipelines/Multiome_Pipeline/README.md index f9a94aba9d..fbd2802544 100644 --- a/website/docs/Pipelines/Multiome_Pipeline/README.md +++ b/website/docs/Pipelines/Multiome_Pipeline/README.md @@ -7,7 +7,6 @@ slug: /Pipelines/Multiome_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | - | [Multiome v3.3.0](https://github.com/broadinstitute/warp/releases) | February, 2024 | Kaylee Mathews | Please file GitHub issues in warp or contact the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org) | ![Multiome_diagram](./multiome_diagram.png) diff --git a/website/docs/Pipelines/PairedTag_Pipeline/README.md b/website/docs/Pipelines/PairedTag_Pipeline/README.md index 9be2233657..67a76f522a 100644 --- a/website/docs/Pipelines/PairedTag_Pipeline/README.md +++ b/website/docs/Pipelines/PairedTag_Pipeline/README.md @@ -126,7 +126,11 @@ All Paired-Tag pipeline releases are documented in the [Paired-Tag changelog](ht ## Citing the Paired-Tag Pipeline -If you use the Paired-Tag Pipeline in your research, please consider citing our preprint: +If you use the Paired-Tag Pipeline in your research, please identify the pipeline in your methods section using the [Paired-Tag SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_025042/resolver?q=paired_tag&l=paired_tag&i=rrid:scr_025042). + +* Ex: *Paired-Tag Pipeline (RRID:SCR_025041)* + +Please also consider citing our preprint: Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 diff --git a/website/docs/Pipelines/SlideSeq_Pipeline/count-matrix-overview.md b/website/docs/Pipelines/SlideSeq_Pipeline/count-matrix-overview.md index 61f7203887..8ca9dd3aac 100644 --- a/website/docs/Pipelines/SlideSeq_Pipeline/count-matrix-overview.md +++ b/website/docs/Pipelines/SlideSeq_Pipeline/count-matrix-overview.md @@ -16,7 +16,7 @@ If the workflow is run with `count_exons` set to `false`, the output h5ad file w You can determine which type of counts are in the h5ad file by looking at the unstructured metadata (the `anndata.uns` property of the matrix) `expression_data_type` key (see [Table 1](#table-1-global-attributes) below). -The matrix also contains multiple metrics for both individual bead barcodes (the `anndata.obs` property of the matrix; [Table 2](#table-2-cell-metrics)) and individual genes (the `anndata.var` property of the matrix; [Table 3](#table-3-gene-metrics)) +The matrix also contains multiple metrics for both individual bead barcodes (the `anndata.obs` property of the matrix; [Table 2](#table-2-column-attributes-bead-barcode-metrics)) and individual genes (the `anndata.var` property of the matrix; [Table 3](#table-3-row-attributes-gene-metrics)) Table 3. Row attributes (gene metrics) ## Table 1. Global attributes diff --git a/website/docs/Pipelines/snM3C/README.md b/website/docs/Pipelines/snM3C/README.md index 89586b3824..d0606addb9 100644 --- a/website/docs/Pipelines/snM3C/README.md +++ b/website/docs/Pipelines/snM3C/README.md @@ -13,7 +13,9 @@ slug: /Pipelines/snM3C/README The Single Nucleus Methly-Seq and Chromatin Capture (snM3C) workflow is an open-source, cloud-optimized computational workflow for processing single-nucleus methylome and chromatin contact (snM3C) sequencing data. The workflow is designed to demultiplex and align raw sequencing reads, call chromatin contacts, and generate summary metrics. -The workflow is developed in collaboration with Hanqing Liu and the laboratory of Joseph Ecker. For more information about the snM3C tools and analysis, please see the [YAP documentation](https://hq-1.gitbook.io/mc/) or the [cemba_data](https://github.com/lhqing/cemba_data) GitHub repository created by Hanqing Liu. +The workflow is developed in collaboration with Hanqing Liu, Wei Tian, Wubin Ding, Huaming Chen, Chongyuan Luo, and the entire laboratory of Joseph Ecker. + +For more information about the snM3C tools and analysis, please see the [YAP documentation](https://hq-1.gitbook.io/mc/) or the [cemba_data](https://github.com/lhqing/cemba_data) GitHub repository created by Hanqing Liu. ## Quickstart table The following table provides a quick glance at the Multiome pipeline features: @@ -161,15 +163,27 @@ All snM3C pipeline releases are documented in the [pipeline changelog](https://g ## Citing the snM3C Pipeline -If you use the snM3C Pipeline in your research, please consider citing our preprint: +If you use the snM3C Pipeline in your research, please identify the pipeline in your methods section using the [snM3C SciCrunch resource identifier](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_025041/resolver?q=SCR_025041&l=SCR_025041&i=rrid:scr_025041). + +* Ex: *snM3C Pipeline (RRID:SCR_025041)* + +Please cite the following publication the snM3C pipeline: + +Lee, DS., Luo, C., Zhou, J. et al. Simultaneous profiling of 3D genome structure and DNA methylation in single human cells. Nat Methods 16, 999–1006 (2019). https://doi.org/10.1038/s41592-019-0547-z + +Please also consider citing our preprint: Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1 + ## Consortia support This pipeline is supported by the [BRAIN Initiative](https://braininitiative.nih.gov/) (BICCN and BICAN). If your organization also uses this pipeline, we would like to list you! Please reach out to us by contacting the [WARP Pipeline Development team](mailto:warp-pipelines-help@broadinstitute.org). +## Acknowledgements +We are immensely grateful to the members of the BRAIN Initiative ([BICAN](https://brainblog.nih.gov/brain-blog/brain-issues-suite-funding-opportunities-advance-brain-cell-atlases-through-centers) Sequencing Working Group) and [SCORCH](https://nida.nih.gov/about-nida/organization/divisions/division-neuroscience-behavior-dnb/basic-research-hiv-substance-use-disorder/scorch-program) for their invaluable and exceptional contributions to this pipeline. Our heartfelt appreciation goes to our collaborators and the developers of these tools, Hanqing Liu, Wei Tian, Wubin Ding, Huaming Chen, Chongyuan Luo, and the entire laboratory of Joseph Ecker. + ## Feedback For questions, suggestions, or feedback related to the snM3C pipeline, please contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org). Your feedback is valuable for improving the pipeline and addressing any issues that may arise during its usage. \ No newline at end of file From cd74dcea8ff19fd3def1688325c26b50df52f38e Mon Sep 17 00:00:00 2001 From: ekiernan <55763654+ekiernan@users.noreply.github.com> Date: Tue, 12 Mar 2024 15:46:10 -0400 Subject: [PATCH 63/68] Update snM3C.wdl --- pipelines/skylab/snM3C/snM3C.wdl | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/pipelines/skylab/snM3C/snM3C.wdl b/pipelines/skylab/snM3C/snM3C.wdl index 17990ac2ed..6639f11994 100644 --- a/pipelines/skylab/snM3C/snM3C.wdl +++ b/pipelines/skylab/snM3C/snM3C.wdl @@ -131,8 +131,9 @@ task Demultiplexing { File random_primer_indexes String plate_id Int batch_number + String docker + - String docker_image = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" Int disk_size = 1000 Int mem_size = 10 Int preemptible_tries = 3 @@ -220,7 +221,7 @@ task Demultiplexing { >>> runtime { - docker: docker_image + docker: docker disks: "local-disk ${disk_size} HDD" cpu: cpu memory: "${mem_size} GiB" @@ -247,7 +248,7 @@ task Sort_and_trim_r1_and_r2 { Int disk_size = 500 Int mem_size = 16 - String docker = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" + String docker Int preemptible_tries = 3 Int cpu = 4 From f562e39f102f6bd803c17a6b84c73269c355d5e8 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Tue, 12 Mar 2024 15:54:20 -0400 Subject: [PATCH 64/68] updated docker --- pipelines/skylab/snM3C/snM3C.wdl | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/pipelines/skylab/snM3C/snM3C.wdl b/pipelines/skylab/snM3C/snM3C.wdl index 6639f11994..0413581aae 100644 --- a/pipelines/skylab/snM3C/snM3C.wdl +++ b/pipelines/skylab/snM3C/snM3C.wdl @@ -23,7 +23,7 @@ workflow snM3C { Int num_downstr_bases = 2 Int compress_level = 5 Int batch_number - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:2.3" String single_end_hisat_cpu_platform = "Intel Ice Lake" String merge_sort_analyze_cpu_platform = "Intel Ice Lake" } @@ -37,6 +37,7 @@ workflow snM3C { fastq_input_read2 = fastq_input_read2, random_primer_indexes = random_primer_indexes, plate_id = plate_id, + docker = docker, batch_number = batch_number } @@ -51,6 +52,7 @@ workflow snM3C { r2_left_cut = r2_left_cut, r2_right_cut = r2_right_cut, min_read_length = min_read_length, + docker = docker, plate_id = plate_id } @@ -61,6 +63,7 @@ workflow snM3C { tarred_index_files = tarred_index_files, genome_fa = genome_fa, chromosome_sizes = chromosome_sizes, + docker = docker, plate_id = plate_id } @@ -69,6 +72,7 @@ workflow snM3C { hisat3n_bam_tar = Hisat_3n_pair_end_mapping_dna_mode.hisat3n_paired_end_bam_tar, min_read_length = min_read_length, plate_id = plate_id, + docker = docker } call hisat_single_end { @@ -106,7 +110,8 @@ workflow snM3C { chromatin_contact_stats = merge_sort_analyze.chromatin_contact_stats, allc_uniq_reads_stats = merge_sort_analyze.allc_uniq_reads_stats, unique_reads_cgn_extraction_tbi = merge_sort_analyze.extract_allc_output_tbi_tar, - plate_id = plate_id + plate_id = plate_id, + docker = docker } output { @@ -330,7 +335,7 @@ task Hisat_3n_pair_end_mapping_dna_mode{ File chromosome_sizes String plate_id - String docker = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" + String docker Int disk_size = 1000 Int mem_size = 64 Int preemptible_tries = 3 @@ -430,7 +435,7 @@ task Separate_and_split_unmapped_reads { Int min_read_length String plate_id - String docker = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" + String docker Int disk_size = 1000 Int mem_size = 10 Int preemptible_tries = 3 @@ -539,7 +544,7 @@ task hisat_single_end { Int mem_size = 128 Int cpu = 32 Int preemptible_tries = 2 - String docker = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" + String docker } command <<< @@ -731,7 +736,7 @@ task merge_sort_analyze { File chromosome_sizes String merge_sort_analyze_cpu_platform - String docker = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" + String docker Int disk_size = 1000 Int mem_size = 64 Int cpu = 16 @@ -939,7 +944,7 @@ task summary { Array[File] unique_reads_cgn_extraction_tbi String plate_id - String docker = "us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" + String docker Int disk_size = 80 Int mem_size = 5 Int preemptible_tries = 3 From b7678395a709ec0b45cc9d919428570606e843fb Mon Sep 17 00:00:00 2001 From: ekiernan Date: Wed, 13 Mar 2024 09:36:12 -0400 Subject: [PATCH 65/68] testing older 1.7.9 cemba_data docker --- pipelines/skylab/snM3C/snM3C.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pipelines/skylab/snM3C/snM3C.wdl b/pipelines/skylab/snM3C/snM3C.wdl index 0413581aae..2698fbc934 100644 --- a/pipelines/skylab/snM3C/snM3C.wdl +++ b/pipelines/skylab/snM3C/snM3C.wdl @@ -23,7 +23,7 @@ workflow snM3C { Int num_downstr_bases = 2 Int compress_level = 5 Int batch_number - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:2.3" + String docker = "docker pull us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" String single_end_hisat_cpu_platform = "Intel Ice Lake" String merge_sort_analyze_cpu_platform = "Intel Ice Lake" } From d7336ca675a3024a9e2aa2973eda62ff4bfbfe81 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Wed, 13 Mar 2024 09:46:45 -0400 Subject: [PATCH 66/68] revert to latest docker --- pipelines/skylab/snM3C/snM3C.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pipelines/skylab/snM3C/snM3C.wdl b/pipelines/skylab/snM3C/snM3C.wdl index 2698fbc934..0413581aae 100644 --- a/pipelines/skylab/snM3C/snM3C.wdl +++ b/pipelines/skylab/snM3C/snM3C.wdl @@ -23,7 +23,7 @@ workflow snM3C { Int num_downstr_bases = 2 Int compress_level = 5 Int batch_number - String docker = "docker pull us.gcr.io/broad-gotc-prod/hisat3n:2.1.0-2.2.1-1709740155" + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:2.3" String single_end_hisat_cpu_platform = "Intel Ice Lake" String merge_sort_analyze_cpu_platform = "Intel Ice Lake" } From 7fa3bb86236cc4a394a7e7ef90a328e6e2897e91 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Wed, 13 Mar 2024 09:47:42 -0400 Subject: [PATCH 67/68] updated TestsnM3C docker to latest cemba_data 2.7.9 --- verification/test-wdls/TestsnM3C.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/verification/test-wdls/TestsnM3C.wdl b/verification/test-wdls/TestsnM3C.wdl index bded2b2f8a..de2d5dab4b 100644 --- a/verification/test-wdls/TestsnM3C.wdl +++ b/verification/test-wdls/TestsnM3C.wdl @@ -36,7 +36,7 @@ workflow TestsnM3C { String vault_token_path String google_account_vault_path - String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:1.0.0-2.2.1" + String docker = "us.gcr.io/broad-gotc-prod/m3c-yap-hisat:2.3" String single_end_hisat_cpu_platform = "Intel Ice Lake" String merge_sort_analyze_cpu_platform = "Intel Ice Lake" } From 01242650b03d4b347c1b117be1f01252e22d7db2 Mon Sep 17 00:00:00 2001 From: ekiernan Date: Wed, 13 Mar 2024 16:53:56 -0400 Subject: [PATCH 68/68] Update snM3C.changelog.md --- pipelines/skylab/snM3C/snM3C.changelog.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/pipelines/skylab/snM3C/snM3C.changelog.md b/pipelines/skylab/snM3C/snM3C.changelog.md index a7901c38ee..9a77ee7eff 100644 --- a/pipelines/skylab/snM3C/snM3C.changelog.md +++ b/pipelines/skylab/snM3C/snM3C.changelog.md @@ -1,7 +1,9 @@ # 3.0.0 2024-02-23 (Date of Last Commit) -* Updated the snM3C docker to include the latest changes to the CEMBA repostiory; this impacts the scientific outputs +* Updated the snM3C docker to include the latest changes to the CEMBA repository; this impacts the scientific outputs +* Added docker as a workflow-level input +* Reverted the Hisat alignments to use the --no-repeat-index parameter # 2.0.1 2024-2-15 (Date of Last Commit)