Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change input type for finish_asm #9

Open
wants to merge 93 commits into
base: werkflow
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
846ebf0
change input type for finish_asm
scanon Dec 7, 2022
95ae1f9
add make_info_file task
Jan 28, 2023
2e334d2
udpate README
Jan 28, 2023
2729674
fix missing make_info file direct
Jan 28, 2023
d1fb88e
Merge pull request #11 from microbiomedata/make_info
scanon Jun 10, 2023
38eeedd
added release workflow for assembly
Jul 3, 2023
354e297
switched out uses for run
Jul 3, 2023
20ffb00
removed v from version bump
Jul 3, 2023
1cedae9
removed unused variables
Jul 6, 2023
7f665b2
added version.txt and upated create_release.yml to source from versio…
Jul 10, 2023
57f3391
remove unused input_file variable from finish_asm
Jul 14, 2023
484a137
add jgi_meta_wdl submodules
Oct 17, 2023
c2dc346
changed wdl files to version 1.0
kaijli Oct 19, 2023
ae28597
update jgi_meta_wdl submodules
Nov 3, 2023
468b75b
quick script to choose between long and short reads
kaijli Nov 7, 2023
bcdc0e8
take one input either to long reads workflow or to short reads workflow
Nov 9, 2023
24e91ca
working on input dependencies
kaijli Nov 12, 2023
a784baa
merged to include interleave
kaijli Nov 12, 2023
72e12ce
having trouble importing jgi long reads
kaijli Nov 12, 2023
47b758a
quick json to test choice wdl
kaijli Nov 12, 2023
713cf33
update jgi_meta_wdl submodules
Nov 13, 2023
b40bae5
move make_interleaved_reads into shortReads condition, and add conta…
Nov 13, 2023
fd3c145
working on file localization errors
kaijli Nov 14, 2023
696579e
added imports.zip for cromwell runs
kaijli Nov 14, 2023
24db934
Merge branch 'longreads' of https://github.com/microbiomedata/metaAss…
kaijli Nov 14, 2023
f4e6db8
merged import lines
kaijli Nov 14, 2023
4b46283
added test file
kaijli Nov 14, 2023
94da6eb
changed folder paths for nmdc edge server
kaijli Nov 14, 2023
1df1c4c
Merge branch 'longreads' of https://github.com/microbiomedata/metaAss…
kaijli Nov 14, 2023
7c7c4f8
container issue in interleaved fixed
kaijli Nov 14, 2023
789c936
update jgi_meta_wdl submodules
Nov 14, 2023
ee2b151
testing long reads
kaijli Nov 14, 2023
b51a91e
added makeinfo task. testing
kaijli Mar 8, 2024
fd537b1
added finishasm for long reads
kaijli Apr 10, 2024
4b71a72
small change to input_file name
kaijli Apr 11, 2024
dcf76e3
added pipefail and symbolic link to all commands
kaijli Apr 16, 2024
9de6fa5
changes for alicia
kaijli Apr 18, 2024
b5e12dc
added set -euo pipefail to test_assembly
kaijli Apr 18, 2024
c4ae29d
Merge pull request #17 from microbiomedata/longreads
kaijli Apr 18, 2024
edfc408
Update version.txt
kaijli Apr 25, 2024
9901d31
fixed typo in release creation
kaijli Apr 25, 2024
baefce9
renamed files for release and consistency when running
kaijli Apr 25, 2024
02560ef
Update create_release.yml
kaijli Apr 26, 2024
def4ae5
Update version.txt
kaijli Apr 26, 2024
e037c72
Update version.txt
kaijli Apr 26, 2024
0f9debf
Update version.txt
kaijli Apr 26, 2024
e96543f
Added potential fix for issue #18, needs testing
May 9, 2024
91d0215
forgot to ass short reads bbcms results to main assembly file
kaijli May 9, 2024
e6ae99e
updated task finish_asm to accomodate issue #18
vlilanl Jun 4, 2024
2e72e3e
Changed a couple parameters to run on jaws and added hard and soft li…
vlilanl Jun 11, 2024
295fe28
deleted sed command for bbcms.fastq.gz
vlilanl Jun 11, 2024
5a56519
Merge pull request #19 from microbiomedata/bbcmsouts
aclum Jun 11, 2024
58c8ad7
Update version.txt
vlilanl Jun 11, 2024
8d0c90a
removed unused git submodule, updated spades container
kaijli Oct 15, 2024
547f3a8
bump version
kaijli Oct 15, 2024
6c98933
add runtime mem for interleave
kaijli Oct 15, 2024
c8283f1
remove extra comma
kaijli Oct 16, 2024
8d7df73
Merge pull request #24 from microbiomedata/23-update-to-spades-version-4
kaijli Oct 20, 2024
79635f1
fixed formatting and added in asmstats to metaflye.wdl
vlilanl Nov 6, 2024
64f8467
added stats.json as an output for task finish_lrasm
vlilanl Nov 7, 2024
7766db8
worflowmeta_contiainer -> workflowmeta_container
vlilanl Nov 7, 2024
d8b10a9
Updated bbtools container
vlilanl Nov 7, 2024
f8dddd6
reverted bbtools container for shortreads
vlilanl Nov 8, 2024
ab0e5cc
Merge branch 'master' into 21-stats-file
vlilanl Nov 8, 2024
061e646
formatting issues
vlilanl Nov 8, 2024
859c061
Update index.rst
vlilanl Nov 12, 2024
df5b86a
Updated image for documentation
vlilanl Nov 12, 2024
f47f965
Update version.txt
vlilanl Nov 12, 2024
86de934
Update index.rst
vlilanl Nov 12, 2024
0f47549
Update index.rst
vlilanl Nov 12, 2024
4676bf3
Update index.rst
vlilanl Nov 12, 2024
34e83d1
Revert index.rst
vlilanl Nov 14, 2024
1274bc9
Merge pull request #25 from microbiomedata/21-stats-file
vlilanl Nov 14, 2024
b2cf317
Update index.rst
vlilanl Nov 15, 2024
dbbfc10
Merge branch 'master' into documentation
vlilanl Nov 15, 2024
fdfb07d
Revert "Merge branch 'master' into documentation"
vlilanl Nov 15, 2024
7ffb200
Update index.rst
vlilanl Nov 15, 2024
99241fb
Update README.md
vlilanl Nov 15, 2024
28540ed
Update index.rst
vlilanl Nov 15, 2024
85c2421
Update bbtools container for short reads
vlilanl Nov 18, 2024
19d8b75
Update version.txt
vlilanl Nov 18, 2024
62593b2
Update workflow diagram
vlilanl Nov 18, 2024
76e2ad9
Update version.txt
vlilanl Nov 18, 2024
db80b0a
Merge pull request #26 from microbiomedata/documentation
vlilanl Nov 18, 2024
6e35537
Updated Workflow figure
vlilanl Nov 19, 2024
8fd6e2a
Merge pull request #27 from microbiomedata/documentation
vlilanl Nov 19, 2024
238e47a
update test data links
kaijli Dec 3, 2024
e353ad9
add svg
kaijli Dec 5, 2024
2f18562
Update index.rst with svg
kaijli Dec 5, 2024
ee23b71
Merge pull request #29 from microbiomedata/28-generate-svg-version-of…
kaijli Dec 5, 2024
74f7f6a
remove image scale
kaijli Dec 5, 2024
31eec93
Enable "Edit on GitHub" link by defining `:github_url:` in `index.rst`
eecavanna Dec 14, 2024
111379b
Merge pull request #31 from microbiomedata/30-configure-edit-on-githu…
eecavanna Dec 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .github/workflows/create_release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Create Release

on:
push:
branches:
- master
paths:
- 'version.txt'

jobs:
release:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Read version
id: get_version
run: |
VERSION=$(cat version.txt)
echo "VERSION=${VERSION}" >> $GITHUB_ENV

- name: Create bundle zip
run: zip -r bundle.zip *.wdl

- name: Create Release
run: gh release create ${{ env.VERSION }} jgi_assembly.wdl bundle.zip
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Empty file added .gitmodules
Empty file.
97 changes: 44 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,51 +1,40 @@
# The Metagenome Assembly Pipeline

## Summary
This workflow is developed by Brian Foster at JGI and original from his [repo](https://gitlab.com/bfoster1/wf_templates/tree/master/templates). It take paired-end reads runs error correction by bbcms (BBTools). The clean reads are assembled by MetaSpades. After assembly, the reads are mapped back to contigs by bbmap (BBTools) for coverage information.
This workflow is developed by Brian Foster at JGI and original from his [repo](https://gitlab.com/bfoster1/wf_templates/tree/master/templates). It takes in paired-end Illumina short reads or PacBio long reads.

## Running Workflow in Cromwell
In short reads, the workflow reformats the interleaved file into two FASTQ files for downstream tasks using bbcms (BBTools). The corrected reads are assembled using metaSPAdes. After assembly, the reads are mapped back to contigs by bbmap (BBTools) for coverage information. The `.wdl` (Workflow Description Language) file includes five tasks: *bbcms*, *assy*, *create_agp*, *read_mapping_pairs*, and *make_output*.

In long reads, the workflow uses Flye for assembly, pbmm2 for alignment, Racon for polishing, and minimap2 for read mapping and coverage analysis. The :literal:`.wdl` (Workflow Description Language) file includes six tasks: *combine_fastq*, *assy*, *racon*, *format_assembly*, *map*, and *make_info_file*.

Description of the files:
- `.wdl` file: the WDL file for workflow definition
- `.json` file: the example input for the workflow
- `.conf` file: the conf file for running Cromwell.
- `.sh` file: the shell script for running the example workflow

## The Docker image and Dockerfile can be found here

[microbiomedata/bbtools:38.94](https://hub.docker.com/r/microbiomedata/bbtools)
[microbiomedata/bbtools:39.03](https://hub.docker.com/r/microbiomedata/bbtools)

[microbiomedata/spades:3.15.0](https://hub.docker.com/r/microbiomedata/spades)
[microbiomedata/spades:4.0.0](https://hub.docker.com/r/microbiomedata/spades)


## Input files

1. fastq (illumina paired-end interleaved fastq)
1. The path to the input FASTQ file (Illumina paired-end interleaved FASTQ or PacBio paired-end interleaved FASTQ) (recommended: output of the Reads QC workflow).

2. contig prefix for fasta header
2. Project name: nmdc:XXXXXX

3. output path

4. input_interleaved (boolean)

5. forwards reads fastq file (required value when input_interleaved is false, otherwise use [] )
3. Memory (optional) e.g., `"jgi_metaAssembly.memory": "105G"`

6. reverse reads fastq file (required value when input_interleaved is false, otherwise use [] )
4. Threads (optional) e.g., `"jgi_metaAssembly.threads": "16"`

7. memory (optional) ex: "jgi_metaASM.memory": "105G"
5. Whether the input is short reads (boolean)

8. threads (optional) ex: "jgi_metaASM.threads": "16"

```
{
"jgi_metaASM.input_file":["/global/cfs/projectdirs/m3408/ficus/11809.7.220839.TCCTGAG-ACTGCAT.fastq.gz"],
"jgi_metaASM.rename_contig_prefix":"503125_160870",
"jgi_metaASM.outdir":"/global/cfs/projectdirs/m3408/aim2/metagenome/assembly/ficus/503125_160870",
"jgi_metaASM.input_interleaved":true,
"jgi_metaASM.input_fq1":[],
"jgi_metaASM.input_fq2":[],
"jgi_metaASM.memory": "105G",
"jgi_metaASM.threads": "16"
"jgi_metaAssembly.input_files": ["https://portal.nersc.gov/project/m3408/test_data/smalltest.int.fastq.gz"],
"jgi_metaAssembly.proj": "nmdc:XXXXXX",
"jgi_metaAssembly.memory": "105G",
"jgi_metaAssembly.threads": "16",
"jgi_metaAssembly.shortRead": true
}
```

Expand All @@ -54,31 +43,33 @@ Description of the files:
Below is a part list of all output files. The main assembly contigs output is in final_assembly/assembly.contigs.fasta.

```
├── bbcms
│   ├── berkeleylab-jgi-meta-60ade422cd4e
│   ├── counts.metadata.json
│   ├── input.corr.fastq.gz
│   ├── input.corr.left.fastq.gz
│   ├── input.corr.right.fastq.gz
│   ├── readlen.txt
│   └── unique31mer.txt
├── final_assembly
│   ├── assembly.agp
│   ├── assembly_contigs.fna
│   ├── assembly_scaffolds.fna
│   └── assembly_scaffolds.legend
├── mapping
│   ├── covstats.txt (mapping_stats.txt)
│   ├── pairedMapped.bam
│   ├── pairedMapped.sam.gz
│   ├── pairedMapped_sorted.bam
│   └── pairedMapped_sorted.bam.bai
── spades3
├── assembly_graph.fastg
├── assembly_graph_with_scaffolds.gfa
├── contigs.fasta
├── contigs.paths
├── scaffolds.fasta
└── scaffolds.paths
# Short Reads
output/
├── nmdc_XXXXXX_metaAsm.info
├── nmdc_XXXXXX_covstats.txt
├── nmdc_XXXXXX_contigs.fna
├── nmdc_XXXXXX_bbcms.fastq.gz
├── nmdc_XXXXXX_scaffolds.fna
── nmdc_XXXXXX_assembly.agp
├── stats.json
├── nmdc_XXXXXX_pairedMapped.sam.gz
── nmdc_XXXXXX_pairedMapped_sorted.bam
# Long Reads
output/
├── nmdc_XXXXXX_assembly.legend
├── nmdc_XXXXXX_contigs.fna
├── nmdc_XXXXXX_pairedMapped_sorted.bam
├── nmdc_XXXXXX_read_count_report.txt
├── nmdc_XXXXXX_metaAsm.info
── nmdc_XXXXXX_summary.stats
── nmdc_XXXXXX_scaffolds.fna
├── nmdc_XXXXXX_pairedMapped.sam.gz
├── stats.json
├── nmdc_XXXXXX_contigs.sam.stats
├── nmdc_XXXXXX_contigs.sorted.bam.pileup.basecov
├── nmdc_XXXXXX_assembly.agp
└── nmdc_XXXXXX_contigs.sorted.bam.pileup.out
```
## Link to Doc Site
Please refer [here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/3_MetaGAssemly_index.html) for more information.

Loading