-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Fetch_SRR_Accession] New wf to retrieve SRR after Terra2NCBI wf (#668)
* inital commit part 1 retrieve srr from Biosample * update task and wf names and meta * dockstore add * Documentation and update column name * update dockstore name * Remove unnecessary blank lines in fetch_srr_metadata WDL task * Update SRR metadata workflow and documentation for clarity and accuracy * Remove redundant docker input from wf_update_srr_metadata workflow * update * update dockstore * initial updates * handle multiple SRR accessionss as string version outputs * update task path * forgot to import task versioning * update dockstore yml * comma sep output as string instead of array * update wf name * test local worked * set euo pipefail * more explicit fail invalid biosample * update logic failure * logic handling valid biosample or SRA * enhance error handling and logging for biosample ID or SRA fetching * Update logic for no SRR accessions and invalid samples * update docs version in table * add sample level to docs * update input and ouptut tables
- Loading branch information
1 parent
24b6abe
commit b4aad55
Showing
8 changed files
with
149 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Fetch SRR Accession Workflow | ||
|
||
## Quick Facts | ||
|
||
| **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** | | ||
|---|---|---|---|---| | ||
| [Public Data Sharing](../../workflows_overview/workflows_type.md/#public-data-sharing) | [Any Taxa](../../workflows_overview/workflows_kingdom.md/#any-taxa) | PHB v2.3.0 | Yes | Sample-level | | ||
|
||
## Fetch SRR Accession | ||
|
||
This workflow retrieves the Sequence Read Archive (SRA) accession (SRR) associated with a given sample accession. The primary inputs are BioSample IDs (e.g., SAMN00000000) or SRA Experiment IDs (e.g., SRX000000), which link to sequencing data in the SRA repository. | ||
|
||
The workflow uses the fastq-dl tool to fetch metadata from SRA and specifically parses this metadata to extract the associated SRR accession and outputs the SRR accession. | ||
|
||
### Inputs | ||
|
||
| **Terra Task Name** | **Variable** | **Type** | **Description**| **Default Value** | **Terra Status** | | ||
| --- | --- | --- | --- | --- | --- | | ||
| fetch_srr_metadata | **sample_accession** | String | SRA-compatible accession, such as a **BioSample ID** (e.g., "SAMN00000000") or **SRA Experiment ID** (e.g., "SRX000000"), used to retrieve SRR metadata. | | Required | | ||
| fetch_srr_metadata | **cpu** | Int | Number of CPUs allocated for the task. | 2 | Optional | | ||
| fetch_srr_metadata | **disk_size** | Int | Disk space in GB allocated for the task. | 10 | Optional | | ||
| fetch_srr_metadata | **docker**| String | Docker image for metadata retrieval. | `us-docker.pkg.dev/general-theiagen/biocontainers/fastq-dl:2.0.4--pyhdfd78af_0` | Optional | | ||
| fetch_srr_metadata | **memory** | Int | Memory in GB allocated for the task. | 8 | Optional | | ||
| version_capture | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/alpine-plus-bash:3.20.0" | Optional | | ||
| version_capture | **timezone** | String | Set the time zone to get an accurate date of analysis (uses UTC by default) | | Optional | | ||
|
||
### Workflow Tasks | ||
|
||
This workflow has a single task that performs metadata retrieval for the specified sample accession. | ||
|
||
??? task "`fastq-dl`: Fetches SRR metadata for sample accession" | ||
When provided a BioSample accession or SRA experiment ID, 'fastq-dl' collects metadata and returns the appropriate SRR accession. | ||
|
||
!!! techdetails "fastq-dl Technical Details" | ||
| | Links | | ||
| --- | --- | | ||
| Task | [Task on GitHub](https://github.com/theiagen-org/phb-workflows/blob/main/tasks/utilities/data_handling/task_fetch_srr_metadata.wdl) | | ||
| Software Source Code | [fastq-dl Source](https://github.com/rvalieris/fastq-dl) | | ||
| Software Documentation | [fastq-dl Documentation](https://github.com/rvalieris/fastq-dl#documentation) | | ||
| Original Publication | [fastq-dl: A fast and reliable tool for downloading SRA metadata](https://doi.org/10.1186/s12859-021-04346-3) | | ||
|
||
### Outputs | ||
|
||
| **Variable** | **Type** | **Description**| | ||
|---|---|---| | ||
| srr_accession| String | The SRR accession's associated with the input sample accession.| | ||
| fetch_srr_accession_version | String | The version of the fetch_srr_accession workflow. | | ||
| fetch_srr_accession_analysis_date | String | The date the fetch_srr_accession analysis was run. | | ||
|
||
## References | ||
|
||
> Valieris, R. et al., "fastq-dl: A fast and reliable tool for downloading SRA metadata." Bioinformatics, 2021. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
62 changes: 62 additions & 0 deletions
62
tasks/utilities/data_handling/task_fetch_srr_accession.wdl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
version 1.0 | ||
|
||
task fetch_srr_accession { | ||
input { | ||
String sample_accession | ||
String docker = "us-docker.pkg.dev/general-theiagen/biocontainers/fastq-dl:2.0.4--pyhdfd78af_0" | ||
Int disk_size = 10 | ||
Int cpu = 2 | ||
Int memory = 8 | ||
} | ||
meta { | ||
volatile: true | ||
} | ||
command <<< | ||
set -euo pipefail | ||
|
||
# Output the current date and fastq-dl version for debugging | ||
date -u | tee DATE | ||
fastq-dl --version | tee VERSION | ||
|
||
echo "Fetching metadata for accession: ~{sample_accession}" | ||
|
||
# Run fastq-dl and capture stderr | ||
fastq-dl --accession ~{sample_accession} --only-download-metadata -m 2 --verbose 2> stderr.log || true | ||
|
||
# Handle whether the ID/accession is valid and contains SRR metadata based on stderr | ||
if grep -q "No results found for" stderr.log; then | ||
echo "No SRR accession found" > srr_accession.txt | ||
echo "No SRR accession found for accession: ~{sample_accession}" | ||
elif grep -q "received an empty response" stderr.log; then | ||
echo "No SRR accession found" > srr_accession.txt | ||
echo "No SRR accession found for accession: ~{sample_accession}" | ||
elif grep -q "is not a Study, Sample, Experiment, or Run accession" stderr.log; then | ||
echo "Invalid accession: ~{sample_accession}" >&2 | ||
exit 1 | ||
elif [[ ! -f fastq-run-info.tsv ]]; then | ||
echo "No metadata file found for accession: ~{sample_accession}" >&2 | ||
exit 1 | ||
else | ||
# Extract SRR accessions from the TSV file if it exists | ||
SRR_accessions=$(awk -F'\t' 'NR>1 {print $1}' fastq-run-info.tsv | paste -sd ',' -) | ||
if [[ -z "${SRR_accessions}" ]]; then | ||
echo "No SRR accession found" > srr_accession.txt | ||
else | ||
echo "Extracted SRR accessions: ${SRR_accessions}" | ||
echo "${SRR_accessions}" > srr_accession.txt | ||
fi | ||
fi | ||
>>> | ||
output { | ||
String srr_accession = read_string("srr_accession.txt") | ||
String fastq_dl_version = read_string("VERSION") | ||
} | ||
runtime { | ||
docker: docker | ||
memory: "~{memory} GB" | ||
cpu: cpu | ||
disks: "local-disk " + disk_size + " SSD" | ||
disk: disk_size + " GB" | ||
preemptible: 1 | ||
} | ||
} |
26 changes: 26 additions & 0 deletions
26
workflows/utilities/data_import/wf_fetch_srr_accession.wdl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
version 1.0 | ||
|
||
import "../../../tasks/utilities/data_handling/task_fetch_srr_accession.wdl" as srr_task | ||
import "../../../tasks/task_versioning.wdl" as versioning_task | ||
|
||
workflow fetch_srr_accession { | ||
meta { | ||
description: "This workflow retrieves the Sequence Read Archive (SRA) accession (SRR) associated with a given sample accession. It uses the fastq-dl tool to fetch metadata from SRA and outputs the SRR accession." | ||
} | ||
input { | ||
String sample_accession | ||
} | ||
call versioning_task.version_capture { | ||
input: | ||
} | ||
call srr_task.fetch_srr_accession as fetch_srr { | ||
input: | ||
sample_accession = sample_accession | ||
} | ||
output { | ||
String srr_accession = fetch_srr.srr_accession | ||
# Version Captures | ||
String fetch_srr_accession_version = version_capture.phb_version | ||
String fetch_srr_accession_analysis_date = version_capture.date | ||
} | ||
} |