-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readme and Annotation Updates -- WIP #697
base: main
Are you sure you want to change the base?
Conversation
## Applications | ||
|
||
This parallel download workflow is essential for: | ||
- Retrieving large datasets from public repositories | ||
- Creating reproducible analysis collections from published studies | ||
- Building reference datasets for benchmarking | ||
- Efficient acquisition of data for meta-analyses | ||
- Classroom and workshop preparation involving multiple datasets | ||
- Any project requiring retrieval of multiple sequencing run accessions | ||
|
||
## Performance Advantages | ||
|
||
The parallel architecture provides significant advantages over sequential approaches: | ||
- Downloads multiple accessions simultaneously rather than one-by-one | ||
- Continues overall progress even if individual downloads fail | ||
- Scales efficiently with available computational resources | ||
- Reduces total download time by orders of magnitude for large collections | ||
- Improves robustness through isolated job execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That stuff is redundant and overly optimistic / "sales" oriented in a way. Overview and Workflow Process are nice subdivisions though, and I think the name changes are pretty good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it really took the original blurb and ran with it here 😆.
Creates one job per listed run accession, and is therefore much faster and more robust to errors when many accessions need to be downloaded.
We can definitely tone this one down.
@@ -1,6 +1,6 @@ | |||
{ | |||
"a_galaxy_workflow": "true", | |||
"annotation": "This workflow take as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file. The pairs are filtered for MAPQ and for the region captured. Then, they are sorted by cooler to generate a tabix dataset. Cooler is used to generate a balanced cool file to the desired resolution.", | |||
"annotation": "This workflow processes paired fastq files with HiCUP to create validPair files. It filters pairs by MAPQ and captured region, then sorts them with cooler to generate tabix datasets and balanced cool files at desired resolution.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like some of the annotations were not updated, but this update look good to me. Considering that's what we pull into BRC maybe we should start with updating the name and annotation fields before we tweak the readme ?
Testing an approach for standardizing structure, content, tone of READMEs as well as creating/aligning annotations. Please don't merge as-is. I can break this into indivdiual PRs (and add the rest as needed) if this looks like a good pursuit.
FOR CONTRIBUTOR:
FOR REVIEWERS:
This workflow does/runs/performs … xyz … to generate/analyze/etc …
name
field should be human readable (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless generally understood-
) over underscore (_
), prefer all lowercase. Folder becomes repository in iwc-workflows organization and is included in TRS id