Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readme and Annotation Updates -- WIP #697

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dannon
Copy link
Member

@dannon dannon commented Feb 27, 2025

Testing an approach for standardizing structure, content, tone of READMEs as well as creating/aligning annotations. Please don't merge as-is. I can break this into indivdiual PRs (and add the rest as needed) if this looks like a good pursuit.

FOR CONTRIBUTOR:

  • I have read the Adding workflows guidelines
  • License permits unrestricted use (educational + commercial)
  • Please also take note of the reviewer guidelines below to facilitate a smooth review process.

FOR REVIEWERS:

  • .dockstore.yml: file is present and aligned with creator metadata in workflow. ORCID identifiers are strongly encouraged in creator metadata. The .dockstore.yml file is required to run tests
  • Workflow is sufficiently generic to be used with lab data and does not hardcode sample names, reference data and can be run without reading an accompanying tutorial.
  • In workflow: annotation field contains short description of what the workflow does. Should start with This workflow does/runs/performs … xyz … to generate/analyze/etc …
  • In workflow: workflow inputs and outputs have human readable names (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless it is generally understood. Altering input or output labels requires adjusting these labels in the the workflow-tests.yml file as well
  • In workflow: name field should be human readable (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless generally understood
  • Workflow folder: prefer dash (-) over underscore (_), prefer all lowercase. Folder becomes repository in iwc-workflows organization and is included in TRS id
  • Readme explains what workflow does, what are valid inputs and what outputs users can expect. If a tutorial or other resources exist they can be linked. If a similar workflow exists in IWC readme should explain differences with existing workflow and when one might prefer one workflow over another
  • Changelog contains appropriate entries
  • Large files (> 100 KB) are uploaded to zenodo and location urls are used in test file

Comment on lines +48 to +65
## Applications

This parallel download workflow is essential for:
- Retrieving large datasets from public repositories
- Creating reproducible analysis collections from published studies
- Building reference datasets for benchmarking
- Efficient acquisition of data for meta-analyses
- Classroom and workshop preparation involving multiple datasets
- Any project requiring retrieval of multiple sequencing run accessions

## Performance Advantages

The parallel architecture provides significant advantages over sequential approaches:
- Downloads multiple accessions simultaneously rather than one-by-one
- Continues overall progress even if individual downloads fail
- Scales efficiently with available computational resources
- Reduces total download time by orders of magnitude for large collections
- Improves robustness through isolated job execution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That stuff is redundant and overly optimistic / "sales" oriented in a way. Overview and Workflow Process are nice subdivisions though, and I think the name changes are pretty good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it really took the original blurb and ran with it here 😆.

Creates one job per listed run accession, and is therefore much faster and more robust to errors when many accessions need to be downloaded.

We can definitely tone this one down.

@@ -1,6 +1,6 @@
{
"a_galaxy_workflow": "true",
"annotation": "This workflow take as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file. The pairs are filtered for MAPQ and for the region captured. Then, they are sorted by cooler to generate a tabix dataset. Cooler is used to generate a balanced cool file to the desired resolution.",
"annotation": "This workflow processes paired fastq files with HiCUP to create validPair files. It filters pairs by MAPQ and captured region, then sorts them with cooler to generate tabix datasets and balanced cool files at desired resolution.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like some of the annotations were not updated, but this update look good to me. Considering that's what we pull into BRC maybe we should start with updating the name and annotation fields before we tweak the readme ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants