Readme and Annotation Updates -- WIP #697

dannon · 2025-02-27T04:06:47Z

Testing an approach for standardizing structure, content, tone of READMEs as well as creating/aligning annotations. Please don't merge as-is. I can break this into indivdiual PRs (and add the rest as needed) if this looks like a good pursuit.

FOR CONTRIBUTOR:

I have read the Adding workflows guidelines
License permits unrestricted use (educational + commercial)
Please also take note of the reviewer guidelines below to facilitate a smooth review process.

FOR REVIEWERS:

.dockstore.yml: file is present and aligned with creator metadata in workflow. ORCID identifiers are strongly encouraged in creator metadata. The .dockstore.yml file is required to run tests
Workflow is sufficiently generic to be used with lab data and does not hardcode sample names, reference data and can be run without reading an accompanying tutorial.
In workflow: annotation field contains short description of what the workflow does. Should start with This workflow does/runs/performs … xyz … to generate/analyze/etc …
In workflow: workflow inputs and outputs have human readable names (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless it is generally understood. Altering input or output labels requires adjusting these labels in the the workflow-tests.yml file as well
In workflow: name field should be human readable (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless generally understood
Workflow folder: prefer dash (-) over underscore (_), prefer all lowercase. Folder becomes repository in iwc-workflows organization and is included in TRS id
Readme explains what workflow does, what are valid inputs and what outputs users can expect. If a tutorial or other resources exist they can be linked. If a similar workflow exists in IWC readme should explain differences with existing workflow and when one might prefer one workflow over another
Changelog contains appropriate entries
Large files (> 100 KB) are uploaded to zenodo and location urls are used in test file

mvdbeek · 2025-02-27T14:56:55Z

workflows/data-fetching/parallel-accession-download/README.md

+## Applications
+
+This parallel download workflow is essential for:
+- Retrieving large datasets from public repositories
+- Creating reproducible analysis collections from published studies
+- Building reference datasets for benchmarking
+- Efficient acquisition of data for meta-analyses
+- Classroom and workshop preparation involving multiple datasets
+- Any project requiring retrieval of multiple sequencing run accessions
+
+## Performance Advantages
+
+The parallel architecture provides significant advantages over sequential approaches:
+- Downloads multiple accessions simultaneously rather than one-by-one
+- Continues overall progress even if individual downloads fail
+- Scales efficiently with available computational resources
+- Reduces total download time by orders of magnitude for large collections
+- Improves robustness through isolated job execution


That stuff is redundant and overly optimistic / "sales" oriented in a way. Overview and Workflow Process are nice subdivisions though, and I think the name changes are pretty good.

Yeah, it really took the original blurb and ran with it here 😆.

Creates one job per listed run accession, and is therefore much faster and more robust to errors when many accessions need to be downloaded.

We can definitely tone this one down.

mvdbeek · 2025-02-27T14:59:10Z

workflows/epigenetics/hic-hicup-cooler/chic-fastq-to-cool-hicup-cooler.ga

@@ -1,6 +1,6 @@
 {
    "a_galaxy_workflow": "true",
-    "annotation": "This workflow take as input a collection of paired fastq. It uses HiCUP to go from fastq to validPair file. The pairs are filtered for MAPQ and for the region captured. Then, they are sorted by cooler to generate a tabix dataset. Cooler is used to generate a balanced cool file to the desired resolution.",
+    "annotation": "This workflow processes paired fastq files with HiCUP to create validPair files. It filters pairs by MAPQ and captured region, then sorts them with cooler to generate tabix datasets and balanced cool files at desired resolution.",


It looks like some of the annotations were not updated, but this update look good to me. Considering that's what we pull into BRC maybe we should start with updating the name and annotation fields before we tweak the readme ?

dannon added 2 commits February 26, 2025 23:03

Readme enhancement partial pass -- needs verification.

b8a562e

Annotation update pass -- please verify these are sane changes.

27f97a3

mvdbeek reviewed Feb 27, 2025

View reviewed changes

mvdbeek added the website label Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme and Annotation Updates -- WIP #697

Readme and Annotation Updates -- WIP #697

dannon commented Feb 27, 2025

mvdbeek Feb 27, 2025

dannon Feb 27, 2025

mvdbeek Feb 27, 2025

Readme and Annotation Updates -- WIP #697

Are you sure you want to change the base?

Readme and Annotation Updates -- WIP #697

Conversation

dannon commented Feb 27, 2025

mvdbeek Feb 27, 2025

Choose a reason for hiding this comment

dannon Feb 27, 2025

Choose a reason for hiding this comment

mvdbeek Feb 27, 2025

Choose a reason for hiding this comment