Skip to content
Ross edited this page Dec 3, 2023 · 2 revisions

Decomposing the TreeVal NF workflow for translation into Galaxy.

Progress report December 2

Former content of this document with background, resources and development content has been moved to the treeval_gal repository

Anna and Bjoern will join in.

Work plan:

Planning for Galaxy components that the VGP really need will be started. If we have working tools and subworkflow examples to show, that will really help focus the conversations. The current proposal is to prepare some demonstrations of the rapid workflow logic in Galaxy, while gathering and documenting requirements from the VGP at the same time.

  1. Deconstruct and document the actual NF steps
  2. Create or reuse suitable Galaxy components.
  3. Make initial subworkflows
  4. Test amd test again. Testing plan will be needed. Default test data is available.
  5. Documentation/training plan will be needed.
  6. Publish something about converting a complex NF workflow into Galaxy
Notes:
  1. Nadolina has suggested:

    Regardless, the main things curators need are:

    1. a brief pipeline that prepares the decontaminated draft assemblies for dual curation *some text manipulation so we don't have duplicate names *concatenating the haplotypes into a single fasta
    2. a coverage track in bedgraph/bigwig format, derived from an alignment of HiFi reads to the diploid assembly
    3. gap track, which can be derived from the HiFi alignment, or from gfastats [fasta] -b gaps, or a couple of other tools I am sure
    4. telomere track, also in bedgraph or bigwig
    5. close comparator alignment (i.e./ with mashmap)
  2. All dependencies will eventually go into Conda and tools to IUC. Development will necessarily be in a working environment and may temporarily involve non-conda dependencies for demonstration purposes.

Proposed action:
  • Ross will continue to slog through the DDL to document the rest of the rapid workflow.
  • Ross has made a new repository with a structure to reflect the treeval subworkflows containing the documentation for each and eventually, the tools and subworkflows for testing. Moved the material from this document below there.
  • Bjoern and Anna please plan to figure out what VGP really needs.
    • Initially, having treeval rapid available, even if not optimal, will provide a working model for users to identify improvements.
    • Maybe walk them through the three things that might meet some of Nadolina’s listed needs - gap_finder, longread_coverage and telo_finder to see if they would be a useful start and artifacts to test, while discussions continue.
    • Not sure if #1 is done anywhere otherwise a new project?
    • Telo_finder is possibly a separate project. We could initially just use what Sanger uses in the short term for testing at least.
    • If this is the mashup Nadolina wants for #5, it is matlab scripts. Not in the toolshed. New project perhaps.
    • Important to make sure Nadolina has seen the outputs page - there might be other useful subworkflows. There is a synteny subworkflow but not used in the rapid one - it uses minimap2 and needs a class specific set of synteny genomes so a datamanager probably needed 🙁

Please see the treeval_gal repository for details. This document is now a work planning resource for work on that repository.