treeval_gal/busco_annotation at main · fubar2/treeval_gal

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

#3 busco annotation

See https://busco.ezlab.org/busco_userguide.html for BUSCO documentation

(from https://github.com/sanger-tol/treeval/blob/dev/docs/output.md#busco-analysis)

This has some odd DDL syntax, but we can probably figure it out with content expert help to explain, for example, why_ lepidoptera_ are a special case?

workflow BUSCO_ANNOTATION {
    take:
    dot_genome          // channel: [val(meta), [ datafile ]]
    reference_tuple     // channel: [val(meta), [ datafile ]]
    assembly_classT     // channel: val(class)
    lineageinfo         // channel: val(lineage_db)
    lineagespath        // channel: val(/path/to/buscoDB)
    buscogene_as        // channel: val(dot_as location)
    ancestral_table     // channel: val(ancestral_table location)
    main:
    ch_versions                 = Channel.empty()
    //
    // MODULE: RUN BUSCO TO OBTAIN FULL_TABLE.CSV
    //      EMITS FULL_TABLE.CSV
    //
    BUSCO (
        reference_tuple,
        lineageinfo,
        lineagespath,
        []
    )
    ch_versions                 = ch_versions.mix( BUSCO.out.versions.first() )
    ch_grab                     = GrabFiles( BUSCO.out.busco_dir )
    //
    // MODULE: EXTRACT THE BUSCO GENES FOUND IN REFERENCE
    //
    EXTRACT_BUSCOGENE (
        ch_grab
    )
    ch_versions                 = ch_versions.mix( EXTRACT_BUSCOGENE.out.versions )
    //
    // MODULE: SORT THE EXTRACTED BUSCO GENE
    //
    BEDTOOLS_SORT(
        EXTRACT_BUSCOGENE.out.genefile,
        []
    )
    ch_versions                 = ch_versions.mix( BEDTOOLS_SORT.out.versions )
    //
    // MODULE: CONVERT THE BED TO BIGBED
    //
    UCSC_BEDTOBIGBED(
        BEDTOOLS_SORT.out.sorted,
        dot_genome.map{it[1]},      // Gets file from tuple (meta, file)
        buscogene_as
    )
    ch_versions                 = ch_versions.mix( UCSC_BEDTOBIGBED.out.versions )
    //
    // LOGIC: AGGREGATE DATA AND SORT BRANCH ON CLASS
    //
    lineageinfo
        .combine( BUSCO.out.busco_dir )
        .combine( ancestral_table )
        .branch {
            lep:    it[0].split('_')[0] == "lepidoptera"
            general: it[0].split('_')[0] != "lepidoptera"
        }
        .set{ ch_busco_data }
    //
    // LOGIC: BUILD NEW INPUT CHANNEL FOR ANCESTRAL ID
    //
    ch_busco_data
            .lep
            .multiMap { lineage, meta, busco_dir, ancestral_table ->
                busco_dir:  tuple( meta, busco_dir )
                atable:     ancestral_table
            }
            .set{ ch_busco_lep_data }
    //
    // SUBWORKFLOW: RUN ANCESTRAL BUSCO ID (ONLY AVAILABLE FOR LEPIDOPTERA)
    //
    ANCESTRAL_GENE (
        ch_busco_lep_data.busco_dir,
        dot_genome,
        buscogene_as,
        ch_busco_lep_data.atable
    )
    ch_versions                 = ch_versions.mix( ANCESTRAL_GENE.out.versions )
    emit:
    ch_buscogene_bigbed         = UCSC_BEDTOBIGBED.out.bigbed
    ch_ancestral_bigbed         = ANCESTRAL_GENE.out.ch_ancestral_bigbed
    versions                    = ch_versions

}

First step needs a nextflow busco module, and that tool is available from the iuc in the Toolshed.

Next is EXTRACT_BUSCOGENE which runs another of the scripts in the /tree/bin directory,

get_busco_gene.sh $fulltable > ${prefix}_buscogene.csv

so need a new tool to run that. This tool in converting the busco table into BED to be consumed by JBrowse. Its a simple awk command, so I think we can convert this without a new tool.

Bedtools sort and ucsc_bedtobigbed are both used again so they will already be available.

Then there is some odd syntax involving an input database from the #1 yaml subworkflow. Looks like it is setting up a filesystem or streams for the #2 ANCESTRAL_GENE subworkflow described above. Hooboy. Will need a content expert to make sure the description being given here makes sense and that the data and parameters can be obtained correctly from the user.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

busco_annotation

busco_annotation

README.md

#3 busco annotation

Files

busco_annotation

Directory actions

More options

Directory actions

More options

Latest commit

History

busco_annotation

Folders and files

parent directory

README.md

#3 busco annotation