Skip to content

Commit

Permalink
Merge pull request #129 from Plant-Food-Research-Open/dev
Browse files Browse the repository at this point in the history
Release candidate for 0.6.0
  • Loading branch information
GallVp authored Dec 20, 2024
2 parents ee702d7 + d069633 commit 023488c
Show file tree
Hide file tree
Showing 44 changed files with 2,767 additions and 131 deletions.
16 changes: 8 additions & 8 deletions .github/workflows/branch.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
name: nf-core branch protection
# This workflow is triggered on PRs to master branch on the repository
# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev`
# This workflow is triggered on PRs to main branch on the repository
# It fails when someone tries to make a PR against the Plant-Food-Research-Open `main` branch instead of `dev`
on:
pull_request_target:
branches: [master]
branches: [main]

jobs:
test:
runs-on: ubuntu-latest
steps:
# PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
# PRs to the nf-core repo main branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
- name: Check PRs
if: github.repository == 'Plant-Food-Research-Open/genepal'
run: |
Expand All @@ -22,7 +22,7 @@ jobs:
uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2
with:
message: |
## This PR is against the `master` branch :x:
## This PR is against the `main` branch :x:
* Do not close this PR
* Click _Edit_ and change the `base` to `dev`
Expand All @@ -32,9 +32,9 @@ jobs:
Hi @${{ github.event.pull_request.user.login }},
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch.
The `master` branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `main` branch.
The `main` branch should always contain code from the latest release.
Because of this, PRs to `main` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ jobs:
uses: actions/[email protected]

- name: Install Nextflow
uses: nf-core/setup-nextflow@v2
uses: nf-core/setup-nextflow@v2.0.0
with:
version: "${{ matrix.NXF_VER }}"

Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/download_pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Test successful pipeline download with 'nf-core pipelines download'

# Run the workflow when:
# - dispatched manually
# - when a PR is opened or reopened to master branch
# - when a PR is opened or reopened to main branch
# - the head branch of the pull request is updated, i.e. if fixes for a release are pushed last minute to dev.
on:
workflow_dispatch:
Expand All @@ -17,10 +17,10 @@ on:
- edited
- synchronize
branches:
- master
- main
pull_request_target:
branches:
- master
- main

env:
NXF_ANSI_LOG: false
Expand All @@ -30,7 +30,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Install Nextflow
uses: nf-core/setup-nextflow@v2
uses: nf-core/setup-nextflow@v2.0.0

- name: Disk space cleanup
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4

- name: Install Nextflow
uses: nf-core/setup-nextflow@v2
uses: nf-core/setup-nextflow@v2.0.0

- uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5
with:
Expand Down
2 changes: 1 addition & 1 deletion .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,5 @@ template:
outdir: .
skip_features:
- igenomes
version: 0.5.0
version: 0.6.0
update: null
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,32 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v0.6.0 - [20-Dec-2024]

### 'Added'

1. Added cDNA and CDS outputs to <OUTPUT_DIR>/annotations/<SAMPLE> directory [#118](https://github.com/Plant-Food-Research-Open/genepal/issues/118)
2. Added parameter `add_attrs_to_proteins_cds_fastas`
3. Added parameter `filter_genes_by_aa_length` with default set to `24` which allows removal of genes with ORFs shorter than 24 [#125](https://github.com/Plant-Food-Research-Open/genepal/issues/125)

### `Fixed`

1. Fixed an issue where TSEBRA failed because LIFTOFF lifted non-protein coding genes [#121](https://github.com/Plant-Food-Research-Open/genepal/issues/121)
2. Switched branch name from `master` to `main` in the GHA CIs
3. Fixed an issue in `genepal_report.Rmd` which caused the pangene matrix plot to fail when the number of clusters exceeded 65536 [#124](https://github.com/Plant-Food-Research-Open/genepal/issues/124)
4. Fixed an issue where `GENEPALREPORT` process failed due to OOM kill signal from SLURM [#123](https://github.com/Plant-Food-Research-Open/genepal/issues/123)
5. Fixed an issue where Gff merge after liftoff failed when one of the Gff files did not contain any genes
6. Fixed an issue where `gxf_fasta_agat_spaddintrons_spextractsequences` crashed due to short introns [#89](https://github.com/Plant-Food-Research-Open/genepal/issues/89)

### `Dependencies`

1. Nextflow!>=24.04.2
2. [email protected]

### `Deprecated`

1. Removed parameter `add_attrs_to_proteins_fasta`

## v0.5.0 - [21-Nov-2024]

### `Added`
Expand Down
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ authors:
- family-names: "Thomson"
given-names: "Susan"
title: "genepal: A Nextflow pipeline for genome and pan-genome annotation"
version: 0.5.0
version: 0.6.0
date-released: 2024-11-21
url: "https://github.com/Plant-Food-Research-Open/genepal"
doi: 10.5281/zenodo.14195006
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,16 @@
- Merge multi-reference liftoffs
- Remove liftoff transcripts marked by _valid_ORF=False_
- Remove liftoff genes with any intron shorter than 10 bp
- Remove rRNA and tRNA from liftoff
- Remove rRNA, tRNA and other non-protein coding models from liftoff
- Optionally, allow or remove iso-forms
- Remove BRAKER models from Liftoff loci
- Merge Liftoff and BRAKER models
- Optionally, remove models without any EggNOG-mapper hits
- [EggNOG-mapper](https://github.com/eggnogdb/eggnog-mapper): Add functional annotation to gff
- [GenomeTools](https://github.com/genometools/genometools): GFF format validation
- [GffRead](https://github.com/gpertea/gffread): Extraction of protein sequences
- [GffRead](https://github.com/gpertea/gffread)
- Extraction of protein sequences
- Optionally, remove models with ORFs shorter than `N` amino acids
- [OrthoFinder](https://github.com/davidemms/OrthoFinder): Perform phylogenetic orthology inference across genomes
- [GffCompare](https://github.com/gpertea/gffcompare): Compare and benchmark against an existing annotation
- [BUSCO](https://gitlab.com/ezlab/busco): Completeness statistics for genome and annotation through proteins
Expand Down Expand Up @@ -97,7 +99,7 @@ sbatch ./pfr_genepal

plant-food-research-open/genepal workflows were originally scripted by Jason Shiller ([@jasonshiller](https://github.com/jasonshiller)). Usman Rashid ([@gallvp](https://github.com/gallvp)) wrote the Nextflow pipeline.

We thank the following people for their extensive assistance in the development of this pipeline:
We thank the following people for extensive assistance in the development of the pipeline,

- Cecilia Deng [@CeciliaDeng](https://github.com/CeciliaDeng)
- Charles David [@charlesdavid](https://github.com/charlesdavid)
Expand All @@ -107,6 +109,10 @@ We thank the following people for their extensive assistance in the development
- Susan Thomson [@cflsjt](https://github.com/cflsjt)
- Ting-Hsuan Chen [@ting-hsuan-chen](https://github.com/ting-hsuan-chen)

and for contributions to the codebase,

- Liam Le Lievre [@liamlelievre](https://github.com/liamlelievre)

The pipeline uses nf-core modules contributed by following authors:

<a href="https://github.com/gallvp"><img src="https://github.com/gallvp.png" width="50" height="50"></a>
Expand Down Expand Up @@ -139,6 +145,7 @@ The pipeline uses nf-core modules contributed by following authors:
<a href="https://github.com/charles-plessy"><img src="https://github.com/charles-plessy.png" width="50" height="50"></a>
<a href="https://github.com/bunop"><img src="https://github.com/bunop.png" width="50" height="50"></a>
<a href="https://github.com/abhi18av"><img src="https://github.com/abhi18av.png" width="50" height="50"></a>
<a href="https://github.com/liamlelievre"><img src="https://github.com/liamlelievre.png" width="50" height="50"></a>

## Contributions and Support

Expand Down
2 changes: 1 addition & 1 deletion assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
report_comment: >
This report has been generated by the <a href="https://github.com/plant-food-research-open/genepal" target="_blank">plant-food-research-open/genepal</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://github.com/plant-food-research-open/genepal/blob/0.5.0/docs/usage.md" target="_blank">documentation</a>.
<a href="https://github.com/plant-food-research-open/genepal/blob/0.6.0/docs/usage.md" target="_blank">documentation</a>.
report_section_order:
"plant-food-research-open-genepal-methods-description":
Expand Down
29 changes: 25 additions & 4 deletions bin/genepal_report.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -190,22 +190,43 @@ cat("<br>")


```{r pheatmap, eval=(exists("n0_df") && !is.null(n0_df$heatmap)), results='hide', fig.align='center', fig.cap="Heatmap showing number of proteins present in each orthocluster (clusters where all individuals have 1 copy are excluded). Columns = Orthologue cluster, Row = Individual", fig.width=7, fig.height=7, dpi=150, warning=FALSE}
pheatmap(n0_df$heatmap,
# Max 65536 allowed
# https://github.com/Plant-Food-Research-Open/genepal/issues/124
n_cols <- ncol(n0_df$heatmap)
max_cols_allowed <- min(n_cols, 5000)
# Approach 1: Random selection of columns
# selected_cols <- sample(n_cols, max_cols_allowed)
# Approach 2: First N largest clusters
selected_cols <- order(colSums(n0_df$heatmap), decreasing = TRUE)[seq(1, max_cols_allowed)]
prefix_text <- ""
if ( n_cols != max_cols_allowed ) {
prefix_text <- paste0("Top ", max_cols_allowed, " ")
}
pheatmap(n0_df$heatmap[, selected_cols],
show_colnames = FALSE,
main = "Orthologue clusters containing accessory proteins",
main = paste0(prefix_text, "Orthologue clusters"),
legend = TRUE,
legend_labels = TRUE,
border_color = "white"
)
pheatmap(n0_df$heatmap,
pheatmap(n0_df$heatmap[, selected_cols],
filename = file.path(outputs_folder, "pangene.matrix.heatmap.pdf"),
show_colnames = FALSE,
main = "Orthologue clusters containing accessory proteins",
main = paste0(prefix_text, "Orthologue clusters"),
legend = TRUE,
legend_labels = TRUE,
border_color = "white"
)
write.csv(x = transform_hogs(n0o), file = file.path(outputs_folder, "pangenome.matrix.csv"), row.names = FALSE)
```


Expand Down
3 changes: 3 additions & 0 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,7 @@ process {
cpus = { 8 * task.attempt }
time = { 7.days * task.attempt }
}
withName:GENEPALREPORT {
memory = { 20.GB * task.attempt }
}
}
31 changes: 28 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ process { // SUBWORKFLOW: FASTA_LIFTOFF
}

withName: '.*:FASTA_LIFTOFF:GFFREAD_BEFORE_LIFTOFF' {
ext.args = '--no-pseudo --keep-genes'
ext.args = '--no-pseudo --keep-genes -C'
}

withName: '.*:FASTA_LIFTOFF:MERGE_LIFTOFF_ANNOTATIONS' {
Expand All @@ -212,7 +212,7 @@ process { // SUBWORKFLOW: FASTA_LIFTOFF

withName: '.*:FASTA_LIFTOFF:GFFREAD_AFTER_LIFTOFF' {
ext.prefix = { "${meta.id}.liftoff" }
ext.args = '--keep-genes'
ext.args = '--no-pseudo --keep-genes -C'
}

withName: '.*:FASTA_LIFTOFF:GFF_TSEBRA_SPFILTERFEATUREFROMKILLLIST:AGAT_CONVERTSPGFF2GTF' {
Expand Down Expand Up @@ -240,6 +240,10 @@ process { // SUBWORKFLOW: GFF_MERGE_CLEANUP
ext.prefix = { "${meta.id}.liftoff.braker" }
}

withName: '.*:GFF_MERGE_CLEANUP:FILTER_BY_ORF_SIZE' {
ext.args = params.filter_genes_by_aa_length ? "--no-pseudo --keep-genes -C -l ${ ( params.filter_genes_by_aa_length + 1 ) * 3 }" : ''
}

withName: '.*:GFF_MERGE_CLEANUP:GT_GFF3' {
ext.args = '-tidy -retainids -sort'
}
Expand Down Expand Up @@ -286,7 +290,7 @@ process { // SUBWORKFLOW: GFF_STORE
}

withName: '.*:GFF_STORE:EXTRACT_PROTEINS' {
ext.args = params.add_attrs_to_proteins_fasta ? '-F -D -y' : '-y'
ext.args = params.add_attrs_to_proteins_cds_fastas ? '-F -D -y' : '-y'
ext.prefix = { "${meta.id}.pep" }

publishDir = [
Expand All @@ -295,6 +299,27 @@ process { // SUBWORKFLOW: GFF_STORE
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: '.*:GFF_STORE:EXTRACT_CDS' {
ext.args = params.add_attrs_to_proteins_cds_fastas ? '-F -D -x' : '-x'
ext.prefix = { "${meta.id}.cds" }

publishDir = [
path: { "${params.outdir}/annotations/$meta.id" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}
withName: '.*:GFF_STORE:EXTRACT_CDNA' {
ext.args = params.add_attrs_to_proteins_cds_fastas ? '-F -D -w' : '-w'
ext.prefix = { "${meta.id}.cdna" }

publishDir = [
path: { "${params.outdir}/annotations/$meta.id" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}
}

process { // SUBWORKFLOW: FASTA_ORTHOFINDER
Expand Down
2 changes: 2 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,8 @@ If more than one genome is included in the pipeline, [ORTHOFINDER](https://githu
- `Y/`
- `Y.gt.gff3`: Final annotation file for genome `Y` which contains gene models and their functional annotations
- `Y.pep.fasta`: Protein sequences for the gene models
- `Y.cdna.fasta`: cDNA sequences for the gene models
- `Y.cds.fasta`: Coding sequences for the gene models

</details>

Expand Down
21 changes: 11 additions & 10 deletions docs/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,19 +59,20 @@ A Nextflow pipeline for consensus, phased and pan-genome annotation.

## Post-annotation filtering options

| Parameter | Description | Type | Default | Required | Hidden |
| ----------------------------- | ----------------------------------------------------------------- | --------- | ------- | -------- | ------ |
| `allow_isoforms` | Allow multiple isoforms for gene models | `boolean` | True | | |
| `enforce_full_intron_support` | Require every model to have external evidence for all its introns | `boolean` | True | | |
| `filter_liftoff_by_hints` | Use BRAKER hints to filter Liftoff models | `boolean` | True | | |
| `eggnogmapper_purge_nohits` | Purge transcripts which do not have a hit against eggnog | `boolean` | | | |
| Parameter | Description | Type | Default | Required | Hidden |
| ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------- | -------- | ------ |
| `allow_isoforms` | Allow multiple isoforms for gene models | `boolean` | True | | |
| `enforce_full_intron_support` | Require every model to have external evidence for all its introns | `boolean` | True | | |
| `filter_liftoff_by_hints` | Use BRAKER hints to filter Liftoff models | `boolean` | True | | |
| `eggnogmapper_purge_nohits` | Purge transcripts which do not have a hit against eggnog | `boolean` | | | |
| `filter_genes_by_aa_length` | Filter genes with open reading frames shorter than the specified number of amino acids excluding the stop codon. If set to `null`, this filter step is skipped. | `integer` | 24 | | |

## Annotation output options

| Parameter | Description | Type | Default | Required | Hidden |
| ----------------------------- | ------------------------------------ | --------- | ------- | -------- | ------ |
| `braker_save_outputs` | Save BRAKER files | `boolean` | | | |
| `add_attrs_to_proteins_fasta` | Add gff attributes to proteins fasta | `boolean` | | | |
| Parameter | Description | Type | Default | Required | Hidden |
| ---------------------------------- | --------------------------------------------- | --------- | ------- | -------- | ------ |
| `braker_save_outputs` | Save BRAKER files | `boolean` | | | |
| `add_attrs_to_proteins_cds_fastas` | Add gff attributes to proteins/cDNA/CDS fasta | `boolean` | | | |

## Evaluation options

Expand Down
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@
},
"gxf_fasta_agat_spaddintrons_spextractsequences": {
"branch": "main",
"git_sha": "7bf6fbca23edc94490ffa6709f52b2f71c6fb130",
"git_sha": "ed4146008dbdcfd4823252b456de32059e2d07f4",
"installed_by": ["subworkflows"]
}
}
Expand Down
Loading

0 comments on commit 023488c

Please sign in to comment.