Skip to content

Releases: labgem/PPanGGOLiN

PPanGGOLiN 2.2.1

03 Dec 08:43
f49f497
Compare
Choose a tag to compare

Minor Changes

  • RGP Score in RGP Output: The RGP score has been added to the RGP output, contributed by @LCoffion in #303.
  • RGP Score in Pangenome File: The RGP score is now stored in the pangenome file, allowing it to be included in subsequent outputs. Contributed by @JeanMainguy in #304.
  • Improved Spot Reading Performance: The process for reading spots has been optimized for better performance. Contributed by @jpjarnoux in #302.

Bug Fixes

  • Fixed Contig Length Detection in GFF Parsing: Addressed an issue where contig lengths were not properly set if the optional ##sequence-region directive was missing. In this case, contig length is now calculated using the associated FASTA sequence, resolving #299. Implemented by @JeanMainguy in #301.

New Contributors

  • A big thank you to @LCoffion for making their first contribution in #303! 🎉

Full Changelog: 2.2.0...2.2.1

PPanGGOLiN 2.2.0

29 Oct 16:40
8cacb1b
Compare
Choose a tag to compare

Major Changes

  • Improved Handling of Partial Genes: Partial genes are now correctly processed in PPanGGOLiN, no longer treated as pseudogenes (PR #290). This update affects results, as new genes may be included. ⚠️

  • Filtered Non-ASCII Characters: Non-ASCII characters are now filtered out to resolve issues for users with genomes annotated by Bakta (PR #291).

Minor Changes

  • Enhanced Memory Efficiency for ppanggolin fasta: Optimized memory usage for handling large pangenome files (PR #283).
  • Added Black Linter: Integrated Black linter to maintain consistent code formatting across the project (PR #295).
  • Adjusted Log Level for Missing Translation Warnings: Changed log level to DEBUG to prevent log clutter when translation information is absent in input genomes (PR #296).

Bug Fixes

  • Corrected Chao Calculation Formula: Fixed a formula error in the Chao metric displayed on the u-curve (PR #284, Issue #281).
  • Resolved Empty Metadata Tag in Annotation Files: Fixed issue with empty metadata tags in annotation files (PR #287, Issue #285).

Full Changelog: Compare v2.1.2...v2.2.0

PPanGGOLiN 2.1.2

16 Sep 08:23
bc50066
Compare
Choose a tag to compare

Bug Fixes

  • Improved tile plot with gene names and metadata in hover text, optional x-axis dendrogram, updated color bar for discrete values, and a new partition legend by @JeanMainguy. (Issues #81, #251, PR #277)

  • Fixed an issue in the partition module where the random.sample function caused errors in Python 3.11 and 3.12, resolving a bug missed in Python 3.12 support by @JeanMainguy. (Issues #268 & #280, PR #278)

  • Fixed issues with the cluster command when using an external cluster file by @JeanMainguy. (Issue #279, PR #278)

  • Fixed ruff warnings related to UP, PERF, and C4 by @fchapoton. (PR #274)

Full Changelog: 2.1.1...2.1.2

PPanGGOLiN 2.1.1

22 Aug 11:35
83c7d75
Compare
Choose a tag to compare

Bug Fixes

  • Added support for Python 3.11 and 3.12 by @JeanMainguy (issues #253 and #268, PR #255).
  • Fixed handling of Aragorn genes that exceed contig length by @JeanMainguy (issue #254, PR #256).
  • Fixed output configuration in workflow commands when set in the config file by @JeanMainguy (PR #261).
  • Sort gene families in the TSV file by cluster size and alphabetically by gene to ensure consistent output across runs by @jpjarnoux (issue #263, PR #265).
  • Fixed issue in projection when using a spotless pangenome by @JeanMainguy (issue #264, PR #266).
  • Added a warning log for partition failures to improve error visibility, as a first step towards better handling of partitioning issues by @JeanMainguy (issue #262, PR #269).
  • Minor code improvements and typo corrections in the documentation and source by @fchapoton (#257, #258, and #259).

New Contributors

Full Changelog: 2.1.0...2.1.1

PPanGGOLiN 2.1.0

10 Jul 14:54
d49dd5d
Compare
Choose a tag to compare

New Features

  • Write the translated sequence of genes using MMSeqs2 with the --proteins option (documentation), which works like the other options in the ppanggolin fasta command (added in PR #205).
  • Some information about contigs and genomes, such as organism name, strain, and dbx_ref information, is now extracted from annotation files (GBFF & GFF) and added to the pangenome as metadata (added in PR #227).
  • The command write_metadata has been added to allow exporting metadata to TSV files. Check out the documentation for more details (added in PR #227).
  • Add infer_singleton option in the workflow (added in PR #239).
  • When clustering is given, it’s now possible to specify the representative gene of the cluster (added in PR #242).

Major Change

  • Handling genes with joined coordinates (for example, frameshift) in input annotation files (GFF or GBFF). Such annotations were disregarded when encountered in GBFF files and improperly managed in GFF files. This change implies a change in writing gene sequences and, consequently, in clustering and, thus, in all pangenome results: graph, partition, RGP, spots, and modules. This change was measured and reported in PR #206. It is not huge on pangenomes, but needs to be known for future version comparisons. See also PR #240 and #249.

Minor Changes

  • Ordering gene in the whole genome MSA file (added in PR #200).
  • Replace the return in the try block with an else statement to return the value found in try (added in PR #204).
  • When writing MSA, the partial gene is handled by removing the last one or two nucleotides to translate (added in PR #205).
  • Change how method get_genes handles end position (added in PR #212).
  • Improve GitHub CI workflow (added in PR #216, #220, #224, #225).
  • PPanGGOLiN now supports using the soft-link option when building the MMSeqs2 database via subprocess, reducing temporary directory size (added in PR #214 and #229).
  • Report subprocess (MMSeqs2, MAFFT, etc.) error message if it crashes (reported in issue #210, added in PR #229).
  • When parsing annotation files, CDS are translated using the translation table code specified by the transl_table tag. If this tag is missing, the translation_table argument is now used, with a default value of 11 (reported in #226 and added in PR #230).
  • Added an identifier to metadata in object and HDF5. This helps to identify the right metadata in a cross-reference (added in PR #235).
  • Make the subprocess more detailed with info and error messages (added in PR #237).
  • Add the protein sequence to the gene family when reading clustering (added in PR #238).
  • Add gene information in RGP output (added in PR #239).
  • Improve metadata management in commands projection and rgp_cluster (added in PR #244).
  • Some developments for the PANORAMA project 🤫 (added in PR #248).

Bug Fixes

  • Fix the last genome missing in the whole genome MSA file (fixed in PR #200).
  • Write only genes associated with the RGP when writing FASTA sequences for RGP (reported in issue #122, fixed in PR #202).
  • Ensure proper handling of circular RGPs, addressing issues observed in the spot plot (reported in issue #124, fixed in PR #206).
  • Fix gene ID mismatch in projection command with GBFF files as input genome (reported in issue #207, fixed in PR #208).
  • Fix spot prediction in projection command (fixed in PR #209).
  • Fix multiple spots per RGP handling in projection command (fixed in PR #211).
  • Handle trailing whitespace at the end of GBFF file (reported in issue #203, fixed in PR #213).
  • Correctly read "is_circular" from GFF files (fixed in PR #215).
  • Fix RGP "looping" around circular contigs (fixed in PR #215).
  • Write the gene name instead of the coordinates in RGP output files (reported in issue #218, fixed in PR #219).
  • Write only the genes of the input genome in gene_to_gene_family.tsv file from projection (reported in issue #221, fixed in PR #228).
  • Fix dup_margin default value (reported in issue #223 and fixed in PR #234).
  • Fix missing translation_table handling (reported in issue #226 and fixed in PR #230).
  • Fix spots to modules output file always empty (fixed in PR #236).
  • Manage chevron in GFF start and stop (fixed in PR #241).
  • Ignore weird tRNA from Aragorn (fixed in PR #245).
  • Fix display module on Proksee with gene overlapping contig (fixed in PR #246).
  • Fix metadata-related issues (fixed in PR #247).

New Contributor

We thank @ktmeaton, who made their first contribution in #200. 🎉

Other Contributors

PPanGGOLiN 2.0.5

21 Mar 13:28
f3ba6a1
Compare
Choose a tag to compare

Bug Fixes

  • Resolved dead links in documentation (reported in issue #189, fixed in PR #190).
  • Addressed missing metadata separation when utilizing metadata in 'proksee' output (PR #188).
  • Added missing documentation for the ppanggolin fasta command (reported in issue #191, fixed in PR #192).
  • Fixed error occurring in ppanggolin msa command when using all genes (PR #196, reported in #198).

Full Changelog: 2.0.4...2.0.5

PPanGGOLiN 2.0.4

07 Mar 15:40
ed7bbfe
Compare
Choose a tag to compare

Bug Fixes

  • Fixed division by zero issue when no module is predicted. (Pull Request #183)
  • Improved error messages during input file parsing for enhanced clarity, helping users in troubleshooting (see issue #185). Additionally, this update adds more flexibility when scanning the first line of input files to identify the GFF file format. Details can be found in (Pull Request #186)

Full Changelog: 2.0.3...2.0.4

PPanGGOLiN 2.0.3

22 Feb 11:00
9d0821e
Compare
Choose a tag to compare

This release addresses several minor bugs identified in the previous version (v2.0.2) of PPanGGOLiN.

Bug fixes

  • Fixes Pyrodigal meta mode and improves training: Resolved an issue related to Pyrodigal meta mode and introduced enhancements in the training process. #177

  • Fix ppanggolin fasta Command: Addressed multiple issues associated with the ppanggolin fasta command (refer to Issue #179). #180

  • Handling cases where two Genes share the same stop: Implemented a solution to manage scenarios where two genes share a common stop position, preventing errors in the gene addition process. #181

  • Unique tmpdir name in clustering step: The tmpdir name generated during the clustering step is now truly unique, preventing any potential conflicts. #178

  • Fix for HTML spot plot radio buttons: Resolved an issue with radio buttons in the HTML spot plot that had become non-functional since bokeh v3. #176

Full Changelog: 2.0.2...2.0.3

PPanGGOLiN 2.0.2

29 Jan 08:38
91d0e39
Compare
Choose a tag to compare

Bug fixes

  • Fix use of non-unique gene IDs when writing sequences in PR #173.
    This PR fixes a bug where the 'all' command fails due to non-unique gene IDs in the input genome annotation files. In this case, PPanGGOLiN now uses custom gene IDs to ensure their uniqueness. This PR should fix issue #172.

  • Minor documentation update in #170

  • Fix workflow that checks the bioconda recipe in #171

Full changelog: 2.0.1...2.0.2

PPanGGOLiN 2.0.1

18 Jan 14:16
a42f3e5
Compare
Choose a tag to compare

Bug Fixes

Made minor patches to ensure the compilation of bioconda recipe on macOS. Version 2.0.0 faced issues on macOS when compiling C code with Clang. This has been resolved by adding a flag in setup.py (#169).

Full Changelog: 2.0.0 to 2.0.1