Skip to content

Releases: labgem/PPanGGOLiN

PPanGGOLiN 2.0.0

15 Jan 15:51
2f76ba2
Compare
Choose a tag to compare

New commands

  • projection: to annotate external genomes using an existing pre-computed reference pangenome (#119, see doc).

  • rgp_cluster: to cluster RGP based on their gene family content (#117, see doc).

  • metadata: add metadata linked to various pangenome elements using simple TSV files (#111, see doc).

  • the write command is split in two commands (#140):

    • write_pangenome: write outputs at the pangenome level (see doc)
    • write_genomes: write genome outputs with pangenome annotation. Three formats are available for outputs: table, GFF and JSON Proksee (#139, see doc).
  • utils: a small side command to generate a default configuration file for any commands (#112, see doc).

New features

  • A new, improved documentation hosted by readthedoc replacing the github wiki.
  • GFF export of genomes with pangenome annotation (#139, see doc).
  • JSON Map for Proksee to visualize interactively each genome and their pangenome annotation (#139, see doc).
  • Configuration file can now be used to set all or some parameters of PPanGGOLiN commands (#112, see doc).

Major change

BREAKING: New structure of the pangenome file to make it much lighter and faster to read (#110). ⚠️ Break the compatibility with PPanGGOLiN v1 pangenome HDF-5 files.

Minor change

  • Replacing Prodigal by pyrodigal for the annotation command (#138).
  • The context command has a window parameter to define the number of neighboring genes that are considered on each side of a gene of interest when searching for contexts (#137, see doc).
  • Replace all option keyword by synteny option keyword for draw –spots to draw spots with different RGP syntenies. Now all will draw all pangenome spots (#129)

Bug Fixes

  • Writing out only the RGP and spot of the gene with --projection (#130). Please note that, in version 2, the --projection parameter in the write command has been renamed to --table and now belongs to the write_genomes command (check the documentation of the write_genomes command for more details).
  • Make deterministic clustering (#116)

PPanGGOLiN 1.2.105

19 Feb 17:31
161035e
Compare
Choose a tag to compare

A lot of the code was rewritten, but that should be relatively transparent for users

Bugfixes

  • Shell subpartitions are properly saved in the HDF5 file and used in the different figures
  • --meta option for annotate to annotate genomes using the metagenome mode of prodigal
  • --single_copy for msa to compute MSA using 'mostly single copy' persistent genes only
  • Cope with drawing more than 2000 identical spots in the same figure

PPanGGOLiN 1.2.74

28 Feb 15:25
Compare
Choose a tag to compare

New commands

  • metrics to compute a list of metrics about the pangenome.

New features

  • The projection option from write subcommand, can give now, information about RGPs, Spots and modules
  • With the new command metrics is now possible to compute the genomic
    fluidity of the pangenome.
  • With the new command metrics is now possible to get some more information
    about the module. This information will be computed and shown as statistics on families and partitions of modules
  • All the metrics are saved in pangenome file and could be print with info
    subcommand.

Bug fixes

  • fix crazy cluster assignment in clustering step
  • fix align when using a pangenome constructed from user-provided
    annotations with prokka-like identifiers
  • fix check information on few subcommand option to help user

PPanGGOLiN 1.2.63

22 Dec 10:28
Compare
Choose a tag to compare
  • identifiers used in provided annotation files (gff or gbff, through --anno) will be used by default, unless they are not unique within the pangenome
  • additional column in context output indicating family partition
  • Always save the gene sequences when building a pangenome (which gives more flexibility when doing additional analysis with ppanggolin)

Bugfixes

  • fix a bug preventing you from doing a new clustering if partitions were not computed
  • fix genome sizes drawn with draw --spots

PPanGGOLiN 1.2.46

30 Nov 10:11
Compare
Choose a tag to compare

New commands

  • module to predict conserved modules in variable parts of a pangenome
  • context to find which gene families are conserved in the same genomic context than sequences of interest
  • all to run all possible analysis with PPanGGOLiN.
  • panmodule to run the panModule workflow

Bug fixes

  • improved pseudogene reading and gff/gbff parsing
  • fixed gff parser to cope with bakta gff files (reported in #66)
  • fixed gexf formatting in the rare case of having '&' in the 'product' field of gene annotations (reported in #61)
  • fixed rare crash happening when a partition has only 1 gene family ( see #64 )
  • fixed compilation issue with gcc 10.* and above (reported in #69 )

Other:

  • Allow to compute K=2 if forced by the user in partition or rarefaction(by default, K is still picked between 3 and 20). (see #65 )
  • removed R, rpy2 and genoPlot-R dependencies (#47 shall never be a problem anymore)
  • added a new bokeh dependency
  • remove spot --draw_hotspots and related options. To realize the same thing, use draw --spots once the spots have been computed.
  • added a --spots option to draw to have interactive figures for spots of interest, replacing the former figures drawn with R.
  • align can compare a set of sequences of interest to a pangenome, and draw related elements, but cannot compare a genome to a pangenome anymore

PPanGGOLiN 1.1.136

24 Feb 09:36
Compare
Choose a tag to compare

Bug fixes:

  • cope with gff3 files without '##sequence-region' (such as Anvi'o 's and JGI's) (reported both in #48 and #56)
  • Do not create a new organism when reading fasta files for sequences when the organism was not in the previsouly read annotations (reported in #48)
  • For the 'msa' command, have the --phylo option actually working

PPanGGOLiN 1.1.131

15 Feb 08:50
Compare
Choose a tag to compare
  • New 'fasta' command to write more easily fasta files of parts (or all ) of the pangenome (making #38 , along with other demands, possible)
  • New 'msa' command to compute MSA from specific families ( such as the core genome for additional phylogenetic analysis) (firstly suggested by #28 ). Will do so using only genes that are present in only one copy in each organism for each family.
  • Switching from lz4 to zstd for better compression for .h5 files (about 30% less disk space used with equivalent i/o speed)
  • More unit tests (Thanks to @sletort )

Bug Fixes

  • Cope with RAST-style gene identifiers (RAST gene identifiers were not used previously) (noticed in #44 )
  • Compute spots properly when the contig is circular (spots were not computed in circular contigs previously)(maybe fixes #43 ?)
  • Proper 'softcore' filter behavior when writing the gexf files (filter was too restrictive and did not include all softcore families)

PPanGGOLiN 1.1.96

04 Sep 14:18
Compare
Choose a tag to compare
  • can customize the clustering mode of mmseqs2
  • add the possibility to read pseudogenes in the 'align' submodule
  • defrag with connected component clustering is now the default clustering strategy
    • due to this, there is a new option --no-defrag to use the previous clustering strategy
  • improved option checking

Bug fixes:

  • cope with contigs having identical identifiers in different genomes ( see #34 )
  • can change the duplication margin in 'rgp' without crashing
  • other minor technical bugs in 'align'

PPanGGOLiN 1.1.85

14 May 12:01
a8fbc98
Compare
Choose a tag to compare
  • allows the extraction of spot borders
  • The gene_families.tsv now includes a third column with an 'F' if the gene is considered being a fragment of the gene family (--families_tsv option in the 'write' module)

Bug fixes :

  • fragment information is actually saved in the HDF5 file now (when using --defrag option with 'cluster', 'workflow' or 'panrgp')
  • cope with features not following the 'tag=value' when reading a gff3 file. They will now just be ignored. An error will be raised if a gff3 object does not have the required attributes. (see #29)
  • In case of an error due to file formatting when reading a cluster file (--clusters with 'cluster', 'workflow' or 'panrgp'), the error will now include the line number that caused the error (see #30)

PPanGGOLiN release 1.1.72

25 Mar 10:24
8db3411
Compare
Choose a tag to compare
  • added the 'rgp' subcommand to predict regions of genomic plasticity
  • added the 'spot' subcommand to predict spots of insertion in the pangenome
  • added a few new output files in 'write' related to the two previous commands
  • the 'write' subcommand has been improved
  • added the 'panrgp' workflow. It will probably replace the main 'workflow' command in the future but for the next few versions, they will live side by side.
  • improved logging and added timing at the end of the 'panrgp' workflow.

pangenomes that were computed with previous versions should be compatible with the new commands.