Releases: labgem/PPanGGOLiN
PPanGGOLiN 2.0.0
New commands
-
projection: to annotate external genomes using an existing pre-computed reference pangenome (#119, see doc).
-
rgp_cluster: to cluster RGP based on their gene family content (#117, see doc).
-
metadata: add metadata linked to various pangenome elements using simple TSV files (#111, see doc).
-
the write command is split in two commands (#140):
-
utils: a small side command to generate a default configuration file for any commands (#112, see doc).
New features
- A new, improved documentation hosted by readthedoc replacing the github wiki.
- GFF export of genomes with pangenome annotation (#139, see doc).
- JSON Map for Proksee to visualize interactively each genome and their pangenome annotation (#139, see doc).
- Configuration file can now be used to set all or some parameters of PPanGGOLiN commands (#112, see doc).
Major change
BREAKING: New structure of the pangenome file to make it much lighter and faster to read (#110).
Minor change
- Replacing Prodigal by pyrodigal for the annotation command (#138).
- The context command has a window parameter to define the number of neighboring genes that are considered on each side of a gene of interest when searching for contexts (#137, see doc).
- Replace all option keyword by synteny option keyword for
draw –spots
to draw spots with different RGP syntenies. Now all will draw all pangenome spots (#129)
Bug Fixes
- Writing out only the RGP and spot of the gene with
--projection
(#130). Please note that, in version 2, the--projection
parameter in thewrite
command has been renamed to--table
and now belongs to thewrite_genomes
command (check the documentation of the write_genomes command for more details). - Make deterministic clustering (#116)
PPanGGOLiN 1.2.105
A lot of the code was rewritten, but that should be relatively transparent for users
Bugfixes
- Shell subpartitions are properly saved in the HDF5 file and used in the different figures
- --meta option for
annotate
to annotate genomes using the metagenome mode of prodigal - --single_copy for
msa
to compute MSA using 'mostly single copy' persistent genes only - Cope with drawing more than 2000 identical spots in the same figure
PPanGGOLiN 1.2.74
New commands
metrics
to compute a list of metrics about the pangenome.
New features
- The projection option from
write
subcommand, can give now, information about RGPs, Spots and modules - With the new command
metrics
is now possible to compute the genomic
fluidity of the pangenome. - With the new command
metrics
is now possible to get some more information
about the module. This information will be computed and shown as statistics on families and partitions of modules - All the metrics are saved in pangenome file and could be print with
info
subcommand.
Bug fixes
- fix crazy cluster assignment in clustering step
- fix align when using a pangenome constructed from user-provided
annotations with prokka-like identifiers - fix check information on few subcommand option to help user
PPanGGOLiN 1.2.63
- identifiers used in provided annotation files (gff or gbff, through --anno) will be used by default, unless they are not unique within the pangenome
- additional column in
context
output indicating family partition - Always save the gene sequences when building a pangenome (which gives more flexibility when doing additional analysis with ppanggolin)
Bugfixes
- fix a bug preventing you from doing a new clustering if partitions were not computed
- fix genome sizes drawn with
draw --spots
PPanGGOLiN 1.2.46
New commands
module
to predict conserved modules in variable parts of a pangenomecontext
to find which gene families are conserved in the same genomic context than sequences of interestall
to run all possible analysis with PPanGGOLiN.panmodule
to run the panModule workflow
Bug fixes
- improved pseudogene reading and gff/gbff parsing
- fixed gff parser to cope with bakta gff files (reported in #66)
- fixed gexf formatting in the rare case of having '&' in the 'product' field of gene annotations (reported in #61)
- fixed rare crash happening when a partition has only 1 gene family ( see #64 )
- fixed compilation issue with gcc 10.* and above (reported in #69 )
Other:
- Allow to compute K=2 if forced by the user in
partition
orrarefaction
(by default, K is still picked between 3 and 20). (see #65 ) - removed R, rpy2 and genoPlot-R dependencies (#47 shall never be a problem anymore)
- added a new bokeh dependency
- remove
spot --draw_hotspots
and related options. To realize the same thing, usedraw --spots
once the spots have been computed. - added a
--spots
option todraw
to have interactive figures for spots of interest, replacing the former figures drawn with R. align
can compare a set of sequences of interest to a pangenome, and draw related elements, but cannot compare a genome to a pangenome anymore
PPanGGOLiN 1.1.136
Bug fixes:
- cope with gff3 files without '##sequence-region' (such as Anvi'o 's and JGI's) (reported both in #48 and #56)
- Do not create a new organism when reading fasta files for sequences when the organism was not in the previsouly read annotations (reported in #48)
- For the 'msa' command, have the --phylo option actually working
PPanGGOLiN 1.1.131
- New 'fasta' command to write more easily fasta files of parts (or all ) of the pangenome (making #38 , along with other demands, possible)
- New 'msa' command to compute MSA from specific families ( such as the core genome for additional phylogenetic analysis) (firstly suggested by #28 ). Will do so using only genes that are present in only one copy in each organism for each family.
- Switching from lz4 to zstd for better compression for .h5 files (about 30% less disk space used with equivalent i/o speed)
- More unit tests (Thanks to @sletort )
Bug Fixes
- Cope with RAST-style gene identifiers (RAST gene identifiers were not used previously) (noticed in #44 )
- Compute spots properly when the contig is circular (spots were not computed in circular contigs previously)(maybe fixes #43 ?)
- Proper 'softcore' filter behavior when writing the gexf files (filter was too restrictive and did not include all softcore families)
PPanGGOLiN 1.1.96
- can customize the clustering mode of mmseqs2
- add the possibility to read pseudogenes in the 'align' submodule
- defrag with connected component clustering is now the default clustering strategy
- due to this, there is a new option --no-defrag to use the previous clustering strategy
- improved option checking
Bug fixes:
- cope with contigs having identical identifiers in different genomes ( see #34 )
- can change the duplication margin in 'rgp' without crashing
- other minor technical bugs in 'align'
PPanGGOLiN 1.1.85
- allows the extraction of spot borders
- The gene_families.tsv now includes a third column with an 'F' if the gene is considered being a fragment of the gene family (--families_tsv option in the 'write' module)
Bug fixes :
- fragment information is actually saved in the HDF5 file now (when using --defrag option with 'cluster', 'workflow' or 'panrgp')
- cope with features not following the 'tag=value' when reading a gff3 file. They will now just be ignored. An error will be raised if a gff3 object does not have the required attributes. (see #29)
- In case of an error due to file formatting when reading a cluster file (--clusters with 'cluster', 'workflow' or 'panrgp'), the error will now include the line number that caused the error (see #30)
PPanGGOLiN release 1.1.72
- added the 'rgp' subcommand to predict regions of genomic plasticity
- added the 'spot' subcommand to predict spots of insertion in the pangenome
- added a few new output files in 'write' related to the two previous commands
- the 'write' subcommand has been improved
- added the 'panrgp' workflow. It will probably replace the main 'workflow' command in the future but for the next few versions, they will live side by side.
- improved logging and added timing at the end of the 'panrgp' workflow.
pangenomes that were computed with previous versions should be compatible with the new commands.