Skip to content

Latest commit

 

History

History
83 lines (56 loc) · 7.09 KB

README.md

File metadata and controls

83 lines (56 loc) · 7.09 KB

kurimu

Kurīmu (meaning “cream” in Japanese) is a highly curated pangenome data collection, with metrics.

Kurīmu クリーム

two-parts-v1

About the Database

クリーム

Kurīmu (meaning "cream" in Japanese) is a pangenome data collection for public access. Kurīmu is a highly curated pangenome collection that provides consistent and validated information needed to document published reports for a wide variety of pangenomes at various taxonomic levels. This information was obtained from hundreds of research papers published in peer reviewed scientific journals and incorporated into a database for consistency and reproducibility of the reported and/or adapted results.

Information Provided

The Kurīmu data collection offers a variety of organized and ready to use information for hundreds of pangenomes, corresponding to over 20,000 individual genome sequences. This collection can serve as an entry point for any pangenome property by taxon, pangenome size, NCBI taxonomy identifier, and citations in the literature. In Kurīmu, information has been harvested for pangenomes at all taxonomic levels. Fields are shown as follows:

-Pangenome: The official scientific name of the taxon; please note that in some cases there are phenotypic (polyphyletic) groups, such as photosynthetic prokaryotes

-Unique_ID: A unique identifier for the organism constructed with MD5 using the Pangenome (name), Effective (size) and Reference (first author name and year of publication)

-NCBI_txid: The Taxonomy identifier provided by the NCBI database linking to other data resources

-Effective: The ‘true’ number of genomes used for the calculation of pangenome parameters

-Level: The taxonomic level that this organism belongs to

-Pan: The pangenome size of the entry

-Core: The number of core genes of the entry

-Peripheral: The number of peripheral genes of the entry

-Unique: The number of unique genes of the entry

-Core_pan: The percentage of the pangenome belonging to the core set (an index of coherence: the higher, the tighter the pangenome)

-Shell_eff: The ratio of unique genes per genome (an index of ‘uniqueness’/dispersion: the lower, the tighter the pangenome)

-Reference: First author surname and year of publication for the published report

-DOI: digital object identifier for the corresponding publication

-Gene_cluster: signifies whether the pangenome partitioning refers to traditional protein family clusters (C, red in figure above) or the more recent adoption of the term for gene-level variation (G, green in figure above)

The boolean fields DS1-DS4 correspond to subsets (see Table 3, original publication), in lieu of Data Supplements: DS1 for all pangenomes; DS2 for gene-level pangenomes and missing values for family clusters; DS3 when C+P+U=T (see Box 2, original publication); DS4 for duplicate entries with variable counts for pangenome sets.

How to access

Hit the link to start browsing Kurīmu.

List of pangenome analysis methods

A K panX
AGAPE KinFin PGAdb-builder
... M PanTools
B MCL Phandango
BPGA Mugsy-A PanWeb †
Bloom FT MetaRef † PanViz
BGDMdocker micropan Panaconda
... MSPminer PanGeT
C ... PanACEA
CAMBer N PanGeneHome
... NGSPanPipe Piggy
E ... PanVC
EDGAR P PGAP-X
eCAMBer PanCGH † ...
EUPAN progMauve R
... Panseq † Roary
G PGAT RPAN
get_phylomarkers PanOCT ...
get_homologues PGAP S
... PanCake SOP (pg) †
H PanFunPro SplitMEM
Hierarchicalsets Pannotator † seqana
Harvest PanGP † Scoary
... PanTetris † seq-seq-pan
I PanFP ...
ITEP Prokka V
... PanCoreGen ‘VarDetPGI’