Skip to content

Similarity annotations

fbastian edited this page Jan 7, 2015 · 5 revisions

The similarity annotation file is used to define evolutionary relations between anatomical entities described in the Uberon ontology.

The annotations are currently focused toward the concept of historical homology, meaning that they try to capture which structures are believed to derive from a common ancestral structure.

This annotation file could cover any transitive similarity relations (such as, "functional equivalence"), but currently covers only the concept of historical homology (a homology that is defined by common descent).

Table of contents

Annotation file fields

The format of this annotation file is inspired from the GO annotation file format, as well as the procedure to provide supporting information is inspired from the guide to GO evidence codes.

Column Content Required? Cardinality Example Comment
1 HOM ID required 1 HOM:0000007
2 HOM name optional 0 or 1 historical homology
3 entity ID (|entity ID) required 1 or greater UBERON:0000019
4 entity name (|entity name) optional 0 or greater camera-type eye
5 qualifier optional 0 or 1 NOT
6 taxon ID required 1 7742 cardinality might change in the future to handle other relations
7 taxon name optional 0 or 1 Vertebrata cardinality might change in the future to handle other relations
8 line type required 1 RAW possible values are only `RAW` and `SUMMARY`
9 ECO ID special 0 or 1 ECO:0000067 required for `RAW` line type, empty for `SUMMARY` line type
10 ECO name optional 0 or 1 developmental similarity evidence
11 CIO ID required 1 CIO:0000003
12 CIO name optional 0 or 1 high confidence assertion from single evidence
13 reference ID special 0 or 1 ISBN:978-0030223693 required for `RAW` line type, empty for `SUMMARY` line type
14 reference title special 0 or 1 Liem KF, Bemis WE, Walker WF, Grande L, Functional Anatomy of the Vertebrates: An Evolutionary Perspective (2001) p.429 required for `RAW` line type, empty for `SUMMARY` line type
15 supporting text special 0 or 1 ...The eye initially develops as a single median evagination of the diencephalon... required for `RAW` line type, empty for `SUMMARY` line type
16 assigned by required 1 Bgee
17 curator optional 0 or 1 ANN required for `RAW` line type, empty for `SUMMARY` line type
18 date special 0 or 1 2013-11-29 required for `RAW` line type, empty for `SUMMARY` line type

Definitions and requirements for field contents

HOM ID (column 1)

Unique identifier of the similarity concept targeting the entity (column 3), in the provided taxon (column 6). The relations come from the ontology of homology and related concepts (see also the related publication in Trends Genet, and the project home)

See columns 3 and 6 for more details. Required field, cardinality 1.

HOM name (column 2)

Name of the similarity concept defined by HOM ID (column 1).

Optional field, cardinality 0 or 1.

Entity (column 3)

Unique identifier(s) of the entity(ies) targeted by the HOM ID relation (column 1), for the provided taxon (column 6).

For instance, for the following values:

column 1: HOM:0000007 ("historical homology") column 3: UBERON:0000019 ("camera-type eye") column 6: 7742 ("Vertebrata")

The meaning of this annotation is that the structure UBERON:0000019 "camera-type eye" is believed to originate from a common structure, present in the least common ancestor of vertebrates, thus being homologous in the vertebrate clade.

It can be necessary in some cases to provide several entity identifiers. This is because it is possible for a structure, present in the common ancestor of a clade, to have evolved into different structures, yet homologous. It is for instance the case of the lung (UBERON:0002048) and of the swim bladder (UBERON:0006860), that are believed to originate from a common ancestral structure present in the ancestor of the Euteleostomi (see the annotation file for more details!). But this ancestral structure is not believed to still exist as such in extant species, and there is no term describing such a structure in the Uberon ontology. In that case, to capture the fact that the lung and the swim bladder are homologous, both their identifiers are used, separated by a pipe (|), in the form:

UBERON:0002048|UBERON:0006860

It could be argued that the proper way of providing such an annotation would be to create a new term in Uberon, describing this putative ancestral structure. While we agree with this principle, we use the "several identifiers" approach for not depending on the modifications of Uberon that would be required, and to not "pollute" Uberon with many non-existing structures.

Please note that sometimes, Uberon can use a same term to describe structures that are not homologous in some lineages, but rather, analogous. This is for instance the case of the structure UBERON:0000988 "pons": while this structure is present in both Aves and Mammalia, it is thought to have appeared independently in each of these lineages, notably because this structure does not exist in other Amniota.

This evolutionary path is captured through 3 different assertions: first assertion: column 1: HOM:0000007 ("historical homology") column 3: UBERON:0000988 ("pons") column 6: 8782 ("Aves") => The meaning of this assertion is that the structure UBERON:0000988 "pons", as it appears in Aves, is believed to originate from a common structure, present in the least common ancestor of Aves, thus being homologous in the Aves clade. This assertion does not mean that all pons in all clades derived from a structure present in the Aves least common ancestor.

Second assertion: column 1: HOM:0000007 ("historical homology") column 3: UBERON:0000988 ("pons") column 6: 40674 ("Mammalia") => The meaning of this assertion is that the structure UBERON:0000988 "pons", as it appears in Mammalia, is believed to originate from a common structure, present in the least common ancestor of Mammalia, thus being homologous in the Mammalia clade. This assertion does not mean that all pons in all clades derived from a structure present in the Mammalia least common ancestor.

Third assertion (negative): column 1: HOM:0000007 ("historical homology") column 3: UBERON:0000988 ("pons") column 5: NOT column 6: 32524 ("Amniota") => formally states that the generic term UBERON:0000988 "pons" is not homologous in Amniota, see column 5 (qualifier) for more details.

This field is required, cardinality 1 or greater.

Entity name (column 4)

Name(s) of the entity(ies) defined by entity (column 3). If several entities are provided in column 3, the pipe (|) separator is used, for instance:

lung|swim bladder

Optional field, cardinality 0 or greater

Qualifier (column 5)

Flag used to negate the interpretation of an annotation. If provided, the only accepted value is NOT. This is used to capture an information rejecting a putative relation between structures, that could otherwise seem plausible.

For instance, this qualifier is used to capture annotations stating that hindgut (UBERON:0001046) is not believed to be homologous among Bilateria. Another annotation then states that hindgut is believed to be homologous among Vertebrata.

Optional field, cardinality 0 or 1. If cardinality 1, the only accepted value is NOT.

Taxon (column 6)

The unique identifier of the taxon targeted by the HOM ID relation (column 1), for the provided entity (column 3). These identifiers are integers linking to the NCBI taxonomy.

See definition of column 3 for an example of use of taxon.

Required field, cardinality 1. Note that this cardinality could evolve in the future, to allow the use of other types of similarity relations (for instance, to define in which taxa a structure is functionally equivalent, as this type of relation would not originate from any common ancestor).

Taxon name (column 7)

Name of taxon, defined in column 6.

Optional field, cardinality 0 or 1. Note that this cardinality could evolve in the future, to allow the use of other types of similarity relations (for instance, to define in which taxa a structure is functionally equivalent, as this type of relation would not originate from any common ancestor).

line type (column 8)

Two values are possible for this column: RAW, and SUMMARY.

If the value is RAW, it means that the line corresponds to one single annotation, based on one single evidence, which corresponds to the GO guidelines to capture sources of annotations. Such lines thus correspond to "standard" annotations.

If the value is SUMMARY, it means that the line corresponds to an automatically generated grouping of several RAW annotations, targeting the same HOM ID (column 1), entity (column 3), taxon (column 6). The aim is to provide a global confidence code based on multiple evidences, see CIO ID (column 11) for more details. You can discard these SUMMARY annotations if you wish.

Required field. Cardinality 1.

ECO ID (column 9)

Unique identifier from the Evidence Ontology, capturing how the annotation is supported. See the GO evidence code guide for more information.

If line type (column 8) is equal to RAW, this field is required, cardinality 1. If line type is equal to SUMMARY, this field is not provided, cardinality 0 (as the aim of such lines are to summarize several evidences, that can be retrieved from the individual RAW annotations).

ECO name (column 10)

Name of ECO ID, defined in column 9.

Optional field, cardinality 0 or 1. If line type (column 8) is equal to SUMMARY, this field is not provided, see column 9 description for more details.

CIO ID (column 11)

Unique identifier from the experimental confidence information ontology. This experimental ontology is an attempt to provide a mean to capture information about the confidence in an assertion. See project home for more details.

If the value of line type (column 8) is RAW, then this confidence code can only belong to the "confidence from single evidence" branch. Possible values are then CIO:0000003 ("high confidence from single evidence"), CIO:0000004 ("medium confidence from single evidence"), and CIO:0000005 ("low confidence from single evidence").

If the value of line type (column 8) is SUMMARY, then this confidence code can only belong to the "confidence from multiple evidences" branch, as the aim of such lines are to summarize several individual annotations, based on a single evidence. For the SUMMARY lines, this confidence code is assigned automatically, using the "single evidence" confidences provided by curators, and using the Evidence Ontology to try to determine whether the evidences used are of a same experimental type, or of different experimental types (which provides a stronger support for the assertion, see the confidence information ontology for more details).

CIO name (column 12)

Name of CIO ID, defined in column 11.

Optional field, cardinality 0 or 1.

Reference ID (column 13)

Unique identifier of a single source, cited as an authority for asserting the relation. Note that only one reference can be cited on a single line in the annotation file (and only one evidence from this reference can be cited on a single line).

If line type (column 8) is equal to RAW, this field is required, cardinality 1. If line type is equal to SUMMARY, this field is not provided, cardinality 0 (as the aim of such lines are to summarize several evidences, potentially from several references, that can be retrieved from the individual RAW annotations).

Reference title (column 14)

Title of the reference defined in reference ID (column 13).

Optional field, cardinality 0 or 1.

Supporting text (column 15)

A quote from the reference defined in column 13, supporting the annotation. If possible, it should also support the choice of the ECO ID (column 9).

If line type (column 8) is equal to RAW, this field is required, cardinality 1. If line type is equal to SUMMARY, this field is not provided.

Assigned by (column 16)

The database which made the annotation. Used for tracking the source of an individual annotation.

Required field, cardinality 1.

Curator (column 17)

A code allowing to identify the curator who made the annotation, from the database defined in column 16.

Optional field, cardinality 0 or 1.

Date (column 18)

Date on which the annotation was made. Format is yyyy-MM-dd.

If line type (column 8) is equal to RAW, this field is required, cardinality 1. If line type is equal to SUMMARY, this field is not provided, as such lines are generated automatically at each release.

Relation between developmental structures

Distinctions based on the developmental state of a same organ can be irrelevant when considering similarity annotations. For instance, terms such as ‘future brain’ and ‘brain’, while relevant when considering the developmental lineage of a structure, correspond to a same common ancestral structure.

The annotations provided here always target entities describing the fully-formed structures, unless a distinction between developmental structures has to be made. Related developmental structures can be retrieved in Uberon using the relations 'transformation_of', and 'immediate_transformation_of'.

When using this annotation file, it is recommended to always group entities described in the entity ID field (column 3), as well as any entity not annotated, and related to them by a 'transformation_of' or 'immediate_transformation_of' relation.

Status of integration in Bgee

As of January 2015, Bgee released new data, based on Uberon, which represented an essential step towards the integration of the similarity annotations. New data making use of these annotations, and allowing automatic comparison of gene expression patterns between species, are expected to be released early 2015.