-
Notifications
You must be signed in to change notification settings - Fork 2
Similarity annotations
The similarity annotation file is used to define evolutionary relations between anatomical entities described in the Uberon ontology.
The annotations are currently focused toward the concept of historical homology, meaning that they try to capture which structures are believed to derive from a common ancestral structure.
This annotation file could cover any transitive similarity relations (such as, "functional equivalence"), but currently covers only the concept of historical homology (a homology that is defined by common descent).
Table of contents
- Annotation file fields
- Definitions and requirements for field contents
- Relation between developmental structures
- Status of integration in Bgee
The format of this annotation file is inspired from the GO annotation file format, as well as the procedure to provide supporting information is inspired from the guide to GO evidence codes.
Column | Content | Required? | Cardinality | Example | Comment |
---|---|---|---|---|---|
1 | HOM ID | required | 1 | HOM:0000007 | |
2 | HOM name | optional | 0 or 1 | historical homology | |
3 | entity ID (|entity ID) | required | 1 or greater | UBERON:0000019 | |
4 | entity name (|entity name) | optional | 0 or greater | camera-type eye | |
5 | qualifier | optional | 0 or 1 | NOT | |
6 | taxon ID | required | 1 | 7742 | cardinality might change in the future to handle other relations |
7 | taxon name | optional | 0 or 1 | Vertebrata | cardinality might change in the future to handle other relations |
8 | line type | required | 1 | RAW | possible values are only `RAW` and `SUMMARY` |
9 | ECO ID | special | 0 or 1 | ECO:0000067 | required for `RAW` line type, empty for `SUMMARY` line type |
10 | ECO name | optional | 0 or 1 | developmental similarity evidence | |
11 | CIO ID | required | 1 | CIO:0000003 | |
12 | CIO name | optional | 0 or 1 | high confidence assertion from single evidence | |
13 | reference ID | special | 0 or 1 | ISBN:978-0030223693 | required for `RAW` line type, empty for `SUMMARY` line type |
14 | reference title | special | 0 or 1 | Liem KF, Bemis WE, Walker WF, Grande L, Functional Anatomy of the Vertebrates: An Evolutionary Perspective (2001) p.429 | required for `RAW` line type, empty for `SUMMARY` line type |
15 | supporting text | special | 0 or 1 | ...The eye initially develops as a single median evagination of the diencephalon... | required for `RAW` line type, empty for `SUMMARY` line type |
16 | assigned by | required | 1 | Bgee | |
17 | curator | optional | 0 or 1 | ANN | required for `RAW` line type, empty for `SUMMARY` line type |
18 | date | special | 0 or 1 | 2013-11-29 | required for `RAW` line type, empty for `SUMMARY` line type |
Unique identifier of the similarity concept targeting the entity
(column 3), in the provided taxon
(column 6). The relations come from the ontology of homology and related concepts (see also the related publication in Trends Genet, and the project home)
See columns 3 and 6 for more details. Required field, cardinality 1.
Name of the similarity concept defined by HOM ID
(column 1).
Optional field, cardinality 0 or 1.
Unique identifier(s) of the entity(ies) targeted by the HOM ID
relation (column 1), for the provided taxon
(column 6).
For instance, for the following values:
column 1: HOM:0000007
("historical homology")
column 3: UBERON:0000019
("camera-type eye")
column 6: 7742
("Vertebrata")
The meaning of this annotation is that the structure UBERON:0000019
"camera-type eye" is believed to originate from a common structure, present in the least common ancestor of vertebrates, thus being homologous in the vertebrate clade.
It can be necessary in some cases to provide several entity
identifiers. This is because it is possible for a structure, present in the common ancestor of a clade, to have evolved into different structures, yet homologous. It is for instance the case of the lung (UBERON:0002048) and of the swim bladder (UBERON:0006860), that are believed to originate from a common ancestral structure present in the ancestor of the Euteleostomi (see the annotation file for more details!).
But this ancestral structure is not believed to still exist as such in extant species, and there is no term describing such a structure in the Uberon ontology. In that case, to capture the fact that the lung and the swim bladder are homologous, both their identifiers are used, separated by a pipe (|
), in the form:
UBERON:0002048|UBERON:0006860
It could be argued that the proper way of providing such an annotation would be to create a new term in Uberon, describing this putative ancestral structure. While we agree with this principle, we use the "several identifiers" approach for not depending on the modifications of Uberon that would be required, and to not "pollute" Uberon with many non-existing structures.
Please note that sometimes, Uberon can use a same term to describe structures that are not homologous in some lineages, but rather, analogous. This is for instance the case of the structure UBERON:0000988
"pons": while this structure is present in both Aves and Mammalia, it is thought to have appeared independently in each of these lineages, notably because this structure does not exist in other Amniota.
This evolutionary path is captured through 3 different assertions:
first assertion:
column 1: HOM:0000007
("historical homology")
column 3: UBERON:0000988
("pons")
column 6: 8782
("Aves")
=> The meaning of this assertion is that the structure UBERON:0000988
"pons", as it appears in Aves, is believed to originate from a common structure, present in the least common ancestor of Aves, thus being homologous in the Aves clade. This assertion does not mean that all pons in all clades derived from a structure present in the Aves least common ancestor.
Second assertion:
column 1: HOM:0000007
("historical homology")
column 3: UBERON:0000988
("pons")
column 6: 40674
("Mammalia")
=> The meaning of this assertion is that the structure UBERON:0000988
"pons", as it appears in Mammalia, is believed to originate from a common structure, present in the least common ancestor of Mammalia, thus being homologous in the Mammalia clade. This assertion does not mean that all pons in all clades derived from a structure present in the Mammalia least common ancestor.
Third assertion (negative):
column 1: HOM:0000007
("historical homology")
column 3: UBERON:0000988
("pons")
column 5: NOT
column 6: 32524
("Amniota")
=> formally states that the generic term UBERON:0000988
"pons" is not homologous in Amniota, see column 5 (qualifier) for more details.
This field is required, cardinality 1 or greater.
Name(s) of the entity(ies) defined by entity
(column 3).
If several entities are provided in column 3, the pipe (|
) separator is used, for instance:
lung|swim bladder
Optional field, cardinality 0 or greater
Flag used to negate the interpretation of an annotation. If provided, the only accepted value is NOT
. This is used to capture an information rejecting a putative relation between structures, that could otherwise seem plausible.
For instance, this qualifier is used to capture annotations stating that hindgut (UBERON:0001046
) is not believed to be homologous among Bilateria. Another annotation then states that hindgut is believed to be homologous among Vertebrata.
Optional field, cardinality 0 or 1. If cardinality 1, the only accepted value is NOT
.
The unique identifier of the taxon targeted by the HOM ID
relation (column 1), for the provided entity
(column 3). These identifiers are integers linking to the NCBI taxonomy.
See definition of column 3 for an example of use of taxon
.
Required field, cardinality 1. Note that this cardinality could evolve in the future, to allow the use of other types of similarity relations (for instance, to define in which taxa a structure is functionally equivalent, as this type of relation would not originate from any common ancestor).
Name of taxon
, defined in column 6.
Optional field, cardinality 0 or 1. Note that this cardinality could evolve in the future, to allow the use of other types of similarity relations (for instance, to define in which taxa a structure is functionally equivalent, as this type of relation would not originate from any common ancestor).
Two values are possible for this column: RAW
, and SUMMARY
.
If the value is RAW
, it means that the line corresponds to one single annotation, based on one single evidence, which corresponds to the GO guidelines to capture sources of annotations. Such lines thus correspond to "standard" annotations.
If the value is SUMMARY
, it means that the line corresponds to an automatically generated grouping of several RAW
annotations, targeting the same HOM ID
(column 1), entity
(column 3), taxon
(column 6). The aim is to provide a global confidence code based on multiple evidences, see CIO ID
(column 11) for more details. You can discard these SUMMARY
annotations if you wish.
Required field. Cardinality 1.
Unique identifier from the Evidence Ontology, capturing how the annotation is supported. See the GO evidence code guide for more information.
If line type
(column 8) is equal to RAW
, this field is required, cardinality 1. If line type
is equal to SUMMARY
, this field is not provided, cardinality 0 (as the aim of such lines are to summarize several evidences, that can be retrieved from the individual RAW
annotations).
Name of ECO ID
, defined in column 9.
Optional field, cardinality 0 or 1. If line type
(column 8) is equal to SUMMARY
, this field is not provided, see column 9 description for more details.
Unique identifier from the experimental confidence information ontology. This experimental ontology is an attempt to provide a mean to capture information about the confidence in an assertion. See project home for more details.
If the value of line type
(column 8) is RAW
, then this confidence code can only belong to the "confidence from single evidence" branch. Possible values are then CIO:0000003
("high confidence from single evidence"), CIO:0000004
("medium confidence from single evidence"), and CIO:0000005
("low confidence from single evidence").
If the value of line type
(column 8) is SUMMARY
, then this confidence code can only belong to the "confidence from multiple evidences" branch, as the aim of such lines are to summarize several individual annotations, based on a single evidence.
For the SUMMARY
lines, this confidence code is assigned automatically, using the "single evidence" confidences provided by curators, and using the Evidence Ontology to try to determine whether the evidences used are of a same experimental type, or of different experimental types (which provides a stronger support for the assertion, see the confidence information ontology for more details).
Name of CIO ID
, defined in column 11.
Optional field, cardinality 0 or 1.
Unique identifier of a single source, cited as an authority for asserting the relation. Note that only one reference can be cited on a single line in the annotation file (and only one evidence from this reference can be cited on a single line).
If line type
(column 8) is equal to RAW
, this field is required, cardinality 1. If line type
is equal to SUMMARY
, this field is not provided, cardinality 0 (as the aim of such lines are to summarize several evidences, potentially from several references, that can be retrieved from the individual RAW
annotations).
Title of the reference defined in reference ID
(column 13).
Optional field, cardinality 0 or 1.
A quote from the reference defined in column 13, supporting the annotation. If possible, it should also support the choice of the ECO ID
(column 9).
If line type
(column 8) is equal to RAW
, this field is required, cardinality 1. If line type
is equal to SUMMARY
, this field is not provided.
The database which made the annotation. Used for tracking the source of an individual annotation.
Required field, cardinality 1.
A code allowing to identify the curator who made the annotation, from the database defined in column 16.
Optional field, cardinality 0 or 1.
Date on which the annotation was made. Format is yyyy-MM-dd
.
If line type
(column 8) is equal to RAW
, this field is required, cardinality 1. If line type
is equal to SUMMARY
, this field is not provided, as such lines are generated automatically at each release.
Distinctions based on the developmental state of a same organ can be irrelevant when considering similarity annotations. For instance, terms such as ‘future brain’ and ‘brain’, while relevant when considering the developmental lineage of a structure, correspond to a same common ancestral structure.
The annotations provided here always target entities describing the fully-formed structures, unless a distinction between developmental structures has to be made. Related developmental structures can be retrieved in Uberon using the relations 'transformation_of', and 'immediate_transformation_of'.
When using this annotation file, it is recommended to always group entities described in the entity ID
field (column 3), as well as any entity not annotated, and related to them by a 'transformation_of' or 'immediate_transformation_of' relation.
As of January 2015, Bgee released new data, based on Uberon, which represented an essential step towards the integration of the similarity annotations. New data making use of these annotations, and allowing automatic comparison of gene expression patterns between species, are expected to be released early 2015.