A fast and safe crate for working with biomedical ontologies.
We provide examples of loading ontology and its subsequent usage in applications.
ontolius
can load ontology from Obographs JSON file.
For the sake of this example, we use
flate2
to decompress a gzipped JSON on the fly.
We can load a toy version of HPO from a JSON file as follows:
use std::fs::File;
use std::io::BufReader;
use flate2::bufread::GzDecoder;
use ontolius::io::OntologyLoaderBuilder;
use ontolius::ontology::csr::MinimalCsrOntology;
// Load a toy Obographs file from the repo
let path = "resources/hp.small.json.gz";
// Configure the loader to parse the input as an Obographs file
let loader = OntologyLoaderBuilder::new()
.obographs_parser()
.build();
let reader = GzDecoder::new(BufReader::new(File::open(path).unwrap()));
let hpo: MinimalCsrOntology = loader.load_from_read(reader)
.expect("HPO should be loaded");
We loaded HPO from a toy JSON file into [crate::ontology::csr::MinimalCsrOntology
].
The loading includes parsing terms and edges from the Obographs file
and construction of the ontology graph.
In case of MinimalCsrOntology
,
the graph is backed by a compressed sparse row (CSR) adjacency matrix.
See [crate::io::OntologyLoader
] for more info on loading.
In the previous section, we loaded an ontology from Obographs JSON file.
Now we have an instance of [crate::ontology::csr::MinimalCsrOntology
] that can
be used for various tasks.
MinimalCsrOntology
implements [crate::ontology::OntologyTerms
] trait,
to support retrieval of specific terms by its index or TermId
, and to iterate
over all terms and TermId
s.
We can get a term by its TermId
:
# use std::fs::File;
# use std::io::BufReader;
# use flate2::bufread::GzDecoder;
# use ontolius::io::OntologyLoaderBuilder;
# use ontolius::ontology::csr::MinimalCsrOntology;
# let loader = OntologyLoaderBuilder::new().obographs_parser().build();
# let reader = GzDecoder::new(BufReader::new(File::open("resources/hp.small.json.gz").unwrap()));
# let hpo: MinimalCsrOntology = loader.load_from_read(reader)
# .expect("HPO should be loaded");
#
use ontolius::TermId;
use ontolius::term::MinimalTerm;
use ontolius::ontology::OntologyTerms;
// `HP:0001250` corresponds to `Arachnodactyly``
let term_id: TermId = "HP:0001166".parse().unwrap();
// Get the term by its term ID ...
let term = hpo.term_by_id(&term_id);
assert!(term.is_some());
/// ... and check its name.
let term = term.unwrap();
assert_eq!(term.name(), "Arachnodactyly");
or iterate over the all ontology terms or their corresponding term IDs:
# use std::fs::File;
# use std::io::BufReader;
# use flate2::bufread::GzDecoder;
# use ontolius::io::OntologyLoaderBuilder;
# use ontolius::ontology::csr::MinimalCsrOntology;
# let loader = OntologyLoaderBuilder::new().obographs_parser().build();
# let reader = GzDecoder::new(BufReader::new(File::open("resources/hp.small.json.gz").unwrap()));
# let hpo: MinimalCsrOntology = loader.load_from_read(reader)
# .expect("HPO should be loaded");
#
use ontolius::ontology::OntologyTerms;
// The toy HPO contains 614 terms and primary term ids,
let terms: Vec<_> = hpo.iter_terms().collect();
assert_eq!(terms.len(), 614);
assert_eq!(hpo.iter_term_ids().count(), 614);
// and the total of 1121 term ids (primary + obsolete)
assert_eq!(hpo.iter_all_term_ids().count(), 1121);
See [crate::ontology::OntologyTerms
] trait for more details.
ontolius
enables to leverage the ontology hierarchy
via several traits. This typically includes iteration over term's parents, ancestors, children, or descendants.
The [crate::ontology::HierarchyWalks
] trait supports iterating through [TermId
]s whereas [crate::ontology::HierarchyTraversals
] enables iteration over ontology graph indices. Iterating over indices is slightly faster and can be useful if we do not really care about the actual term IDs (e.g. to test if term a
is an ancestor of term b
).
The [crate::ontology::HierarchyQueries
] simplifies testing if term a
is a parent, child, ancestor, or descendant of term b
.
In all cases, the hierarchy is represented as a directed acyclic graph that is built from is_a
relationships.
Let's see how to use the ontology hierarchy. For instance, we can use [crate::ontology::HierarchyWalks::iter_parent_ids
] to get parent ids of a term:
# use std::fs::File;
# use std::io::BufReader;
# use flate2::bufread::GzDecoder;
# use ontolius::io::OntologyLoaderBuilder;
# use ontolius::ontology::csr::MinimalCsrOntology;
# use ontolius::TermId;
# use ontolius::term::MinimalTerm;
# let loader = OntologyLoaderBuilder::new().obographs_parser().build();
# let reader = GzDecoder::new(BufReader::new(File::open("resources/hp.small.json.gz").unwrap()));
# let hpo: MinimalCsrOntology = loader.load_from_read(reader)
# .expect("HPO should be loaded");
use ontolius::ontology::{HierarchyWalks, OntologyTerms};
let arachnodactyly: TermId = "HP:0001166".parse()
.expect("CURIE should be valid");
let parent_names: Vec<_> = hpo.iter_parent_ids(&arachnodactyly)
.map(|idx| hpo.term_by_id(idx).expect("A term for a term ID obtained from ontology should always be present"))
.map(MinimalTerm::name)
.collect();
assert_eq!(vec!["Slender finger", "Long fingers"], parent_names);
We first create the TermId
that corresponds to Arachnodactyly and then we query hpo
for its parents by calling iter_parent_ids
. We retrieve the term that corresponds to term id, extract its name, and collect the names into a vector.
Similar methods exist for getting term IDs of ancestors, children, and descendants of a term. See [crate::ontology::HierarchyWalks
] for more info.
At this time, support for the following ontologies is tested:
- Human Phenotype Ontology (HPO)
- Gene Ontology (GO)
- Medical Action Ontology (MAxO)
Other ontologies are very likely to work too. In case of any problems, please let us know on our Issue tracker.
Ontolius includes several features, with the features marked by (*)
being enabled
by default:
csr
(*)
- include [crate::ontology::csr
] module with implementation of ontology with graph backed by a CSR adjacency matrixobographs
(*)
- support loading Ontology from Obographs JSON filepyo3
- include [crate::py
] module with PyO3 bindings to selected data structs to support using from Python
The tests can be run by invoking:
cargo test
We use criterion
for crate benchmarks.
Run the following to run the bench suite:
cargo bench
The benchmark report will be written into the target/criterion/report
directory.