Rosetta is the knowledge map and service invocation tier of the Gamma reasoner.
Rosetta coordinates semantically annotated data sources into a metadata graph. That graph can be queried to generate programs to perform complex data retrieval tasks. It looks like this:
Blue nodes are semantic types from the biolink-model
Install Docker if not installed on your computer.
Make a <workspace>
directory.
$ mkdir <workspace>
$ cd <workspace>
We will setup our enviroment using these enviroment settings. Copy and save them to <workspace>/shared/robokop.env
.
If you would like to run the neo4j and redis-cache instance on the same host as the app you can use these values. If you choose to have Neo4j and Redis hosted on a different host you would change these values.
# neo4j host name
NEO4J_HOST=neo4j
# cache host name
CACHE_HOST=request_cache
If you wish to test robokopkg instance on a smaller computer, you can modify the following values to fit your hardware, but the default values are the ones used on RobokopKG.
NEO4J_HEAP_MEMORY
NEO4J_HEAP_MEMORY_INIT
NEO4J_CACHE_MEMORY
And finally, set the password variables found on the bottom section of the file.
Run the following to make sure that your terminal is set up with the enviroment variable before running docker commands.
$ set -a
$ source <workspace>/shared/robokop.env
From the <workspace>
directory, clone the repository.
$ git clone https://github.com/NCATS-Gamma/robokop-interfaces.git
$ cd robokop-interfaces
The graph and concept map will be stored in a Neo4j server instance. Start the Neo4j instance with:
[robokop-interfaces/] $ docker-compose -f deploy/graph/docker-compose.yml up -d
Optionally you can load our latest build of the knowledge graph available at RobokopKG. Once you download a version of dump file best suited, run the following commands:
[robokop-interfaces/] $ cp <dump_file> <workspace>/neo4j_data/
[robokop-interfaces/] $ cd deploy/graph
[robokop-interfaces/deploy/graph] $ ../robokopkg/scripts/reload.sh -f <dump_file_name_only> -c ../robokopkg/scripts/docker-compose-backup.yml
[robokop-interfaces/deploy/graph] $ cd <workspace>/robokop-interfaces/
#####NOTE: After building or loading the graph run docker exec -it interfaces python robokop-interfaces/scripts/setup_neo4j_index.py
Start the Redis container.
[robokop-interfaces/] $ docker-compose -f deploy/cache/docker-compose.yml up -d
Now that the backend for the App is up we can start the app containers.
We need to build the container with the current user and group permissions so that log file ownership and the code directory does not get elevated.
[robokop-interfaces/] $ cd deploy
[robokop-interfaces/deploy] $ docker build --build-arg UID=$(id -u) --build-arg GID=$(id -g) -t robokop_interfaces .
[robokop-interfaces/deploy] $ docker-compose up -d
If you have not imported database dump into your neo4j instance, you will need to run the following command to initialize the type graph. This imports the graph of Translator services, overlays local service configurations, and imports locally defined services. It configures all of these according to the biolink-model.
$ docker exec $(docker ps -f name=interfaces -q) bash -c "source robokop-interfaces/deploy/setenv.sh && robokop-interfaces/initialize_type_graph.sh"
Via the Neo4J interface at http://localhost:7474/browser/ query the entire type graph:
match (m)--(n) return *
Query a particular path:
MATCH (n:Concept{name:'named_thing')-[a]->(d:Concept{name:'disease'})-[b]->(g:Concept{name:'gene'}) RETURN *
In the returned graph, nodes are biolink-model concepts and edges contain attributes indicating the service to invoke.
The web API presents two endpoints:
Given a drug name and a disease name, it returns a knowledge graph of the clinical outcome pathway.
Each edge includes:
- subj : A subject
- pred : A predicate indicating the relation of the subject to the object.
- obj : An object of the relation.
- pmids : One or more PubMed identifiers relevant to the statement.
Each node includes:
- id : A numeric identifier used as a link to edges in the same graph.
- identifier : A curie identifying an instance in an ontology.
- type : A biolink-model type for the object.
Given inputs and a Cypher query representing a shortest path between two concepts, generate a graph of items. More complex graphs can be composed by iteratively invoking this endpoint.
- inputs : A key value pair where the key is a biolink-model concept and the value is a comma separated list of curies. eg, concept=curie:id[,curie:id]
- query : A cypher query returning a path.
This simple snippet demonstrates usage via the Python API:
from greent.rosetta import Rosetta
rosetta = Rosetta ()
knowledge_graph = rosetta.construct_knowledge_graph(**{
"inputs" : {
"disease" : [
"DOID:2841"
]
},
"query" :
"""MATCH (a:disease),(b:gene), p = allShortestPaths((a)-[*]->(b))
WHERE NONE (r IN relationships(p) WHERE type(r) = 'UNKNOWN' OR r.op is null)
RETURN p"""
})
We cache in Redis. Objects are serialized using Python's pickle scheme.
To find out what operations(id) combinations are cached:
$ docker exec $(docker ps -f name=request_cache -q) redis-cli -p $CACHE_PORT -a $CACHE_PASSWORD --raw keys '*'
To delete specific keys or patterns of keys from the cache:
$ docker exec $(docker ps -f name=request_cache -q) redis-cli -p $CACHE_PORT -a $CACHE_PASSWORD --raw keys '*' | \
xargs docker exec $(docker ps -f name=request_cache -q) redis-cli -p $CACHE_PORT -a $CACHE_PASSWORD --raw del
To add a data source to the knowledge map:
- Reuse or develop a smartAPI interface to your data.
- Publish a public network endpoint to the API if none exists.
- Register your smartAPI at the Translator Registry.
- For now, build a Python stub to your service. Soon, we hope to derive this information from the registry to invoke services programmatically. For an example stub, see the CTD service. This is a stub for this smartAPI endpoint.
- Add your service endpoint URL to the configuration files following the CTD pattern.
- Add to greent.conf used for local development.
- And greent-dev.conf used in the continuous integration environment.
Instantiate your service, following the lazy loading pattern, in core.py
This YAML file links types in the biolink-model. Each link includes a predicate and the name of an operation. Operations are named:
<objectName>.<methodName>
where objectName
is a member of core.py, the central service manager.
- Find the
@operators
tag in the configuration file. - Find the biolink-model element for the source type to your service.
- Follow the pattern in the configuration to enter your predicate (link) and operator (op)
$ PYTHONPATH=$PWD/.. rosetta.py --delete-type-graph --initialize-type-graph --debug
You should now be able to write cypher queries for Rosetta that use the biolink-model names specified in the rosetta.yml config file that are connected by your new service.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.