SLIME (Soil LIfe MEtaweb) is a knowledge graph on the trophic ecology of soil organisms that integrates several open databases covering major taxonomic groups across multiple trophic levels. SLIME allows users to find relevant information across integrated datasets and facilitates the reconstruction of local food webs based on local co-occurrence or co-abundance data.
(For more details on the content of these files, see https://nleguillarme.github.io/inteGraph/manual.html#create-a-new-project)
- graph.cfg: a INI file used to configure the URI of the knowledge graph, the path to the directory containing the data source configuration files, the triplestore connection and the ontologies that will be used to annotate the integrated data.
- connections.json: a JSON configuration file used for storing credentials and other information necessary for connecting to external services (e.g. the GloBi API).
- config-morph.ini: a INI file used to configure the RDF materialization process.
- sources: a directory contaning the configuration and mapping files for the different data sources.
- graphdb: a directory containing a Makefile to help you set up an instance of the GraphDB Free triplestore.
- LICENSE: a file containing the licence text.
- README.md: this file.
Clone this repository using the following command:
$ git clone https://github.com/nleguillarme/SLIME.git
inteGraph is a toolbox that helps you build and execute biodiversity data integration pipelines to create RDF knowledge graphs from multiple data sources. inteGraph pipelines are defined in configuration files. We provide one such configuration file per data source in the sources
directory of this repository.
To install inteGraph:
- Clone the project repository
$ git clone https://github.com/nleguillarme/inteGraph.git
- Run install.sh
$ cd inteGraph ; sh install.sh
Some data sources do not provide an API or URL for downloading datasets programatically. You will need to download these datasets manually.
Dataset | URL | Copy data file to |
---|---|---|
BETSI | Download link | SLIME/sources/betsi/data |
FungalTraits | Download link | SLIME/sources/fungaltraits/data |
GlobalAnts | Download link | SLIME/sources/global_ants/data |
After downloading the datasets, ensure that the correct file path is configured for each source (check the [extract.file]
section in the source.cfg file for each source):
[extract.file]
file_path=<path-to-the-data-file>
We provide a Makefile to help you set up an instance of GraphDB Free in a docker container. You will need docker, docker-compose and make installed on your machine.
- Move to the graphdb directory
$ cd graphdb
- Run the following command to build a docker image for GraphDB Free
$ make build
- Run the following command to load the ontology into a new repository called
slime
(N.B. this may take some time)
$ make load
- Start GraphDB by running the following command
$ make start
The GraphDB Workbench is accessible at http://localhost:7200/.
Configure the connection to the repository in the [load]
section of graph.cfg:
[load]
id=graphdb
conn_type=http
host=172.17.0.1
port=7200
user=<user-login-if-any>
password=<user-password-if-any>
repository=slime
To run inteGraph, execute the following commands:
$ cd inteGraph
$ export INTEGRAPH__CONFIG__HOST_CONFIG_DIR=<path-to-this-repository> ; make up
Make sure you replace <path-to-this-repository>
in the command with the path to your local copy of this repository.
This will start an instance of Apache Airflow, which can be found at http://localhost:8080/home.
The DAG tab lists all the pipelines generated from the configuration files:
To execute a pipeline, click on the Pause/Unpause DAG button on the left-hand side. Then click on the pipeline name to monitor its execution.
After triggering the pipeline, it will start running and you will see its current state represented by colors.
A failed task appears in red in the interface. It’s not uncommon for tasks to fail, which could be for a multitude of reasons (e.g., an external service is down, network connectivity issues). In this situation, you can restart the pipeline from the point of failure by clicking on the failed task and then clicking on the Clear Task button in the top right-hand corner.
If the task keeps failing, you may want to examine the problem in more detail. You can access the task logs by clicking on the failed task and opening the Logs tab.
Once all the pipelines have been run successfully, you can stop inteGraph with the following command:
$ make down
You can use SPARQL queries to retrieve information from SLIME. There are three ways to do this:
Access the GraphDB Workbench at http://localhost:7200/. Choose SPARQL from the navigation bar, enter your query and hit Run, as shown in this example:
Write your SPARQL query in a file (e.g. query.rq) and submit it to the SPARQL endpoint using curl
:
$ curl -H "Accept: text/csv" --data-urlencode "[email protected]" http://0.0.0.0:7200/repositories/slime
dietName
detritivorous
fungivorous
We have developed SLIMER, an R package that provides a set of functions for retrieving/updating trophic information from SLIME, and that requires no prior knowledge of the SPARQL query language. Each function in SLIMER wraps a SPARQL query template that allows users to retrieve specific information (e.g. diet, guild, interaction) for a taxon, using either its name or a taxonomic identifier.
> install.packages("https://github.com/nleguillarme/SLIME/releases/download/v1.0.0/SLIMER_1.0.0.tar.gz", repos = NULL, type="source")
> library(SLIMER)
> endpoint_url <- "http://0.0.0.0:7200/repositories/slime"
> get.diets(sciName = "Nothrus", endpoint = endpoint_url)
# A tibble: 9 × 9
queryName queryId matchName matchId dietId dietName reference source inferred
<chr> <lgl> <chr> <list> <chr> <chr> <list> <chr> <chr>
1 Nothrus NA Nothrus silvestris <chr [4]> SFWO:0000479 detritivorous <chr [2]> slime:betsi false
2 Nothrus NA Nothrus silvestris <chr [4]> SFWO:0000483 fungivorous <chr [2]> slime:betsi false
3 Nothrus NA Nothrus silvestris <chr [4]> SFWO:0000479 detritivorous <chr [1]> slime:betsi true
4 Nothrus NA Nothrus silvestris <chr [4]> SFWO:0000483 fungivorous <chr [1]> slime:betsi true
5 Nothrus NA Nothrus silvestris <chr [4]> SFWO:0000485 microbivorous <chr [1]> slime:betsi true
6 Nothrus NA Nothrus silvestris <chr [4]> SFWO:0000513 microphytophagous <chr [1]> slime:betsi true
See the package vignette for more examples on how to use SLIMER.
Coming soon.
Coming soon.