Skip to content

Latest commit

 

History

History
174 lines (127 loc) · 8.02 KB

README.md

File metadata and controls

174 lines (127 loc) · 8.02 KB

SLIME: a semantic soil-life metaweb

SLIME (Soil LIfe MEtaweb) is a knowledge graph on the trophic ecology of soil organisms that integrates several open databases covering major taxonomic groups across multiple trophic levels. SLIME allows users to find relevant information across integrated datasets and facilitates the reconstruction of local food webs based on local co-occurrence or co-abundance data.

What does this repository contain?

(For more details on the content of these files, see https://nleguillarme.github.io/inteGraph/manual.html#create-a-new-project)

  • graph.cfg: a INI file used to configure the URI of the knowledge graph, the path to the directory containing the data source configuration files, the triplestore connection and the ontologies that will be used to annotate the integrated data.
  • connections.json: a JSON configuration file used for storing credentials and other information necessary for connecting to external services (e.g. the GloBi API).
  • config-morph.ini: a INI file used to configure the RDF materialization process.
  • sources: a directory contaning the configuration and mapping files for the different data sources.
  • graphdb: a directory containing a Makefile to help you set up an instance of the GraphDB Free triplestore.
  • LICENSE: a file containing the licence text.
  • README.md: this file.

How to build a local copy of SLIME?

1. Clone this repository

Clone this repository using the following command:

$ git clone https://github.com/nleguillarme/SLIME.git

2. Install inteGraph

inteGraph is a toolbox that helps you build and execute biodiversity data integration pipelines to create RDF knowledge graphs from multiple data sources. inteGraph pipelines are defined in configuration files. We provide one such configuration file per data source in the sources directory of this repository.

To install inteGraph:

  1. Clone the project repository
$ git clone https://github.com/nleguillarme/inteGraph.git
  1. Run install.sh
$ cd inteGraph ; sh install.sh

3. Download missing datasets

Some data sources do not provide an API or URL for downloading datasets programatically. You will need to download these datasets manually.

 Dataset URL Copy data file to
BETSI Download link SLIME/sources/betsi/data
FungalTraits Download link SLIME/sources/fungaltraits/data
GlobalAnts Download link SLIME/sources/global_ants/data

After downloading the datasets, ensure that the correct file path is configured for each source (check the [extract.file] section in the source.cfg file for each source):

[extract.file]
file_path=<path-to-the-data-file>

4. Set up your triplestore

We provide a Makefile to help you set up an instance of GraphDB Free in a docker container. You will need docker, docker-compose and make installed on your machine.

  1. Move to the graphdb directory
$ cd graphdb
  1. Run the following command to build a docker image for GraphDB Free
$ make build
  1. Run the following command to load the ontology into a new repository called slime (N.B. this may take some time)
$ make load
  1. Start GraphDB by running the following command
$ make start

The GraphDB Workbench is accessible at http://localhost:7200/.

Configure the connection to the repository in the [load] section of graph.cfg:

[load]
id=graphdb
conn_type=http
host=172.17.0.1
port=7200
user=<user-login-if-any>
password=<user-password-if-any>
repository=slime

5. Run inteGraph

To run inteGraph, execute the following commands:

$ cd inteGraph
$ export INTEGRAPH__CONFIG__HOST_CONFIG_DIR=<path-to-this-repository> ; make up

Make sure you replace <path-to-this-repository> in the command with the path to your local copy of this repository.

This will start an instance of Apache Airflow, which can be found at http://localhost:8080/home.

The DAG tab lists all the pipelines generated from the configuration files:

Airflow DAG list

6. Run the pipelines

To execute a pipeline, click on the Pause/Unpause DAG button on the left-hand side. Then click on the pipeline name to monitor its execution.

After triggering the pipeline, it will start running and you will see its current state represented by colors.

Pipeline running

A failed task appears in red in the interface. It’s not uncommon for tasks to fail, which could be for a multitude of reasons (e.g., an external service is down, network connectivity issues). In this situation, you can restart the pipeline from the point of failure by clicking on the failed task and then clicking on the Clear Task button in the top right-hand corner.

If the task keeps failing, you may want to examine the problem in more detail. You can access the task logs by clicking on the failed task and opening the Logs tab.

7. Stop inteGraph

Once all the pipelines have been run successfully, you can stop inteGraph with the following command:

$ make down

How to retrieve information from SLIME?

You can use SPARQL queries to retrieve information from SLIME. There are three ways to do this:

1. Using the GraphDB Workbench

Access the GraphDB Workbench at http://localhost:7200/. Choose SPARQL from the navigation bar, enter your query and hit Run, as shown in this example:

SPARQL Query and Update

2. Over HTTP in the REST style

Write your SPARQL query in a file (e.g. query.rq) and submit it to the SPARQL endpoint using curl:

$ curl -H "Accept: text/csv" --data-urlencode "[email protected]" http://0.0.0.0:7200/repositories/slime
dietName
detritivorous
fungivorous

3. Using the SLIMER package

We have developed SLIMER, an R package that provides a set of functions for retrieving/updating trophic information from SLIME, and that requires no prior knowledge of the SPARQL query language. Each function in SLIMER wraps a SPARQL query template that allows users to retrieve specific information (e.g. diet, guild, interaction) for a taxon, using either its name or a taxonomic identifier.

> install.packages("https://github.com/nleguillarme/SLIME/releases/download/v1.0.0/SLIMER_1.0.0.tar.gz", repos = NULL, type="source")
> library(SLIMER)
> endpoint_url <- "http://0.0.0.0:7200/repositories/slime"
> get.diets(sciName = "Nothrus", endpoint = endpoint_url)
# A tibble: 9 × 9
  queryName queryId matchName          matchId   dietId       dietName          reference source                inferred
  <chr>     <lgl>   <chr>              <list>    <chr>        <chr>             <list>    <chr>                 <chr>   
1 Nothrus   NA      Nothrus silvestris <chr [4]> SFWO:0000479 detritivorous     <chr [2]> slime:betsi           false   
2 Nothrus   NA      Nothrus silvestris <chr [4]> SFWO:0000483 fungivorous       <chr [2]> slime:betsi           false   
3 Nothrus   NA      Nothrus silvestris <chr [4]> SFWO:0000479 detritivorous     <chr [1]> slime:betsi           true    
4 Nothrus   NA      Nothrus silvestris <chr [4]> SFWO:0000483 fungivorous       <chr [1]> slime:betsi           true    
5 Nothrus   NA      Nothrus silvestris <chr [4]> SFWO:0000485 microbivorous     <chr [1]> slime:betsi           true    
6 Nothrus   NA      Nothrus silvestris <chr [4]> SFWO:0000513 microphytophagous <chr [1]> slime:betsi           true

See the package vignette for more examples on how to use SLIMER.

How to cite SLIME?

Coming soon.

How to ask for help?

Coming soon.