Skip to content

Commit

Permalink
doc: refactor and update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Elodie Thiéblin committed Aug 17, 2023
1 parent 27abf1c commit 15a93f9
Show file tree
Hide file tree
Showing 8 changed files with 174 additions and 188 deletions.
192 changes: 64 additions & 128 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,37 @@
# Python based Thing Description Directory API compliant to: https://www.w3.org/TR/wot-discovery/
# SparTDD

## Deploy locally with docker-compose
A Python and SPARQL based Thing Description Directory API compliant to:
https://www.w3.org/TR/wot-discovery/

For a quick launch of the SPARQL endpoint and TDD API with docker-compose:
To learn more about the routes and function of this server, see
the [API documentation](doc/api.md).

```bash
chmod a+rwx fuseki-docker/configuration
chmod a+rwx fuseki-docker/databases
docker-compose build # builds api and sparqlendpoint
docker-compose up # runs api and sparqlendpoint
```

If you want to deploy only the TDD API using docker-compose and use an
existing SPARQL endpoint then you should edit the `config.toml` file with the
appropriate `SPARQLENDPOINT_URL` value. Then run only the api image.
If the api image is already built you do not have to rebuild, relaunching it
will use the new config.

```bash
docker-compose build api # builds the api image
docker-compose run api # runs the api
```

## Deploy production
## Configuration

If you want to deploy production without using docker or docker-compose you can use
the following commands:
The TDD API can be configured using two methods. The first one is editing the
`config.toml` file and the other one is using environment variables. Those two
configuration can be mixed with a priority for the environment variables. It
means that, for each variable, TDD API will search for the environment
variables first, if they are not defined, then it will search for the
`config.toml` values and if the variables are not defined in environment
variable nor in `config.toml` the default value will be used.

```bash
pip install .[prod]
export TDD__SPARQLENDPOINT_URL=<sparql endpoint url>
export TDD__TD_JSONSCHEMA=tdd/data/td-json-schema-validation.json
gunicorn -b 0.0.0.0:5000 app:app
```
The configuration variables are the same on both methods, except that
the environment variables must be prefixed with `TDD__` to avoid conflicts.
The `config.toml` file can also be used to define FLask server configuration (c.f.
[documentation](https://flask.palletsprojects.com/en/2.1.x/config/#builtin-configuration-values)).

You can change the `-b` parameter if you want to deploy only for localhost
access, or public access, or change the port deployment.
### Configuration variables

In this example we use the configuration using the environment variables but you can edit
ths `config.toml` file instead if you prefer.
| Variable name | default value | description |
| ------------------------- | ----------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [TDD__]TD_REPO_URL | http://localhost:5000 | The URL to access the TDD API server |
| [TDD__]SPARQLENDPOINT_URL | http://localhost:3030/things | The SPARQL endpoint URL |
| [TDD__]TD_JSONSCHEMA | ./tdd/data/td-json-schema-validation.json | The path to the file containing JSON-Schema to validate the TDs |
| [TDD__]CHECK_JSON_SCHEMA | False | Define if TDD API will check the TDs regarding to the `TD_JSONSCHEMA` schema and SHACL shapes |
| [TDD__]MAX_TTL | None | Integer, maximum time-to-live (in seconds) that a TD will be kept on the server (unlimited if None) |
| [TDD__]MANDATE_TTL | False | Boolean value, if set to True, it will only upload TDs having a time-to-live (ttl) value. The server will send a 400 HTTP code if the TD does not contain one. |
| [TDD__]LIMIT_BATCH_TDS | 25 | Default limit of returned TDs by batch |

## Deploy to develop on the API

Expand Down Expand Up @@ -69,42 +62,25 @@ Install the JavaScript dependencies (the project relies on jsonld.js for JSON-LD
npm ci
```

### Deploy a Fuseki server locally
### Deploy a SPARQL endpoint

You can either use a distant SPARQL server or use a SPARQL server locally.
SparTDD relies uses a SPARQL endpoint as database.
You need to set up one before you run the project.

We propose to use Apache Jena Fuseki, which has a nice administration interface.
Download the Fuseki projet (apache-jena-fuseki-X.Y.Z.zip) from
https://jena.apache.org/download/index.cgi
In the [SPARQL endpoint documentation](doc/sparql-endpoints/README.md) we provide
you with guidelines on how to set-up your SPARQL endpoint.

Then unzip the downloaded archive.
To launch the server, in the apache-jena-fuseki-X.Y.Z folder, run

```
./fuseki-server
```

The server will run on http://localhost:3030.
If you want to create the dataset with the right configuration, you can copy-paste
`fuseki-docker/configuration/things.ttl` into `apache-jena-fuseki-X.Y.Z/run/configuration`

```
cp fuseki-docker/configuration/things.ttl path/to/apache-jena-fuseki-X.Y.Z/run/configuration
```
### Run the flask server

More documentation on Fuseki in this project is available in `doc/fuseki.md`
First, set up your configuration (the SPARQL endpoint URL) (see [configuration](#configuration))
if your SPARQL endpoint URL is not the default http://localhost:3030/things.

### Run the flask server
Then run the flask server at the root of this project in your python virtual environment.

```bash
export TDD__SPARQLENDPOINT_URL=<sparql endpoint url>
export TDD__TD_JSONSCHEMA=tdd/data/td-json-schema-validation.json
flask run
```

You can edit the `config.toml` file to change the configuration instead of using
environment variables if you prefer.

## Import data using script

To import the TDs from a directory to your SPARQL endpoint using the proxy api, run:
Expand All @@ -120,84 +96,44 @@ follows:
python scripts/import_all_plugfest.py /path/to/TDs/directory <WOT API URL>/things?check-schema=false
```

To import snapshots bundle (discovery data) use the proper script as following:
## Deploy locally with docker-compose

For a quick launch of the SPARQL endpoint and TDD API with docker-compose:

```bash
python scripts/import_snapshot.py /path/to/snapshots.json <WOT API URL>/things
chmod a+rwx fuseki-docker/configuration
chmod a+rwx fuseki-docker/databases
docker-compose build # builds api and sparqlendpoint
docker-compose up # runs api and sparqlendpoint
```

The `check-schema` param also works on this route.

## Configuration

The TDD API can be configured using two methods. The first one is editing the
`config.toml` file and the other one is using environment variables. Those two
configuration can be mixed with a priority for the environment variables. It
means that, for each variable, TDD API will search for the environment
variables first, if they are not defined, then it will search for the
`config.toml` values and if the variables are not defined in environment
variable nor in `config.toml` the default value will be used.

The configuration variables are the same on both methods, except that
the environment variables must be prefixed with `TDD__` to avoid conflicts.
The `config.toml` file can also be used to define FLask server configuration (c.f.
[documentation](https://flask.palletsprojects.com/en/2.1.x/config/#builtin-configuration-values)).

### Configuration variables

| Variable name | default value | description |
| ------------------------- | ----------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [TDD__]TD_REPO_URL | http://localhost:5000 | The URL to access the TDD API server |
| [TDD__]SPARQLENDPOINT_URL | http://localhost:3030/things | The SPARQL endpoint URL |
| [TDD__]TD_JSONSCHEMA | ./tdd/data/td-json-schema-validation.json | The path to the file containing JSON-Schema to validate the TDs |
| [TDD__]CHECK_JSON_SCHEMA | False | Define if TDD API will check the TDs regarding to the `TD_JSONSCHEMA` schema |
| [TDD__]MAX_TTL | None | Integer, maximum time-to-live (in seconds) that a TD will be kept on the server (unlimited if None) |
| [TDD__]MANDATE_TTL | False | Boolean value, if set to True, it will only upload TDs having a time-to-live (ttl) value. The server will send a 400 HTTP code if the TD does not contain one. |
| [TDD__]LIMIT_BATCH_TDS | 25 | Default limit of returned TDs by batch |

## Notes on Virtuoso - TODO Change to a general section about tested Triplestores (Jena, GraphDB, Virtuoso and include this as a subsection)
If you want to deploy only the TDD API using docker-compose and use an
existing SPARQL endpoint then you should edit the `config.toml` file with the
appropriate `SPARQLENDPOINT_URL` value (see [configuration](#configuration)).
Then run only the api image.
If the api image is already built you do not have to rebuild, relaunching it
will use the new config.

https://vos.openlinksw.com/owiki/wiki/VOS
```bash
docker-compose build api # builds the api image
docker-compose run api # runs the api
```

`/Applications/Virtuoso Open Source Edition v7.2.app/Contents/virtuoso-opensource/bin` -> `./virtuoso-t +foreground +configfile ../database/virtuoso.ini`
## Deploy production

`/Applications/Virtuoso\ Open\ Source\ Edition\ v7.2.app/Contents/virtuoso-opensource/bin/isql localhost:1111 -U dba -P dba`
If you want to deploy production without using docker or docker-compose you can use
the following commands:

```
GRANT EXECUTE ON DB.DBA.SPARQL_INSERT_DICT_CONTENT TO "SPARQL";
GRANT EXECUTE ON DB.DBA.SPARQL_DELETE_DICT_CONTENT TO "SPARQL";
DB.DBA.RDF_DEFAULT_USER_PERMS_SET ('nobody', 7);
GRANT EXECUTE ON DB.DBA.SPARUL_RUN TO "SPARQL";
GRANT EXECUTE ON DB.DBA.SPARQL_INSERT_QUAD_DICT_CONTENT TO "SPARQL";
GRANT EXECUTE ON DB.DBA.L_O_LOOK TO "SPARQL";
GRANT EXECUTE ON DB.DBA.SPARUL_CLEAR TO "SPARQL";
GRANT EXECUTE ON DB.DBA.SPARUL_DROP TO "SPARQL";
GRANT EXECUTE ON DB.DBA.SPARQL_UPDATE TO "SPARQL";
```bash
pip install .[prod]
gunicorn -b 0.0.0.0:5000 app:app
```

http://127.0.0.1:8890/sparql
http://127.0.0.1:8890/conductor

| User Name | Default Password | Usage |
| :-------- | :--------------- | :---------------------------------------------------------------------------------------------------------- |
| dba | dba | Default Database Administrator account. |
| dav | dav | WebDAV Administrator account. |
| vad | vad | WebDAV account for internal usage in VAD (disabled by default). |
| demo | demo | Default demo user for the demo database. This user is the owner of the Demo catalogue of the demo database. |
| soap | soap | SQL User for demonstrating SOAP services. |
| fori | fori | SQL user account for 'Forums' tutorial application demonstration in the Demo database. |

Problem: Virtuoso 37000 Error SP031: SPARQL compiler: Blank node '\_:b0' is not allowed in a constant clause
https://github.com/openlink/virtuoso-opensource/issues/126

Go to the Virtuoso administration UI, i.e., http://host:port/conductor

- Log in as user dba
- Go to System Admin → User Accounts → Users
- Click the Edit link
- Set User type to SQL/ODBC Logins and WebDAV
- From the list of available Account Roles, select SPARQL_UPDATE and click the >> button to add it to the right-hand list
- Click the Save button
You can change the `-b` parameter if you want to deploy only for localhost
access, or public access, or change the port deployment.

In this example we use the configuration using the environment variables but you can edit
the `config.toml` file instead if you prefer.

## Code quality

Expand Down
16 changes: 13 additions & 3 deletions doc/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@
## Flask server configuration

The following environment variables is mandatory:
- **TDD__SPARQLENDPOINT_URL** the URI of your SPARQLENDPOINT service (e.g., http://localhost:3030/things)
(or edit the `config.toml` file)

- **TDD\_\_SPARQLENDPOINT_URL** the URI of your SPARQLENDPOINT service (e.g., http://localhost:3030/things)
(or edit the `config.toml` file)

## Testing script

We created a small script to import a TD or a folder of TDs `scripts/import_all_plugfest.py`.
This script takes two arguments:

- the path towards the TD file or TD folder
- the tdd-api import route (e.g., http://localhost:5000/things)

Expand All @@ -27,5 +29,13 @@ It implements the [WoT-Discovery Exploration Mechanisms](https://w3c.github.io/w

Its compliance has been tested in PlugFest/TestFest events.
The results are listed here:

- 2022.03: https://github.com/w3c/wot-testing/blob/main/events/2022.03.Online/Discovery/Results/logilabtdd.csv
- TODO
- TODO

## Routes and schemas

The list of routes and how they were implemented can be viewed in (routes-diagrams.odg)[routes-diagrams.odg].

Some schemas to describe how the JSON and RDF are delt with in SparTDD
can be found in (schemas.odg)[schemas.odg].
Binary file modified doc/schemas.odg
Binary file not shown.
42 changes: 42 additions & 0 deletions doc/sparql-endpoints/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Configuring SparTDD for different SPARQL endpoints

## General requirements

You can either use a distant SPARQL server or use a SPARQL server locally.

The SPARQL endpoint you configure must:

- Allow SPARQL UPDATE queries
- Allow named graphs
- Be configured in the manner that the default graph is the union of the named graphs
- Allow CORS

## Using Apache Jena Fuseki

We propose to use Apache Jena Fuseki, which has a nice administration interface.
Download the Fuseki projet (apache-jena-fuseki-X.Y.Z.zip) from
https://jena.apache.org/download/index.cgi

Then unzip the downloaded archive.
To launch the server, in the apache-jena-fuseki-X.Y.Z folder, run

```
./fuseki-server
```

The server will run on http://localhost:3030.
If you want to create the dataset with the right configuration, you can copy-paste
`fuseki-docker/configuration/things.ttl` into `apache-jena-fuseki-X.Y.Z/run/configuration`

```
cp fuseki-docker/configuration/things.ttl path/to/apache-jena-fuseki-X.Y.Z/run/configuration
```

More documentation on Fuseki in this project is available in [fuseki.md](fuseki.md)
(for further configuration or docker configuration).

## Using Virtuoso

More documentation on Fuseki in this project is available in [virtuoso.md](virtuoso.md)

## Using GraphDb
20 changes: 11 additions & 9 deletions doc/fuseki.md → doc/sparql-endpoints/fuseki.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,31 +16,33 @@ The port of the image is 3030.
## Fuseki docker image environment variables

We have set the following variables for the fuseki docker image :

- **ENABLE_UPLOAD**: "true" -- to allow file upload (needed to import all triples)
- **ASSEMBLER**: "/fuseki-base/configuration/things.ttl" -- this file, which in
this repository in under `fuseki-docker/configuration/things.ttl` will create
a `things` service with a TDB dataset in the fuseki endpoint at launch time.
- **ADMIN_PASSWORD**: *your desired password*
this repository in under `fuseki-docker/configuration/things.ttl` will create
a `things` service with a TDB dataset in the fuseki endpoint at launch time.
- **ADMIN_PASSWORD**: _your desired password_

We have set three shared volumes on the image:

- **/fuseki-base/configuration** folder where the configurations of the services
are read and stored by the fuseki endpoint
are read and stored by the fuseki endpoint
- **/fuseki-base/databases** folder where the TDB files (persistent RDF databases)
are read and stored by the fuseki endpoint
are read and stored by the fuseki endpoint
- **/fuseki-base/config.ttl** the configuration file for the whole endpoint. This
file will only be read by the fuseki server as no modification of this file
is possible at runtime.

file will only be read by the fuseki server as no modification of this file
is possible at runtime.

## Fuseki Service Configuration

We propose a default configuration for a `/things` service on the fuseki sparql
endpoint. This configuration file is in `fuseki-docker/configuration/things.ttl`.

Two points are important in this configuration:

- The dataset must be persistent (TDB or TDB2) so that the data is not lost on restart
- The default graph must be the union of all graphs (`unionDefaultGraph` option)
so that all named graphs can be queried without adding a GRAPH keyword everywhere.
so that all named graphs can be queried without adding a GRAPH keyword everywhere.

For a TDB Dataset :

Expand Down
Loading

0 comments on commit 15a93f9

Please sign in to comment.