Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instructions for using with QLever #1

Merged
merged 3 commits into from
Mar 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,57 @@ $ bzcat rels.out.bz2
9 intersects 1
[...]
```

## Use with QLever and osm2rdf

One use case of `spatialjoin` is to add triples for the relations `contains` and
`intersects` to an RDF dataset with WKT literals. The following example shows
the process for the OSM data for germany.

### Step 1: Download PBF from Geofabrik and convert to RDF

```
NAME=osm-germany
wget -O ${NAME}.pbf https://download.geofabrik.de/europe/germany-latest.osm.pbf
osm2rdf ${NAME}.pbf -o ${NAME}.ttl --simplify-wkt 0 --write-ogc-geo-triples none
```

Note: `osm2rdf` by default computes and outputs the predicates `ogc:sfContains`
and `ogc:sfIntersects`. The `--write-ogc-geo-triples none` option disables
this. To have both the `osm2rdf` predicates *and* the `spatiajoin` predicates
(for comparison or debugging), just omit the option.

### Step 2: Build a QLever instance, start it, and download the geometries

```
PORT=7008
echo '{ "languages-internal": [], "prefixes-external": [""], "ascii-prefixes-only": false, "num-triples-per-batch": 1000000 }' > ${NAME}.settings.json
ulimit -Sn 1048576; bzcat ${NAME}.ttl.bz2 | IndexBuilderMain -F ttl -f - -i ${NAME} -s ${NAME}.settings.json --stxxl-memory 10G | tee ${NAME}.index-log.txt
ServerMain -i ${NAME} -j 8 -p ${PORT} -m 20G -c 10G -e 3G -k 200 -s 300s
curl -s localhost:${PORT} -H "Accept: text/tab-separated-values" -H "Content-type: application/sparql-query" --data "PREFIX geo: <http://www.opengis.net/ont/geosparql#> SELECT ?osm_id ?geometry WHERE { ?osm_id geo:hasGeometry/geo:asWKT ?geometry }" | sed -E 's#<https://www.openstreetmap.org/(rel|way|node)(ation)?/([0-9]+)>\t"(.+)"\^\^<http:.*wktLiteral>#osm\1:\3\t\4#g' | sed 1d > spatialjoin.input.tsv
```

Note: The `sed` command replaces the full IRIs by shorter prefixed IRIs. Also
note that we only get the WKT literals from `geo:gasGeometry/geo:asWKT` here.
It would be nicer to fetch all WKT literals in the datasets, no matter to which
predicate they belong (for example, the predicates `osm2rdfgeom:envelope` or
`osm2rdfgeom:convex_hull` also have WKT literals as objects)

### Step 3: Compute the spatial relations

```
cat spatialjoin.input.tsv | spatialjoin --suffix $' .\n'
```

Note that we could feed the geometries directly into `spatialjoin` as follows:

```
curl -s localhost:${PORT} -H "Accept: text/tab-separated-values" -H "Content-type: application/sparql-query" --data "PREFIX geo: <http://www.opengis.net/ont/geosparql#> SELECT ?osm_id ?geometry WHERE { ?osm_id geo:hasGeometry/geo:asWKT ?geometry }" | sed -E 's#<https://www.openstreetmap.org/(rel|way|node)(ation)?/([0-9]+)>\t"(.+)"\^\^<http:.*wktLiteral>#osm\1:\3\t\4#g' | sed 1d | spatialjoin --suffix $' .\n'
```

### Step 4: Rebuild the QLever index with the added triples

```
ulimit -Sn 1048576; bzcat ${NAME}.ttl.bz2 rels.out.bz2 | IndexBuilderMain -F ttl -f - -i ${NAME} -s ${NAME}.settings.json --stxxl-memory 10G | tee ${NAME}.index-log.txt
ServerMain -i ${NAME} -j 8 -p ${PORT} -m 20G -c 10G -e 3G -k 200 -s 300s
```
Loading