Skip to content

PostgreSQL Foreign Data Wrapper for RDF Triplestores

License

Notifications You must be signed in to change notification settings

jimjonesbr/rdf_fdw

Repository files navigation


RDF Triplestore Foreign Data Wrapper for PostgreSQL (rdf_fdw)

The rdf_fdw is a PostgreSQL Foreign Data Wrapper to easily access RDF triplestores via SPARQL endpoints, including pushdown of several SQL Query clauses.

CI

Index

In an Ubuntu environment you can install all dependencies with the following command:

apt install -y make gcc postgresql-server-dev-16 libxml2-dev libcurl4-openssl-dev

Note

postgresql-server-dev-16 only installs the libraries for PostgreSQL 16. Change it if you're using another PostgreSQL version.

To compile the source code you need to ensure the pg_config executable is properly set when you run make - this executable is typically in your PostgreSQL installation's bin directory. After that, just run make in the root directory:

$ cd rdf_fdw
$ make

After compilation, just run make install to install the Foreign Data Wrapper:

$ make install

After building and installing the extension you're ready to create the extension in a PostgreSQL database with CREATE EXTENSION:

CREATE EXTENSION rdf_fdw;

To install an specific version add the full version number in the WITH VERSION clause

CREATE EXTENSION rdf_fdw WITH VERSION '1.3';

To run the predefined regression tests run make installcheck with the user postgres:

$ make PGUSER=postgres installcheck

Note

In order for rdf_fdw to convert the data retrieved from the RDF triplestore into a format that can be interpreted by PostgreSQL, it must first load all the retrieved data into memory. Therefore, if you expect to retrieve large volumes of data with a single query, make sure PostgreSQL has access to sufficient memory. Another option is to retrieve the data in chuncks by using rdf_fdw_clone_table or a customised script.

To update the extension's version you must first build and install the binaries and then run ALTER EXTENSION:

ALTER EXTENSION rdf_fdw UPDATE;

To update to an specific version use UPDATE TO and the full version number

ALTER EXTENSION rdf_fdw UPDATE TO '1.3';

To use the rdf_fdw you must first create a SERVER to connect to a SPARQL endpoint. After that, you have to create the FOREIGN TABLEs, which will contain the SPARQL instructions of what to retrieve from the endpoint.

The SQL command CREATE SERVER defines a new foreign server. The user who defines the server becomes its owner. A SERVER requires an endpoint, so that rdf_fdw knows where to sent the SPARQL queries.

The following example creates a SERVER that connects to the DBpedia SPARQL Endpoint:

  CREATE SERVER dbpedia
  FOREIGN DATA WRAPPER rdf_fdw 
  OPTIONS (endpoint 'https://dbpedia.org/sparql');

Server Options

Server Option Type Description
endpoint required URL address of the SPARQL Endpoint.
enable_pushdown optional Globally enables or disables pushdown of SQL clauses into SPARQL (default true)
format optional The rdf_fdw expects the result sets encoded in the SPARQL Query Results XML Format, which can be normally enforced by setting the MIME type application/sparql-results+xml in the Accept HTTP request header. However, there are some products that deviate from the standard and expect a differently value, e.g. xml, rdf-xml. In case the expected parameter differs from the official MIME type, it should be set in this parameter (default application/sparql-results+xml).
http_proxy optional Proxy for HTTP requests.
proxy_user optional User for proxy server authentication.
proxy_user_password optional Password for proxy server authentication.
connect_timeout optional Connection timeout for HTTP requests in seconds (default 300 seconds).
connect_retry optional Number of attempts to retry a request in case of failure (default 3 times).
request_redirect optional Enables URL redirect issued by the server (default false).
request_max_redirect optional Limit of how many times the URL redirection may occur. If that many redirections have been followed, the next redirect will cause an error. Not setting this parameter or setting it to 0 will allow an infinite number of redirects.
custom optional One or more parameters expected by the configured RDF triplestore. Multiple parameters separated by &, e.g. signal_void=on&signal_unconnected=on. Custom parameters are appended to the request URL.
query_param optional The request parameter where the SPARQL endpoint expects the query in a HTTP request. Most SPARQL endpoints expects the query to be in the parameter query - and this is the rdf_fdw default value. So, chances are you'll never need to touch this server option (default query)

Note

To visualise the foreign server's options use the psql meta-command \des[+]

Availability: 1.1.0

CREATE USER MAPPING defines a mapping of a PostgreSQL user to an user in the target triplestore. For instance, to map the PostgreSQL user postgres to the user admin in the SERVER named graphdb:

CREATE SERVER graphdb
FOREIGN DATA WRAPPER rdf_fdw
OPTIONS (endpoint 'http://192.168.178.27:7200/repositories/myrepo');

CREATE USER MAPPING FOR postgres
SERVER graphdb OPTIONS (user 'admin', password 'secret');
Option Type Description
user required name of the user for authentication
password optional password of the user set in the option user

The rdf_fdw will try to authenticate the given user using HTTP Basic Authentication - no other authentication method is currently supported. This feature can be ignored if the triplestore does not require user authentication.

Note

To visualise created user mappings use the psql meta-command \deu[+]

Foreign Tables from the rdf_fdw work as a proxy between PostgreSQL clients and RDF Triplestores. Each FOREIGN TABLE column must be mapped to a SPARQL variable, so that PostgreSQL knows where to display each node retrieved from the SPARQL queries. Optionally, it is possible to add an expression to the column, so that function calls can be used to retrieve or format the data.

Table Options

Option Type Description
sparql required The raw SPARQL query to be executed
log_sparql optional Logs the exact SPARQL query executed. This option is useful to check for any modification to the configured SPARQL query due to pushdown
enable_pushdown optional Enables or disables pushdown of SQL clauses into SPARQL for a specific foreign table. This overrides the SERVER option enable_pushdown

Column Options

Option Type Description
variable required This option maps the table column to a SPARQL variable used in the table option sparql. A variable must start with either ? or $ (? or $ are not part of the variable name!), and the name must be a string with the following characters: [a-z], [A-Z],[0-9]
expression optional Similar to variable, but instead of a SPARQL variable it can handle expressions, such as function calls. All expressions supported by the data source can be used in this option.
language optional RDF language tag, e.g. en,de,pt,es,pl, etc. This option is necessary for the pushdown feature to properly set the literal language tag in FILTER expressions. Set it to * to make FILTER espressions ignore language tags when comparing literals.
literaltype optional Data type for typed literals , e.g. xsd:string, xsd:date, xsd:boolean. This option is necessary for the pushdown feature to properly set the literal type of expressions from SQL WHERE conditions. Set it to * to make FILTER espressions ignore data types when comparing literals.
nodetype optional Type of the RDF node. Expected values are literal or iri. This option helps the query planner to optimize SPARQL FILTER expressions when the WHERE conditions are pushed down (default literal)

The following example creates a FOREIGN TABLE connected to the server dbpedia. SELECT queries executed against this table will execute the SPARQL query set in the OPTION sparql, and its result sets are mapped to each column of the table via the column OPTION variable.

CREATE FOREIGN TABLE film (
  film_id text    OPTIONS (variable '?film',     nodetype 'iri'),
  name text       OPTIONS (variable '?name',     nodetype 'literal', literaltype 'xsd:string'),
  released date   OPTIONS (variable '?released', nodetype 'literal', literaltype 'xsd:date'),
  runtime int     OPTIONS (variable '?runtime',  nodetype 'literal', literaltype 'xsd:integer'),
  abstract text   OPTIONS (variable '?abstract', nodetype 'literal', literaltype 'xsd:string')
)
SERVER dbpedia OPTIONS (
  sparql '
    PREFIX dbr: <http://dbpedia.org/resource/>
    PREFIX dbp: <http://dbpedia.org/property/>
    PREFIX dbo: <http://dbpedia.org/ontology/>

    SELECT DISTINCT ?film ?name ?released ?runtime ?abstract
    WHERE
    {
      ?film a dbo:Film ;
            rdfs:comment ?abstract ;
            dbp:name ?name ;
            dbp:released ?released ;
            dbp:runtime ?runtime .
      FILTER (LANG ( ?abstract ) = "en")
      FILTER (datatype(?released) = xsd:date)
      FILTER (datatype(?runtime) = xsd:integer)
     }
'); 

Note

To visualise the foreign table's columns and options use the psql meta-commands \d[+] or \det[+]

All options and parameters set to a FOREIGN TABLE or SERVER can be changed, dropped, and new ones can be added using the ALTER FOREIGN TABLE and ALTER SERVER commands.

Adding options

ALTER FOREIGN TABLE film OPTIONS (ADD enable_pushdown 'false',
                                  ADD log_sparql 'true');

ALTER SERVER dbpedia OPTIONS (ADD enable_pushdown 'false');

Changing previously configured options

ALTER FOREIGN TABLE film OPTIONS (SET enable_pushdown 'false');

ALTER SERVER dbpedia OPTIONS (SET enable_pushdown 'true');

Dropping options

ALTER FOREIGN TABLE film OPTIONS (DROP enable_pushdown,
                                  DROP log_sparql);

ALTER SERVER dbpedia OPTIONS (DROP enable_pushdown);

Synopsis

text rdf_fdw_version();

Availability: 1.0.0

Description

Shows the version of the installed rdf_fdw and its main libraries.


Usage

SELECT * FROM rdf_fdw_version();
                                                                                             rdf_fdw_version
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 rdf_fdw = 1.0.0, libxml/2.9.14 libcurl/7.88.1 OpenSSL/3.0.11 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 libpsl/0.21.2 (+libidn2/2.3.3) libssh2/1.10.0 nghttp2/1.52.0 librtmp/2.3 OpenLDAP/2.5.13
(1 row)

Synopsis

void rdf_fdw_clone_table(
  foreign_table text,
    target_table text,
    begin_offset int,
    fetch_size int,
    max_records int,
    orderby_column text,
    sort_order text,
    create_table boolean,
    verbose boolean,
    commit_page boolean
)

Availability: 1.0.0 (PostgreSQL 11+ only)

Description

This procedure is designed to copy data from a FOREIGN TABLE to an ordinary TABLE. It provides the possibility to retrieve the data set in batches, so that issues related to triplestore limits and client's memory don't bother too much. Parameters

foreign_table (required): FOREIGN TABLE from where the data has to be copied.

target_table (required): heap TABLE where the data from the FOREIGN TABLE is copied to.

begin_offset: starting point in the SPARQL query pagination. Default 0.

fetch_size: size of the page fetched from the triplestore. Default is the value set at fetch_size in either SERVER or FOREIGN TABLE. In case SERVER and FOREIGN TABLE do not set fetch_size, the default will be set to 100.

max_records: maximum number of records that should be retrieved from the FOREIGN TABLE. Default 0, which means no limit.

orderby_column: ordering column used for the pagination - just like in SQL ORDER BY. Default '', which means that the function will chose a column to use in the ORDER BY clause on its own. That is, the procedure will try to set the first column with the option nodetype set to iri. If the table has no iri typed nodetype, the first column will be chosen as ordering column. If you do not wish to have an ORDER BY clause at al, set this parameter to NULL.

sort_order: ASC or DESC to sort the data returned in ascending or descending order, respectivelly. Default ASC.

create_table: creates the table set in target_table before harvesting the FOREIGN TABLE. Default false.

verbose: prints debugging messages in the standard output. Default false.

commit_page: commits the inserted records immediatelly or only after the transaction is finished. Useful for those who want records to be discarded in case of an error - following the principle of everyhing or nothing. Default true, which means that all inserts will me committed immediatelly.


Usage Example

CREATE SERVER dbpedia
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (endpoint 'https://dbpedia.org/sparql');

CREATE FOREIGN TABLE public.dbpedia_cities (
  uri text           OPTIONS (variable '?city', nodetype 'iri'),
  city_name text     OPTIONS (variable '?name', nodetype 'literal', literaltype 'xsd:string'),
  elevation numeric  OPTIONS (variable '?elevation', nodetype 'literal', literaltype 'xsd:integer')
)
SERVER dbpedia OPTIONS (
  sparql '
    PREFIX dbo:  <http://dbpedia.org/ontology/>
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX dbr:  <http://dbpedia.org/resource/>
    SELECT *
    {?city a dbo:City ;
      foaf:name ?name ;
      dbo:federalState dbr:North_Rhine-Westphalia ;
      dbo:elevation ?elevation
    }
');

/*
 * Materilize all records from the FOREIGN TABLE 'public.dbpedia_cities' in 
 * the table 'public.t1_local'. 
 */ 
CALL rdf_fdw_clone_table(
      foreign_table => 'dbpedia_cities',
      target_table  => 't1_local',
      create_table => true);

SELECT * FROM t1_local;
                            uri                            |      city_name      | elevation 
-----------------------------------------------------------+---------------------+-----------
 http://dbpedia.org/resource/Aachen                        | Aachen              |     173.0
 http://dbpedia.org/resource/Bielefeld                     | Bielefeld           |     118.0
 http://dbpedia.org/resource/Dortmund                      | Dortmund            |      86.0
 http://dbpedia.org/resource/Düsseldorf                    | Düsseldorf          |      38.0
 http://dbpedia.org/resource/Gelsenkirchen                 | Gelsenkirchen       |      60.0
 http://dbpedia.org/resource/Hagen                         | Hagen               |     106.0
 http://dbpedia.org/resource/Hamm                          | Hamm                |      37.7
 http://dbpedia.org/resource/Herne,_North_Rhine-Westphalia | Herne               |      65.0
 http://dbpedia.org/resource/Krefeld                       | Krefeld             |      39.0
 http://dbpedia.org/resource/Mönchengladbach               | Mönchengladbach     |      70.0
 http://dbpedia.org/resource/Mülheim                       | Mülheim an der Ruhr |      26.0
 http://dbpedia.org/resource/Münster                       | Münster             |      60.0
 http://dbpedia.org/resource/Remscheid                     | Remscheid           |     365.0
(13 rows)

A pushdown is the ability to translate SQL queries in such a way that operations are performed in the data source rather than in PostgreSQL, e.g. sorting, formatting, filtering. This feature can significantly reduce the number of records retrieved from the data source. For instance, a SQL LIMIT not pushed down means that the target system will perform a full scan in the data source, prepare everything for data transfer, return it to PostgreSQL via network, and then PostgreSQL will only locally discard the unnecessary data, which depending on the total number of records can be extremely inefficient. The rdf_fdw tries to translate SQL into SPARQL queries, which due to their conceptual differences is not always an easy task, so it is often worth considering to keep SQL queries involving foreign tables as simple as possible - or just to stick to the features, data types and operators described in this section.

LIMIT

LIMIT clauses are pushed down only if the SQL query does not contain aggregates and when all conditions in the WHERE clause can be translated to SPARQL.

SQL SPARQL
LIMIT x LIMIT x
FETCH FIRST x ROWS LIMIT x
FETCH FIRST ROW ONLY LIMIT 1

Pushdown of OFFSET clauses is not supported, which means that OFFSET filters will be applied locally in the client. Take a look at rdf_fdw_clone_table if you wish to retrieve records in batches.

ORDER BY

ORDER BY can be pushed down if the data types can be translated into SPARQL.

SQL SPARQL
ORDER BY x ASC, ORDER BY x ORDER BY ASC(x)
ORDER BY x DESC ORDER BY DESC(x)

DISTINCT

DISTINCT is pushed down to the SPARQL SELECT statement just like in SQL. In case that the configured SPARQL query already contains a DISTINCT or a REDUCED modifier, the SQL DISTINCT won't be pushed down. There is no equivalent for DISTINCT ON, so it cannot be pushed down either.

WHERE

The rdf_fdw will attempt to translate RDF literals to the data type of the mapped column, and this can be quite tricky! RDF literals can be pretty much everything, as often they have no explicit data type declarations, e.g. "wwu" and "wwu"^^xsd:string are equivalent. The contents of literals are often also not validated by the RDF triplestores, but PostgreSQL will validate them in query time. So, if a retrieved literal cannot be translated to declared column data type, the query will be interrupted. SQL WHERE conditions are translated into SPARQL FILTER expressions, as long as the involved columns data types and operators are supported.

Supported Data Types and Operators

Data type Operators
text, char, varchar, name =, <>, !=, ~~, !~~, ~~*, !~~*
date, timestamp, timestamp with time zone =, <>, !=, >, >=, <, <=
smallint, int, bigint, numeric, double precision =, <>, !=, >, >=, <, <=
boolean IS, IS NOT

IN and ANY constructs

SQL IN and ANY constructs are translated into the SPARQL IN operator, which will be placed in a FILTER evaluation.

Pattern matching operators LIKE and ILIKE

Availability: 1.1.0

Expressions using LIKE and ILIKE - or their equivalent operators ~~ and ~~* - are converted to REGEX filters in SPARQL. It is important to notice that pattern matching operations using LIKE/ILIKE only support the wildcards % and _, and therefore only these characters will be translated to their REGEX equivalents. Any other character that might be potentially used as a wildcard in REGEX, such as ^, | or $, will be escaped.

String Functions

Availability: 1.2.0

The following string functions are pushed down with their correspondent SPARQL FILTER expressions:

SQL SPARQL
LENGTH() STRLEN()
STARTS_WITH() STRSTARTS()
SUBSTRING() SUBSTR()
UPPER() UCASE()
LOWER() LCASE()
MD5() MD5()

Mathematical Functions

Availability: 1.2.0

The following mathematical functions are pushed down with their correspondent SPARQL FILTER expressions:

SQL SPARQL
ABS() ABS()
CEIL() CEIL()
FLOOR() FLOOR()
ROUND() ROUND()

Date Time Functions

Availability: 1.2.0

The following date/time functions are pushed down with their correspondent SPARQL FILTER expressions:

SQL SPARQL
EXTRACT(YEAR FROM x), DATE_PART('year', x) YEAR(x)
EXTRACT(MONTH FROM x), DATE_PART('month', x) MONTH(x)
EXTRACT(DAY FROM x), DATE_PART('day', x) DAY(x)
EXTRACT(HOUR FROM x), DATE_PART('hour', x) HOURS(x)
EXTRACT(MINUTE FROM x), DATE_PART('minute', x) MINUTES(x)
EXTRACT(SECOND FROM x), DATE_PART('second', x) SECONDS(x)

Pushdown Examples

Foreign table columns with the option literaltype

PostgreSQL Type Literal Type SQL WHERE Condition SPARQL FILTER (pushdown)
text xsd:string name = 'foo' FILTER(?s = "foo"^^xsd:string)
text * name <> 'foo' FILTER(STR(?s) != "foo")
text * name ILIKE '%Jon_s', name ~~* '%Jon_s' FILTER(REGEX(?name,".*Jon.s$","i"))
text * name LIKE '%foo%', name ~~ '%foo%'; FILTER(REGEX(?name,".*foo.*"))
text - upper(val) = 'FOO' FILTER(UCASE(STR(?var)) = "FOO")
text - lower(val) = 'foo' FILTER(LCASE(STR(?var)) = "foo")
text - md5(name) = 'dd16aacc7f77cec7ed83139f81704577' FILTER(MD5(STR(?personname)) = "dd16aacc7f77cec7ed83139f81704577")
text - substring(name,1,4) = 'foo' FILTER(SUBSTR(STR(?personname), 1, 4) = "foo")
text - starts_with(name,'foo') FILTER(STRSTARTS(STR(?partyname), "foo"))
text - length(val) = 42 FILTER(STRLEN(STR(?var)) = 42)
int xsd:integer runtime > 42 FILTER(?runtime > 42)
int xsd:integer runtime > 40+2 FILTER(?runtime > 42)
numeric - abs(val) <> 42.73 FILTER(ABS(?var) != 42.73)
numeric - ceil(val) = 42 FILTER(CEIL(?var) = 42)
numeric - floor(val) = 42 FILTER(FLOOR(?var) = 42)
numeric - round(val) = 42 FILTER(ROUND(?var) = 42)
date xsd:date extract(year FROM birthdate) = 1970 FILTER(YEAR(?birthdate) = 1970)
date xsd:date extract(month FROM birthdate) = 4 FILTER(MONTH(?birthdate) = 4)
date xsd:date extract(day FROM birthdate) = 8 FILTER(DAY(?birthdate) = 8)
timestamp xsd:dateTime extract(hour FROM ts) = 14 FILTER(HOURS(?ts) = 14)
timestamp xsd:dateTime extract(minute FROM ts) = 33 FILTER(MINUTES(?ts) = 33)
timestamp xsd:dateTime extract(second FROM ts) = 42 FILTER(SECONDS(?ts) = 42)
numeric - val >= 42.73 FILTER(?val >= 42.73)
date xsd:date released BETWEEN '2021-04-01' AND '2021-04-30' FILTER(?released >= "2021-04-01"^^xsd:date) FILTER(?released <= "2021-04-30"^^xsd:date)
timestamp xsd:dateTime modified > '2021-04-06 14:07:00.26' FILTER(?modified > "2021-04-06T14:07:00.260000"^^xsd:dateTime)
timestamp xsd:dateTime modified < '2021-04-06 14:07:00.26' FILTER(?modified < "2021-04-06T14:07:00.260000"^^xsd:dateTime)
text xsd:string country IN ('Germany','France','Portugal') FILTER(?country IN ("Germany"^^xsd:string, "France"^^xsd:string, "Portugal"^^xsd:string))
varchar - country NOT IN ('Germany','France','Portugal') FILTER(?country NOT IN ("Germany", "France", "Portugal"))
name - country = ANY(ARRAY['Germany','France','Portugal']) FILTER(?country IN ("Germany", "France", "Portugal"))
boolean xsd:boolean bnode IS TRUE FILTER(?node = "true"^^xsd:boolean)
boolean xsd:boolean bnode IS NOT TRUE FILTER(?node != "true"^^xsd:boolean)
boolean xsd:boolean bnode IS FALSE FILTER(?node = "false"^^xsd:boolean)
boolean xsd:boolean bnode IS NOT FALSE FILTER(?node != "false"^^xsd:boolean)

Foreign table columns with the option language

PostgreSQL Type Language Tag SQL WHERE Condition SPARQL (pushdown)
text en name = 'foo' FILTER(?s = "foo"@en)
name de name <> 'foo' FILTER(?s != "foo"@de)
varchar en country NOT IN ('Germany','France','Portugal') FILTER(?country NOT IN ("Germany"@en, "France"@en, "Portugal"@en))
text * name = 'foo' FILTER(STR(?s) = "foo")

Foreign table columns with the option expression

PostgreSQL Type Expression Literal Type SQL WHERE Condition SPARQL (pushdown)
boolean STRSTARTS(STR(?country),"https") xsd:boolean bnode IS TRUE FILTER(STRSTARTS(STR(?country),"http") = "true"^^xsd:boolean)
int STRLEN(?variable) xsd:integer strlen > 10 FILTER(STRLEN(?variable) > 10)
text UCASE(?variable) - uname = 'FOO' FILTER(UCASE(?variable) = "FOO")

These and other examples can be found here

Create a SERVER and FOREIGN TABLE to query the DBpedia SPARQL Endpoint (Politicians):

CREATE SERVER dbpedia
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (endpoint 'https://dbpedia.org/sparql');

CREATE FOREIGN TABLE politicians (
  uri text        OPTIONS (variable '?person',     nodetype 'iri'),
  name text       OPTIONS (variable '?personname', nodetype 'literal', literaltype 'xsd:string'),
  birthdate date  OPTIONS (variable '?birthdate',  nodetype 'literal', literaltype 'xsd:date'),
  party text      OPTIONS (variable '?partyname',  nodetype 'literal', literaltype 'xsd:string'),
  country text    OPTIONS (variable '?country',    nodetype 'literal', language 'en')
)
SERVER dbpedia OPTIONS (
  log_sparql 'true',
  sparql '
    PREFIX dbp: <http://dbpedia.org/property/>
    PREFIX dbo: <http://dbpedia.org/ontology/>

    SELECT *
    WHERE {
      ?person 
          a dbo:Politician;
          dbo:birthDate ?birthdate;
          dbp:name ?personname;
          dbo:party ?party .       
        ?party 
          dbp:country ?country;
          rdfs:label ?partyname .
        FILTER NOT EXISTS {?person dbo:deathDate ?died}
        FILTER(LANG(?partyname) = "de")
      } 
');

In the following SQL query we can observe that:

  • the executed SPARQL query was logged.
  • the SPARQL SELECT was modified to retireve only the columns used in the SQL SELECT and WHERE clauses.
  • the conditions in the SQL WHERE clause were pushed down as SPARQL FILTER conditions.
  • the SQL ORDER BY clause was pushed down as SPARQL ORDER BY.
  • the FETCH FIRST ... ROWS ONLY was pushed down as SPARQL LIMIT
  • the column country has a language option, and its value is used as a language tag in the SPARQL expression: FILTER(?country IN ("Germany"@en, "France"@en))
SELECT name, birthdate, party
FROM politicians
WHERE 
  country IN ('Germany','France') AND 
  birthdate > '1995-12-31' AND
  party <> ''
ORDER BY birthdate DESC, party ASC
FETCH FIRST 5 ROWS ONLY;
NOTICE:  SPARQL query sent to 'https://dbpedia.org/sparql':

 PREFIX dbp: <http://dbpedia.org/property/>
 PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?personname ?birthdate ?partyname ?country 
{
      ?person 
          a dbo:Politician;
          dbo:birthDate ?birthdate;
          dbp:name ?personname;
          dbo:party ?party .       
        ?party 
          dbp:country ?country;
          rdfs:label ?partyname .
        FILTER NOT EXISTS {?person dbo:deathDate ?died}
        FILTER(LANG(?partyname) = "de")
       FILTER(?country IN ("Germany"@en, "France"@en))
 FILTER(?birthdate > "1995-12-31"^^xsd:date)
 FILTER(?partyname != ""^^xsd:string)
}
ORDER BY  DESC (?birthdate)  ASC (?partyname)
LIMIT 5

        name        | birthdate  |                  party                  
--------------------+------------+-----------------------------------------
 Louis Boyard       | 2000-08-26 | La France insoumise
 Klara Schedlich    | 2000-01-04 | Bündnis 90/Die Grünen
 Pierrick Berteloot | 1999-01-11 | Rassemblement National
 Niklas Wagener     | 1998-04-16 | Bündnis 90/Die Grünen
 Jakob Blankenburg  | 1997-08-05 | Sozialdemokratische Partei Deutschlands
(5 rows)

Create a SERVER and FOREIGN TABLE to query the Getty Thesaurus SPARQL endpoint Non-Italians Who Worked in Italy:

Find non-Italians who worked in Italy and lived during a given time range

  • Having event that took place in tgn:1000080 Italy or any of its descendants
  • Birth date between 1250 and 1780
  • Just for variety, we look for artists as descendants of facets ulan:500000003 "Corporate bodies" or ulan:500000002 "Persons, Artists", rather than having type "artist" as we did in previous queries. In the previous query we used values{..} but we here use filter(in(..)).
  • Not having nationality aat:300111198 Italian or any of its descendants
CREATE SERVER getty
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (
  endpoint 'http://vocab.getty.edu/sparql.xml',
  format 'application/sparql-results+xml'
);

CREATE FOREIGN TABLE getty_non_italians (
  uri text   OPTIONS (variable '?x'),
  name text  OPTIONS (variable '?name'),
  bio text   OPTIONS (variable '?bio'),
  birth int  OPTIONS (variable '?birth')
)
SERVER getty OPTIONS (
  log_sparql 'true',
  sparql '
  PREFIX ontogeo: <http://www.ontotext.com/owlim/geo#>
  PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
  PREFIX gvp: <http://vocab.getty.edu/ontology#>
  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
  PREFIX schema: <http://schema.org/>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>

   SELECT ?x ?name ?bio ?birth {
  {SELECT DISTINCT ?x
    {?x foaf:focus/bio:event/(schema:location|(schema:location/gvp:broaderExtended)) tgn:1000080-place}}
 	 ?x gvp:prefLabelGVP/xl:literalForm ?name;
	    foaf:focus/gvp:biographyPreferred [
	    schema:description ?bio;
       	gvp:estStart ?birth].

  FILTER ("1250"^^xsd:gYear <= ?birth && ?birth <= "1780"^^xsd:gYear)
  FILTER EXISTS {?x gvp:broaderExtended ?facet.
  FILTER(?facet in (ulan:500000003, ulan:500000002))}
  FILTER NOT EXISTS {?x foaf:focus/(schema:nationality|(schema:nationality/gvp:broaderExtended)) aat:300111198}}
  '); 

In the following SQL query we can observe that:

  • the executed SPARQL query was logged.
  • all conditions were applied locally (rdf_fdw currently does not support sub selects).
  • the columns of the FOREIGN TABLE have only the required option variable, as the other options are only necessary for the pushdown feature.
SELECT name, bio, birth
FROM getty_non_italians
WHERE bio ~~* '%artist%'
ORDER BY birth 
LIMIT 10;

NOTICE:  SPARQL query sent to 'http://vocab.getty.edu/sparql.xml':

  PREFIX ontogeo: <http://www.ontotext.com/owlim/geo#>
  PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
  PREFIX gvp: <http://vocab.getty.edu/ontology#>
  PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
  PREFIX schema: <http://schema.org/>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>

   SELECT ?x ?name ?bio ?birth {
  {SELECT DISTINCT ?x
    {?x foaf:focus/bio:event/(schema:location|(schema:location/gvp:broaderExtended)) tgn:1000080-place}}
 	 ?x gvp:prefLabelGVP/xl:literalForm ?name;
	    foaf:focus/gvp:biographyPreferred [
	    schema:description ?bio;
       	gvp:estStart ?birth].

  FILTER ("1250"^^xsd:gYear <= ?birth && ?birth <= "1780"^^xsd:gYear)
  FILTER EXISTS {?x gvp:broaderExtended ?facet.
  FILTER(?facet in (ulan:500000003, ulan:500000002))}
  FILTER NOT EXISTS {?x foaf:focus/(schema:nationality|(schema:nationality/gvp:broaderExtended)) aat:300111198}}
  

                name                 |                                  bio                                  | birth 
-------------------------------------+-----------------------------------------------------------------------+-------
 Juán de España                      | Spanish artist and goldsmith, active 1455                             |  1415
 Coecke van Aelst, Pieter, the elder | Flemish artist, architect, and author, 1502-1550                      |  1502
 Worst, Jan                          | Dutch artist, active ca. 1645-1655                                    |  1605
 Mander, Karel van, III              | Dutch portraitist and decorative artist, 1608-1670, active in Denmark |  1608
 Ulft, Jacob van der                 | Dutch artist, 1627-1689                                               |  1627
 Fiammingo, Giacomo                  | Flemish artist, fl. 1655                                              |  1635
 Marotte, Charles                    | French artist, fl. ca.1719-1743                                       |  1699
 Troll, Johann Heinrich              | Swiss artist, 1756-1824                                               |  1756
 Beys, G.                            | French artist, fl. ca.1786-1800                                       |  1766
 Vaucher, Gabriel Constant           | Swiss artist, 1768-1814                                               |  1768
(10 rows)

Create a SERVER and FOREIGN TABLE to query the BBC Programmes and Music SPARQL endpoint (authors and their work)

CREATE SERVER bbc
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (
  endpoint 'https://lod.openlinksw.com/sparql',
  format 'application/sparql-results+xml'
);


CREATE FOREIGN TABLE artists (
  id text          OPTIONS (variable '?person',  nodetype 'iri'),
  name text        OPTIONS (variable '?name',    nodetype 'literal'),
  itemid text      OPTIONS (variable '?created', nodetype 'iri'),
  title text       OPTIONS (variable '?title',   nodetype 'literal'),
  description text OPTIONS (variable '?descr',   nodetype 'literal')
)
SERVER bbc OPTIONS (
  log_sparql 'true',
  sparql '
  PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
  PREFIX blterms: <http://www.bl.uk/schemas/bibliographic/blterms#>
  PREFIX dcterms: <http://purl.org/dc/terms/>
  PREFIX bibo:    <http://purl.org/ontology/bibo/>
  PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>

  SELECT *
  {
    ?person a foaf:Person ;
      foaf:name ?name ;
      blterms:hasCreated ?created .
    ?created a bibo:Book ;
      dcterms:title ?title ;
    dcterms:description ?descr
  } 
'); 

In the following SQL query we can observe that:

  • the executed SPARQL query was logged.
  • the SPARQL SELECT clause was reduced to the columns used in the SQL query
  • DISTINCT and WHERE and clauses were pushed down.
  • an ORDER BY was automatically pushded down due the use of DISTINCT
SELECT DISTINCT title, description 
FROM artists
WHERE name = 'John Lennon';

NOTICE:  SPARQL query sent to 'https://lod.openlinksw.com/sparql':

 PREFIX foaf:    <http://xmlns.com/foaf/0.1/>
 PREFIX blterms: <http://www.bl.uk/schemas/bibliographic/blterms#>
 PREFIX dcterms: <http://purl.org/dc/terms/>
 PREFIX bibo:    <http://purl.org/ontology/bibo/>
 PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?name ?title ?descr 
{
    ?person a foaf:Person ;
      foaf:name ?name ;
      blterms:hasCreated ?created .
    ?created a bibo:Book ;
      dcterms:title ?title ;
    dcterms:description ?descr
   FILTER(?name = "John Lennon")
}
ORDER BY  ASC (?title)  ASC (?descr)

                            title                             |                                                                      description                                
                                      
--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------
--------------------------------------
 Sometime in New York City                                    | Limited ed. of 3500 copies.
 All you need is love                                         | Originally published: 2019.
 The John Lennon letters                                      | Originally published: London: Weidenfeld &amp; Nicolson, 2012.
 Imagine John Yoko                                            | In slip case housed in box (37 x 29 x 8 cm).
 Imagine                                                      | Originally published: 2017.
 More Beatles hits arranged for ukulele                       | Words and music by John Lennon and Paul McCartney except Something, words and music by George Harrison.
 The Lennon play : In his own write the Lennon play           | 'Adapted from John Lennon's best-selling books "In his own write" and "A Spaniard in the works".' - Book jacket.
 More Beatles hits arranged for ukulele                       | Publishers no.: NO91245.
 The John Lennon letters                                      | Originally published: 2012.
 Imagine John Yoko                                            | Includes numbered officially stamped giclée print in clothbound portfolio case.
 Last interview : all we are saying, John Lennon and Yoko Ono | Previous ed.: published as The Playboy interviews with John Lennon and Yoko Ono. New York: Playboy Press, 1981; 
Sevenoaks: New English Library, 1982.
 John Lennon : drawings, performances, films                  | Published in conjunction with the exhibition "The art of John Lennon: drawings, performances, films", Kunsthalle
 Bremen, 21 May to 13 August 1995.
 John Lennon in his own write                                 | Originally published in Great Britain in1964 by Johnathan Cape.
 Sometime in New York City                                    | In box.
 Imagine John Yoko                                            | "This edition is limited to 2,000 copies worldwide, numbered 1-2,000, plus 10 copies retained by the artist, ins
cribed i-x"--Container.
 Last interview : all we are saying, John Lennon and Yoko Ono | This ed. originally published: London: Sidgwick &amp; Jackson, 2000.
 Imagine John Yoko                                            | Includes index.
 Sometime in New York City                                    | Includes index.
 A Spaniard in the works                                      | Originally published: Great Britain : Johnathan Cape, 1965.
 All you need is love                                         | Board book.
 The Playboy interviews with John Lennon and Yoko Ono         | Originally published: New York : Playboy Press, c1981.
 Skywriting by word of mouth                                  | Originally published: New York: HarperCollins; London: Jonathan Cape, 1986.
 John Lennon in his own write ; and a Spaniard in the works   | Originally published: 1964.
 John Lennon in his own write ; and a Spaniard in the works   | Originally published: 1965.
 Lennon on Lennon : conversations with John Lennon            | "This edition published by arrangement with Chicago Review Press"--Title page verso.
(25 rows)

Create a SERVER and FOREIGN TABLE to query the Wikidata SPARQL endpoint (Places that are below 10 meters above sea level and their geo coordinates)

CREATE SERVER wikidata
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (
  endpoint 'https://query.wikidata.org/sparql');

CREATE FOREIGN TABLE places_below_sea_level (
  wikidata_id text         OPTIONS (variable '?place'),
  label text               OPTIONS (variable '?label'),
  wkt geometry(point,4326) OPTIONS (variable '?location'),
  elevation numeric        OPTIONS (variable '?elev')
)
SERVER wikidata OPTIONS (
  log_sparql 'true',
  sparql '
  SELECT *
  WHERE
  {
    ?place rdfs:label ?label .
    ?place p:P2044/psv:P2044 ?placeElev.
    ?placeElev wikibase:quantityAmount ?elev.
    ?placeElev wikibase:quantityUnit ?unit.
    bind(0.01 as ?km).
    FILTER( (?elev < ?km*1000 && ?unit = wd:Q11573)
        || (?elev < ?km*3281 && ?unit = wd:Q3710)
        || (?elev < ?km      && ?unit = wd:Q828224) ).
    ?place wdt:P625 ?location.    
    FILTER(LANG(?label)="en")
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
  }
');   

In the following SQL query we can observe that:

  • the executed SPARQL query was logged.
  • the SPARQL SELECT clause was reduced to the columns used in the SQL query
  • the FETCH FIRST 10 ROWS ONLY was pushded down in a SPARQL LIMIT
SELECT wikidata_id, label, wkt
FROM places_below_sea_level
FETCH FIRST 10 ROWS ONLY;

INFO:  SPARQL query sent to 'https://query.wikidata.org/sparql':

SELECT ?place ?label ?location 
{
    ?place rdfs:label ?label .
    ?place p:P2044/psv:P2044 ?placeElev.
    ?placeElev wikibase:quantityAmount ?elev.
    ?placeElev wikibase:quantityUnit ?unit.
    bind(0.01 as ?km).
    FILTER( (?elev < ?km*1000 && ?unit = wd:Q11573)
        || (?elev < ?km*3281 && ?unit = wd:Q3710)
        || (?elev < ?km      && ?unit = wd:Q828224) ).
    ?place wdt:P625 ?location.    
    FILTER(LANG(?label)="en")
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
  }
LIMIT 10

               wikidata_id                |      label      |                    wkt                     
------------------------------------------+-----------------+--------------------------------------------
 http://www.wikidata.org/entity/Q61308849 | Tuktoyaktuk A   | 0101000000295C8FC2F5A060C0EC51B81E855B5140
 http://www.wikidata.org/entity/Q403083   | Ahyi            | 010100000041E3101111216240A9CDA7AAAA6A3440
 http://www.wikidata.org/entity/Q31796625 | Ad Duyūk        | 01010000003AE97DE36BB7414065A54929E8DE3F40
 http://www.wikidata.org/entity/Q54888910 | Lydd Library    | 010100000074B7EBA52902ED3F0C3B8C497F794940
 http://www.wikidata.org/entity/Q27745421 | Écluse de Malon | 0101000000E8F9D346757AFDBFB9FB1C1F2DE64740
 http://www.wikidata.org/entity/Q14204611 | Bilad el-Rum    | 0101000000D578E9263168394021C059B2793A3D40
 http://www.wikidata.org/entity/Q2888647  | Petza'el        | 0101000000F886DEBC9AB841408D05D940A7054040
 http://www.wikidata.org/entity/Q2888816  | Gilgal          | 0101000000272E0948E2B84140539C1F56EAFF3F40
 http://www.wikidata.org/entity/Q4518111  | Chupícuaro      | 0101000000F10DBD79356559C05C30283B4CAD3340
 http://www.wikidata.org/entity/Q2889475  | Na'aran         | 010100000069A1D4C627BA41409EE61CF084F73F40
(10 rows)

Create a SERVER and a FOREIGN TABLE to query the DBpedia SPARQL Geographic Information Systems

The rdf_fdw can also be used as a bridge between GIS (Geographic Information Systems) and RDF Triplestores. This example shows how to retrieve geographic coordinates of all German public universities from DBpedia, create a WKT (Well Known Text) representation of them, and finally import this data into QGIS to plot a map.

Note

This example requires the extension PostGIS.

CREATE SERVER dbpedia
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (endpoint 'https://dbpedia.org/sparql');

CREATE FOREIGN TABLE german_public_universities (
  id text                   OPTIONS (variable '?uri', nodetype 'iri'),
  name text                 OPTIONS (variable '?name',nodetype 'literal'),
  lon numeric               OPTIONS (variable '?lon', nodetype 'literal'),
  lat numeric               OPTIONS (variable '?lat', nodetype 'literal'),
  geom geometry(point,4326) OPTIONS (variable '?wkt', nodetype 'literal',
                                    expression 'CONCAT("POINT(",?lon," ",?lat,")") AS ?wkt')
) SERVER dbpedia OPTIONS (
  sparql '
    PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
    PREFIX dbp: <http://dbpedia.org/property/>
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX dbr:  <http://dbpedia.org/resource/>
    SELECT ?uri ?name ?lon ?lat
    WHERE {
      ?uri dbo:type dbr:Public_university ;
        dbp:name ?name;
        geo:lat ?lat; 
        geo:long ?lon; 
        dbp:country dbr:Germany
      }
  ');

Now that we have our FOREIGN TABLE in place, we just need to create a New PostGIS Connection in QGIS and go to Database > DB Manager ..., select the table we just created and query the data using SQL:

SELECT id, name, geom
FROM german_public_universities

unis

Finally give the layer a name, select the geometry column and press Load.

unis

Just like with an ordinary TABLE in PostgreSQL, it is possible to create and publish FOREIGN TABLES as WFS layers in GeoServer.

First create the FOREIGN TABLE:

CREATE SERVER wikidata
FOREIGN DATA WRAPPER rdf_fdw 
OPTIONS (endpoint 'https://query.wikidata.org/sparql');

CREATE FOREIGN TABLE museums_brittany (
  id text                   OPTIONS (variable '?villeId', nodetype 'iri'),
  label text                OPTIONS (variable '?museumLabel',nodetype 'literal'),
  ville text                OPTIONS (variable '?villeIdLabel', nodetype 'literal'),
  geom geometry(point,4326) OPTIONS (variable '?coord', nodetype 'literal')
) SERVER wikidata OPTIONS (
  sparql '
    SELECT DISTINCT ?museumLabel ?villeId ?villeIdLabel ?coord
    WHERE
    {
      ?museum wdt:P539 ?museofile. # french museofile Id
      ?museum wdt:P131* wd:Q12130. # in Brittany
      ?museum wdt:P131 ?villeId.   # city of the museum
      ?museum wdt:P625 ?coord .    # wkt literal      
      SERVICE wikibase:label { bd:serviceParam wikibase:language "fr". } # french label
    }
  ');

Then set up the Workspace and Store, go to Layers -> Add a new layer, select the proper workspace and go to Configure new SQL view... to create a layer create a layer with a native SQL statement:

SELECT id, label, ville, geom
FROM museums_brittany

geoserver

Afer that set the geometery column and identifier, and hit Save. Finally, fill in the remaining layer attributes, such as Style, Bounding Boxes and Spatial Reference System, and click Save - see this instructions for more details. After that you'll be able to reach your layer from a standard OGC WFS client, e.g. using QGIS.

geoserver-wfs

geoserver-wfs-map

To deploy rdf_fdw with docker just pick one of the supported PostgreSQL versions, install the requirements and compile the source code. For instance, a rdf_fdw Dockerfile for PostgreSQL 15 should look like this (minimal example):

FROM postgres:15

RUN apt-get update && \
    apt-get install -y make gcc postgresql-server-dev-15 libxml2-dev libcurl4-openssl-dev

RUN mkdir /extensions
COPY ./rdf_fdw-1.0.0.tar.gz /extensions/
WORKDIR /extensions

RUN tar xvzf rdf_fdw-1.0.0.tar.gz && \
    cd rdf_fdw-1.0.0 && \
    make -j && \
    make install

To build the image save it in a Dockerfile and run the following command in the root directory - this will create an image called rdf_fdw_image.:

 $ docker build -t rdf_fdw_image .

After successfully building the image you're ready to run or create the container ..

$ docker run --name my_container -e POSTGRES_HOST_AUTH_METHOD=trust rdf_fdw_image

.. and then finally you're able to create and use the extension!

$ docker exec -u postgres my_container psql -d mydatabase -c "CREATE EXTENSION rdf_fdw;"

If you're cool enough, try out the latest commits:

Dockerfile

FROM postgres:15

RUN apt-get update && \
    apt-get install -y git make gcc postgresql-server-dev-15 libxml2-dev libcurl4-openssl-dev

WORKDIR /

RUN git clone https://github.com/jimjonesbr/rdf_fdw.git && \
    cd rdf_fdw && \
    make -j && \
    make install

Deployment

 $ docker build -t rdf_fdw_image .
 $ docker run --name my_container -e POSTGRES_HOST_AUTH_METHOD=trust rdf_fdw_image
 $ docker exec -u postgres my_container psql -d mydatabase -c "CREATE EXTENSION rdf_fdw;"

About

PostgreSQL Foreign Data Wrapper for RDF Triplestores

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages