Skip to content

Latest commit

 

History

History
125 lines (97 loc) · 7.95 KB

README_code_architecture.md

File metadata and controls

125 lines (97 loc) · 7.95 KB

Maven Projects Dependencies

Morph-xR2RML consists of the following projects. Dependencies are noted top to down, i.e. the project on the top is the root on which all others depend.

                morph-core
                    |
             morph-xr2rml-lang
                    |
                morph-base
                  /    \
  morph-xr2rml-mongo   morph-xr2rml-rdb
               |         |
               \         /
            morph-xr2rml-dist

morph-core: brings major global definitions: constants, utility classes (properties, exceptions, RDF and XML manipulation and serialization), mixed syntax path utilities, SQL machinery (that would be better of in the morhpg-xr2rml-rdb project by this is legacy).

morph-xr2rml-lang: the model representing the xR2RML elements (extended from R2RML).

morph-base: abstract classes for major functions of the translation engine: runner factory, source reader, data translator, query translator etc., + various utility classes.

morph-xr2rml-mongo: implementation of the materialization and query rewriting engine for MongoDB.

morph-xr2rml-rdb: implementation of the materialization and query rewriting engine for SQL databases.

morph-xr2rml-dist: includes all MongoDb and RDB engines into a single jar, along with a main class (MorphRunner) and example databases, mapping and engine configuration files.

Main objects

Configuration

Several configuration files are provided in project morph-xr2rml-dist for both the materialization and the qery rewriting modes, and both MySQL and MongoDB database. The configuration files defaults to example_mysql/morph.properties.

Main objects factory

es.upm.fi.dia.oeg.morph.base.engine.IMorphFactory (project morph-base) serves as the provider of all main objects needed during the xR2RML processing, either in materialization or in query rewriting modes.

All those objects have an abstract version in project morph-base (package es.upm.fi.dia.oeg.morph.base) and a concrete implementation in projects morph-xr2rml-rdb and morph-xr2rml-mongo.

The es.upm.fi.dia.oeg.morph.base.engine.MorphBaseRunnerFactory (project morph-base) is an abstract implementation of IMorphFactory: it builds all main objects relying on database-specific implementations in MorphRDBRunnerFactory and MorphMongoRunnerFactory:

  • A properties object: es.upm.fi.dia.oeg.morph.base.MorphProperties holds members for each property definable in the configuration file.
  • A mapping document (R2RMLMappingDocument): load the xR2RML mapping file and create an R2RMLMappingDocument that consists of a set of triples maps (R2RMLTriplesMap).
  • A database connection embedded into a GenericConnection object.
  • An unfolder (MorphBaseUnfolder) creates an database query from a triples map (see MorphRDBUnfolder and MorphMongoUnfolder).
  • A data source reader (MorphBaseDataSourceReader) provides methods to open, configure and close the database connection, run queries against the connection, and possibly manage cache strategies.
  • A data translator (MorphBaseDataTranslator) provides utility methods to translate database query results into RDF terms and RDF triples. In the data materialization case, it uses the data source reader to execute queries created by the unfolder.
  • A data materializer (MorphBaseMaterializer, in project morph-core) consists of a properly initialized JENA model (name space, etc.) either in memory or persisted. The model stores statements created from subjects, predicates and objects. It can serialize triples in different syntaxes (RDF/XML, Turtles, N-Triples etc.).
  • A query translator (MorphBaseQueryTranslator) rewrites a SPARQL query into the target database query language.
  • A query result processor (MorphBaseQueryResultProcessor) runs the query, generated by the query translator, against the database connection and translates the results into a SPARQL response.

Execution entry point

fr.unice.i3s.morph.xr2rml.engine.MorphRunner (project morph-xr2rml-dist) provides the main class to run the process:

  • Load the configuration file and create the MorphProperties object,
  • Create the concrete instance of MorphBaseRunnerFactory whose distinguished name is provided in the configuration file (property runner_factory.class.name),
  • Get a MorphBaseRunner from the factory and run it.

Processing steps

Below we describe the architecture of classes regarding the treatment of RDBs. This is easily adapted to the case of MongoDB.

Materialization process

The description below talks about SQL queries but the very same process can be translated to MongoDB.

For each triples map (R2RMLTriplesMap) of the mapping document (R2RMLMappingDocument):

  • Unfold the triples map (MorphRDBUnfolder.unfoldTriplesMap): unfolding means to progressively build an SQL query by accumulating pieces from different components of the triples map:
    • create the FROM clause with the logical table
    • for each column in the subject, predicate and object maps, add items to the SELECT clause
    • for each column in the parent triples map of each referencing object map, add items of the SELECT clause
    • for each join condition, add an SQL WHERE condition and an alias in the FROM clause for the parent table
    • for each column of each join condition, add items to the SELECT clause
  • Then the data translator (MorphRDBDataTranslator < MorphBaseDataTranslator) runs the query against the database and builds triples from the results. For each row of the result set:
    • Create a subject resource, and optionally a graph resource if the subject map contains a rr:graph/rr:graphMap property.
    • Loop on each predicate-object map: create a list of resources for the predicates, a list of resources for the objects, a list of resources from the subject map of a parent object map in case there are referencing object maps, and a list of resources representing target graphs mentioned in the predicate-object map.
    • Finally combine all subject, graph, predicate and object resources to generate triples. Once all triples have been created in the model, the materializer is used to write them to the output file. The materializer was initially implemented to write triples in the file along their generation, thus avoiding memory space issue. However in the case of xR2RML, the RDF lists and containers makes it necessary to wait until the end (when all is generated) to be able to serialize the data to a file.

Query rewriting approach

The SPARQL-to-SQL query rewriting is that of Morph-RDB, we do not describe it here. Below we focus on the SPARQL-to-MongoDB method.

The MorphBaseRunner orchestrates the process:

Query translation

MorphBaseQueryTranslator.translate() is the entry point of the process.

  1. The MorphBaseTriplePatternBinder.bindm method parses the SPARQL query and figures out the candidate triples maps for each of the triple patterns.
  2. The MorphMongoQueryTranslator translates the query into an AbstractQuery using helper methods of MorphBaseQueryTranslator: transTPm, genProjection, genCond etc. The abstract query is optimized by eliminating self-joins, self-unions and propagating filters.
  3. Each atomic abstract query (fr.unice.i3s.morph.xr2rml.mongo.abstractquery.AbstractAtomicQuery) is translated into one or several MongoDB queries. In case several queries are produced, their results must be UNIONed by the query evaluation engine (next step)
Query evaluation
  1. The MorphMongoQueryResultProcessor runs the MongoDB query/queries against the database.
  2. The data translator (MorphBaseDataTranslator/MorphMongoDataTranslator) translates the results into triples according to triples map, and performs operations not supported by MongoDB: joins, unions. This comes up with a temporary result graph.
  3. The MorphMongoQueryResultProcessor evaluates the SPARQL query against the result graph and serializes the result into the output file.