Skip to content
roxmer edited this page Jul 20, 2017 · 20 revisions

Welcome to the Mapper wiki!

Introduction

The Mapper is an online service that allows the mapping of datasets to a predefined standard. The standard should be represented following a star schema model with one fact table.

The aim of this service is to help to harmonize data from different sources.

Input:

  • Standard files for entities-attributes and for attribute values
  • Datasets to be harmonized (csv files with header included)

Ouput:

  • One file with the list of columns for each file mapped
  • A JSON file containing all the mapped entities and fields
  • A JSON file containing all the mapped field values

The Mapper can be used as part of the software federation framework to federate data from heterogeneous data sources. The Mapper source-code can be downloaded here.

Example Use Case

A biobank federation project will make available for research all the samples that can be shared by biobanks in a given region. Each biobank in the region has its own idiosyncratic semantic and each biobank has its own data management system to manage resources and data. In order to share available samples, the biobanks are requested to provide csv files. One file with sample data, one file with samples collection information and one file with contact person information for the sample collections. Each biobank can provide the requested files but the columns names and values are those generated by their local data management systems. As results, datasets from different biobanks can differ a lot one each other in columns order, name amount of columns and values. To deal with this problem, the federation project defines the semantic of the federation and provides a standard in two csv files:

Each biobank uses the Mapper service and provides the federation framework with the results of the mapping. The software federation framework will use the mapper results to harmonize the datasets from each biobank and will provide an unified query web interfac to search for samples in all the biobanks in the federation.

Future improvements:

  • Add tab to select standard. It will allow having a central mapper for several defined standards.
  • Add tab for data transformation. For instance, to transform values from one unit to another (volume, speed, etc.)

Please, take a look at the User Guide to learn how to map data to a given standard:

Mapper User Guide

Clone this wiki locally