Skip to content
roxmer edited this page Mar 13, 2017 · 20 revisions

Welcome to the Mapper wiki!

Introduction

The Mapper is an online service that allows the mapping of datasets to a predefined standard. The standard follows a star schema model with one fact table. The aim of this service is to help to harmonize data from different sources. The output of the Mapper is a set of files:

  • One file with the list of columns for each file mapped
  • A JSON file containing all the mapped entities and fields
  • A JSON file containing all the mapped field values

Example Use Case

A biobank federation project will make available for research all the samples that can be shared by biobanks in a given region. Each biobank in the region has its own idiosyncratic semantic and each biobank has its own data management system to manage resources and data. In order to share available samples, the biobanks are requested to provide csv files. One file with sample data, one file with samples collection information and one file with contact person information for the sample collections. Each biobank can provide the requested files but the columns names and values are those generated by their local data management systems. As results, datasets from different biobanks can differ a lot one each other. To deal with this problem, the federation project defines the semantic of the federation and provides a standard in two csv files:

Each biobank uses the Mapper service and provides the federation framework with the results of the mapping. The ETL tool of the federation will use the mapper results to harmonize the datasets from each biobank and will provide a unified query tool to search for samples in all the biobanks in the federation.

Clone this wiki locally