-
Notifications
You must be signed in to change notification settings - Fork 1
Home
The Mapper is an online service that allows the mapping of datasets to a predefined standard. The standard should be represented following a star schema model with one fact table.
The aim of this service is to help to harmonize data from different sources. The output of the Mapper is a set of files:
- One file with the list of columns for each file mapped
- A JSON file containing all the mapped entities and fields
- A JSON file containing all the mapped field values
The Mapper can be used as part of the software federation framework to federate data from heterogeneous data sources. The Mapper source-code can be downloaded [here](software federation framework).
A biobank federation project will make available for research all the samples that can be shared by biobanks in a given region. Each biobank in the region has its own idiosyncratic semantic and each biobank has its own data management system to manage resources and data. In order to share available samples, the biobanks are requested to provide csv files. One file with sample data, one file with samples collection information and one file with contact person information for the sample collections. Each biobank can provide the requested files but the columns names and values are those generated by their local data management systems. As results, datasets from different biobanks can differ a lot one each other in columns order, name amount of columns and values. To deal with this problem, the federation project defines the semantic of the federation and provides a standard in two csv files:
Each biobank uses the Mapper service and provides the federation framework with the results of the mapping. The software federation framework will use the mapper results to harmonize the datasets from each biobank and will provide an unified query tool to search for samples in all the biobanks in the federation.