Skip to content

RKI Metadata Exchange | Collection of ETL-pipelines to get metadata from various sources, transform it into a common format and load it into configurable sinks.

License

Notifications You must be signed in to change notification settings

robert-koch-institut/mex-extractors

Repository files navigation

MEx extractors

ETL pipelines for the RKI Metadata Exchange.

cookiecutter cve-scan documentation linting open-code testing

Project

The Metadata Exchange (MEx) project is committed to improve the retrieval of RKI research data and projects. How? By focusing on metadata: instead of providing the actual research data directly, the MEx metadata catalog captures descriptive information about research data and activities. On this basis, we want to make the data FAIR1 so that it can be shared with others.

Via MEx, metadata will be made findable, accessible and shareable, as well as available for further research. The goal is to get an overview of what research data is available, understand its context, and know what needs to be considered for subsequent use.

RKI cooperated with D4L data4life gGmbH for a pilot phase where the vision of a FAIR metadata catalog was explored and concepts and prototypes were developed. The partnership has ended with the successful conclusion of the pilot phase.

After an internal launch, the metadata will also be made publicly available and thus be available to external researchers as well as the interested (professional) public to find research data from the RKI.

For further details, please consult our project page.

Contact
For more information, please feel free to email us at [email protected].

Publisher

Robert Koch-Institut
Nordufer 20
13353 Berlin
Germany

Package

The mex-extractors package implements a variety of ETL pipelines to extract metadata from primary data sources using a range of different technologies and protocols. Then, we transform the metadata into a standardized format using models provided by mex-common. The last step in this process is to load the harmonized metadata into a sink (file output, API upload, etc).

License

This package is licensed under the MIT license. All other software components of the MEx project are open-sourced under the same license as well.

Development

Installation

Linting and testing

  • run all linters with pdm lint
  • run only unit tests with pdm unit
  • run unit and integration tests with pdm test

Updating dependencies

  • update boilerplate files with cruft update
  • update global requirements in requirements.txt manually
  • update git hooks with pre-commit autoupdate
  • update package dependencies using pdm update-all
  • update github actions in .github/workflows/*.yml manually

Creating release

  • run pdm release RULE to release a new version where RULE determines which part of the version to update and is one of major, minor, patch.

Container workflow

  • build image with make image
  • run directly using docker make run
  • start with docker compose make start

Commands

  • run pdm run {command} --help to print instructions
  • run pdm run {command} --debug for interactive debugging

dagster

  • pdm run dagster dev to launch a local dagster UI

all extractors

  • pdm run all-extractors executes all extractors
  • execute only in local or dev environment

artificial extractor

  • pdm run artificial creates deterministic artificial sample data
  • execute only in local or dev environment

biospecimen extractor

  • pdm run biospecimen extracts sources from the Biospecimen excel files

blueant extractor

  • pdm run blueant extracts sources from the Blue Ant project management software

confluence-vvt extractor

  • pdm run confluence-vvt extracts sources from the VVT confluence page

datscha-web extractor

  • pdm run datscha-web extracts sources from the datscha web app

ff-projects extractor

  • pdm run ff-projects extracts sources from the FF Projects excel file

ifsg extractor

  • pdm run ifsg extracts sources from the ifsg data base

international-projects extractor

  • pdm run international-projects extracts sources from the international projects excel

grippeweb extractor

  • pdm run grippeweb extracts grippeweb metadata from grippeweb database

odk extractor

  • pdm run odk extracts ODK survey data from excel files

organigram extractor

  • pdm run organigram extracts organizational units from JSON file

rdmo extractor

  • pdm run rdmo extracts sources from RDMO using its REST API

seq-repo extractor

  • pdm run seq-repo extracts sources from seq-repo JSON file

sumo extractor

  • pdm run sumo extract sumo data from xlsx files

synopse extractor

  • pdm run synopse extracts synopse data from report-server exports

voxco extractor

  • pdm run voxco extracts voxco data from voxco JSON files

Footnotes

  1. FAIR is referencing the so-called FAIR data principles – guidelines to make data Findable, Accessible, Interoperable and Reusable.

About

RKI Metadata Exchange | Collection of ETL-pipelines to get metadata from various sources, transform it into a common format and load it into configurable sinks.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Languages