Opal-DataSHIELD ecosystem deployment and usage for the MDR-RA project - External Partners Server Setup
This document outlines the deployment process, configuration, initialization, and usage of the OBiBa Opal-DataSHIELD ecosystem for external partners, as required by the MDR-RA project. To ensure portability and reproducibility, the deployment leverages Docker, a containerization platform that enhances the ecosystem's portability and ease of setup across various environments.
Overview of the Opal-DataSHIELD Architecture
The Opal-DataSHIELD ecosystem is designed as a federated system, where a client communicates with multiple Opal servers to enable secure and privacy-preserving analysis of sensitive or distributed data. This architecture ensures that individual-level data remains at its source, mitigating privacy risks. Instead of sharing raw data, only aggregate or non-disclosive information is exchanged between the servers and the client, as illustrated in the figure below.
For detailed documentation, refer to the appropriate sections of this Document.
0 - Quick Deployment
1 - System deployment
  1.1 - Minimum Hardware Specifications
  1.2 - Prerequisites
    1.2.1 - Operating System Requirements
    1.2.2 - Docker engine deployment
    1.2.3 - Make installation
  1.3 - Deploying Opal-DataSHIELD ecosystem
    1.3.1 - Downloading and cloning the repository
    1.3.2 - Prerequisites for ecosystem deployment
    1.3.3 Ecosystem deployment
2 - Quick Opal-DataSHIELD ecosystem test
3 - Working with Opal-DataSHIELD ecosystem
  3.1 Log In and Test the Opal-DataSHIELD Web Interface
  3.2 Create a New User Profile (Administrator-Only Task)
  3.3 Enable Two-Factor Authentication (2FA)
  3.4 Change User Permissions
    3.4.1 Change System Permissions (Administrator Only)
    3.4.2 Change Project Permissions
    3.4.3 Change Table Permissions
    3.4.4 Change DataSHIELD Permissions (Administrator Only)
  3.5 Manage Folders and Files
  3.6 Create and Manage Projects
  3.7 Basic DataSHIELD Client Authentication and Data Access
4 - System Security Management
5 - Support
6 - Credits
Before proceeding with the ecosystem deployment, ensure that you meet all the hardware requirements and prerequistes are met before starting the ecosystem deployment. If you're uncertain whether your setup complies with these recommendations, review the respective sections to verify and make any necessary adjustments before continuing.
Step 1: Clone the repository
Begin by downloading the repository to access the necessary deployment scripts and resources. Use the following command to clone the repository:
git clone https://github.com/InfOmics/MDR-RA-Opal-DataSHIELD-documentation.git
Step 2: Navigate to the Repository Directory
Once cloned, navigate to the directory containing the repository files:
cd MDR-RA-Opal-DataSHIELD-documentation
Step 3: Obtain SSL certficates
Move or copy the obtained SSL certificates to the certs directory within your project folder:
cp mycert.pem certs/ # or mv mycert.pem certs/
Step 4: Configure Deployment Environment Variables
Navigate to the project folder and set up the environment variables required for
deployment. Open MDR_RA.env
and modify the following line, replacing
example.domain.com
with your actual domain name (e.g.my.domain.com
):
IP_DOMAIN=my.domain.com # replaces IP_DOMAIN=myDomainAddress
Save and close the file.
Step 5: Deploy the ecosystem
With the SSL certificates in place and environment variables configured, you're
ready to deploy. Use the provided make
command:
make deploy
This command will:
- Validate the environment setup.
- Configure the services to use the SSL certificates.
- Start the deployment process.
For a quick post-deployment verification, see section 2.
This section provides a comprehensive guide for deploying the OBiBa Opal-DataSHIELD ecosystem on a Linux environment. The deployment process will be discussed in detail, including prerequisites such as the required operating system (OS) and recommended version, as well as the installation of necessary external dependencies. For a smooth setup, Docker and Docker Compose are essential components, enabling efficient container management and orchestration for DataSHIELD services. In the sections below, we will cover each prerequisite and guide you through the installation and configuration steps needed to prepare your environment for deploying the Opal-DataSHIELD ecosystem.
To ensure optimal performance and stability, the following minimum hardware specifications are recommended:
- Memory (RAM): 32 GB or higher
- Storage: 500 GB HDD or SSD (SSD preferred for faster read/write operations)
- CPU: 16 cores or more
- GPU (Optional): While not required, the system may benefit from a compatible GPU for accelerated processing, especially for model training and other compute-intensive tasks.
NOTE-1: These specifications are intended to handle large datasets and intensive computational processes. Lower configurations may experience reduced performance or stability issues.
NOTE-2: These specifications apply to both host and tenant machines, though each has specific needs:
-
Host Machine: As the main data storage location, the host should prioritize having ample storage resources to manage extensive datasets effectively.
-
Tenant Machine: Primarily responsible for sending instructions and processing tasks, the tenant should focus on higher computational resources to manage potentially high volumes of concurrent requests to the host.
Before proceeding with the deployment of the OBiBa Opal-DataSHIELD ecosystem, ensure that your system meets the following prerequisites. Below, we describe each prerequisite and provide guidance on how to install them or troubleshoot any missing components:
- Ubuntu >= 22.04.3 LTS (Jammy Jellyfish)
- Ubuntu Terminal
- Docker >= 27.1.1
- Docker-compose
- make
To ensure optimal performance and compatibility, the OBiBa Opal-DataSHIELD ecosystem requires Ubuntu 22.04.3 LTS (Jammy Jellyfish) or a later version to operate correctly. This specific version of Ubuntu is recommended due to its long-term support and stability (LTS), providing a robust foundation for deploying the necessary services. Additionally, it is crucial to have access to an Ubuntu terminal, as this will be the primary interface for executing commands throughout the deployment process.
For users who need to obtain Ubuntu 22.04.3 LTS, the operating system can be downloaded from the official Ubuntu website. Follow the instructions to create a bootable USB drive or DVD, or consider installing it in a virtual machine if you prefer to run it alongside your existing operating system.
Once Ubuntu is installed, you can access the terminal by searching for "Terminal" in the applications menu or by using the keyboard shortcut Ctrl + Alt + T. The terminal will be the primary tool for executing the commands required for the deployment of the OBiBa Opal-DataSHIELD ecosystem.
Docker is an open-source platform that enables developers to automate the deployment, scaling, and management of applications within lightweight, portable containers. Containers package an application and its dependencies, ensuring consistent performance across various environments, whether on local machines, virtual machines, or cloud infrastructure. By leveraging Docker, users can streamline development workflows, enhance collaboration, and maintain system integrity while deploying complex applications like the OBiBa Opal-DataSHIELD ecosystem. Its robust ecosystem also includes tools like Docker Compose, which simplifies the management of multi-container applications through a user-friendly configuration format.
Docker Compose is a powerful tool designed to simplify the management of multi-container
Docker applications. It allows users to define and run multiple containers as a single
application using a simple YAML configuration file, known as
docker-compose.yml
. This file specifies the services, networks, and volumes required
for the application, along with their respective configurations and dependencies.
With Docker Compose, users can easily start, stop, and manage all containers with a single command, streamlining the development and deployment process. It also facilitates environment consistency, making it easier to replicate complex setups across different stages of development, testing, and production. This makes Docker Compose particularly useful for applications that require multiple interconnected services, such as databases, web servers, and application servers, enabling a seamless orchestration of the entire application stack.
Below is a description of the steps required to install Docker and Docker Compose on the machine designated for deploying the OBiBa Opal-DataSHIELD ecosystem. These instructions will guide you through the installation process, ensuring that your environment is correctly set up for optimal performance and functionality.
The deployment of the Opal-DataSHIELD ecosystem requires the installation of the Docker Engine on the host machine. For detailed instructions on installing the Docker Engine, you can refer to the official documentation at Docker Engine Installation.
NOTE-1: The following steps assume that the user has sudo
privileges to execute the
necessary commands for the installation process.
Before installing Docker, it is essential to remove any conflicting packages that might interfere with the installation. To uninstall any potential conflicts, run the following command:
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
NOTE-2: The apt-get
command may report that none of these packages are installed.
NOTE-3: Older images, containers, volumes, and networks are stored in
/var/lib/docker/
and are not automatically removed when uninstalling Docker. To
start with a clean installation, consider cleaning up existing data by following
the guidelines provided here.
Before proceeding with the installation of Docker Engine, you must set up the Docker repository. This allows you to install and update Docker directly from the repository.
To set up Docker's apt repository, execute the following commands:
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
Once the repository is set up, you can install the Docker packages using the following command: sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
To verify that Docker has been installed successfully, you can run the hello-world image with the following command:
sudo docker run hello-world
This command downloads a test image and executes it in a container. Upon running, the container will print a confirmation message indicating that the installation was successful, and then it will exit.
After successfully installing Docker, it is important to add your user to the Docker
group. This allows the user to run Docker commands without needing to prefix them with
sudo
, enhancing convenience and streamlining workflow.
Use the following command to add your user to the Docker group, replacing <username>
with your actual username:
sudo usermod -aG docker <username>
After executing the command, you need to log out and log back in for the changes to take effect. Alternatively, you can also restart your terminal session.
To verify that the user has been added to the Docker group, you can run:
groups <username>
This command will list the groups associated with your user account, and you should see "docker" included in the output.
By adding your user to the Docker group, you can now run Docker commands directly without needing elevated privileges, simplifying your interactions with the Docker ecosystem.
To verify that Docker Compose is installed and available on your system, run the following command:
docker-compose --version
This command will display the installed version of Docker Compose if it is correctly installed. Seeing the version number confirms that Docker Compose is available and ready for use. If you receive an error or no version information, you may need to check your installation steps.
The make
tool, commonly used in automation and build processes, is highly valuable
for deploying Docker images, especially in systems like OBiBa Opal-DataSHIELD. In this
setup, make
simplifies deployment by handling repetitive commands through a concise
set of instructions within a Makefile
.
Using make
for Docker deployment in Opal-DataSHIELD, three key commands are defined:
make deploy
– This command streamlines the initial deployment process, executing a series of Docker commands to build and run containers as specified in the Docker configuration.make up
– This command brings the server online, handling container startup and initialization processes to ensure that the DataSHIELD server is ready for operations.make stop
– This command halts the server, stopping all relevant containers and freeing resources.
By organizing these operations within make
, deployment becomes simpler, faster, and
more reliable, reducing manual command entry and mitigating risks of human error in
deployment steps.
Below is a description of the steps required to install make
on the machine designated
for deploying the OBiBa Opal-DataSHIELD ecosystem. These instructions will guide you
through the installation process, ensuring that your environment is correctly set up
for optimal performance and functionality.
The deployment of the Opal-DataSHIELD ecosystem requires the installation of make
on the
host machine.
NOTE-1: The following steps assume that the user has sudo
privileges to execute the
necessary commands for the installation process.
Before installing make
we need to update our OS by using the following command:
sudo apt update
Before starting installation, we may check whether make
is already installed or not.
Often make
package can be included by default in Ubuntu distros, so we need to check
if it is already installed before proceeding. We can check it by running this command:
make -version
If an error message is displayed, then the package is not installed on the machine. To
install make
, type the following command:
sudo apt install make
To verify the installation, we may perform a double-check. First, we can verify the
make
binary location by typing:
ls /usr/bin/make
If no error message is displayed we can proceed with the second check:
make -version
If no error message is displayed, the make
has been correctly installed on your
machine. Otherwise, you may need to check your installation steps.
After successfully installing all the required dependencies for the Opal-DataSHIELD ecosystem, you are ready to begin the system deployment process. This section provides a comprehensive, step-by-step guide to deploying the ecosystem, ensuring a smooth and efficient setup.
Each step is designed to help you configure the necessary components, initialize the environment, and verify that the ecosystem is fully operational. By following these instructions, you will be able to deploy the Opal-DataSHIELD ecosystem effectively, enabling secure and privacy-preserving data analysis across distributed environments.
The first step in deploying the Opal-DataSHIELD ecosystem is to download and clone the repository containing all the necessary files and resources required for the setup. This repository includes configuration files, deployment scripts, documentation, and other essential components that will guide you through the deployment process.
Cloning the repository ensures that you have the latest version of the files, along
with the proper directory structure, to streamline the setup. It is important to
ensure that Git
is installed on your system before proceeding. If Git
is not
installed, refer to the official
installation guide
for the Ubuntu OS.
To download the repository, follow these steps:
-
Open a terminal window.
- On Ubuntu systems, you can find the terminal by searching for "Terminal" in the application menu.
- Alternatively, use the keyboard shortcut
Ctrl + Alt + T
on Ubuntu to open the terminal directly
-
In the terminal, enter the following command to clone the repository:
git clone https://github.com/InfOmics/MDR-RA-Opal-DataSHIELD-documentation.git
This command will download the entire repository to your local machine.
-
Once the cloning process is complete, navigate to the downloaded directory to view its contents using the following command:
cd MDRA-RA-Opal-DataSHIELD-documentation
The downloaded repository will be organized as follows:
-
certs/
: Contains the SSL/TLS certificates required to enable HTTPS for secure communication between components. -
src/
: Includes scripts and utilities essential for deploying and managing the ecosystem. These scripts automate various setup and maintenance tasks, ensuring a streamlined deployment process. -
MakeFile
: AMakefile
defines the rules and targets for deploying, configuring, and managing the system. Using this file, you can execute deployment steps with simple commands, such asmake deploy
,make up
ormake stop
. -
docker-compose.yml
: The Docker Compose configuration file specifies the services, networks, and volumes required to deploy the Opal-DataSHIELD ecosystem. It orchestrates the setup of containers, ensuring they are properly connected and configured. -
traefik.yml
: Configuration file for Traefik, the reverse proxy and load balancer used in the system. This file defines routing rules, SSL termination, and other settings to manage external access to the deployed services. -
MDR_RA.env
: A.env
file containing environment variables needed for deployment. These variables define configuration options such as service ports, authentication credentials, and paths to resources. -
LICENSE
: Provides the licensing terms under which the repository can be used, modified, and distributed.
Before proceeding with the deployment of the Opal-DataSHIELD ecosystem, certain configurations must be in place to ensure the system is accessible and secure:
- Obtain SSL certificates
To enable secure access to the ecosystem through the HTTPS protocol, you must first obtain valid SSL certificates. These certificates ensure encrypted communication between users and the ecosystem, safeguarding sensitive data. Be sure that certificates match the port and domain name designated for the ecosystem deployment.
-
If you do not already have SSL certificates, you can generate them or purchase them from a trusted certificate authority (CA).
-
Once obtained, place the certificates in the
certs/
directory of the cloned repository.-
Ensure that the files are named appropriately and match the configuration requirements outlined in Section 1.3.1 of the documentation.
-
Verify that the certificates are accessible and have the necessary file permissions to be used during deployment.
-
- Set Ports and IP address/domain
The next step is to configure the ports and the IP address or domain name that the ecosystem will use to expose its services to the web.
-
Open the
MDR_RA.env
file located in the root directory of the repository. -
Locate the section defining the Opal administrator password and replace the placeholder with the desired password. For example:
OPAL_ADMINISTRATOR_PASSWORD=administrator # replace default value
-
Locate the sections defining the ports and replace the placeholders with the desired port numbers. For example:
HTTP_PORT=80 # replace default value HTTPS_PORT=443 # replace default value TRAEFIK_PORT=8080 # replace default value
-
Locate the section defining the domain name and replace the placeholder with the desired domain name. This is crucial for generating URLs and ensuring services are reachable. For example:
IP_DOMAIN=my.domain.com # replace default value IP_DOMAIN=myDomainAddress
-
Locate the section defining the file names of certificates and replace the placeholder with the desired file names. For example:
CERT_FILE=fullchain.pem # replace default value KEY_FILE=privkey.pem # replace default value
Refer to Section 3 in the
Advanced Topics Documentation for detailed instructions on editing
docker-compose.yml
.
- Verify configuration
-
Ensure that the selected ports are open and not being used by other applications.
-
Verify that the IP address or domain name is correctly configured in your DNS settings, if applicable.
With the SSL certificates in place and the network settings configured, you are ready to proceed to the deployment phase. Proper setup of these elements is essential for ensuring a secure, accessible, and functional ecosystem.
Once all configurations have been properly set up, including SSL
certificates, ports, and IP address/domain settings, you can proceed with the
deployment of the Opal-DataSHIELD ecosystem. The deployment process is
streamlined using the Makefile
provided in the repository, which automates
the required steps.
To start the deployment, open a terminal, navigate to the repository directory, and type the following command:
make deploy
The make deploy
command initiates a sequence of operations that automate the deployment process:
-
Execution of Deployment Script:
-
This script from the
src/
directory is executed to prepare and configure the environment. -
This script handles tasks such as setting up directories, configuring services, and verifying dependencies.
-
-
Docker Image Creation:
-
The command builds the Docker images defined in the
docker-compose. yaml
file. -
Settings from the
MDR_RA.env
file are used to configure these images, ensuring the ecosystem is tailored to your specifications.
-
-
Service Startup:
- The Opal server is launched, making it accessible to external users through the configured HTTPS protocol.
-
Integration of R Server (ROCK):
- The ROCK Docker image, which provides an R environment for executing DataSHIELD analysis, is added to the ecosystem.
After deploying the system, you can verify the following to quickly ensure successful deployment:
-
Opal Server Accessibility:
-
Open a web browser and navigate to the configured domain or IP address using HTTPS (e.g., https://your-domain.com).
-
Confirm that the Opal login page is visible and accessible.
-
-
Logs and Errors:
- Monitor the terminal output for any error messages during deployment.
- If any issues occur, consult the logs/ directory or the documentation for troubleshooting steps.
With the ecosystem now deployed, you are ready to begin using Opal and DataSHIELD for secure and collaborative data analysis. For additional information on configuring users, datasets, or DataSHIELD analyses, refer to Section 3 in the documentation.
This section provides an introduction to the essential operations for managing the Opal-DataSHIELD ecosystem. It covers fundamental tasks such as logging in, creating and managing users, enabling two-factor authentication (2FA) for added security, and configuring user permissions.
To access the Opal web interface, open a browser and navigate to:
https://your-domain.com
Log in using the appropriate user credentials.
NOTE-1: For the initial deployment, log in as the administrator using the
default username administrator
and the password set in the MDR_RA.env
configuration file during setup.
This step ensures the Opal web interface is functioning correctly and ready for further configuration.
To create a new user profile, follow these steps:
-
Access the Administration page by clicking on the "Administration" tab in the Opal web interface.
-
Navigate to the "Users and Groups" section.
-
Click the "+ ADD" button to open the user creation form.
-
Enter the required details for the new user, including a username and password.
-
Save the changes to complete the user creation process.
This functionality is restricted to system administrators and ensures secure and controlled access to the Opal-DataSHIELD ecosystem.
To enhance your account's security by enabling Two-Factor Authentication (2FA), follow these steps:
-
Log in to the Opal web interface using your credentials.
-
Click on "My Profile" in the top-right menu to access your account settings.
-
Scroll down to the "Two-Factor Authentication" section.
-
Click the "Enable 2FA" button to start the setup process.
-
Follow the on-screen instructions to complete the configuration, which may include scanning a QR code with an authenticator app and verifying the generated code.
Once enabled, 2FA adds an extra layer of protection, requiring a one-time code in addition to your password for future logins.
Managing user permissions in the Opal-DataSHIELD ecosystem allows administrators to control access to system features, projects, tables, and DataSHIELD operations. The following sections outline how to modify permissions at different levels.
-
Access the Administration page.
-
Navigate to the General Settings section.
-
Click on the Permissions tab and then the + ADD button to assign new permissions.
-
Enter the user's name in the input field.
-
Select the desired permission level from the available options (e.g., read, write, admin).
-
Click Submit to save changes.
-
Go to the Projects page and select the project for which permissions need to be modified.
-
Open the Permissions tab within the selected project.
-
Click the + ADD button to add a new permission.
-
Enter the user's name and choose the appropriate permission level.
-
Submit the changes to update the permissions.
-
Navigate to the Projects page and select the desired project.
-
Choose the table within the project for which permissions need to be modified.
-
Open the Permissions tab under the table settings.
-
Click the + ADD button to assign permissions.
Enter the user's name, select the permission type, and submit your changes.
-
Access the Administration page and go to the DataSHIELD section.
-
Open the Permissions tab.
-
Click the + ADD button to assign DataSHIELD-specific permissions.
-
Enter the user's name in the input field.
-
Select the permission level (e.g., allow access to specific DataSHIELD operations).
-
Save the changes by clicking Submit.
Efficiently organize and upload data into the user file system by creating folders and uploading files. Follow the steps below:
-
Create a New Folder
-
Navigate to the Files page within the Opal interface.
-
Click the + ADD FOLDER button.
-
Enter a name for the folder and confirm to create it.
-
-
Upload Files
-
Open the newly created folder or any existing folder where the file should be uploaded.
-
Click the UPLOAD button.
-
Select the file from your local machine and confirm the upload.
-
NOTE-2: Ensure the uploaded file is in a compatible tabular format for proper processing.
Projects in the Opal-DataSHIELD ecosystem serve as containers for data tables and related configurations. Follow these steps to create and populate projects:
-
Create a New Project
-
Navigate to the Projects page by selecting the Projects tab.
-
Click the + ADD button to initiate project creation.
-
Enter the project Name (required) in the input field.
-
Optionally, provide a Title and Description to give context to the project.
-
Confirm to create the project.
-
-
Import Data Tables
-
Open the newly created project by clicking on its name in the Projects list.
-
Expand the Tables (views) section by clicking the arrow next to it.
-
Click the IMPORT button and choose Import from file.
-
Select the desired data format (e.g., CSV, Excel) and proceed.
-
Click Select, navigate to the file's location in the system, and upload it.
-
Follow the prompts to configure and finalize the import process.
-
NOTE-3: Ensure the data file is clean and formatted correctly to avoid issues during import.
To verify the functionality of the DataSHIELD client, follow these steps to connect to your Opal server and retrieve basic data statistics, open R and run:
# DataSHIELD user
# load libraries
library(DSI)
library(DSOpal)
library(dsBaseClient)
# connection builder
builder <- DSI::newDSLoginBuilder()
builder$append(server = "server1", url = "https://your-domain.com",
user = "user", password = "password",
driver = "OpalDriver",
options='list(ssl_verifyhost=0, ssl_verifypeer=0)')
logindata <- builder$build()
# log in
connections <- DSI::datashield.login(logins = logindata, assign = TRUE)
# data access
# NOTE:
# - "TEST" is the project's name
# - "airway" is the table's name
DSI::datashield.assign.table(conns = connections, symbol = "Example",
table = c("TEST.airway"))
# table dimensions
ds.dim(x = 'Example')
# table column names
ds.colnames(x = 'Example')
# log out
DSI::datashield.logout(connections)
The security of the Opal-DataSHIELD ecosystem used in the context of the MDR-RA project is tested and maintained by the project partner, Pluribus One, a provider of advanced cybersecurity solutions. Their expertise ensures that our systems remain robust and resilient against potential threats.
For support or security-related inquiries, please feel free to reach out to the dedicated referents:
- Fabio Roli
Email: [email protected]
Pluribus One is committed to upholding the highest standards of security, and we encourage you to contact them for any concerns or assistance.
For support, please contact:
-
Email: [email protected]
For any issues, questions, or further assistance, reach out to our dedicated support team via email -
Referents:
- Rosalba Giugno: Principal Investigator
- Manuel Tognon: Post-Doctoral Researcher
-
Project Lead:
- Prof. Rosalba Giugno [email protected]
-
Development Team:
- Simone Avesani, PhD [email protected]
- Gospel Ozioma Nnadi [email protected]
-
Documentation:
- Manuel Tognon, PhD [email protected]
- Eva Viesi [email protected]
Back to top