Skip to content

Latest commit

 

History

History
271 lines (199 loc) · 12.7 KB

README.md

File metadata and controls

271 lines (199 loc) · 12.7 KB

Kubernetes Deployment

This document introduces the architecture, the user cases and the usage of KubeFATE. If you only care about how to use it, then you can jump to Usage.

We recommend using Kubernetes as an underlying infrastructure to create and manage the FATE clusters in a production environment. KubeFATE supports deploying multiple FATE clusters in an instance of Kubernetes with different namespaces for the purposes of development, testing and production. Considering the different IT designs and standards in each company, the actual deployment should be customized. KubeFATE is flexible for the FATE configuration.

High-level architecture of multiple federated learning parties

The high-level architecture of a multi-party federated learning deployment (e.g. two parties) is shown as follows:

  • KubeFATE: Orchestrates a FATE cluster of a party. It offers APIs for FATE-Cloud Manager and other management portals.
  • Harbor (Optional): Versioned FATE deployments and images management.
  • Kubernetes: Container orchestration engine.

KubeFATE is responsible for:

  • Day 1 initialization: Provision a FATE cluster on Kubernetes
  • Day 2 operations: Provides RESTful APIs to manage FATE clusters

High-level architecture of KubeFATE

The high-level architecture of KubeFATE is shown as follows:

The numbers depicted in the diagram:

  1. Accepting external API calls of Authentication & authorization
  2. Rendering templates via Helm
  3. Storing jobs and configuration of a FATE deployment
  4. KubeFATE is running as a service of Kubernetes

There are two parts of KubeFATE:

  • The KubeFATE CLI. KubeFATE CLI is an executable helps to quickly initialize and manage a FATE cluster in an interactive mode. It does not rely on Kubernetes. Eventually, KubeFATE CLI calls KubeFATE Service for operations with a KubeFATE user token.
  • The KubeFATE Service. The KubeFATE service provides RESTful APIs for managing FATE clusters. The KubeFATE service is deployed in Kubernetes, and exposes APIs via Ingress. For the authentication and authorization, the KubeFATE service implements JWT, and neutral to other security solutions which can be added to Kubernetes ingress.

KubeFATE is designed to handle different versions FATE. Normally, KubeFATE CLI and KubeFATE service can work with several FATE releases.

Real-world story

Suppose in an organization, there are two roles:

  • System Admin: who is responsible for the infrastructure management as well as Kubernetes administration
  • ML Infrastructure Operators: who is responsible for managing the machine learning cluster such as FATE

Usage

Prerequisite

Recommended version of dependent software:

Kubernetes: v1.23.5

Ingress-nginx: v1.1.3

Creating role, namespace and other resource in Kubernetes

The example yaml can be found in rbac-config.yaml. In this example, we create a kube-fate namespace for KubeFATE service. Resource constraints can be applied to kube-fate namespace, refer to Kubernetes Namespace, Configure Memory and CPU Quotas for Namespace.

Run the following command to create the namespace:

kubectl apply -f ./rbac-config.yaml

Note that, the default username and password of KubeFATE service can be set in rbac-config.yaml Secret->kubefate-secret->stringData :

stringData:
  kubefateUsername: admin
  kubefatePassword: admin

Preparing domain name and deploying KubeFATE in Kubernetes

Because KubeFATE service exposes RESTful APIs for external access, system admin needs to prepare a domain name for KubeFATE service. In our example, the domain name is example.com . Moreover, system admin should create a namespace (e.g. fate-9999) for FATE deployment.

kubectl apply -f ./kubefate.yaml
kubectl create namespace fate-9999

For more about the configuration of KubeFATE service, please refer to: KubeFATE service Configuration Guild.

Preparing cluster configuration and deploying FATE

After the system admin deployed the KubeFATE service and prepared the namespace for FATE. The ML Infrastructure Operator is able to start the deployment of FATE. The config.yaml for kubefate CLI is required. It contains the username and password of KubeFATE access, and the KubeFATE service URL:

log:
  level: info
user:
  username: admin
  password: admin

serviceurl: example.com
safeconnect: false
Name Type Description
log scalars The log level of command line.
user mappings User name and password when logging into KubeFATE service.
serviceurl scalars KubeFATE service's ingress domain name, defined in kubefate.yaml.
safeconnect scalars Whether use HTTPS to connect the KubeFATE service URL. You can refer to: kubefate_service_tls_enable

Create a cluster.yaml for FATE cluster configuration. The details of configuration can be found here: FATE Cluster Configuration Guide.

NOTE: For Chinese user, specifying a local image registry in cluster.yaml can accelerate the download of images. The details are as follows:

registry: "hub.c.163.com/federatedai"

Next, install the FATE cluster,

$ kubefate cluster install -f ./cluster.yaml
create job success, job id=d92d7a56-7002-46a4-9363-da9c7346e05a

NOTE: If you want to deploy FATE on Spark, you can use cluster-spark.yaml.

Deploying FATE with KubeFATE can support many different engine combinations. For more details on the different types of FATE see: Introduction to FATE Engine Architecture.

If you have resource requirements (CPU and memory) for the components, please make sure to check the example, search for "resources" to know how to define the resource requirements.

We support such definition for:

  1. Eggroll components: cluster manager, node manager and rollsite.
  2. Spark components: master and worker.
  3. Rabbitmq.
  4. Pulsar.

Checking the status of "Installing Cluster" job

After the above command has finished, a job is created for installing a FATE cluster. Run the command kubefate job describe to check the status of the job, until the "Status" turns to Success.

$ kubefate job describe d92d7a56-7002-46a4-9363-da9c7346e05a
UUID      d92d7a56-7002-46a4-9363-da9c7346e05a
StartTime 2022-04-12 07:34:09
EndTime   2022-04-12 07:48:14
Duration  14m
Status    Success
Creator   admin
ClusterId 24bb75ff-f636-4c64-8c04-1b9073f89a2f
States    - update job status to Running
          - create Cluster in DB Success
          - helm install Success
          - checkout Cluster status [794]
          - job run Success

SubJobs   nodemanager-0        ModuleStatus: Available, SubJobStatus: Success, Duration:    13m, StartTime:
          2022-04-12 07:34:09, EndTime: 2022-04-12 07:47:26
          nodemanager-1        ModuleStatus: Available, SubJobStatus: Success, Duration:    13m, StartTime:
          2022-04-12 07:34:09, EndTime: 2022-04-12 07:47:18
          python               ModuleStatus: Available, SubJobStatus: Success, Duration:    14m, StartTime:
          2022-04-12 07:34:09, EndTime: 2022-04-12 07:48:14
          rollsite             ModuleStatus: Available, SubJobStatus: Success, Duration:    13m, StartTime:
          2022-04-12 07:34:09, EndTime: 2022-04-12 07:47:24
          client               ModuleStatus: Available, SubJobStatus: Success, Duration:    11m, StartTime:
          2022-04-12 07:34:09, EndTime: 2022-04-12 07:45:22
          clustermanager       ModuleStatus: Available, SubJobStatus: Success, Duration:    13m, StartTime:
          2022-04-12 07:34:09, EndTime: 2022-04-12 07:47:11
          mysql                ModuleStatus: Available, SubJobStatus: Success, Duration:    13m, StartTime:
          2022-04-12 07:34:09, EndTime: 2022-04-12 07:47:11

Describing the cluster and finding FATE access information

After the installing cluster job succeeded, use kubefate cluster describe to check the FATE access information:

$ kubefate cluster describe 24bb75ff-f636-4c64-8c04-1b9073f89a2f
UUID         24bb75ff-f636-4c64-8c04-1b9073f89a2f
Name         fate-9999                           
NameSpace    fate-9999                           
ChartName    fate                                
ChartVersion v1.10.0                              
Revision     1                                   
Age          15m                                 
Status       Running                             
Spec         algorithm: Basic                    
             chartName: fate                     
             chartVersion: v1.10.0                
             computing: Eggroll                  
             device: CPU                         
             federation: Eggroll                 
             imagePullSecrets:                   
             - name: myregistrykey               
             ingressClassName: nginx             
             istio:                              
               enabled: false                    
             modules:                            
             - rollsite                          
             - clustermanager                    
             - nodemanager                       
             - mysql                             
             - python                            
             - fateboard                         
             - client                            
             name: fate-9999                     
             namespace: fate-9999                
             partyId: 9999                       
             persistence: false                  
             podSecurityPolicy:                  
               enabled: false                    
             pullPolicy: null                    
             registry: ""                        
             storage: Eggroll                    
                                                 
Info         dashboard:                          
             - party9999.notebook.example.com              
             - party9999.fateboard.example.com             
             ip: 192.168.9.1                    
             status:                             
               containers:                       
                 client: Running                 
                 clustermanager: Running         
                 fateboard: Running              
                 fateflow: Running               
                 mysql: Running                  
                 nodemanager: Running            
                 nodemanager-eggrollpair: Running
                 rollsite: Running               
               deployments:                      
                 clustermanager: Available       
                 rollsite: Available

Access the UI of FATEBoard and Notebook

If the components of fateboard and client are installed, you can use the information party9999.fateboard.example.com and party9999.notebook.example.com obtained in the previous step to access FATEBoard and Notebook UI, and configure the resolution of these two domain names It can be opened in the browser.

FATEBoard

http://party9999.fateboard.example.com

Access to FATEBoard UI requires a login user name and password, which can be found in cluster.yaml [Configuration](../docs/configurations/FATE_cluster_configuration.md#fateboard mappings).

fate_board

Notebook

http://party9999.notebook.example.com

notebook

Other user scenarios

KubeFATE service RESTful APIs reference

API Reference