Skip to content

Slurm Bare Metal Arkouda Kubernetes Integration

hokiegeek2 edited this page Apr 17, 2023 · 7 revisions

Background

The external-systems and k8s-enterprise branches include work geared towards integrating a bare-metal or Slurm Arkouda deployments with service discovery frameworks such as Kubernetes, Consul, Istio, Eureka, etc... The external-system branch is for Arkouda instances deployed on bare-metal or Slurm whereas the k8s-enterprise branch is for Arkouda instances deployed on Kubernetes.

Kubernetes Integration

Kubernetes Service for Access to External Services

Integrating Arkouda deployed on bare-metal or on Slurm with Kubernetes-hosted applications such as jupyterlab consists of creating a Kubernetes Service and a corresponding Endpoints to enable access from within Kubernetes. As described here, this process involves creating a Kubernetes service without a selector, or pointer to a Kubernetes deployment. Instead, the k8s Service is mapped to an Endpoints record as shown below:

apiVersion: v1
kind: Service
metadata:
  name: arkouda
spec:
  ports:
    - protocol: TCP
      port: 5555
      targetPort: 5555

----------------------------------------

apiVersion: v1
kind: Endpoints
metadata:
  name: arkouda
subsets:
  - addresses:
      - ip: <ip address of Arkouda bare-metal or Slurm master process>
    ports:
      - port: 5555

Arkouda ExternalSystem Module: Integrating Arkouda with Kubernetes

The external-systems and k8s-enterprise branches have the ExternalSystem module that contains procedures that register Arkouda with, and deregister Arkouda from, Kubernetes.

Configuring Registration Logic: SystemType and ServiceType Enums

The ExternalSystem SystemType enum is used to indicate the external system Arkouda is to be registered with. If the SystemType.KUBERNETES is specified, Arkouda is registered with Kubernetes.

The ExternalSystem ServiceType enum specifies whether Arkouda is registering with Kubernetes as an app deployed within Kubernetes (ServiceType.INTERNAL) or externally via a bare-metal or Slurm deployment (ServiceType.EXTERNAL).

Registering Arkouda with Kubernetes

The registerWithKubernetes procedure encapsulates the Curl-based logic that registers Arkouda as a Kubernetes service, thereby enabling service discovery by applications such as Jupyterhub deployed on Kubernetes. There are two possible types of services: ServiceType.INTERNAL and ServiceType.EXTERNAL.

If the ExternalSystem ServiceType is set to ServiceType.INTERNAL, this means that Arkouda is deployed within Kubernetes as a Kubernetes app via the k8s-enterprise arkouda-locale and arkouda-server Helm charts. Accordingly, the Kubernetes service is mapped to a Kubernetes app via the JSON submitted by the registerWithKubernetes procedure:

{"apiVersion": "v1","kind": "Service","metadata": {"name": "arkouda"},"spec": {"ports": [{"port": 5555,"protocol": "TCP","targetPort": 5555}],"selector": {"app":"arkouda-server"}}}

If the ExternalSystem ServiceType is set to ServiceType.EXTERNAL, Arkouda is being deployed externally bare-metal or via Slurm. Accordingly, Kubernetes Service and Endpoints are each deployed with the registerWithKubernetes procedure to point the k8s service to the host and port of the externally-deployed Arkouda instance:

{"apiVersion": "v1","kind": "Service","metadata": {"name": "arkouda"},"spec": {"ports": [{"port": 5555,"protocol": "TCP","targetPort": 5555}]}}

kind": "Endpoints","apiVersion": "v1", "metadata": {"name": "arkouda"}, "subsets": [{"addresses": [{"ip": "192.168.1.24"}],"ports": [{"port": 5555, "protocol": "TCP"}]}]}

Deregistering Arkouda from Kubernetes

The deregisterFromKubernetes procedure deletes the Kubernetes Service (both ServiceType.INTERNAL and ServiceType.EXTERNAL) and Endpoint (only ServiceType.EXTERNAL) upon Arkouda server shutdown.

Important note: the Arkouda server must be shut down via the arkouda client shutdown command because Arkouda shutdown by Kubernetes, Slurm, or bare-metal results in a SIGTERM signal being sent to the Arkouda Chapel process, for which there is no handling logic within Chapel; otherwise, the Arkouda k8s Service and, if applicable Endpoints, will be orphaned.

Building Arkouda with ExternalSystem Support

Since the ExternalSystem delegates registration logic to Curl, building Arkouda with ExternalSystem requires the libcurl4-openssl-dev lib to be installed. For Debian Linux distros, the install command is as follows:

sudo apt-get install libcurl4-openssl-dev

Required Files for Registering/Deregistering with Kubernetes

The Chapel Curl logic must use HTTPS to register/deregister with Kubernetes via the Kubernetes Rest API. Accordingly, SSL .crt and .key files signed with the Kubernetes certificate authority (CA) file must be deployed to all bare-metal/Slurm nodes along with the Kubernetes CA file.

An example of generating the required files is as follows:

# Generate base key file
openssl genrsa -out arkouda.key 2048

# User and password generated in this step
openssl req -new -key arkouda.key -out arkouda.csr

# sign with k8s CA
sudo openssl x509 -req -in arkouda.csr -CA /etc/kubernetes/ssl/kube-ca.pem -CAkey /etc/kubernetes/ssl/kube-ca-key.pem -CAcreateserial -out arkouda.crt -days 730

Creating the Kubernetes User with arkouda.crt and arkouda.key Files

kubectl config set-credentials arkouda --client-certificate=arkouda.crt --client-key=arkouda.key

Setting ClusterRoleBinding to Authorize read/write Access to Kubernetes Client API

With the k8s secret composed of the arkouda.key and arkouda.crt in place, create the ClusterRoleBinding needed to authorize the Arkouda user read/write access to the Kubernetes Client API.

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: arkouda-rbac
subjects:
- kind: User
  name: arkouda
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole #this must be Role or ClusterRole
  name: cluster-admin # must match the name of the Role
  apiGroup: rbac.authorization.k8s.io
kubectl apply -f arkouda-rbac.yaml

Env Variables for Registering with Kubernetes

Registering bare-metal or Slurm Arkouda with Kubernetes involves creating a Service and Endpoints. The required environment variables for the Slurm job or the bare-metal GASNET S Spawner deployment are as follows:

K8S_HOST=https://localhost:8443
NAMESPACE=arkouda-namespace
EXTERNAL_SERVICE_NAME=arkouda-service-name
EXTERNAL_SERVICE_PORT=arkouda-port
EXTERNAL_SERVICE_TARGET_PORT=arkouda-port-by-default
ENDPOINT_NAME=arkouda-service-name
ENDPOINT_PORT=arkouda-port

# Required HTTPS params
export KEY_FILE=path-to-key-file-on-slurm-hosts
export CERT_FILE=path-to-crt-file-on-slurm-hosts
export CACERT_FILE=path-to-kubernetes-ca.pem-file-on-slurm-hosts

# Set if the key has a password
export KEY_PASSWD=password

Slurm sbatch File for Arkouda Deployments

An example slurm batch file with the required env variable is shown below.

#!/bin/bash
#
#SBATCH --job-name=arkouda-3-node
#SBATCH --output=/tmp/arkouda.out
#SBATCH --mem=4096
#SBATCH --ntasks=3
#SBATCH --nodes=3
 
export CHPL_COMM_SUBSTRATE=udp
export GASNET_MASTERIP='ace'
export SSH_SERVERS='ace finkel einhorn'
export GASNET_SPAWNFN=S

export NAMESPACE=arkouda
export ENDPOINT_NAME=arkouda-on-slurm # ENDPOINT_NAME must match EXTERNAL_SERVICE_NAME
export ENDPOINT_PORT=5555
export K8S_HOST=https://ace:6443 #result from kubectl cluster-info command
export EXTERNAL_SERVICE_NAME=arkouda-on-slurm # EXTERNAL_SERVICE_NAME must match ENDPOINT_NAME
export EXTERNAL_SERVICE_PORT=5555
export EXTERNAL_SERVICE_TARGET_PORT=5555
export METRICS_SERVICE_NAME=arkouda-on-slurm-metrics
export METRICS_SERVICE_PORT=5556
export METRICS_SERVICE_TARGET_PORT=5556
export KEY_FILE=/opt/arkouda.key #on slurm hosts
export CERT_FILE=/opt/arkouda.crt #on slurm hosts
export CHPL_RT_NUM_THREADS_PER_LOCALE=4

/opt/arkouda-2023.03.01/arkouda_server -nl 3 --ExternalIntegration.systemType=SystemType.KUBERNETES \
                   --ServerDaemon.daemonTypes=ServerDaemonType.INTEGRATION,ServerDaemonType.METRICS \
                   --memTrack=true --memMax=4000000000 --logLevel=LogLevel.DEBUG

As detailed above, SystemType.KUBERNETES directs Arkouda to register with Kubernetes, which is done within the arkouda_server main loop, which delegates to the ExternalService registerAsInternalService. Also detailed above, the ServiceType.EXTERNAL directs Arkouda to register as an external service with Kubernetes, which is done within the ExternalService registerAsExternalService

Bare-Metal Arkouda Deployment Script

#!/bin/bash

export GASNET_MASTERIP='ace'
export SSH_SERVERS='ace finkel einhorn'
export NAMESPACE=arkouda
export ENDPOINT_NAME=arkouda-external # ENDPOINT_NAME must match EXTERNAL_SERVICE_NAME
export ENDPOINT_PORT=5555
export EXTERNAL_SERVICE_NAME=arkouda-external # EXTERNAL_SERVICE_NAME must match ENDPOINT_NAME
export EXTERNAL_SERVICE_PORT=5555
export EXTERNAL_SERVICE_TARGET_PORT=5555
export METRICS_SERVICE_NAME=arkouda-external-metrics
export METRICS_SERVICE_PORT=5556
export METRICS_SERVICE_TARGET_PORT=5556

export K8S_HOST=https://ace:6443 #result from kubectl cluster-info command
export KEY_FILE=/opt/arkouda.key #on slurm hosts
export CERT_FILE=/opt/arkouda.crt #on slurm hosts
export CHPL_RT_NUM_THREADS_PER_LOCALE=4

/opt/arkouda-2023.03.01/arkouda_server -nl 3 --ExternalIntegration.systemType=SystemType.KUBERNETES \
                   --ServerDaemon.daemonTypes=ServerDaemonType.INTEGRATION,ServerDaemonType.METRICS \
                   --memTrack=true --memMax=4000000000 --logLevel=LogLevel.DEBUG

Troubleshooting External System Registration Error Codes

There are distinctive error codes returned by the Curl commands executed in the ExternalSystem module.

  • 7--a return code of 7 indicates the K8S_HOST env variable does not point to a valid Kubernetes client API URL
  • 22--indicates the service with the name matching EXTERNAL_SERVICE_NAME already exists
  • 36--indicates a problem with the ssl crt or key file