-
Notifications
You must be signed in to change notification settings - Fork 0
Slurm Bare Metal Arkouda Kubernetes Integration
The external-systems and k8s-enterprise branches include work geared towards integrating a bare-metal or Slurm Arkouda deployments with service discovery frameworks such as Kubernetes, Consul, Istio, Eureka, etc... The external-system branch is for Arkouda instances deployed on bare-metal or Slurm whereas the k8s-enterprise branch is for Arkouda instances deployed on Kubernetes.
Integrating Arkouda deployed on bare-metal or on Slurm with Kubernetes-hosted applications such as jupyterlab consists of creating a Kubernetes Service and a corresponding Endpoints to enable access from within Kubernetes. As described here, this process involves creating a Kubernetes service without a selector, or pointer to a Kubernetes deployment. Instead, the k8s Service is mapped to an Endpoints record as shown below:
apiVersion: v1
kind: Service
metadata:
name: arkouda
spec:
ports:
- protocol: TCP
port: 5555
targetPort: 5555
----------------------------------------
apiVersion: v1
kind: Endpoints
metadata:
name: arkouda
subsets:
- addresses:
- ip: <ip address of Arkouda bare-metal or Slurm master process>
ports:
- port: 5555
The external-systems and k8s-enterprise branches have the ExternalSystem module that contains procedures that register Arkouda with, and deregister Arkouda from, Kubernetes.
The ExternalSystem SystemType enum is used to indicate the external system Arkouda is to be registered with. If the SystemType.KUBERNETES is specified, Arkouda is registered with Kubernetes.
The ExternalSystem ServiceType enum specifies whether Arkouda is registering with Kubernetes as an app deployed within Kubernetes (ServiceType.INTERNAL) or externally via a bare-metal or Slurm deployment (ServiceType.EXTERNAL).
The registerWithKubernetes procedure encapsulates the Curl-based logic that registers Arkouda as a Kubernetes service, thereby enabling service discovery by applications such as Jupyterhub deployed on Kubernetes. There are two possible types of services: ServiceType.INTERNAL and ServiceType.EXTERNAL.
If the ExternalSystem ServiceType is set to ServiceType.INTERNAL, this means that Arkouda is deployed within Kubernetes as a Kubernetes app via the k8s-enterprise arkouda-locale and arkouda-server Helm charts. Accordingly, the Kubernetes service is mapped to a Kubernetes app via the JSON submitted by the registerWithKubernetes procedure:
{"apiVersion": "v1","kind": "Service","metadata": {"name": "arkouda"},"spec": {"ports": [{"port": 5555,"protocol": "TCP","targetPort": 5555}],"selector": {"app":"arkouda-server"}}}
If the ExternalSystem ServiceType is set to ServiceType.EXTERNAL, Arkouda is being deployed externally bare-metal or via Slurm. Accordingly, Kubernetes Service and Endpoints are each deployed with the registerWithKubernetes procedure to point the k8s service to the host and port of the externally-deployed Arkouda instance:
{"apiVersion": "v1","kind": "Service","metadata": {"name": "arkouda"},"spec": {"ports": [{"port": 5555,"protocol": "TCP","targetPort": 5555}]}}
kind": "Endpoints","apiVersion": "v1", "metadata": {"name": "arkouda"}, "subsets": [{"addresses": [{"ip": "192.168.1.24"}],"ports": [{"port": 5555, "protocol": "TCP"}]}]}
The deregisterFromKubernetes procedure deletes the Kubernetes Service (both ServiceType.INTERNAL and ServiceType.EXTERNAL) and Endpoint (only ServiceType.EXTERNAL) upon Arkouda server shutdown.
Important note: the Arkouda server must be shut down via the arkouda client shutdown command because Arkouda shutdown by Kubernetes, Slurm, or bare-metal results in a SIGTERM signal being sent to the Arkouda Chapel process, for which there is no handling logic within Chapel; otherwise, the Arkouda k8s Service and, if applicable Endpoints, will be orphaned.
Since the ExternalSystem delegates registration logic to Curl, building Arkouda with ExternalSystem requires the libcurl4-openssl-dev lib to be installed. For Debian Linux distros, the install command is as follows:
sudo apt-get install libcurl4-openssl-dev
The Chapel Curl logic must use HTTPS to register/deregister with Kubernetes via the Kubernetes Rest API. Accordingly, SSL .crt and .key files signed with the Kubernetes certificate authority (CA) file must be deployed to all bare-metal/Slurm nodes along with the Kubernetes CA file.
An example of generating the required files is as follows:
# Generate base key file
openssl genrsa -out arkouda.key 2048
# User and password generated in this step
openssl req -new -key arkouda.key -out arkouda.csr
# sign with k8s CA
sudo openssl x509 -req -in arkouda.csr -CA /etc/kubernetes/ssl/kube-ca.pem -CAkey /etc/kubernetes/ssl/kube-ca-key.pem -CAcreateserial -out arkouda.crt -days 730
kubectl config set-credentials arkouda --client-certificate=arkouda.crt --client-key=arkouda.key
With the k8s secret composed of the arkouda.key and arkouda.crt in place, create the ClusterRoleBinding needed to authorize the Arkouda user read/write access to the Kubernetes Client API.
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: arkouda-rbac
subjects:
- kind: User
name: arkouda
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole #this must be Role or ClusterRole
name: cluster-admin # must match the name of the Role
apiGroup: rbac.authorization.k8s.io
kubectl apply -f arkouda-rbac.yaml
Registering bare-metal or Slurm Arkouda with Kubernetes involves creating a Service and Endpoints. The required environment variables for the Slurm job or the bare-metal GASNET S Spawner deployment are as follows:
K8S_HOST=https://localhost:8443
NAMESPACE=arkouda-namespace
EXTERNAL_SERVICE_NAME=arkouda-service-name
EXTERNAL_SERVICE_PORT=arkouda-port
EXTERNAL_SERVICE_TARGET_PORT=arkouda-port-by-default
ENDPOINT_NAME=arkouda-service-name
ENDPOINT_PORT=arkouda-port
# Required HTTPS params
export KEY_FILE=path-to-key-file-on-slurm-hosts
export CERT_FILE=path-to-crt-file-on-slurm-hosts
export CACERT_FILE=path-to-kubernetes-ca.pem-file-on-slurm-hosts
# Set if the key has a password
export KEY_PASSWD=password
An example slurm batch file with the required env variable is shown below.
#!/bin/bash
#
#SBATCH --job-name=arkouda-3-node
#SBATCH --output=/tmp/arkouda.out
#SBATCH --mem=4096
#SBATCH --ntasks=3
#SBATCH --nodes=3
export CHPL_COMM_SUBSTRATE=udp
export GASNET_MASTERIP='ace'
export SSH_SERVERS='ace finkel einhorn'
export GASNET_SPAWNFN=S
export NAMESPACE=arkouda
export ENDPOINT_NAME=arkouda-on-slurm # ENDPOINT_NAME must match EXTERNAL_SERVICE_NAME
export ENDPOINT_PORT=5555
export K8S_HOST=https://ace:6443 #result from kubectl cluster-info command
export EXTERNAL_SERVICE_NAME=arkouda-on-slurm # EXTERNAL_SERVICE_NAME must match ENDPOINT_NAME
export EXTERNAL_SERVICE_PORT=5555
export EXTERNAL_SERVICE_TARGET_PORT=5555
export METRICS_SERVICE_NAME=arkouda-on-slurm-metrics
export METRICS_SERVICE_PORT=5556
export METRICS_SERVICE_TARGET_PORT=5556
export KEY_FILE=/opt/arkouda.key #on slurm hosts
export CERT_FILE=/opt/arkouda.crt #on slurm hosts
export CHPL_RT_NUM_THREADS_PER_LOCALE=4
/opt/arkouda-2023.03.01/arkouda_server -nl 3 --ExternalIntegration.systemType=SystemType.KUBERNETES \
--ServerDaemon.daemonTypes=ServerDaemonType.INTEGRATION,ServerDaemonType.METRICS \
--memTrack=true --memMax=4000000000 --logLevel=LogLevel.DEBUG
As detailed above, SystemType.KUBERNETES directs Arkouda to register with Kubernetes, which is done within the arkouda_server main loop, which delegates to the ExternalService registerAsInternalService. Also detailed above, the ServiceType.EXTERNAL directs Arkouda to register as an external service with Kubernetes, which is done within the ExternalService registerAsExternalService
#!/bin/bash
export GASNET_MASTERIP='ace'
export SSH_SERVERS='ace finkel einhorn'
export NAMESPACE=arkouda
export ENDPOINT_NAME=arkouda-external # ENDPOINT_NAME must match EXTERNAL_SERVICE_NAME
export ENDPOINT_PORT=5555
export EXTERNAL_SERVICE_NAME=arkouda-external # EXTERNAL_SERVICE_NAME must match ENDPOINT_NAME
export EXTERNAL_SERVICE_PORT=5555
export EXTERNAL_SERVICE_TARGET_PORT=5555
export METRICS_SERVICE_NAME=arkouda-external-metrics
export METRICS_SERVICE_PORT=5556
export METRICS_SERVICE_TARGET_PORT=5556
export K8S_HOST=https://ace:6443 #result from kubectl cluster-info command
export KEY_FILE=/opt/arkouda.key #on slurm hosts
export CERT_FILE=/opt/arkouda.crt #on slurm hosts
export CHPL_RT_NUM_THREADS_PER_LOCALE=4
/opt/arkouda-2023.03.01/arkouda_server -nl 3 --ExternalIntegration.systemType=SystemType.KUBERNETES \
--ServerDaemon.daemonTypes=ServerDaemonType.INTEGRATION,ServerDaemonType.METRICS \
--memTrack=true --memMax=4000000000 --logLevel=LogLevel.DEBUG
There are distinctive error codes returned by the Curl commands executed in the ExternalSystem module.
- 7--a return code of 7 indicates the K8S_HOST env variable does not point to a valid Kubernetes client API URL
- 22--indicates the service with the name matching EXTERNAL_SERVICE_NAME already exists
- 36--indicates a problem with the ssl crt or key file