Skip to content

Commit

Permalink
feat: add cyclecloud
Browse files Browse the repository at this point in the history
  • Loading branch information
Denise Perez authored and assafgi committed Nov 29, 2024
1 parent c11b3b2 commit 81624b0
Show file tree
Hide file tree
Showing 2 changed files with 153 additions and 0 deletions.
52 changes: 52 additions & 0 deletions azure/cyclecloud/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
## Integrate Azure CycleCloud with WEKA

### Pre-requisites
1. Download the Azure CycleCloud / WEKA CycleCloud Template
2. Configure the network parameters to enable DPDK on the Azure CycleCloud Nodes
3. Create and deploy the cluster initialization module on the Azure CycleCloud Nodes
4. Configure the WEKA blade on the CycleCloud / Weka template installed in step 1.


### CycleCloud - Weka template
- On your CycleCloud VM, git clone the repository `https://github.com/themorey/cyclecloud-weka`
```bash
git clone https://github.com/themorey/cyclecloud-weka.git
```
- Import the template entitled “slurm-weka”
```bash
cyclecloud import_template -f /home/weka/cyclecloud-weka/templates/slurm-weka.txt
```
- Once successful, you will see the template in your CycleCloud GUI

### Azure CycleCloud VM
- Navigate to the CycleCloud / Weka Template that was downloaded in step before
- Scroll to the section called `[[nodearraybase]]` and add the following arguments
```bash
[[[network-interface eth0]]]
AssociatePublicIpAddress = $ExecuteNodesPublic
SubnetId = $SubnetId
AcceleratedNetworking = true

[[[network-interface eth1]]]
SubnetId = $SubnetId
```
- copy `weka_client_install.sh` script to vm. Depending on your configuration, you may create sperate Azure CycleCloud specs for each Node Array and have a cloud-init script for each array.
`~/specs/htc/cluster-init/scripts`

### CycleCloud GUI
- On your CycleCloud GUI, click Edit
- Click on `Advanced Settings` and scroll to the Cluster Init section near the bottom.
- Click on `Browse` and navigate to the cluster init for the desired node array. The example below shows the same cluster init script being deployed for the HTC and HPC nodes
- Click `Weka Cluster Info` fill in the parameters. The Weka Addresses are from step 1 above. You can specify any mount point you like, and the WEKA filesystem is one you have chosen in step 1 above. Note ensure you have separated each WEKA backend IP with a comma
- save

### Scheduler VM
- Log into the scheduler VM
- Run a SLURM job. For this example we have chosen to run a batch HTC job with 3 nodes
```bash
sbatch -p htc -N3 --wrap /bin/hostname
```
HTC Nodes have been activated via CycleCloud
For debuging:
- log into a HTC node and find the cluster-init script file
- Do a tail -f <script name> to see the VM going through the steps to mount to weka
101 changes: 101 additions & 0 deletions azure/cyclecloud/weka_client_install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#!/bin/bash
set -ex

# setup CycleCloud variables to find cluster IPs
ccuser=$(jetpack config cyclecloud.config.username)
ccpass=$(jetpack config cyclecloud.config.password)
ccurl=$(jetpack config cyclecloud.config.web_server)
mount_point=$(jetpack config weka.mount_point)
fs=$(jetpack config weka.fs)


# Pick a package manager
yum install -y epel-release || true
apt install -y epel-release || true

if [ -e "/etc/netplan/50-cloud-init.yaml" ]; then
cat <<-EOF | sed -i "/ ethernets:/r /dev/stdin" /etc/netplan/50-cloud-init.yaml
eth1:
dhcp4: true
EOF
netplan apply
fi

if [ -e "/etc/sysconfig/network-scripts/ifcfg-eth0" ]; then
cp /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth1
sed -i "s/eth0/eth1/g" /etc/sysconfig/network-scripts/ifcfg-eth1
systemctl restart NetworkManager
fi

# Find the mount addresses if deployed by CycleCloud...otherwise use manual entries
if [ "$(jetpack config weka.cycle)" == "True" ]; then
cluster_name=$(jetpack config weka.cluster_name)
# Get the list of Weka cluster IPs from CycleCloud
IPS=$(curl -s -k --user ${ccuser}:${ccpass} "${ccurl}/clusters/${cluster_name}/nodes" \
| jq -r '.nodes[] | .PrivateIp' | xargs | sed -e 's/ /,/g')
else
IPS=$(jetpack config weka.cluster_address)
fi

# Pick a random Weka node from the list of IPs
num_commas=$(echo $IPS | tr -cd , | wc -c )
num_nodes=$(echo "$((num_commas + 1))")
weka_address=$(echo $IPS | cut -d ',' -f $(( ( RANDOM % ${num_nodes} ) + 1 )))


# Create a mount point
mkdir -p ${mount_point}

# Install the WEKA agent on the client machine:
curl http://${weka_address}:14000/dist/v1/install | sh


rm -rf $INSTALLATION_PATH

echo "$(date -u): before weka agent installation"

INSTALLATION_PATH="/tmp/weka"
mkdir -p $INSTALLATION_PATH
cd $INSTALLATION_PATH

gateways="${all_gateways}"
FRONTEND_CONTAINER_CORES_NUM=1
NICS_NUM=2
eth0=$(ifconfig | grep eth0 -C2 | grep 'inet ' | awk '{print $2}')
eth1_ip=$(ifconfig | grep eth1 -C2 | grep 'inet ' | awk '{print $2}')
eth1_mask=$(echo -n /;ip -4 addr | awk '/eth1/ { getline; {print $2} }' | cut -f2 -d/)

#### new additions to establish network interface for weka mount
eth1mac=$(ifconfig eth1|grep ether|awk '{print $2'})
mntdev=$(ifconfig|grep $eth1mac -B2|grep mtu|grep -v eth1|awk '{print $1}'| cut -d':' -f1)



function retry {
local retry_max=$1
local retry_sleep=$2
shift 2
local count=$retry_max
while [ $count -gt 0 ]; do
"$@" && break
count=$(($count - 1))
echo "Retrying $* in $retry_sleep seconds..."
sleep $retry_sleep
done
[ $count -eq 0 ] && {
echo "Retry failed [$retry_max]: $*"
echo "$(date -u): retry failed"
return 1
}
return 0
}

mount_command="mount -t wekafs ${weka_address}/${fs} -o num_cores=$FRONTEND_CONTAINER_CORES_NUM -o net=${mntdev}/${eth1_ip}${eth1_mask} -o mgmt_ip=$eth0 $mount_point"


retry 60 45 $mount_command

rm -rf $INSTALLATION_PATH

echo "$(date -u): wekafs mount complete"

0 comments on commit 81624b0

Please sign in to comment.