-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
153 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
## Integrate Azure CycleCloud with WEKA | ||
|
||
### Pre-requisites | ||
1. Download the Azure CycleCloud / WEKA CycleCloud Template | ||
2. Configure the network parameters to enable DPDK on the Azure CycleCloud Nodes | ||
3. Create and deploy the cluster initialization module on the Azure CycleCloud Nodes | ||
4. Configure the WEKA blade on the CycleCloud / Weka template installed in step 1. | ||
|
||
|
||
### CycleCloud - Weka template | ||
- On your CycleCloud VM, git clone the repository `https://github.com/themorey/cyclecloud-weka` | ||
```bash | ||
git clone https://github.com/themorey/cyclecloud-weka.git | ||
``` | ||
- Import the template entitled “slurm-weka” | ||
```bash | ||
cyclecloud import_template -f /home/weka/cyclecloud-weka/templates/slurm-weka.txt | ||
``` | ||
- Once successful, you will see the template in your CycleCloud GUI | ||
|
||
### Azure CycleCloud VM | ||
- Navigate to the CycleCloud / Weka Template that was downloaded in step before | ||
- Scroll to the section called `[[nodearraybase]]` and add the following arguments | ||
```bash | ||
[[[network-interface eth0]]] | ||
AssociatePublicIpAddress = $ExecuteNodesPublic | ||
SubnetId = $SubnetId | ||
AcceleratedNetworking = true | ||
|
||
[[[network-interface eth1]]] | ||
SubnetId = $SubnetId | ||
``` | ||
- copy `weka_client_install.sh` script to vm. Depending on your configuration, you may create sperate Azure CycleCloud specs for each Node Array and have a cloud-init script for each array. | ||
`~/specs/htc/cluster-init/scripts` | ||
|
||
### CycleCloud GUI | ||
- On your CycleCloud GUI, click Edit | ||
- Click on `Advanced Settings` and scroll to the Cluster Init section near the bottom. | ||
- Click on `Browse` and navigate to the cluster init for the desired node array. The example below shows the same cluster init script being deployed for the HTC and HPC nodes | ||
- Click `Weka Cluster Info` fill in the parameters. The Weka Addresses are from step 1 above. You can specify any mount point you like, and the WEKA filesystem is one you have chosen in step 1 above. Note ensure you have separated each WEKA backend IP with a comma | ||
- save | ||
|
||
### Scheduler VM | ||
- Log into the scheduler VM | ||
- Run a SLURM job. For this example we have chosen to run a batch HTC job with 3 nodes | ||
```bash | ||
sbatch -p htc -N3 --wrap /bin/hostname | ||
``` | ||
HTC Nodes have been activated via CycleCloud | ||
For debuging: | ||
- log into a HTC node and find the cluster-init script file | ||
- Do a tail -f <script name> to see the VM going through the steps to mount to weka |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
#!/bin/bash | ||
set -ex | ||
|
||
# setup CycleCloud variables to find cluster IPs | ||
ccuser=$(jetpack config cyclecloud.config.username) | ||
ccpass=$(jetpack config cyclecloud.config.password) | ||
ccurl=$(jetpack config cyclecloud.config.web_server) | ||
mount_point=$(jetpack config weka.mount_point) | ||
fs=$(jetpack config weka.fs) | ||
|
||
|
||
# Pick a package manager | ||
yum install -y epel-release || true | ||
apt install -y epel-release || true | ||
|
||
if [ -e "/etc/netplan/50-cloud-init.yaml" ]; then | ||
cat <<-EOF | sed -i "/ ethernets:/r /dev/stdin" /etc/netplan/50-cloud-init.yaml | ||
eth1: | ||
dhcp4: true | ||
EOF | ||
netplan apply | ||
fi | ||
|
||
if [ -e "/etc/sysconfig/network-scripts/ifcfg-eth0" ]; then | ||
cp /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth1 | ||
sed -i "s/eth0/eth1/g" /etc/sysconfig/network-scripts/ifcfg-eth1 | ||
systemctl restart NetworkManager | ||
fi | ||
|
||
# Find the mount addresses if deployed by CycleCloud...otherwise use manual entries | ||
if [ "$(jetpack config weka.cycle)" == "True" ]; then | ||
cluster_name=$(jetpack config weka.cluster_name) | ||
# Get the list of Weka cluster IPs from CycleCloud | ||
IPS=$(curl -s -k --user ${ccuser}:${ccpass} "${ccurl}/clusters/${cluster_name}/nodes" \ | ||
| jq -r '.nodes[] | .PrivateIp' | xargs | sed -e 's/ /,/g') | ||
else | ||
IPS=$(jetpack config weka.cluster_address) | ||
fi | ||
|
||
# Pick a random Weka node from the list of IPs | ||
num_commas=$(echo $IPS | tr -cd , | wc -c ) | ||
num_nodes=$(echo "$((num_commas + 1))") | ||
weka_address=$(echo $IPS | cut -d ',' -f $(( ( RANDOM % ${num_nodes} ) + 1 ))) | ||
|
||
|
||
# Create a mount point | ||
mkdir -p ${mount_point} | ||
|
||
# Install the WEKA agent on the client machine: | ||
curl http://${weka_address}:14000/dist/v1/install | sh | ||
|
||
|
||
rm -rf $INSTALLATION_PATH | ||
|
||
echo "$(date -u): before weka agent installation" | ||
|
||
INSTALLATION_PATH="/tmp/weka" | ||
mkdir -p $INSTALLATION_PATH | ||
cd $INSTALLATION_PATH | ||
|
||
gateways="${all_gateways}" | ||
FRONTEND_CONTAINER_CORES_NUM=1 | ||
NICS_NUM=2 | ||
eth0=$(ifconfig | grep eth0 -C2 | grep 'inet ' | awk '{print $2}') | ||
eth1_ip=$(ifconfig | grep eth1 -C2 | grep 'inet ' | awk '{print $2}') | ||
eth1_mask=$(echo -n /;ip -4 addr | awk '/eth1/ { getline; {print $2} }' | cut -f2 -d/) | ||
|
||
#### new additions to establish network interface for weka mount | ||
eth1mac=$(ifconfig eth1|grep ether|awk '{print $2'}) | ||
mntdev=$(ifconfig|grep $eth1mac -B2|grep mtu|grep -v eth1|awk '{print $1}'| cut -d':' -f1) | ||
|
||
|
||
|
||
function retry { | ||
local retry_max=$1 | ||
local retry_sleep=$2 | ||
shift 2 | ||
local count=$retry_max | ||
while [ $count -gt 0 ]; do | ||
"$@" && break | ||
count=$(($count - 1)) | ||
echo "Retrying $* in $retry_sleep seconds..." | ||
sleep $retry_sleep | ||
done | ||
[ $count -eq 0 ] && { | ||
echo "Retry failed [$retry_max]: $*" | ||
echo "$(date -u): retry failed" | ||
return 1 | ||
} | ||
return 0 | ||
} | ||
|
||
mount_command="mount -t wekafs ${weka_address}/${fs} -o num_cores=$FRONTEND_CONTAINER_CORES_NUM -o net=${mntdev}/${eth1_ip}${eth1_mask} -o mgmt_ip=$eth0 $mount_point" | ||
|
||
|
||
retry 60 45 $mount_command | ||
|
||
rm -rf $INSTALLATION_PATH | ||
|
||
echo "$(date -u): wekafs mount complete" | ||
|