Skip to content

Commit

Permalink
changing the folder name to be plural and added a new node monitor to…
Browse files Browse the repository at this point in the history
… test if there is a node stuck in the "notready" state.
  • Loading branch information
EliseCastle23 committed Oct 25, 2023
1 parent 8cc37a3 commit e0dc474
Show file tree
Hide file tree
Showing 4 changed files with 43 additions and 0 deletions.
File renamed without changes.
File renamed without changes.
File renamed without changes.
43 changes: 43 additions & 0 deletions kube/services/node-monitors/node-not-ready.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
apiVersion: batch/v1
kind: CronJob
metadata:
name: node-not-ready-cron
namespace: default
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
metadata:
labels:
app: gen3job
spec:
serviceAccountName: node-monitor
containers:
- name: kubectl
image: quay.io/cdis/awshelper
env:
- name: SLACK_WEBHOOK_URL
valueFrom:
configMapKeyRef:
name: global
key: slack_webhook

command: ["/bin/bash"]
args:
- "-c"
- |
#!/bin/sh
# Get nodes that show "NodeStatusNeverUpdated"
NODES=$(kubectl get nodes -o json | jq -r '.items[] | select(.status.conditions[] | select(.type == "Ready" and .status == "Unknown")) | .metadata.name')
if [ -n "$NODES" ]; then
echo "Nodes reporting 'NodeStatusNeverUpdated', sending an alert:"
echo "$NODES"
# Send alert to Slack
curl -X POST -H 'Content-type: application/json' --data "{\"text\":\"WARNING: Node \`${NODES}\` is stuck in "NotReady"!\"}" $SLACK_WEBHOOK_URL
else
echo "No nodes reporting 'NodeStatusNeverUpdated'"
fi
restartPolicy: OnFailure

0 comments on commit e0dc474

Please sign in to comment.