An administrator can request CKE to gracefully reboot a set of nodes via ckecli
.
The requests are appended to the reboot queue.
Each request entry corresponds to a list of nodes.
CKE watches the reboot queue and handles the reboot requests. CKE processes reboot requests in the following manner:
- cordons the nodes to mark them as unschedulable.
- checks the existence of Job-managed Pods on the nodes. If such Pods exist on the nodes, uncordons the node immediately and process it again later.
- evicts (and/or deletes) non-DaemonSet-managed pods on the nodes.
- reboot the node by running hardware reboot command for the node.
- waits for boot by running boot check command for the node.
- uncordons the nodes and recovers them.
The behavior of the reboot functionality is configurable through the cluster configuration.
Name | Type | Description |
---|---|---|
index |
string | Index number of entry, formatted as a string. |
node |
string | An addresses of a node to reboot. |
status |
string | One of queued , draining , rebooting , cancelled . |
last_transition_time |
time.Time | The time last transition of status |
drain_backoff_count |
int | The number of drain backoff |
drain_backoff_expire |
time.Time | The time drain backoff expires |
An administrator issues a reboot request using ckecli reboot-queue add
.
The command writes reboot queue entry(s) and increments reboots/write-index
atomically.
The queue is processed by CKE as follows:
- If
reboots/disabled
istrue
, it doesn't process the queue. - Check the reboot queue to find an entry.
- If the number of nodes under processing is less than maximum concurrent reboots and the number of unreachable nodes that are not under this reboot process is not more than
maximum-unreachable-nodes-for-reboot
in the constraints, pick several nodes from front of the queue and start draining them.- Cordon the node.
- If there are Job-managed Pods, backoff the draining. i.e.:
- update the entry status back to
queued
. - mark the entry not to be drained again immediately
- evict non-DaemonSet-managed Pods. If the eviction is failed due to PDBs and the namespace of the Pod is not protected by
.reboot.protected_namespaces
, delete the Pods. If the deletion is also failed, backoff the draining.
- If draining is timed out, backoff the draining.
- If draining is completed, run hardware reboot command specified by
.reboot.reboot_command
for the node and update the entry status torebooting
. - remove entries if:
- the node is confirmed booted by boot check command specified by
.reboot.boot_check_command
or - the entry status is
cancelled
- the node is confirmed booted by boot check command specified by
- If a node is cordoned by reboot operation and its entry status is not
draining
orrebooting
, uncordon it.
- If the number of nodes under processing is less than maximum concurrent reboots and the number of unreachable nodes that are not under this reboot process is not more than
There are several rules for API server nodes.
- API servers are processed one by one.
- Multiple API servers are never processed simultaneously.
- API servers are processed with lower priority.
- If API servers and non-API servers are in reboot queue, non-API servers are processed first.
- API servers are not processed simultaneously with non-API servers.