You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I launched a cluster on my local Kind and the cluster is cleaned after I restarted my docker daemon:
$ sky status
Clusters
NAME LAUNCHED RESOURCES STATUS AUTOSTOP COMMAND
aylei3 1 min ago 1x Kubernetes(1CPU--1GB, cpus=1) UP - sky launch -c aylei3 --cpus...
kg po
NAME READY STATUS RESTARTS AGE
aylei3-57339c81-head 0/1 Unknown 0 15m
Of course the Pod is down since the kind node (runs as container in docker daemon VM) is restarted. The sky cluster aylei3 also disappeared after a while. It turns out there is a daemon that refresh the cluster status every 60s, the daemon log explains why:
$ sky api logs skypilot-status-refresh-daemon
....
Cluster terminated on the cloud: aylei3
Failed to refresh status for 1 cluster:
test: 'test' (Kubernetes) is owned by account ['gke_sky-dev-465_us-central1-c_skypilotalpha_gke_sky-dev-465_us-central1-c_skypilotalpha_skypilot'], but available identities are [['kind-skypilot_kind-skypilot_default'], ['kind-kind_kind-kind_default'], ['kind-skypilot_kind-skypilot_default'], ['temp_temp_default']]. Check your kubeconfig file and make sure the correct context is available.
I 02-27 19:07:50 requests.py:311] Status refreshed. Sleeping 60 seconds for the next refresh...
I 02-27 19:07:50 requests.py:311]
I 02-27 19:08:50 requests.py:306] === Refreshing cluster status ===
It is really confusing since my cluster is cleaned silently. It would be helpful if we:
notify users that some actions have been taken by the refresh daemon in the next sky CLI invocation
or (better), prompt user that the daemon discovers that some actions should be made, and ask user's confirmation to perform these actions
The text was updated successfully, but these errors were encountered:
aylei
changed the title
Feature request: sky status should show recent refresh events
Feature request: make the action taken by skypilot-status-refresh-daemon user aware
Feb 27, 2025
I launched a cluster on my local Kind and the cluster is cleaned after I restarted my docker daemon:
Of course the Pod is down since the kind node (runs as container in docker daemon VM) is restarted. The sky cluster
aylei3
also disappeared after a while. It turns out there is a daemon that refresh the cluster status every 60s, the daemon log explains why:It is really confusing since my cluster is cleaned silently. It would be helpful if we:
sky
CLI invocationThe text was updated successfully, but these errors were encountered: