Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: make the action taken by skypilot-status-refresh-daemon user aware #4837

Open
aylei opened this issue Feb 27, 2025 · 1 comment
Labels

Comments

@aylei
Copy link
Collaborator

aylei commented Feb 27, 2025

I launched a cluster on my local Kind and the cluster is cleaned after I restarted my docker daemon:

$ sky status
Clusters
NAME                          LAUNCHED    RESOURCES                         STATUS   AUTOSTOP  COMMAND
aylei3                        1 min ago   1x Kubernetes(1CPU--1GB, cpus=1)  UP       -         sky launch -c aylei3 --cpus...
kg po
NAME                   READY   STATUS    RESTARTS   AGE
aylei3-57339c81-head   0/1     Unknown   0          15m

Of course the Pod is down since the kind node (runs as container in docker daemon VM) is restarted. The sky cluster aylei3 also disappeared after a while. It turns out there is a daemon that refresh the cluster status every 60s, the daemon log explains why:

$ sky api logs skypilot-status-refresh-daemon
....
Cluster terminated on the cloud: aylei3
Failed to refresh status for 1 cluster:
  test: 'test' (Kubernetes) is owned by account ['gke_sky-dev-465_us-central1-c_skypilotalpha_gke_sky-dev-465_us-central1-c_skypilotalpha_skypilot'], but available identities are [['kind-skypilot_kind-skypilot_default'], ['kind-kind_kind-kind_default'], ['kind-skypilot_kind-skypilot_default'], ['temp_temp_default']]. Check your kubeconfig file and make sure the correct context is available.
I 02-27 19:07:50 requests.py:311] Status refreshed. Sleeping 60 seconds for the next refresh...
I 02-27 19:07:50 requests.py:311]
I 02-27 19:08:50 requests.py:306] === Refreshing cluster status ===

It is really confusing since my cluster is cleaned silently. It would be helpful if we:

  • notify users that some actions have been taken by the refresh daemon in the next sky CLI invocation
  • or (better), prompt user that the daemon discovers that some actions should be made, and ask user's confirmation to perform these actions
@aylei aylei changed the title Feature request: sky status should show recent refresh events Feature request: make the action taken by skypilot-status-refresh-daemon user aware Feb 27, 2025
@concretevitamin
Copy link
Member

+1. Before apiserver there's a hint printed on next command. Now with the refresh daemon it's hard to notice.

@aylei aylei added interface/ux User experiences api server labels Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants