Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ping-and-up test for den launched clusters #1684

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BelSasha
Copy link
Contributor

@BelSasha BelSasha commented Jan 15, 2025

fixing the logic of cluster._ping(retry=True) for den_launched clusters.

Copy link
Contributor Author

BelSasha commented Jan 15, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@BelSasha BelSasha changed the title fix ping-and-up test for den launched clusters [WIP] fix ping-and-up test for den launched clusters Jan 15, 2025
@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch from 1597c5d to c75846e Compare January 15, 2025 10:28
@BelSasha BelSasha changed the title [WIP] fix ping-and-up test for den launched clusters Fix ping-and-up test for den launched clusters Jan 15, 2025
@BelSasha BelSasha marked this pull request as ready for review January 15, 2025 10:43
@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch 5 times, most recently from e31d961 to 6524d80 Compare January 15, 2025 11:50
@BelSasha BelSasha requested a review from jlewitt1 January 15, 2025 11:51
@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch 2 times, most recently from c20cd82 to 4013fe4 Compare January 15, 2025 13:08
cluster_den_status: list = read_resp_data(resp)

if not cluster_den_status:
logger.warning("Failed to update cluster info")
Copy link
Contributor Author

@BelSasha BelSasha Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to go with "Failed to update cluster info" since we are updating cluster info (such as compute properties etc), and not just cluster status.
@jlewitt1

@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch 2 times, most recently from 0074691 to 97d564c Compare January 16, 2025 17:36
cluster_dict = self._sky_status(refresh=not dryrun)
self._populate_connection_from_status_dict(cluster_dict)
if self.launcher == LauncherType.DEN:
cluster_uri = rns_client.format_rns_address(self.rns_address or self.name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're using the den launcher it always have an rns address, is there a reason we have the or self.name here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the first time we call self._update_from_status(dryrun=True) is during the cluster construction, before we save the cluster (relevant for both den and local launched clusters), therefore we don't have rns_address yet.
(Code reference)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in that case isn't it enough to rely on the RNS address, and if it's none not hit den?

Copy link
Contributor Author

@BelSasha BelSasha Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good point, agreed, pushed an update

@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch from 97d564c to ee1469a Compare January 21, 2025 10:49
@BelSasha BelSasha requested a review from jlewitt1 January 21, 2025 10:49
@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch 2 times, most recently from 6cbbf68 to 768ea7a Compare January 21, 2025 12:29
@BelSasha BelSasha changed the base branch from main to handling-shared-cluster-creds January 21, 2025 12:29
@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch 2 times, most recently from 1edaa75 to 53ef430 Compare January 21, 2025 12:58
@jlewitt1 jlewitt1 force-pushed the handling-shared-cluster-creds branch from 1f9481c to 6c9cdd9 Compare January 21, 2025 13:20
@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch from 53ef430 to a2f72d9 Compare January 21, 2025 13:30
@jlewitt1 jlewitt1 force-pushed the handling-shared-cluster-creds branch from 6c9cdd9 to 9573009 Compare January 21, 2025 13:37
@jlewitt1 jlewitt1 force-pushed the handling-shared-cluster-creds branch 3 times, most recently from c0be80e to 6f580a5 Compare January 23, 2025 21:30
@BelSasha BelSasha force-pushed the handling-shared-cluster-creds branch from 6f580a5 to 814246b Compare January 26, 2025 09:47
@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch from 14753f6 to b82ab35 Compare January 26, 2025 09:47
@jlewitt1 jlewitt1 force-pushed the handling-shared-cluster-creds branch 6 times, most recently from 2280a44 to 6718847 Compare January 26, 2025 13:51
@BelSasha BelSasha force-pushed the handling-shared-cluster-creds branch from 6718847 to 8789a16 Compare January 26, 2025 14:43
@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch from b82ab35 to dd92066 Compare January 26, 2025 14:43
@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch from dd92066 to c5ef2e1 Compare January 26, 2025 14:51
@BelSasha BelSasha requested a review from jlewitt1 January 26, 2025 14:57
@jlewitt1 jlewitt1 force-pushed the handling-shared-cluster-creds branch 3 times, most recently from 304b5ff to 0b66b12 Compare January 28, 2025 09:36
@BelSasha BelSasha force-pushed the handling-shared-cluster-creds branch from 0b66b12 to 65cf6d8 Compare January 28, 2025 10:46
@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch from c5ef2e1 to 493b0d7 Compare January 28, 2025 10:47
@jlewitt1 jlewitt1 force-pushed the handling-shared-cluster-creds branch from 65cf6d8 to 20d6b9f Compare January 28, 2025 17:58
@jlewitt1 jlewitt1 force-pushed the handling-shared-cluster-creds branch 4 times, most recently from 74507f6 to bac7432 Compare January 29, 2025 09:58
Base automatically changed from handling-shared-cluster-creds to main February 6, 2025 10:50
@BelSasha BelSasha force-pushed the sb/fix-ping-and-up-test branch from 493b0d7 to c3c29d2 Compare February 10, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants