-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow scaling events to be logged during cluster validation #16354
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
trying this out: /test pull-kops-e2e-aws-upgrade-k127-ko127-to-klatest-kolatest-many-addons though this job sounds invalid, since a k8s 1.27 cluster can't be upgraded to k8s latest (1.29) |
I like the idea!
(Do we know that it can't? It's not considered a supported path, but in practice most skip-upgrades do work) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool idea!
@@ -709,6 +710,17 @@ func (c *instanceGroupManagerClientImpl) List(ctx context.Context, project, zone | |||
return ms, nil | |||
} | |||
|
|||
func (c *instanceGroupManagerClientImpl) ListErrors(ctx context.Context, project, zone, name string) ([]*compute.InstanceManagedByIgmError, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: I sort of regret our having this layer in GCE, I'm not sure it adds much!
@@ -1196,13 +1196,29 @@ func awsBuildCloudInstanceGroup(ctx context.Context, c AWSCloud, cluster *kops.C | |||
return nil, fmt.Errorf("failed to fetch instances: %v", err) | |||
} | |||
|
|||
scalingReq := &autoscaling.DescribeScalingActivitiesInput{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know whether this gets expensive, but I wonder if we should plumb through a flag here as to whether we want this information (or add a method or func to CloudInstanceGroup that could query it on-demand)
for _, activity := range p.Activities { | ||
event := cloudinstances.ScalingEvent{ | ||
Timestamp: aws.TimeValue(activity.StartTime), | ||
Description: aws.StringValue(activity.Description), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's some extra fields; I'm not sure if Description has all/most of the information, but you might consider including the Raw any
field (which will then get printed!)
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
@rifelpet: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The e2e upgrade jobs that have been migrated to the new prow cluster are failing to validate mid rolling-update
I0209 13:36:37.740937 14123 instancegroups.go:560] Cluster did not pass validation within deadline: InstanceGroup "nodes-us-west-2a" did not have enough nodes 1 vs 4.
We can get scaling activities on the ASG which should mention if the AWS autoscaling service is failing to launch nodes for some reason (resource quota, capacity, etc.)
This allows those events to be logged at
--v=4
level, and sets that level on the upgrade scripts.I included autoscaling activity for both AWS and GCE. other providers can add their implementations separately.