-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate AWS 5K test failures that started happening in last 10 days #31755
Comments
There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:
Please see the group list for a listing of the SIGs, working groups, and committees available. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Investigation about test failures for this issue was done here(so tests can be run to RCA the issue) Summary:Prometheus pods were restarting and failing to come up due to the error mentioned in this comment here
Prom container/pod failing because of above error and this error is caused due to this containerd change and KOPS change here PR. As of this morning that change was reverted on KOPS here and that fixed the problem and I verified it through this one off test which includes the reverted commit. I will close the issue once I see a successful run in our periodics tomorrow which should pull in these changes in next run. |
Sweet! thanks @hakuna-matatah |
closing the issue as expected Periodics started succeeding now - |
What happened:
Investigate AWS 5K test failures that started happening in last 10 days. Need to understand what dependency chain in upstream has broken the tests. I see that from last couple of failures
prom stack
is not coming up.What you expected to happen:
Tests were succeeding continuosly prior to that and it started failing, so we need to make them succeed again.
How to reproduce it (as minimally and precisely as possible):
periodics that run here everyday are already reproducing it - https://testgrid.k8s.io/sig-scalability-aws#ec2-master-scale-performance
Please provide links to example occurrences, if any:
https://testgrid.k8s.io/sig-scalability-aws#ec2-master-scale-performance
Anything else we need to know?:
The text was updated successfully, but these errors were encountered: