-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup periodic scalability CI tests on AWS #29139
Comments
We need guidance/status quo for 1 and 3 from @kubernetes/sig-testing :) |
Hope I can help more here in the future, bit distracted with the registry redirect.
We usually use a pool of sub-accounts that we rent from boskos as a way to ensure we can cleanup after CI runs. I think SIG K8s Infra can help with the accounts, and this pattern should be pretty well established for kOps and CAPA at least. We'd still need K8s Infra to setup some new projects with higher quota in a distinct pool, but the project pool part should be ready to go, mainly just the quota part I think?
I would strongly suggest starting with kOps:
|
/sig k8s-infra |
I'd be happy to work with you @shyamjvs to get scale testing running with kOps + AWS; we have a bunch of tests that run already. It looks like the ones at k8s head broke literally today, so we'll have to dig into why (but that's probably a good reason why you don't want to start your own effort if your focus is on scalability not fixing random breakages!) We can simply create a scenario that runs with 100 nodes, and see what breaks (e.g. we might have quota already). Which CNI/network configuration would we want to start with? And if you have any ideas on machine sizes, then we can plug those in to create a one-off scenario here. If we don't know, it's also fairly easy to iterate here. Then absolutely we should make sure it runs against the new CNCF AWS account , make sure those have the higher limits etc. (Good news, the breakage looks like a known regression at head that should be fixed by kubernetes/kubernetes@b83600d ) |
Thanks Ben. So to clarify - we need either a new set of account pool with higher quotas or need to increase the quotas for every single account in the existing pool before we can run these tests? Would least resistance path be to provision a single new account (w/ higher quotas) and dedicate that for the scale tests instead?
Thanks Justin for offering to help! We're still figuring out a few details especially around setting custom flags to a bunch of components (typically needed for the scale tests) and control-plane setup (for e.g co-locating etcd with apiserver vs running them seperately - latter being the model EKS uses today fwiw). Will need some help figuring out what's possible with kops vs not. Is sig-testing call good place to discuss this? |
In the past we've created a separate resource pool for highly-quota accounts on GCP and I think we should do that here. This makes it easier to track scale testing related spend and we have needed fewer accounts for this purpose than general e2e testing.
SIG testing could be a reasonable place for this, as passing config will also need flow through the CI tooling. In the meantime re e.g. custom etcd options: |
We discussed the above in last sig-scale meeting (30th Mar) - meeting notes Summary of next steps:
|
@shyamjvs how many dedicated accounts do you need? (from slack thread https://kubernetes.slack.com/archives/CCK68P2Q2/p1684337286675859) |
We just need 1 account to begin with for now. |
Related to: - kubernetes/test-infra#29139 Signed-off-by: Arnaud Meukam <[email protected]>
We have 2 accounts from line item # 1 with limits updated ( details: https://kubernetes.slack.com/archives/CCK68P2Q2/p1684593816977859 Some follow ups pending with @ashishranjan738 |
Looks like there's a Kops CI job definition here we may have started looking at for line item 3: |
Current status for Scalability Tests runs on AWS: Next things:
|
Looks like the 5k-node kops test has been failing for the past 10 days due to some issue during cluster creation - https://testgrid.k8s.io/kops-misc#ec2-master-scale-performance. @hakuna-matatah or @mengqiy - could one of you open an issue against this repo for tracking the fixes? |
@dims: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Follows from discussion here - https://groups.google.com/g/kubernetes-sig-release/c/ShwzKuYoRAc/m/t6LvF7BQAgAJ
Let's use this issue to plan and track the tasks we need to get there. For the initial phase, we need to:
cc @kubernetes/sig-scalability @kubernetes/sig-k8s-infra @kubernetes/sig-testing
cc @dims @wojtek-t @BenTheElder
The text was updated successfully, but these errors were encountered: