-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[AKS VMs Pool] Add VMs implementation to 1.30 #7960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: cluster-autoscaler-release-1.30
Are you sure you want to change the base?
[AKS VMs Pool] Add VMs implementation to 1.30 #7960
Conversation
Hi @wenxuan0923. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@wenxuan0923 we don't want to commit the vendor/ directory, could you remove that from the changeset? thank you! |
This reverts commit 222c8e8.
Yes, removed the vendor directory. Emmm but now it failed the test with package not found... |
@jackfrancis this PR is to 1.30 specifically, which still have the vendor folder (right?) |
@wenxuan0923 just to confirm: once more testing is done, this will be cherry-picked to all other versions, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the direction of making VMs pool's implementation similar to VMSS, I don't think it is that necessary if the differences it makes is for the better. I think many parts of VMSS's implementation worth refactoring, and one way to do so is for VMs pool to set up a possible pattern for that.
Happy to discuss more, and thanks for driving all this 🙂
|
||
isVMPool, agentPoolName, sku := m.parseSKUAndVMsAgentpoolNameFromSpecName(s.Name) | ||
if isVMPool { | ||
return NewVMPool(s, m, agentPoolName, sku) | ||
} | ||
|
||
switch m.config.VMType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To prevent confusion, I think there could be a comment explaining the relationship between the above and this switch
statement. Likely highlighting the fact that we are supporting multiple kinds of pools in the same cluster for VMs pool case and onwards.
SkuName: *vmss.Sku.Name, | ||
Tags: vmss.Tags, | ||
Location: *vmss.Location, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a good chance to introduce the pattern of nil checks and return an error (next to the NodeTemplate
) for this and other functions. Looking at current uses, we could see what fields are unacceptable to be missing. Thoughts?
SkuName string | ||
InstanceOS string | ||
Location string | ||
Zones *[]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only Zones
is a pointer?
} | ||
} | ||
|
||
func buildNodeTemplateFromVMPool(vmsPool armcontainerservice.AgentPool, location string, skuName string) NodeTemplate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are "VMPool"
, "VMsPool"
, and "VMs"
being referred to this concept. Trying to to make those consistent would be easier to understand/navigate. Thoughts?
template compute.VirtualMachineScaleSet, manager *AzureManager, enableDynamicInstanceList bool) (*apiv1.Node, error) { | ||
// VMsNodeTemplate holds properties for node from VMPool | ||
type VMsNodeTemplate struct { | ||
Taints []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In scale from zero, CAS with VMSS currently use inputLabels
, which is the list expected labels of that pool, to predict what labels it will have through the template as we can see in this file. The assumption that each node will have the same set of labels is made.
Without that for VMs pool, how would it handle scale from zero? And I think that assumption is not usable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also, doesn't look like Taints
is being used yet)
lastSizeRefresh time.Time | ||
sizeRefreshPeriod time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asides from consistency with VMSS, what do we think about the necessity of caching here in the first place?
There was an idea that it might have always been overkill, and not worth the complexity it adds. Maybe we could try not having it in VMs pool. Can discuss more. Thoughts?
func (agentPool *VMsPool) IncreaseSize(delta int) error { | ||
// TODO(wenxuan): Implement this method | ||
return cloudprovider.ErrNotImplemented | ||
func (vmPool *VMPool) getAgentpoolFromAzure() (armcontainerservice.AgentPool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this down to where getAgentPoolFromCache()
is?
func (vmPool *VMPool) getVMPoolSize() (int64, error) { | ||
size, err := vmPool.getCurSize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this could have been one method. Thoughts?
} | ||
|
||
// buildRequestBodyForScaleUp builds the request body for scale up for self-hosted CAS | ||
func (vmPool *VMPool) buildRequestBodyForScaleUp(versionedAP armcontainerservice.AgentPool, count int64) (armcontainerservice.AgentPool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks more like a utility function than a method IMO. Maybe pass vmPool.sku
as a parameter as well?
// this is a placeholder for now, no real implementation | ||
type VMsPool struct { | ||
// VMPool represents a pool of standalone VMs with a single SKU. | ||
// It is part of a mixed-SKU agent pool (agent pool with type `VirtualMachines`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making sure I understand it correctly: a mixed SKU pool maps to multiple "VMPool"s here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's correct.
yes that's correct. |
yes, can see from my update skewer module PR: https://github.com/kubernetes/autoscaler/pull/7288/files |
63e478e
to
5b36982
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: wenxuan0923 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Support VMs pool autoscaling in CAS 1.30 release
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: