-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Elasticquota failed to update due to velasticquota verification #2220
Comments
@qinfustu It seems you have modified the quota webhook since the vanilla version does not print the following quantities. Perhaps we should find out the cause first about why the memory.Value() in the example shows an incorrect value, as it is supposed to be calculated by the childMinSum.Memory().Sub(newQuotaInfoMin.Memory()). |
@saintube In order to troubleshoot this problem, new log information has been added. The guess is that there is a problem with the memory.Value() method in scenarios where the numerical value is large and negative. I haven't found the reason after looking at the source code. I need your help. |
@qinfustu How about adding a UT to verify this case? |
@saintube LessThanOrEqualCompletely's UT did not find the problem. func TestLessThanOrEqualCompletely(t *testing.T) {
a := corev1.ResourceList{
corev1.ResourceCPU: *resource.NewQuantity(23838, resource.DecimalSI),
corev1.ResourceMemory: *resource.NewQuantity(87751*1024*1024*1024, resource.BinarySI),
"nvidia.com/gpu": *resource.NewQuantity(724, resource.DecimalSI),
}
b := corev1.ResourceList{
corev1.ResourceCPU: *resource.NewQuantity(500000, resource.DecimalSI),
corev1.ResourceMemory: *resource.NewQuantity(300000*1024*1024*1024, resource.BinarySI),
"nvidia.com/gpu": *resource.NewQuantity(20000, resource.DecimalSI),
}
result := LessThanOrEqualCompletely(a, b)
fmt.Println(result)
assert.Equal(t, true, result)
} But there is a problem with the real scene. apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
labels:
quota.scheduling.koordinator.sh/allow-lent-resource: "false"
quota.scheduling.koordinator.sh/is-parent: "true"
quota.scheduling.koordinator.sh/parent: koordinator-root-quota
name: cluster-root-quota-test
namespace: koordinator-system
spec:
max:
cpu: "500000"
memory: 300000Gi
nvidia.com/gpu: "20000"
min:
cpu: "500000"
memory: 300000Gi
nvidia.com/gpu: "20000"
---
apiVersion: scheduling.sigs.k8s.io/v1alpha1
kind: ElasticQuota
metadata:
labels:
quota.scheduling.koordinator.sh/allow-lent-resource: "false"
quota.scheduling.koordinator.sh/is-parent: "false"
quota.scheduling.koordinator.sh/parent: cluster-root-quota-test
name: cluster-root-quota-child-test
namespace: koordinator-system
spec:
max:
cpu: "23838"
memory: 87751Gi
nvidia.com/gpu: "724"
min:
cpu: "23838"
memory: 87751Gi
nvidia.com/gpu: "724" Modify the memory in cluster-root-quota-test to 200000Gi to reproduce the problem. debug information: |
@qinfustu In your example, the root cause is unclear. It seems the |
@qinfustu For more clues, could you please log the details of the childMinSum, newQuotaInfo.CalculatedInfo.Min and the delta when before entering the LessThanOrEqualCompletely and inside the LessThanOrEqualCompletely? BTW, the Quantity does not guarantee the valueS consistent to the latest value. |
@saintube The same problem will occur after adding .DeepCopy(). You can try to reproduce it based on the quota information. code: klog.V(4).Infof("childMinSum: %+v, newQuotaInfoMin: %+v", printResource(childMinSum), printResource(newQuotaInfo.CalculateInfo.Min))
if !util.LessThanOrEqualCompletely(childMinSum.DeepCopy(), newQuotaInfo.CalculateInfo.Min.DeepCopy()) {
return fmt.Errorf("checkMinQuotaSum all children's MinQuota > current MinQuota, current: %v, childMinSum: %+v, newQuotaInfoMin: %+v", newQuotaInfo.Name, printResource(childMinSum), printResource(newQuotaInfo.CalculateInfo.Min))
} func LessThanOrEqualCompletely(a corev1.ResourceList, b corev1.ResourceList) bool {
result := true
delta := quotav1.Subtract(a.DeepCopy(), b.DeepCopy())
klog.V(4).Infof("delta: %+v", printResource(delta))
for k, value := range delta {
klog.V(4).Infof("k: %s, value: %+v, valueS: %s", k, value.Value(), value.String())
if value.Value() > 0 {
// if value.Value() > 0 && !strings.Contains(value.String(), "-") {
result = false
break
}
}
return result
} debug info: quota_topology_check.go:287] allChildQuotaSum: cpu: 23838, memory: 87751Gi, gpu: 724
quota_topology_check.go:252] childMinSum: cpu: 23838, memory: 87751Gi, gpu: 724, newQuotaInfoMin: cpu: 500k, memory: 200000Gi, gpu: 20k
esource.go:273] delta: cpu: -476162, memory: -112249Gi, gpu: -19276
resource.go:275] k: cpu, value: -476162, valueS: -476162
resource.go:275] k: memory, value: 4579775443, valueS: -112249Gi |
@qinfustu Please try to print the actual value of the quantities instead of calling (Quantity).String(). |
quota_topology_check.go:287] allChildQuotaSum: cpu: 23838/23838, memory: 87751Gi/94221918797824, gpu: 724
quota_topology_check.go:252] childMinSum: cpu: 23838/23838, memory: 87751Gi/94221918797824, gpu: 724, newQuotaInfoMin: cpu: 500k/500000, memory: 200000Gi/214748364800000, gpu: 20k
resource.go:273] delta: cpu: -476162/-476162, memory: -112249Gi/4579775443, gpu: -19276
resource.go:275] k: cpu, value: -476162, valueS: -476162
resource.go:275] k: memory, value: 4579775443, valueS: -112249Gi |
This issue has been automatically marked as stale because it has not had recent activity.
|
What happened:
The update of elasticquota failed, and the error reported was denied the request: checkMinQuotaSum all children's MinQuota > current MinQuota. However, in fact all children's MinQuota < current MinQuota. The following is the debug information
error: elasticquotas.scheduling.sigs.k8s.io "cluster-root-quota" could not be patched: admission webhook "velasticquota.koordinator.sh" denied the request: checkMinQuotaSum all children's MinQuota > current MinQuota, current: cluster-root-quota, childMinSum: cpu: 23838, memory: 87751Gi, gpu: 724, newQuotaInfoMin: cpu: 500k, memory: 100000Gi, gpu: 20k
Locate the problem lies in the LessThanOrEqualCompletely method. There is a problem with the value returned by the Value() method of corev1.ResourceList, which is inconsistent with String().
childMinSum: cpu: 23838, memory: 87751Gi, gpu: 724, newQuotaInfoMin: cpu: 500k, memory: 300000Gi, gpu: 20k
resource.go:274] delta: cpu: -476162, memory: -212249Gi, gpu: -19276
resource.go:276] k: memory, value: 8894628506, valueS: -212249Gi
resource.go:276] k: nvidia.com/gpu, value: -19276, valueS: -19276
resource.go:276] k: cpu, value: -476162, valueS: -476162
The current temporary processing strategy is to change value.Value() > 0 to value.Value() > 0 && !strings.Contains(value.String(), "-"). Hope to find the root cause and give the best solution.
Environment:
kubectl version
): 1.20The text was updated successfully, but these errors were encountered: