You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
When the estimatedUsedByResource function estimates the resource usage of a pod:
For non-prod pods, if the pod's request is less than its limit, the estimation uses limit * scalingFactor as the predicted value (scalingFactor is a preset value, and in this example, the default scaling factor for CPU is 85%).
For prod pods, even when the pod's request is equal to its limit, the estimation still uses limit * scalingFactor as the predicted value.
This design affects the filter stage in the load_aware process:
Prod Resources
Node Estimation = Resources of the prod pod to be scheduled + Estimated pod usage for abnormal pods on the node (when metrics cannot be obtained) + Actual metrics that can be retrieved for the node.
For prod resources during scheduling, stability is critical, and the estimated value needs to be on the higher side. Therefore, the limit value of the prod pod should be used for estimation instead of limit * scalingFactor (default is 0.85).
Non-Prod Resources
Node Estimation = Actual metrics of the node - Estimated pod usage for abnormal pods on the node (when metrics cannot be obtained).
For non-prod resources during scheduling, since it involves the estimation of resources on the node, to improve accuracy, the limit value should be used for prod resources, while limit * scalingFactor (default is 0.85) should be used for non-prod resources.
ditingdapeng
changed the title
[BUG]
[BUG] func estimatedUsedByResource calculates the estimated value for prod resources and non-prod resources.
Jan 10, 2025
What happened:
When the estimatedUsedByResource function estimates the resource usage of a pod:
For non-prod pods, if the pod's request is less than its limit, the estimation uses limit * scalingFactor as the predicted value (scalingFactor is a preset value, and in this example, the default scaling factor for CPU is 85%).
For prod pods, even when the pod's request is equal to its limit, the estimation still uses limit * scalingFactor as the predicted value.
code link
This design affects the filter stage in the load_aware process:
Node Estimation = Resources of the prod pod to be scheduled + Estimated pod usage for abnormal pods on the node (when metrics cannot be obtained) + Actual metrics that can be retrieved for the node.
For prod resources during scheduling, stability is critical, and the estimated value needs to be on the higher side. Therefore, the limit value of the prod pod should be used for estimation instead of limit * scalingFactor (default is 0.85).
Node Estimation = Actual metrics of the node - Estimated pod usage for abnormal pods on the node (when metrics cannot be obtained).
For non-prod resources during scheduling, since it involves the estimation of resources on the node, to improve accuracy, the limit value should be used for prod resources, while limit * scalingFactor (default is 0.85) should be used for non-prod resources.
Environment:
Koordinator version: - latest
Kubernetes version (use kubectl version): v1.21.3
The text was updated successfully, but these errors were encountered: