You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ended up thinking about improving user node's utilization, and ended up with some thoughts I figured was worth writing down.
Let's open dedicated issues for related action points we may come up with to better focus on the theory in this issue.
Too high remainder capacity
The "remainder capacity" is the unscheduled capacity of a node pools current nodes with no scheduled users. A node pool's remainder capacity is likely the biggest cost driver for wasted capacity, and increases with larger node sizes.
Reducing the remainder capacity can be done by:
using smaller node sizes
decreasing segregation of users (to reduce community specific node pools for
example)
making users session durations shorter by more aggressive culling
Badly tuned resource requests/limits
Badly tuned resource requests/limits will drive costs.
Resource requests/limits for memory is most complicated to tune as the consequences of running out of memory on a node leads to termination of the user server exceeding its resource requests by the largest relative amount.
To request more memory than used at any given time.
Then its trivially a too large request.
Not oversubscribing well enough
To not oversubscribe well enough examplified with memory is to request memory too close to the memory limit, and too far above the memory used on average. The extreme case is to have the requests equal the limit.
To avoid running out of memory on a node at any given time, user server's must at least request more memory than their average use, otherwise the node is mathematically guaranteed to run out of memory when fully scheduled based on requests.
Requests should be made somewhere between the user server's average use and maximum use. With more users per node, it becomes safer to make requests closer to the average use.
To cause a significant remainder of unscheduled capacity
Requests should pack well on nodes, leaving little unscheduled capacity. This can fail by requesting for example 51% or 26% of an available resource. Then the node would only fit 1 and 3 users respectively instead of the more appropriate 2 and 4, leaving 49% and 22% of a nodes capacity unscheduled for use.
The text was updated successfully, but these errors were encountered:
I ended up thinking about improving user node's utilization, and ended up with some thoughts I figured was worth writing down.
Let's open dedicated issues for related action points we may come up with to better focus on the theory in this issue.
Too high remainder capacity
The "remainder capacity" is the unscheduled capacity of a node pools current nodes with no scheduled users. A node pool's remainder capacity is likely the biggest cost driver for wasted capacity, and increases with larger node sizes.
Reducing the remainder capacity can be done by:
example)
Badly tuned resource requests/limits
Badly tuned resource requests/limits will drive costs.
Resource requests/limits for memory is most complicated to tune as the consequences of running out of memory on a node leads to termination of the user server exceeding its resource requests by the largest relative amount.
To request more memory than used at any given time.
Then its trivially a too large request.
Not oversubscribing well enough
To not oversubscribe well enough examplified with memory is to request memory too close to the memory limit, and too far above the memory used on average. The extreme case is to have the requests equal the limit.
To avoid running out of memory on a node at any given time, user server's must at least request more memory than their average use, otherwise the node is mathematically guaranteed to run out of memory when fully scheduled based on requests.
Requests should be made somewhere between the user server's average use and maximum use. With more users per node, it becomes safer to make requests closer to the average use.
To cause a significant remainder of unscheduled capacity
Requests should pack well on nodes, leaving little unscheduled capacity. This can fail by requesting for example 51% or 26% of an available resource. Then the node would only fit 1 and 3 users respectively instead of the more appropriate 2 and 4, leaving 49% and 22% of a nodes capacity unscheduled for use.
The text was updated successfully, but these errors were encountered: