Elastic Cloud Autoscaler based on CPU util or cron schedules inspired by es-operator.
DryRun: true
at first.
- Elasticsearch >= 8.x
The autoscaler supports following ways of auto-scaling.
- CPU utilization based auto-scaling.
- Autoscaler tries to scale-out/scale-in when average CPU util is higher/lower than the desired CPU utilization throughout the threshold duration
- Cron schedule based auto-scaling.
- You can override min/max node num for configured duration with the cron format schedule.
Config properties | Type | Required | Description |
---|---|---|---|
index | string | true | Index to update replicas when scaling out/in |
shardsPerNode | int | true | Desired shard count per 1 node. Autoscaler won't scale-in / scale-out to the node count that can't meet this ratio. |
defaultMinMemoryGBPerZone | int | true | Default memory min size per zone. Available number is only 64,...(64xN node) |
defaultMaxMemoryGBPerZone | int | true | Default memory max size per zone. Available number is only 64,...(64xN node) |
autoScaling | object | ||
autoScaling.desiredCPUUtilPercent | int | true (in autoScaling) | Desired CPU utilization percent. Autoscaler will change nodes to make CPU utilization closer to the desired CPU utilization. |
autoScaling.scaleOutThresholdDuration | time.Duration | Threshold duration for scale-out. When CPU util is higher than desiredCPUUtilPercent throughout the threshold duration scale-out may happen. | |
autoScaling.scaleOutCoolDownDuration | time.Duration | Cool down period for scale-out after the last scaling operation. | |
autoScaling.scaleInThresholdDuration | time.Duration | Threshold duration for scale-in. When CPU util is lower than desiredCPUUtilPercent throughout the threshold duration scale-in may happen. | |
autoScaling.scaleInCoolDownDuration | time.Duration | Cool down period for scale-in after the last scaling operation | |
[]scheduledScalings | array of object | ||
scheduledScalings[i].startCronSchedule | string | true (in scheduledScaling) | Cron format schedule to start the specified min/max size. Default timezone is machine local timezone. If you want to specify, set TZ= prefix (e.g. TZ=UTC 0 0 0 0 0 ). |
scheduledScalings[i].duration | time.Duration | true (in scheduledScaling) | Duration to apply above min/max size from startCronSchedule |
scheduledScalings[i].minMemoryGBPerZone | int | true (in scheduledScaling) | Min memory size during the specified period. |
scheduledScalings[i].maxMemoryGBPerZone | int | true (in scheduledScaling) | Max memory size during the specified period. |
index: test
shardsPerNode: 1
defaultMinMemoryGBPerZone: 64
defaultMaxMemoryGBPerZone: 256
autoScaling:
desiredCPUUtilPercent: 50
scaleOutThresholdDuration: 5m
scaleInThresholdDuration: 10m
scheduledScalings:
- startCronSchedule: TZ=UTC 0 0 * * *
duration: 1h
minMemoryGBPerZone: 128
maxMemoryGBPerZone: 256
Autoscaler tries to scale-out/scale-in within min/max range keeping configured shardsPerNode
.
If CPU based auto-scaling is configured, Autoscaler tries to increase/decrease the number of nodes and replicas so that they are closer to the target utilization when the CPU utilization stays above/below the target CPU utilization for a certain period of time.
If it can't meet the shardsPerNode
, Autoscaler won't apply scaling operation.
cluster:
memoryGBPerZone: 384 (64g * 6)
zoneCount: 2
averageCPUUtil: 60 (keeping 60 for 5 minutes)
index:
numberOfShards: 2
numberOfReplicas: 5
index: test
shardsPerNode: 1
defaultMinMemoryGBPerZone: 284 (64g * 6)
defaultMaxMemoryGBPerZone: 768 (64g * 12)
autoScaling:
desiredCPUUtilPercent: 45
scaleOutThresholdDuration: 5m
scaleInThresholdDuration: 10m
Scaling-out to 8 nodes per zone to reduce CPU utilization. (60% * 12 nodes / 16 nodes => 45%
).
cluster:
memoryGBPerZone: 256 (64g * 6) => 512 (64g * 8)
zoneCount: 2
averageCPUUtil: 80
index:
numberOfShards: 2
numberOfReplicas: 5 => 6
cluster:
memoryGBPerZone: 192 (64g * 3)
zoneCount: 2
index:
numberOfShards: 3
numberOfReplicas: 1
index: test
shardsPerNode: 1
defaultMinMemoryGBPerZone: 256 (64g * 4)
defaultMaxMemoryGBPerZone: 256 (64g * 4)
In the above case, Autoscaler won't apply scaling-out to 4 nodes x 2 zones even though defaultMinMemoryGBPerZone is 4 nodes.
since either 1 replica (6 shards in total) or 2 replicas (9 shards in total) can't be 8 which is required by shardsPerNode: 1
.
Elastic Cloud Autoscaler can be used as library. Example is in ./examples/main.go.
Also handy docker image is provided. See kyomo/elastic-cloud-autoscaler for more image details.
- Monitoring deployment must be enabled to use this library. https://www.elastic.co/guide/en/cloud/current/ec-enable-logging-and-monitoring.html#ec-enable-logging-and-monitoring-steps
- This library only support
hot_content
topology and greater than or equal to64g
memory size for now. - Scaling-out from 5 nodes or less to 6 nodes or more is not possible since from 6 data nodes dedicated master nodes are required.
auto_expand_replicas
won't be used. Autoscaler will manually expand / drop replicas before or after node scaling.
You can easily test this library with the below repository. https://github.com/k-yomo/elastic-cloud-autoscaler-demo