Skip to content

Commit

Permalink
feat: support dynamic scaling of stable ReplicaSet as inverse of cana…
Browse files Browse the repository at this point in the history
…ry weight (#1430)

Signed-off-by: Jesse Suen <[email protected]>
  • Loading branch information
jessesuen authored Sep 21, 2021
1 parent bab546d commit c1353e4
Show file tree
Hide file tree
Showing 47 changed files with 2,271 additions and 603 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
dist/
*.iml
# delve debug binaries
__debug_bin
cmd/**/debug
debug.test
coverage.out
Expand Down
58 changes: 53 additions & 5 deletions docs/features/canary.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ If no `duration` is specified for a pause step, the rollout will be paused indef
kubectl argo rollouts promote <rollout>
```

## Controlling Canary Scale
## Dynamic Canary Scale (with Traffic Routing)

By default, the rollout controller will scale the canary to match the current trafficWeight of the
current step. For example, if the current weight is 25%, and there are four replicas, then the
Expand Down Expand Up @@ -109,11 +109,59 @@ spec:
matchTrafficWeight: true
```

If no `duration` is specified for a pause step, the rollout will be paused indefinitely. To unpause, use the [argo kubectl plugin](kubectl-plugin.md) `promote` command.
When using `setCanaryScale` with explicit values for either replicas or weight, one must be careful
if used in conjunction with the `setWeight` step. If done incorrectly, an imbalanced amount of traffic
may be directed to the canary (in proportion to the Rollout's scale). For example, the following set
of steps would cause 90% of traffic to only be served by 10% of pods:

```shell
# promote to the next step
kubectl argo rollouts promote <rollout>
```yaml
spec:
replicas: 10
strategy:
canary:
steps:
- setCanaryScale:
weight: 10
- setWeight: 90
- pause: {}
```

## Dynamic Stable Scale (with Traffic Routing)

!!! important
Available since v1.1

When using traffic routing, by default the stable ReplicaSet is left scaled to 100% during the update.
This has the advantage that if an abort occurs, traffic can be immediately shifted back to the
stable ReplicaSet without delay. However, it has the disadvantage that during the update, there will
eventually exist double the number of replica pods running (similar to in a blue-green deployment),
since the stable ReplicaSet is left scaled up for the full duration of the update.

It is possible to dynamically reduce the scale of the stable ReplicaSet during an update such that
it scales down as the traffic weight increases to canary. This would be desirable in scenarios where
the Rollout has a high replica count and resource cost is a concern, or in bare-metal situations
where it is not possible to create additional node capacity to accommodate double the replicas.

The ability to dynamically scale the stable ReplicaSet can be enabled by setting the
`canary.dynamicStableScale` flag to true:

```yaml
spec:
strategy:
canary:
dynamicStableScale: true
```

NOTE: that if `dynamicStableScale` is set, and the rollout is aborted, the canary ReplicaSet will
dynamically scale down as traffic shifts back to stable. If you wish to leave the canary ReplicaSet
scaled up while aborting, an explicit value for `abortScaleDownDelaySeconds` can be set:

```yaml
spec:
strategy:
canary:
dynamicStableScale: true
abortScaleDownDelaySeconds: 600
```


Expand Down
3 changes: 1 addition & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@ require (
github.com/antonmedv/expr v1.8.9
github.com/argoproj/notifications-engine v0.2.1-0.20210525191332-e8e293898477
github.com/argoproj/pkg v0.9.0
github.com/aws/aws-sdk-go-v2/config v1.0.0
github.com/aws/aws-sdk-go-v2/internal/ini v1.2.1 // indirect
github.com/aws/aws-sdk-go-v2/config v1.8.1
github.com/aws/aws-sdk-go-v2/service/cloudwatch v1.5.0
github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2 v1.6.1
github.com/blang/semver v3.5.1+incompatible
Expand Down
34 changes: 18 additions & 16 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -176,30 +176,32 @@ github.com/aws/aws-sdk-go v1.31.13/go.mod h1:5zCpMtNQVjRREroY7sYe8lOMRSxkhG6MZve
github.com/aws/aws-sdk-go v1.33.16/go.mod h1:5zCpMtNQVjRREroY7sYe8lOMRSxkhG6MZveU8YkpAk0=
github.com/aws/aws-sdk-go v1.35.24/go.mod h1:tlPOdRjfxPBpNIwqDj61rmsnA85v9jc0Ps9+muhnW+k=
github.com/aws/aws-sdk-go-v2 v0.18.0/go.mod h1:JWVYvqSMppoMJC0x5wdwiImzgXTI9FuZwxzkQq9wy+g=
github.com/aws/aws-sdk-go-v2 v1.0.0/go.mod h1:smfAbmpW+tcRVuNUjo3MOArSZmW72t62rkCzc2i0TWM=
github.com/aws/aws-sdk-go-v2 v1.7.0/go.mod h1:tb9wi5s61kTDA5qCkcDbt3KRVV74GGslQkl/DRdX/P4=
github.com/aws/aws-sdk-go-v2 v1.8.1 h1:GcFgQl7MsBygmeeqXyV1ivrTEmsVz/rdFJaTcltG9ag=
github.com/aws/aws-sdk-go-v2 v1.8.1/go.mod h1:xEFuWz+3TYdlPRuo+CqATbeDWIWyaT5uAPwPaWtgse0=
github.com/aws/aws-sdk-go-v2/config v1.0.0 h1:x6vSFAwqAvhYPeSu60f0ZUlGHo3PKKmwDOTL8aMXtv4=
github.com/aws/aws-sdk-go-v2/config v1.0.0/go.mod h1:WysE/OpUgE37tjtmtJd8GXgT8s1euilE5XtUkRNUQ1w=
github.com/aws/aws-sdk-go-v2/credentials v1.0.0 h1:0M7netgZ8gCV4v7z1km+Fbl7j6KQYyZL7SS0/l5Jn/4=
github.com/aws/aws-sdk-go-v2/credentials v1.0.0/go.mod h1:/SvsiqBf509hG4Bddigr3NB12MIpfHhZapyBurJe8aY=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.0.0 h1:lO7fH5n7Q1dKcDBpuTmwJylD1bOQiRig8LI6TD9yVQk=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.0.0/go.mod h1:wpMHDCXvOXZxGCRSidyepa8uJHY4vaBGfY2/+oKU/Bc=
github.com/aws/aws-sdk-go-v2/internal/ini v1.2.1 h1:IkqRRUZTKaS16P2vpX+FNc2jq3JWa3c478gykQp4ow4=
github.com/aws/aws-sdk-go-v2/internal/ini v1.2.1/go.mod h1:Pv3WenDjI0v2Jl7UaMFIIbPOBbhn33RmmAmGgkXDoqY=
github.com/aws/aws-sdk-go-v2 v1.9.0 h1:+S+dSqQCN3MSU5vJRu1HqHrq00cJn6heIMU7X9hcsoo=
github.com/aws/aws-sdk-go-v2 v1.9.0/go.mod h1:cK/D0BBs0b/oWPIcX/Z/obahJK1TT7IPVjy53i/mX/4=
github.com/aws/aws-sdk-go-v2/config v1.8.1 h1:AcAenV2NVwOViG+3ts73uT08L1olN4NBNNz7lUlHSUo=
github.com/aws/aws-sdk-go-v2/config v1.8.1/go.mod h1:AQtpYfVYjuuft4Dgh0jGSkPQJ9MvmK9vXfSub7oSXlI=
github.com/aws/aws-sdk-go-v2/credentials v1.4.1 h1:oDiUP50hKRwC6xAgESAj46lgL2prJRZQWnCBzn+TU/c=
github.com/aws/aws-sdk-go-v2/credentials v1.4.1/go.mod h1:dgGR+Qq7Wjcd4AOAW5Rf5Tnv3+x7ed6kETXyS9WCuAY=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.5.0 h1:OxTAgH8Y4BXHD6PGCJ8DHx2kaZPCQfSTqmDsdRZFezE=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.5.0/go.mod h1:CpNzHK9VEFUCknu50kkB8z58AH2B5DvPP7ea1LHve/Y=
github.com/aws/aws-sdk-go-v2/internal/ini v1.2.2 h1:d95cddM3yTm4qffj3P6EnP+TzX1SSkWaQypXSgT/hpA=
github.com/aws/aws-sdk-go-v2/internal/ini v1.2.2/go.mod h1:BQV0agm+JEhqR+2RT5e1XTFIDcAAV0eW6z2trp+iduw=
github.com/aws/aws-sdk-go-v2/service/cloudwatch v1.5.0 h1:XO1uX7dQKWfD0WzycEfz+bL/7rl0SsQ05VJwLPWGzGM=
github.com/aws/aws-sdk-go-v2/service/cloudwatch v1.5.0/go.mod h1:acH3+MQoiMzozT/ivU+DbRg7Ooo2298RdRaWcOv+4vM=
github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2 v1.6.1 h1:mGc8UvJS4XJv8Tp7Doxlx2p3vfwPx46K9zg+9s9szPE=
github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2 v1.6.1/go.mod h1:lGKz4aJbqGX+pgyXG47ZBAJPjwrlA5+TJsAuJ2+aE2g=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.0.0 h1:IAutMPSrynpvKOpHG6HyWHmh1xmxWAmYOK84NrQVqVQ=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.0.0/go.mod h1:3jExOmpbjgPnz2FJaMOfbSk1heTkZ66aD3yNtVhnjvI=
github.com/aws/aws-sdk-go-v2/service/sts v1.0.0 h1:6XCgxNfE4L/Fnq+InhVNd16DKc6Ue1f3dJl3IwwJRUQ=
github.com/aws/aws-sdk-go-v2/service/sts v1.0.0/go.mod h1:5f+cELGATgill5Pu3/vK3Ebuigstc+qYEHW5MvGWZO4=
github.com/aws/smithy-go v1.0.0/go.mod h1:EzMw8dbp/YJL4A5/sbhGddag+NPT7q084agLbB9LgIw=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.3.0 h1:VNJ5NLBteVXEwE2F1zEXVmyIH58mZ6kIQGJoC7C+vkg=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.3.0/go.mod h1:R1KK+vY8AfalhG1AOu5e35pOD2SdoPKQCFLTvnxiohk=
github.com/aws/aws-sdk-go-v2/service/sso v1.4.0 h1:sHXMIKYS6YiLPzmKSvDpPmOpJDHxmAUgbiF49YNVztg=
github.com/aws/aws-sdk-go-v2/service/sso v1.4.0/go.mod h1:+1fpWnL96DL23aXPpMGbsmKe8jLTEfbjuQoA4WS1VaA=
github.com/aws/aws-sdk-go-v2/service/sts v1.7.0 h1:1at4e5P+lvHNl2nUktdM2/v+rpICg/QSEr9TO/uW9vU=
github.com/aws/aws-sdk-go-v2/service/sts v1.7.0/go.mod h1:0qcSMCyASQPN2sk/1KQLQ2Fh6yq8wm0HSDAimPhzCoM=
github.com/aws/smithy-go v1.5.0/go.mod h1:SObp3lf9smib00L/v3U2eAKG8FyQ7iLrJnQiAmR5n+E=
github.com/aws/smithy-go v1.7.0 h1:+cLHMRrDZvQ4wk+KuQ9yH6eEg6KZEJ9RI2IkDqnygCg=
github.com/aws/smithy-go v1.7.0/go.mod h1:SObp3lf9smib00L/v3U2eAKG8FyQ7iLrJnQiAmR5n+E=
github.com/aws/smithy-go v1.8.0 h1:AEwwwXQZtUwP5Mz506FeXXrKBe0jA8gVM+1gEcSRooc=
github.com/aws/smithy-go v1.8.0/go.mod h1:SObp3lf9smib00L/v3U2eAKG8FyQ7iLrJnQiAmR5n+E=
github.com/aybabtme/rgbterm v0.0.0-20170906152045-cc83f3b3ce59/go.mod h1:q/89r3U2H7sSsE2t6Kca0lfwTK8JdoNGS/yzM/4iH5I=
github.com/beevik/ntp v0.2.0/go.mod h1:hIHWr+l3+/clUnF44zdK+CWW7fO8dR5cIylAQ76NRpg=
github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973/go.mod h1:Dwedo/Wpr24TaqPxmxbtue+5NUziq4I4S80YR8gNf3Q=
Expand Down
48 changes: 48 additions & 0 deletions manifests/crds/rollout-crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,8 @@ spec:
type: object
canaryService:
type: string
dynamicStableScale:
type: boolean
maxSurge:
anyOf:
- type: integer
Expand Down Expand Up @@ -2799,6 +2801,52 @@ spec:
- name
- status
type: object
weights:
properties:
additional:
items:
properties:
podTemplateHash:
type: string
serviceName:
type: string
weight:
format: int32
type: integer
required:
- weight
type: object
type: array
canary:
properties:
podTemplateHash:
type: string
serviceName:
type: string
weight:
format: int32
type: integer
required:
- weight
type: object
stable:
properties:
podTemplateHash:
type: string
serviceName:
type: string
weight:
format: int32
type: integer
required:
- weight
type: object
verified:
type: boolean
required:
- canary
- stable
type: object
type: object
collisionCount:
format: int32
Expand Down
48 changes: 48 additions & 0 deletions manifests/install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10183,6 +10183,8 @@ spec:
type: object
canaryService:
type: string
dynamicStableScale:
type: boolean
maxSurge:
anyOf:
- type: integer
Expand Down Expand Up @@ -12677,6 +12679,52 @@ spec:
- name
- status
type: object
weights:
properties:
additional:
items:
properties:
podTemplateHash:
type: string
serviceName:
type: string
weight:
format: int32
type: integer
required:
- weight
type: object
type: array
canary:
properties:
podTemplateHash:
type: string
serviceName:
type: string
weight:
format: int32
type: integer
required:
- weight
type: object
stable:
properties:
podTemplateHash:
type: string
serviceName:
type: string
weight:
format: int32
type: integer
required:
- weight
type: object
verified:
type: boolean
required:
- canary
- stable
type: object
type: object
collisionCount:
format: int32
Expand Down
48 changes: 48 additions & 0 deletions manifests/namespace-install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10183,6 +10183,8 @@ spec:
type: object
canaryService:
type: string
dynamicStableScale:
type: boolean
maxSurge:
anyOf:
- type: integer
Expand Down Expand Up @@ -12677,6 +12679,52 @@ spec:
- name
- status
type: object
weights:
properties:
additional:
items:
properties:
podTemplateHash:
type: string
serviceName:
type: string
weight:
format: int32
type: integer
required:
- weight
type: object
type: array
canary:
properties:
podTemplateHash:
type: string
serviceName:
type: string
weight:
format: int32
type: integer
required:
- weight
type: object
stable:
properties:
podTemplateHash:
type: string
serviceName:
type: string
weight:
format: int32
type: integer
required:
- weight
type: object
verified:
type: boolean
required:
- canary
- stable
type: object
type: object
collisionCount:
format: int32
Expand Down
51 changes: 51 additions & 0 deletions pkg/apiclient/rollout/rollout.swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -701,6 +701,10 @@
"currentExperiment": {
"type": "string",
"title": "CurrentExperiment indicates the running experiment"
},
"weights": {
"$ref": "#/definitions/github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.TrafficWeights",
"title": "Weights records the weights which have been set on traffic provider. Only valid when using traffic routing"
}
},
"title": "CanaryStatus status fields that only pertain to the canary rollout"
Expand Down Expand Up @@ -792,6 +796,10 @@
"type": "integer",
"format": "int32",
"title": "AbortScaleDownDelaySeconds adds a delay in second before scaling down the canary pods when update\nis aborted for canary strategy with traffic routing (not applicable for basic canary).\n0 means canary pods are not scaled down.\nDefault is 30 seconds.\n+optional"
},
"dynamicStableScale": {
"type": "boolean",
"description": "DynamicStableScale is a traffic routing feature which dynamically scales the stable\nReplicaSet to minimize total pods which are running during an update. This is calculated by\nscaling down the stable as traffic is increased to canary. When disabled (the default behavior)\nthe stable ReplicaSet remains fully scaled to support instantaneous aborts."
}
},
"title": "CanaryStrategy defines parameters for a Replica Based Canary"
Expand Down Expand Up @@ -1419,6 +1427,49 @@
},
"description": "TLSRoute holds the information on the virtual service's TLS/HTTPS routes that are desired to be matched for changing weights."
},
"github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.TrafficWeights": {
"type": "object",
"properties": {
"canary": {
"$ref": "#/definitions/github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.WeightDestination",
"title": "Canary is the current traffic weight split to canary ReplicaSet"
},
"stable": {
"$ref": "#/definitions/github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.WeightDestination",
"title": "Stable is the current traffic weight split to stable ReplicaSet"
},
"additional": {
"type": "array",
"items": {
"$ref": "#/definitions/github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.WeightDestination"
},
"title": "Additional holds the weights split to additional ReplicaSets such as experiment ReplicaSets"
},
"verified": {
"type": "boolean",
"title": "Verified is an optional indicator that the weight has been verified to have taken effect.\nThis is currently only applicable to ALB traffic router"
}
},
"title": "TrafficWeights describes the current status of how traffic has been split"
},
"github.com.argoproj.argo_rollouts.pkg.apis.rollouts.v1alpha1.WeightDestination": {
"type": "object",
"properties": {
"weight": {
"type": "integer",
"format": "int32",
"title": "Weight is an percentage of traffic being sent to this destination"
},
"serviceName": {
"type": "string",
"title": "ServiceName is the Kubernetes service name traffic is being sent to"
},
"podTemplateHash": {
"type": "string",
"title": "PodTemplateHash is the pod template hash label for this destination"
}
}
},
"google.protobuf.Any": {
"type": "object",
"properties": {
Expand Down
Loading

0 comments on commit c1353e4

Please sign in to comment.