Horizontal Fleet Autoscaling #334

victor-prodan · 2018-08-31T08:43:29Z

What

Horizontal Fleet Autoscaling means modifying in real time the number of replicas (game servers) of a fleet, according to demand (i.e. numbers of servers allocated) and user-defined settings (i.e. min/max buffer size).

Cluster autoscaling (adding/removing nodes) is a separate issue (#145) which, given the multi-tenancy nature of Kubernetes (a cluster may host multiple fleets), must be addressed at a central, higher layer.

Why

Currently the fleet specification includes a static number of replicas, which include game servers in all possible states - pending, ready, allocated, terminating etc. This means on one hand that you have a limited capacity that can be used and on the other hand that you pay for instances that are not used. Horizontal Fleet Autoscaling addresses these two issues by scaling up and down the number of replicas according to number of already allocated replicas or, in other words, scaling the fleet according to demand.

For example, instead of saying that the fleet must have 50 replicas, you say that the fleet may have between 5 and 100 total replicas, and keep a buffer of 5 ready game instances. The fleet will start with 5 game instances (the minimum number) and when one is allocated, the autoscaler will automatically add a new replica to the fleet to refill the "ready" buffer.

This HFA system be useful in itself, even without cluster autoscaling, in the following scenarios:

you are using a cluster to host multiple server versions for development or QA purposes (i.e. daily build produced automatically by a build farm); you don't know in advance how many developers will use each version, and their "popularity" will vary in time as well; if you set a low replica number you risk to block the developers if many of them will want to use the same version (i.e. on Mondays almost everyone will sync), and if you set a high low replica number you will reserve cluster capacity you will not actually use and this will lead to higher costs;
you have a stable (or limited - i.e. closed alpha, beta etc) player base but many game modes and you can't predict with accuracy the popularity of each mode; however, it is sure that one user will play only one game mode at a time; in this case, you can host all the fleets for the game modes in the same cluster (sized to cover the player limit) and use HFA to adjust them dynamically according to user preference;

How

I think that this behavior may be implemented as part of the fleet controller, because:

the fleet autoscaler is a fleet component as there is a one-to-one relation between fleet and autoscaler; each autoscaler is working on a specific fleet, it is created and destroyed together with the fleet
rolling updates are also handled in the fleet controller
autoscaling settings can be part of the Fleet specification, it would make sense; each fleet must have a different autoscaling settins
I don't see the need of a central fleet autoscaler, there is nothing to share between fleets and this may act as a central point of failure

[Update: this idea is prototyped in #336]

A second solution, which allows a better separation of concerns and better flexibiliy is presented by @EricFortin and @markmandel below. In a few words, implement the fleet autoscaler in a separated controller/CRD pair that is attached 1-1 to a fleet, in a way similar to the fleet allocations.

[Update: this solution is implemented in #340]

References

Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization. This is similar to what we need but not quite. HPA is suited for a micro-service, distributed architecture in which you automatically deploy more instances of your app based on total resource utilization.

Items of interest

Relation between autoscaling and rolling updates

Should autoscaling run while a rolling update is in progress?
If yes, then we must be very careful about the potential conflicts (as both processes do similar actions on the same resources - game server sets).
If no, then what can happen is that, if demand increases, the "ready" buffer will be depleted and the server allocations will fail. This can have a wide range of effects on the matchmaker's back-end, beginning with a temporary increase in matchmaking times and ending with matchmaking queue filling up, hitting a nasty bug because it's executing a code path not sufficiently tested and the whole backend going down in flames.

EricFortin · 2018-08-31T13:18:13Z

While I agree with the need, I believe some implementation details need to change. @markmandel will certainly bring more insight than I can but I'll give it a try. I think we should use the scaling mechanism built into Kubernetes meaning the Fleet controller should only be responsible of maintaining a number of servers. The Fleet autoscaling should be another component as is the HPA. If we put the code to drive the scaling in the controller, it means you are stuck with a single way of scaling for all your Fleets. Then, if you want to support multiple scaling algorithms, the code become extremely complex. I believe that driving autoscaling is sort of specific to each user(or company).

Just on the top of my head, I can see these algorithms and this is only based on my experience:

Having a constant(or ratio) of servers ready to be allocated at all time
Scaling based on number of players connected
Scaling a development Fleet based on working hours
Scaling fleets based on version age
Scaling a fleet based on special events (e.g. free weekend)
Scaling a fleet based on CCU history (or ML)
Scaling a fleet based on time of year(Thanksgiving, Christmas, etc)
A mix of some of the above

By using external components, you can use different ones based on your needs.

I agree with having a one-to-one in Fleet vs HFA though.

WDYT?

victor-prodan · 2018-08-31T13:39:58Z

I think that if you want to do an external component, with custom autoscaling algorithms, you can already do this right now using Kubernetes API and updating the Fleet's spec replicas.

Fleet controller should only be responsible of maintaining a number of servers

Actually the game server sets are responsible for that. Right now the fleets are responsible for creating the initial game server set, applying manual scaling and doing rolling updates (by creating and scaling game server sets). In other words, the fleet controllers already handle scaling, but support only one policy: manual

victor-prodan · 2018-08-31T13:45:54Z

Having a constant(or ratio) of servers ready to be allocated at all time

This is the "ready" buffer I mentioned above.

Scaling based on number of players connected

This is not a metric known by Agones... and the "ready" buffer addresses this implicitely

Scaling a development Fleet based on working hours
Scaling fleets based on version age
Scaling a fleet based on special events (e.g. free weekend)
Scaling a fleet based on CCU history (or ML)
Scaling a fleet based on time of year(Thanksgiving, Christmas, etc)

I don't understand how these would be useful... scale blindly ignoring actual demand?
All these events materialize in increased demand, which is handled by the "ready" buffer.
It is useful to set the size of this "ready" buffer and min and max replicas for the fleet.
And these params could be change-able by users using their own algorithms and Kubernetes API

Or maybe we are talking of different things? :-/

L.E. HFA is something very quick, as it doesn't involve cluster scaling
Yes, for cluster scaler better algorithms are needed (because it takes several minutes for the cluster to scale), but HFA can do with a much simpler method imo.

markmandel · 2018-08-31T20:49:14Z

I agree with @EricFortin on this one, I think this should be a separate CRD and controller (inside Agones though) that is applied to a Fleet - and ideally a CRD that can allow for the expansion to multiple different scaling strategies, as we discover and implement more.

Basically following the pattern of Deployments and PodAutoscaling - and also continuing with our modular development pattern. (Also becomes a nice separation of concerns in the codebase)

Maybe something like (initial suggestion):

apiVersion: "stable.agones.dev/v1alpha1"
kind: FleetAutoscaler
metadata:
  name: fleet-autoscaler
spec:
  fleetName: "target-fleet-name"
  minReplicas: 1
  maxReplicas: 10
  strategy:
    type: Buffer
    # each type has it's own config section with the same name as the type
    buffer:
        # this could be a percentage of int value (maybe)
        replicaBuffer: 5

Then we can add more strategies from here - maybe even a custom metrics strategy that's exposed through the SDK even (as a random idea)?

As a meta comment - I like to start with thinking about what the config will look like and work from there - it's the most user facing part. But YMMV.

Regarding the point that things could break if scaling and allocating when doing a rolling update - this is a valid point. The same could be said when allocating generally during a rolling update, or scaling down - so my thought is - let's handle that as it's own issue. We likely need more testing around this, as race conditions will likely occur - and we need that whole aspect to be as rock solid as possible. Likely a good candidate for some e2e tests to see what we can break.

How does that sound?

markmandel · 2018-08-31T21:25:53Z

@victor-prodan also brought up a good point in chat - having these as separate, also facilitates changing of the autoscaling strategies or configuration without having to also manage it as a change of the underlying fleet as well - which I think is quite powerful.

victor-prodan · 2018-09-01T10:08:29Z

Regarding configuration and customization.

I think that @EricFortin is right when saying that there is a huge playground and each production will want something different, even between fleet types/game modes/stages/environments.

I dont think that Agones cam implement enough built-in policies and configuration options to satisfy big scale productions.

What I think is that Agones should implement a couple of basic policies and offer the tools to write your own policy.

And because there is also the topic of the cluster scaling, I wouldnt go further right now than the simple buffer policy until a nodescaler solution îs added as well.

A simple buffer fleet scaler combined with a simple buffer cluster scaler are enough to get small productions started.

WDYT?

victor-prodan · 2018-09-01T10:35:01Z

Two additional items of interest:

Shouldnt the update strategy be extracted into its own component as well?
How does Move to using CRD Subresources for all Agones CRDs #329 impact the implementation of the scaler? Should we wait for that?

markmandel · 2018-09-01T20:24:14Z

100% agree we should start with a simple buffer policy for our first implementations for Fleet Autoscaling. I have thoughts on node autoscaling (I think it should match what currently is done - nodes to should scale up and down to match requested resources) For exactly the reasons you describe.
My only concern was making sure if we do want to extend to other strategies, we don't config author ourselves into a corner - but so far, this all sounds really good 👍

Regarding fleet updates and allocation - I think this is a separate issue from this one here. Allocations should "work" regardless of what is happening to a Fleet. But I think there are two potential issues at the moment -- that actually aren't autoscaler related:

If you allocate while doing a rolling update / a scale down - it's definitely possible to hit a race condition in which the game server you've been allocated get's deleted at the same time. (I'm going to write a ticket to address this next week -- I have several thoughts on how to manage this)
You actually run out of game servers - in which case, the FleetAllocation should return an appropriate error (which is already does).

I feel like #2 is going to happen at some point to your game. So I don't think we can do anything special here.

Assuming I can find #1, which I think I can, I don't think we really need a update strategy - at least, not to start. WDYT?

markmandel · 2018-09-01T20:25:17Z

Re #329 probably more of an ordering question - if it comes before this gets written, use the Status update field. If not, then it will be up to the implementation to #329 to make the change in that PR.

So I don't think it's a blocker.

victor-prodan · 2018-09-01T20:31:50Z

One reason for separate update strategy: the current strategy îs very simple, a production needs more features and leverages, for example automatic roll backs If the new version is faulty.

victor-prodan · 2018-09-01T20:53:15Z

The next policy after the simple buffer I think it should be an adaptive/smart/predictive buffer, that gets bigger if there is a ramp up in allocations and smaller during ramp downs. Or in other words it tries to predict the demand in the near future and adapts to it.

victor-prodan · 2018-09-02T08:46:36Z

More thoughts on HFA config. For the simple buffer policy I think that these settings are enough:

BufferSize
MinReplicas
MaxReplicas

Min/Max are hard caps.

markmandel · 2018-11-15T17:12:55Z

Working on a design for having a webhook powered logic for the autoscaler - so that people can provide their own logic as necessary for fleet autoscaling (which I think will also close out fleet autoscaling, at least for now?).

This deliberately emulated K8s HTTP webhooks - and I also think that a HTTP/S webhook is important, as (a) a REST endpoint has lower developer friction for most developers than gRPC, and (b) if we ever want to integrate/or developers want to use a FaaS - responding to a HTTP/s request is supported.

The configuration I'm proposing would be the following:

apiVersion: "stable.agones.dev/v1alpha1"
kind: FleetAutoscaler
metadata:
  name: fleet-autoscaler-example
spec:
  fleetName: fleet-example
  policy:
    # type of the policy - this example is Webhook
    type: Webhook
    # parameters for the webhook policy - this is a WebhookClientConfig, as per other K8s webhooks
    webhook:
      # use a service, or a url
      service:
        name: fleetautoscaler-service
        namespace: default
        path: /autoscaler
      # URL (optional, instead of service)
      # caBundle (optional, if you want to provide your own ca cert to test against)

This would send a request to the webhook endpoint every sync period (which is currently 30s) with a JSON body, and scale the target fleet based on the data that is returned.

The proposed data structure that is sent, and then returned back (following AdmissionReview patterns)

type FleetAutoscaleRequest struct {
	// UID is an identifier for the individual request/response. It allows us to distinguish instances of requests which are
	// otherwise identical (parallel requests, requests when earlier requests did not modify etc)
	// The UID is meant to track the round trip (request/response) between the Autoscaler and the WebHook, not the user request.
	// It is suitable for correlating log entries between the webhook and apiserver, for either auditing or debugging.
	UID types.UID `json:"uid""`
	// Name is the name of the Fleet being scaled
	Name string `json:"name"`
	// Namespace is the namespace associated with the request (if any).
	Namespace string `json:"namespace"`
	// The Fleet's status values
	Status v1alpha1.FleetStatus `json:"status"`
}

type FleetAutoscaleResponse struct {
	// UID is an identifier for the individual request/response.
	// This should be copied over from the corresponding FleetAutoscaleRequest.
	UID types.UID `json:"uid"`
	// Set to false if no scaling should occur to the Fleet
	Scale bool `json:"scale"`
	// The targeted replica count
	Replicas int32 `json:"replicas"`
}

// This is passed to the webhook with a populated Request value,
// and then returned with a populated Response.
type FleetAutoscaleReview struct {
	Request  *FleetAutoscaleRequest  `json:"request"`
	Response *FleetAutoscaleResponse `json:"response"`
}

For reference, the FleetStatus has is built like so:

// FleetStatus is the status of a Fleet
type FleetStatus struct {
	// Replicas the total number of current GameServer replicas
	Replicas int32 `json:"replicas"`
	// ReadyReplicas are the number of Ready GameServer replicas
	ReadyReplicas int32 `json:"readyReplicas"`
	// AllocatedReplicas are the number of Allocated GameServer replicas
	AllocatedReplicas int32 `json:"allocatedReplicas"`
}

The user would populate the FleetResponse portion, and send the whole FleetAutoscaleReview back as json, with a 200 response.

The Fleet Autoscaler would scale if Scale is true, to the value set in Replicas.

Questions

Is there any information that need to be passed through on the request that is missing/needed?

Thoughts?

victor-prodan · 2018-11-16T11:01:38Z

Looks good in general.
But maybe we should have a way of identifying (and maybe discarding) out-of-order responses?
I am thinking that maybe the service uses different threads to process the requests, and the order of the responses might not be guaranteed.

markmandel · 2018-11-16T20:53:56Z

@victor-prodan Interesting point! We could (and probably should) have a timeout on the webhook. It would have to be a max of 30s - which is our sync time as well. So, in that case, I don't think we would get out of order responses at that point.

That being said - since we have server side set request UID, so we can definitely track requests to make sure that there actually aren't any concurrent requests running.

Is that what you were thinking about?

markmandel · 2018-11-24T00:00:27Z

The plan in this version is to run the webhook every 30s, like we do for the buffer strategy - should we include an option to run it every fleet count change? (Then we would have to track out of order requests).

This came up in: #423 (comment)

Add webhook policy type into FleetAutoscaler declaration. Provided an example of a webhook pod which receives FleetAutoscaleReview with a Fleet status and based on that information calculates target replica set. This process is performed on FleetAustoscaler Sync every 30 seconds. Extends Buffer policy functionality with Webhook policy which is proposed in googleforgames#334 comments.

Add webhook policy type into FleetAutoscaler declaration. Provided an example of a webhook pod which receives FleetAutoscaleReview with a Fleet status and based on that information calculates target replica set. This process is performed on FleetAustoscaler Sync every 30 seconds. Extends Buffer policy functionality with Webhook policy which is proposed in #334 comments.

markmandel · 2018-12-31T23:06:53Z

I'm going to close this ticket, as with #460 I think we've rounded this out. If we want to add more features, let's start some new tickets. Nice work everything! 🍰

markmandel added kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc labels Aug 31, 2018

victor-prodan mentioned this issue Aug 31, 2018

[Proof of concept] Horizontal Fleet Autoscaler prototype #336

Closed

victor-prodan mentioned this issue Sep 4, 2018

FleetAutoScaler v0 #340

Merged

markmandel mentioned this issue Nov 7, 2018

Admission webhook "mutations.stable.agones.dev" errors with Invalid FleetAutoscaler #406

Closed

markmandel mentioned this issue Nov 22, 2018

FleetAutoScaler Buffer policy does not notice Unhealthy/Stopping gameservers #423

Closed

aLekSer mentioned this issue Dec 26, 2018

Add webhook functionality into FleetAutoscaler #460

Merged

markmandel closed this as completed Dec 31, 2018

markmandel added this to the 0.7.0 milestone Dec 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Horizontal Fleet Autoscaling #334

Horizontal Fleet Autoscaling #334

victor-prodan commented Aug 31, 2018 •

edited

Loading

EricFortin commented Aug 31, 2018 •

edited

Loading

victor-prodan commented Aug 31, 2018 •

edited

Loading

victor-prodan commented Aug 31, 2018 •

edited

Loading

markmandel commented Aug 31, 2018 •

edited

Loading

markmandel commented Aug 31, 2018

victor-prodan commented Sep 1, 2018

victor-prodan commented Sep 1, 2018 •

edited

Loading

markmandel commented Sep 1, 2018

markmandel commented Sep 1, 2018

victor-prodan commented Sep 1, 2018

victor-prodan commented Sep 1, 2018 •

edited

Loading

victor-prodan commented Sep 2, 2018

markmandel commented Nov 15, 2018

victor-prodan commented Nov 16, 2018

markmandel commented Nov 16, 2018

markmandel commented Nov 24, 2018

markmandel commented Dec 31, 2018

Horizontal Fleet Autoscaling #334

Horizontal Fleet Autoscaling #334

Comments

victor-prodan commented Aug 31, 2018 • edited Loading

EricFortin commented Aug 31, 2018 • edited Loading

victor-prodan commented Aug 31, 2018 • edited Loading

victor-prodan commented Aug 31, 2018 • edited Loading

markmandel commented Aug 31, 2018 • edited Loading

markmandel commented Aug 31, 2018

victor-prodan commented Sep 1, 2018

victor-prodan commented Sep 1, 2018 • edited Loading

markmandel commented Sep 1, 2018

markmandel commented Sep 1, 2018

victor-prodan commented Sep 1, 2018

victor-prodan commented Sep 1, 2018 • edited Loading

victor-prodan commented Sep 2, 2018

markmandel commented Nov 15, 2018

Questions

victor-prodan commented Nov 16, 2018

markmandel commented Nov 16, 2018

markmandel commented Nov 24, 2018

markmandel commented Dec 31, 2018

victor-prodan commented Aug 31, 2018 •

edited

Loading

EricFortin commented Aug 31, 2018 •

edited

Loading

victor-prodan commented Aug 31, 2018 •

edited

Loading

victor-prodan commented Aug 31, 2018 •

edited

Loading

markmandel commented Aug 31, 2018 •

edited

Loading

victor-prodan commented Sep 1, 2018 •

edited

Loading

victor-prodan commented Sep 1, 2018 •

edited

Loading