Make driver restarts on kubernetes have zero downtime #1331

fleupold · 2020-08-17T07:45:44Z

We occasionally run into issues submitting solutions (e.g. here). These likely come from unfortunately timed restarts where we stop the old pods halfway through the processing of a batch and thus restart the new ones with little time remaining.

Moreover, the new instances might have some significant setup time if e.g. the event database has to be resynced from scratch.

We should therefore work towards only shutting down the previous deployment once the new deployment is ready and ideally only at the beginning (e.g. first 30 seconds) of a batch.

At the same time we have to make sure we are not actually running the same pod twice on the same auction as this could lead to issues with out solver license and nonce issues when trying to use the same PK for solution submission.

fleupold · 2020-08-28T08:54:22Z

Small Update on the progress:

@giacomolicari has been working on making the price-estimation service rollover (since the ready route is already implemented there). We now have an init container that downloads the latest orderbook from an S3 bucket and a container running on the side that re-uploads it to S3 on every change.

He is currently working on changing the auto-deployer to use a smooth restart instead of forced delete + restart.

Once #1373 is merged we can also have the same concept for the solver containers. There, we might not have to upload any data since one host (price estimator) should be fine.

fleupold added enhancement New feature or request [Driver] tasks that relate to the driver subsystem labels Aug 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make driver restarts on kubernetes have zero downtime #1331

Make driver restarts on kubernetes have zero downtime #1331

fleupold commented Aug 17, 2020

fleupold commented Aug 28, 2020

Make driver restarts on kubernetes have zero downtime #1331

Make driver restarts on kubernetes have zero downtime #1331

Comments

fleupold commented Aug 17, 2020

fleupold commented Aug 28, 2020