You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ian is planning to put in a service that monitors Teamware for liveness and readiness. Further discussion is needed on what we should put into Teamware in order to support this.
Liveness test should not require any modification as it just checks that index.html is served or not
Readiness test should check that the database (and what else?) is working and return a 5xx error if not.
It might be useful to provide JSON response if we feel there's any stats useful to report, this can be logged as part of monitoring.
The text was updated successfully, but these errors were encountered:
To clarify - when running under kubernetes there are two types of health check probes that are configured in the helm chart for the django backend container:
"liveness" probe which determines whether the container is running. Repeated failures of the liveness probe cause the container to be killed and restarted - the assumption is that these problems are terminal and we need to reset the container to a known good state
"readiness" probe which determines whether this container is able to serve incoming HTTP requests. Failures of the readiness probe cause the pod to be removed from the set of active endpoints but (importantly) do not cause the container to be restarted - the assumption is that these are transient problems that can be solved by just waiting
At present these both check the same thing - does a GET of the root path return a successful (399 or lower) response code. This makes sense for the liveness check, but for readiness we should add a separate endpoint that checks that the backend can query the database. If the database is down the backend cannot serve requests, but it should not be killed, it simply waits until the database comes back. To be compatible with the k8s probe system this should be a simple GET endpoint that indicates success with a 200 response code and failure with a 5xx.
Ian is planning to put in a service that monitors Teamware for liveness and readiness. Further discussion is needed on what we should put into Teamware in order to support this.
index.html
is served or notThe text was updated successfully, but these errors were encountered: