-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The agent should more clearly indicate when it or its sub-processes have been OOM killed on Kubernetes #3641
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
I think we will need to experiment with a few different scenarios to test this properly:
|
Just so we don't forget. If the ExitCode is |
The reporting we get from k8s when a pod is OOMKilled differs based on the Kubernetes version. Starting from Kubernetes 1.28 the Prior versions have Kubernetes change log for reference: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.28.md
|
We need to make it easier to detect inadequate memory limits on Kubernetes, which are extremely common.
The agent should detect when its last status was OOM killed and report its status as degraded. Detecting that an agent has been OOMKilled from diagnostics along is not easy, it must be inferred from process restarts appearing the agent diagnostics with no other plausible explanations.
Today the primary way for us to detect this is to instruct users to run
kubectl describe pod
and look for the following:We should automate this process and have the agent read the last state and reason for itself and report it in the agent status report.
We have also seen cases where the agent sub-processes are killed and restarted without the agent process itself being OOMKilled (because the sub-processes use more memory). We should double check that the OOMKilled reason appears on the pod when this happens.
The OOM kill event also appears in the node kernel logs if we end up needing to look there:
The text was updated successfully, but these errors were encountered: