Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safeguard against pod eviction and OOM kill #334

Open
paulomach opened this issue Nov 8, 2023 · 1 comment
Open

safeguard against pod eviction and OOM kill #334

paulomach opened this issue Nov 8, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@paulomach
Copy link
Contributor

Guaranteed QoS Class requirements (docs):

  • Every Container in the Pod must have a memory limit and a memory request.
  • For every Container in the Pod, the memory limit must equal the memory request.
  • Every Container in the Pod must have a CPU limit and a CPU request.
  • For every Container in the Pod, the CPU limit must equal the CPU request.

BestEffort (current) is the first class to be evicted, followed by Burstable and Guaranteed

Method:

  1. Patch statefulset at the charm initialization
    1. allow rolling update of pods before any initialization
    2. allow set of other desirable/unrelated statefulset values, e.g:
      1. terminationGracePeriodSeconds to 24hs
    3. Allow testing k8s cluster access ASAP (trust enabled or not)

Problem:

  • setting request may prevent scheduling of pods due resource unavailability

Proposal:

  • only set when both cpu and memory constraints are explicitly configured by the user on deployment, i.e. juju deploy mysql-k8s --constraints 'mem=4G cpu-power=2000'
    • cpu-power is set in millis
    • we have no knowledge of concurrent workloads or cluster topology
  • patch QoS requirements on startup otherwise to match Burstable criteria (doc)
    • always patch request for initContainer is enough
    • still use method to patch some unrelated SS values / validate cluster access

Catches:

  • juju will rewrite statefulset on charm refresh, which will require the patching
  • Extra controls to ensure rollingupdate is not interpreted as and juju refresh
@paulomach paulomach added the bug Something isn't working label Nov 8, 2023
Copy link
Contributor

github-actions bot commented Nov 8, 2023

@paulomach paulomach added enhancement New feature or request and removed bug Something isn't working labels Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant