Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]pods are deleted when vscale cpu/memory limit exceed namespace quota #5366

Open
ahjing99 opened this issue Oct 10, 2023 · 2 comments
Open
Assignees
Labels
bug kind/bug Something isn't working
Milestone

Comments

@ahjing99
Copy link
Collaborator

ahjing99 commented Oct 10, 2023

➜ ~ kbcli version
Kubernetes: v1.27.3-gke.100
KubeBlocks: 0.6.3-beta.3
kbcli: 0.6.3-beta.3

When vscale cpu/memory limit exceed namespace quota, pods will be deleted, we should block the ops at the beginning when it exceed quota

  1. Create ns with quota
kubectl apply -f -<<EOF
apiVersion: v1
items:
- apiVersion: v1
  kind: ResourceQuota
  metadata:
    name: quota-ns-ukltji
    namespace: ns-ukltji
  spec:
    hard:
      limits.cpu: "2"
      limits.ephemeral-storage: 10Gi
      limits.memory: 2Gi
      requests.storage: 10Gi
  status:
    hard:
      limits.cpu: "2"
      limits.ephemeral-storage: 10Gi
      limits.memory: 2Gi
      requests.storage: 10Gi
    used:
      limits.cpu: "0"
      limits.ephemeral-storage: "0"
      limits.memory: "0"
      requests.storage: "0"
kind: List
metadata:
  resourceVersion: ""
---
apiVersion: v1
kind: LimitRange
metadata:
  name: range-ns-ukltji
  namespace: ns-ukltji
spec:
  limits:
  - default:
      cpu: 100m
      memory: 100Mi
    type: Container
EOF
  1. Create role
kubectl apply -f -<<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  labels:
    app.kubernetes.io/instance: dbname
    app.kubernetes.io/managed-by: kbcli
  name: dbname
  namespace: ns-ukltji
rules:
  - apiGroups:
      - ''
    resources:
      - events
    verbs:
      - create
  - apiGroups:
      - ''
    resources:
      - configmaps
    verbs:
      - create
      - get
      - list
      - patch
      - update
      - watch
      - delete
  - apiGroups:
      - ''
    resources:
      - endpoints
    verbs:
      - create
      - get
      - list
      - patch
      - update
      - watch
      - delete
  - apiGroups:
      - ''
    resources:
      - pods
    verbs:
      - get
      - list
      - patch
      - update
      - watch
EOF
  1. Create SA RoleBinding
kubectl apply -f -<<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/instance: dbname
    app.kubernetes.io/managed-by: kbcli
  name: dbname
  namespace: ns-ukltji
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    app.kubernetes.io/instance: dbname
    app.kubernetes.io/managed-by: kbcli
  name: dbname
  namespace: ns-ukltji
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: dbname
subjects:
  - kind: ServiceAccount
    name: dbname
    namespace: ns-ukltji
EOF
  1. Create cluster
kubectl create -f -<<EOF
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  labels:
    clusterdefinition.kubeblocks.io/name: mongodb
    clusterversion.kubeblocks.io/name: mongodb-5.0
  generateName: mongo-
  namespace: ns-ukltji
spec:
  affinity:
    nodeLabels: {}
    podAntiAffinity: Preferred
    tenancy: SharedNode
    topologyKeys: []
  clusterDefinitionRef: mongodb
  clusterVersionRef: mongodb-5.0
  componentSpecs:
  - componentDefRef: mongodb
    monitor: true
    name: mongodb
    replicas: 1
    resources:
      limits:
        cpu: 1000m
        memory: 1024Mi
      requests:
        cpu: 100m
        memory: 102Mi
    serviceAccountName: dbname
    volumeClaimTemplates:
    - name: data
      spec:
        storageClassName: standard-rwo
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
  terminationPolicy: WipeOut
  tolerations: []
EOF


➜  ~ kbcli cluster describe -n ns-ukltji mongo-ggbgx
Name: mongo-ggbgx         Created Time: Oct 10,2023 11:25 UTC+0800
NAMESPACE   CLUSTER-DEFINITION   VERSION       STATUS    TERMINATION-POLICY
ns-ukltji   mongodb              mongodb-5.0   Running   WipeOut

Endpoints:
COMPONENT   MODE        INTERNAL                                                EXTERNAL
mongodb     ReadWrite   mongo-ggbgx-mongodb.ns-ukltji.svc.cluster.local:27017   <none>

Topology:
COMPONENT   INSTANCE                ROLE      STATUS    AZ              NODE                                                CREATED-TIME
mongodb     mongo-ggbgx-mongodb-0   primary   Running   us-central1-c   gke-yjtest-default-pool-c51609d3-ss98/10.128.0.46   Oct 10,2023 11:25 UTC+0800

Resources Allocation:
COMPONENT   DEDICATED   CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE-SIZE   STORAGE-CLASS
mongodb     false       100m / 1             102Mi / 1Gi             data:5Gi       standard-rwo

Images:
COMPONENT   TYPE      IMAGE
mongodb     mongodb   registry.cn-hangzhou.aliyuncs.com/apecloud/mongo:5.0.14

Data Protection:
AUTO-BACKUP   BACKUP-SCHEDULE   TYPE     BACKUP-TTL   LAST-SCHEDULE   RECOVERABLE-TIME
Disabled      <none>            <none>   7d           <none>          <none>

Show cluster events: kbcli cluster list-events -n ns-ukltji mongo-ggbgx

  1. Vscale
kubectl create -f -<<EOF
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  generateName: ops-verticalscaling-2c4g-
  namespace: ns-ukltji
spec:
  clusterRef: mongo-ggbgx
  type: VerticalScaling
  verticalScaling:
  - componentName: mongodb
    requests:
      cpu: "4"
      memory: "4Gi"
    limits:
      cpu: "4"
      memory: "4Gi"
EOF      
  1. Pods are deleted and cannot recover
➜  ~ k describe cluster mongo-ggbgx  -n ns-ukltji
Name:         mongo-ggbgx
Namespace:    ns-ukltji
Labels:       clusterdefinition.kubeblocks.io/name=mongodb
              clusterversion.kubeblocks.io/name=mongodb-5.0
Annotations:  kubeblocks.io/ops-request: [{"name":"ops-verticalscaling-2c4g-v77p7","type":"VerticalScaling"}]
              kubeblocks.io/reconcile: 2023-10-10T03:43:56.541431103Z
API Version:  apps.kubeblocks.io/v1alpha1
Kind:         Cluster
Metadata:
  Creation Timestamp:  2023-10-10T03:25:43Z
  Finalizers:
    cluster.kubeblocks.io/finalizer
  Generate Name:  mongo-
  Generation:     3
  Managed Fields:
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:generateName:
        f:labels:
          .:
          f:clusterdefinition.kubeblocks.io/name:
          f:clusterversion.kubeblocks.io/name:
      f:spec:
        .:
        f:affinity:
          .:
          f:podAntiAffinity:
          f:tenancy:
        f:clusterDefinitionRef:
        f:clusterVersionRef:
        f:componentSpecs:
          .:
          k:{"name":"mongodb"}:
            .:
            f:componentDefRef:
            f:monitor:
            f:name:
            f:noCreatePDB:
            f:replicas:
            f:resources:
              .:
              f:limits:
              f:requests:
            f:serviceAccountName:
            f:volumeClaimTemplates:
        f:terminationPolicy:
    Manager:      kubectl-create
    Operation:    Update
    Time:         2023-10-10T03:25:43Z
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:clusterDefGeneration:
        f:components:
          .:
          f:mongodb:
            .:
            f:consensusSetStatus:
              .:
              f:leader:
                .:
                f:accessMode:
                f:name:
                f:pod:
            f:phase:
            f:podsReady:
        f:conditions:
        f:observedGeneration:
        f:phase:
    Manager:      manager
    Operation:    Update
    Subresource:  status
    Time:         2023-10-10T03:41:11Z
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubeblocks.io/ops-request:
          f:kubeblocks.io/reconcile:
        f:finalizers:
          .:
          v:"cluster.kubeblocks.io/finalizer":
      f:spec:
        f:componentSpecs:
          k:{"name":"mongodb"}:
            f:classDefRef:
              .:
              f:class:
            f:resources:
              f:limits:
                f:cpu:
                f:memory:
              f:requests:
                f:cpu:
                f:memory:
        f:monitor:
        f:resources:
          .:
          f:cpu:
          f:memory:
        f:storage:
          .:
          f:size:
    Manager:         manager
    Operation:       Update
    Time:            2023-10-10T03:43:56Z
  Resource Version:  1151548
  UID:               c9b3441e-793f-4848-8568-4619bc6620a9
Spec:
  Affinity:
    Pod Anti Affinity:     Preferred
    Tenancy:               SharedNode
  Cluster Definition Ref:  mongodb
  Cluster Version Ref:     mongodb-5.0
  Component Specs:
    Class Def Ref:
      Class:
    Component Def Ref:  mongodb
    Monitor:            true
    Name:               mongodb
    No Create PDB:      false
    Replicas:           1
    Resources:
      Limits:
        Cpu:     4
        Memory:  4Gi
      Requests:
        Cpu:               4
        Memory:            4Gi
    Service Account Name:  dbname
    Volume Claim Templates:
      Name:  data
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:         5Gi
        Storage Class Name:  standard-rwo
  Monitor:
  Resources:
    Cpu:     0
    Memory:  0
  Storage:
    Size:              0
  Termination Policy:  WipeOut
Status:
  Cluster Def Generation:  2
  Components:
    Mongodb:
      Consensus Set Status:
        Leader:
          Access Mode:  None
          Name:
          Pod:          Unknown
      Phase:            Updating
      Pods Ready:       false
  Conditions:
    Last Transition Time:  2023-10-10T03:41:06Z
    Message:               VerticalScaling opsRequest: ops-verticalscaling-2c4g-v77p7 is processing
    Reason:                VerticalScaling
    Status:                False
    Type:                  LatestOpsRequestProcessed
    Last Transition Time:  2023-10-10T03:25:43Z
    Message:               The operator has started the provisioning of Cluster: mongo-ggbgx
    Observed Generation:   3
    Reason:                PreCheckSucceed
    Status:                True
    Type:                  ProvisioningStarted
    Last Transition Time:  2023-10-10T03:25:44Z
    Message:               Successfully applied for resources
    Observed Generation:   3
    Reason:                ApplyResourcesSucceed
    Status:                True
    Type:                  ApplyResources
    Last Transition Time:  2023-10-10T03:41:11Z
    Message:               pods are not ready in Components: [mongodb], refer to related component message in Cluster.status.components
    Reason:                ReplicasNotReady
    Status:                False
    Type:                  ReplicasReady
    Last Transition Time:  2023-10-10T03:41:11Z
    Message:               pods are unavailable in Components: [mongodb], refer to related component message in Cluster.status.components
    Reason:                ComponentsNotReady
    Status:                False
    Type:                  Ready
  Observed Generation:     3
  Phase:                   Updating
Events:
  Type     Reason                    Age                    From                    Message
  ----     ------                    ----                   ----                    -------
  Normal   ComponentPhaseTransition  18m                    cluster-controller      Create a new component
  Normal   AllReplicasReady          18m                    cluster-controller      all pods of components are ready, waiting for the probe detection successful
  Normal   ClusterReady              18m                    cluster-controller      Cluster: mongo-ggbgx is ready, current phase is Running
  Normal   ComponentPhaseTransition  18m                    cluster-controller      Running: true, PodsReady: true, PodsTimedout: false
  Normal   Running                   18m                    cluster-controller      Cluster: mongo-ggbgx is ready, current phase is Running
  Normal   ApplyResourcesSucceed     3m23s (x2 over 18m)    cluster-controller      Successfully applied for resources
  Normal   PreCheckSucceed           3m23s (x2 over 18m)    cluster-controller      The operator has started the provisioning of Cluster: mongo-ggbgx
  Normal   VerticalScaling           3m23s                  ops-request-controller  Start to process the VerticalScaling opsRequest "ops-verticalscaling-2c4g-v77p7" in Cluster: mongo-ggbgx
  Normal   ComponentPhaseTransition  3m23s                  cluster-controller      Component workload updated
  Normal   WaitingForProbeSuccess    3m23s (x3 over 3m23s)  cluster-controller      Waiting for probe success
  Warning  ReplicasNotReady          3m18s                  cluster-controller      pods are not ready in Components: [mongodb], refer to related component message in Cluster.status.components
  Warning  ComponentsNotReady        3m18s                  cluster-controller      pods are unavailable in Components: [mongodb], refer to related component message in Cluster.status.components
  Warning  FailedCreate              33s (x3 over 2m36s)    event-controller        create Pod mongo-ggbgx-mongodb-0 in StatefulSet mongo-ggbgx-mongodb failed error: pods "mongo-ggbgx-mongodb-0" is forbidden: exceeded quota: quota-ns-ukltji, requested: limits.cpu=4,limits.memory=4Gi, used: limits.cpu=0,limits.memory=0, limited: limits.cpu=2,limits.memory=2Gi
➜  ~ k get pod -n ns-ukltji
No resources found in ns-ukltji namespace.
@ahjing99 ahjing99 added the kind/bug Something isn't working label Oct 10, 2023
@ahjing99 ahjing99 added this to the Release 0.7.0 milestone Oct 10, 2023
@leon-inf
Copy link
Contributor

dup with #5375

@leon-inf leon-inf closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2023
@leon-inf
Copy link
Contributor

Reopen this issue to track the problem about namespace resource quota.

@leon-inf leon-inf reopened this Oct 12, 2023
@leon-inf leon-inf removed this from the Release 0.7.0 milestone Oct 17, 2023
@nayutah nayutah added this to the Release 0.9.0 milestone Jun 18, 2024
@ahjing99 ahjing99 modified the milestones: Release 0.9.0, Release 0.9.1 Jul 8, 2024
@github-actions github-actions bot modified the milestones: Release 0.9.1, Release 0.9.2 Aug 8, 2024
@github-actions github-actions bot modified the milestones: Release 0.9.2, Release 0.8.5 Oct 8, 2024
@github-actions github-actions bot modified the milestones: Release 0.9.2, Release 0.9.3 Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants