Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] oceanbase cluster stop to start failed: Readiness probe failed: cat: /tmp/ready #5464

Closed
JashBook opened this issue Oct 16, 2023 · 3 comments
Assignees
Labels
bug kind/bug Something isn't working severity/major Great chance user will encounter the same problem
Milestone

Comments

@JashBook
Copy link
Collaborator

Describe the bug
oceanbase cluster stop to start failed:
Readiness probe failed: cat: /tmp/ready: No such file or directory

kbcli version
Kubernetes: v1.26.7-gke.500
KubeBlocks: 0.7.0-beta.4
kbcli: 0.7.0-beta.4

To Reproduce
Steps to reproduce the behavior:

  1. create cluster
kubectl apply -f -<<EOF
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ob-cluster-observer-sa
  namespace: default
  labels:
    app.kubernetes.io/name: oceanbase-cluster
    app.kubernetes.io/instance: oceanbase
    app.kubernetes.io/version: "4.2.0.0-100010032023083021"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ob-cluster-statefulset-reader
  namespace: default
  labels:
    app.kubernetes.io/name: oceanbase-cluster
    app.kubernetes.io/instance: oceanbase
    app.kubernetes.io/version: "4.2.0.0-100010032023083021"
rules:
- apiGroups: ["apps"] # "" indicates the core API group
  resources: ["statefulsets"]
  verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ob-cluster-read-statefulsets
  namespace: default
  labels:
    app.kubernetes.io/name: oceanbase-cluster
    app.kubernetes.io/instance: oceanbase
    app.kubernetes.io/version: "4.2.0.0-100010032023083021"
subjects:
- kind: ServiceAccount
  name: ob-cluster-observer-sa
- kind: ServiceAccount
  name: kb-ob-cluster
roleRef:
  kind: Role
  name: ob-cluster-statefulset-reader
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: ob-cluster
  namespace: default
spec:
  clusterDefinitionRef: oceanbase
  clusterVersionRef: oceanbase-4.2.0.0-100010032023083021
  terminationPolicy: WipeOut
  componentSpecs:
    - name: ob-bundle
      componentDefRef: ob-bundle
      serviceAccountName: ob-cluster-observer-sa
      replicas: 3
      resources:
        requests:
          cpu: 2000m
          memory: 8Gi
        limits:
          cpu: 2000m
          memory: 8Gi
      volumeClaimTemplates:
        - name: data-file
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: "50Gi"
        - name: data-log
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: "50Gi"
        - name: log
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: "20Gi"
EOF
  1. stop cluster then start
kbcli cluster stop ob-cluster --auto-approve

kbcli cluster start ob-cluster
  1. See error
➜  ~ kubectl get pod 
NAME                                            READY   STATUS    RESTARTS   AGE
ob-cluster-ob-bundle-0                          0/1     Running   0          3m17s

➜  ~ kubectl get cluster
NAME            CLUSTER-DEFINITION   VERSION                                TERMINATION-POLICY   STATUS     AGE
ob-cluster      oceanbase            oceanbase-4.2.0.0-100010032023083021   WipeOut              Updating   8m1s

➜  ~ kubectl get ops    
NAME                        TYPE    CLUSTER         STATUS    PROGRESS   AGE
ob-cluster-start-7mf76      Start   ob-cluster      Running   0/3        3m29s
ob-cluster-stop-c55hb       Stop    ob-cluster      Succeed   3/3        5m4s

describe pod

kubectl describe pod ob-cluster-ob-bundle-0
Name:             ob-cluster-ob-bundle-0
Namespace:        default
Priority:         0
Service Account:  ob-cluster-observer-sa
Node:             gke-cicd-gke-test-cicd-gke-test-6c648ea8-63vk/10.128.0.32
Start Time:       Mon, 16 Oct 2023 16:34:43 +0800
Labels:           app.kubernetes.io/component=ob-bundle
                  app.kubernetes.io/instance=ob-cluster
                  app.kubernetes.io/managed-by=kubeblocks
                  app.kubernetes.io/name=oceanbase
                  app.kubernetes.io/version=oceanbase-4.2.0.0-100010032023083021
                  apps.kubeblocks.io/component-name=ob-bundle
                  apps.kubeblocks.io/workload-type=Stateful
                  controller-revision-hash=ob-cluster-ob-bundle-9787c8d4
                  statefulset.kubernetes.io/pod-name=ob-cluster-ob-bundle-0
Annotations:      apps.kubeblocks.io/component-replicas: 3
Status:           Running
IP:               10.0.33.19
IPs:
  IP:           10.0.33.19
Controlled By:  StatefulSet/ob-cluster-ob-bundle
Containers:
  observer-container:
    Container ID:  containerd://77c8faa60918fa0a74d650644b630da4b2c281a3e122429be5f21cd41da06307
    Image:         oceanbasedev/oceanbase-chart:4.2.0.0-100010032023083021
    Image ID:      docker.io/oceanbasedev/oceanbase-chart@sha256:bdb0feec2ce5ff0929285fd8991db96a22bf50a81d79100066aee4715be878e6
    Ports:         2881/TCP, 2882/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      bash
      -c
      ./scripts/entrypoint.sh
    State:          Running
      Started:      Mon, 16 Oct 2023 16:34:50 +0800
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  8Gi
    Requests:
      cpu:      2
      memory:   8Gi
    Readiness:  exec [cat /tmp/ready] delay=10s timeout=5s period=10s #success=1 #failure=10
    Environment Variables from:
      ob-cluster-ob-bundle-env      ConfigMap  Optional: false
      ob-cluster-ob-bundle-rsm-env  ConfigMap  Optional: false
    Environment:
      KB_POD_NAME:               ob-cluster-ob-bundle-0 (v1:metadata.name)
      KB_POD_UID:                 (v1:metadata.uid)
      KB_NAMESPACE:              default (v1:metadata.namespace)
      KB_SA_NAME:                 (v1:spec.serviceAccountName)
      KB_NODENAME:                (v1:spec.nodeName)
      KB_HOST_IP:                 (v1:status.hostIP)
      KB_POD_IP:                  (v1:status.podIP)
      KB_POD_IPS:                 (v1:status.podIPs)
      KB_HOSTIP:                  (v1:status.hostIP)
      KB_PODIP:                   (v1:status.podIP)
      KB_PODIPS:                  (v1:status.podIPs)
      KB_CLUSTER_NAME:           ob-cluster
      KB_COMP_NAME:              ob-bundle
      KB_CLUSTER_COMP_NAME:      ob-cluster-ob-bundle
      KB_CLUSTER_UID_POSTFIX_8:  8ee36a59
      KB_POD_FQDN:               $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc
      LD_LIBRARY_PATH:           /home/admin/oceanbase/lib
      ZONE_COUNT:                3
      CLUSTER_NAME:              $(KB_CLUSTER_COMP_NAME)
      POD_IP:                     (v1:status.podIP)
      DB_ROOT_PASSWORD:          <set to the key 'password' in secret 'ob-cluster-conn-credential'>  Optional: false
    Mounts:
      /home/admin/data-file from data-file (rw)
      /home/admin/data-log from data-log (rw)
      /home/admin/log from log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8dcc5 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data-file:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-file-ob-cluster-ob-bundle-0
    ReadOnly:   false
  data-log:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-log-ob-cluster-ob-bundle-0
    ReadOnly:   false
  log:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  log-ob-cluster-ob-bundle-0
    ReadOnly:   false
  kube-api-access-8dcc5:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 kb-data=true:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  4m18s                 default-scheduler  Successfully assigned default/ob-cluster-ob-bundle-0 to gke-cicd-gke-test-cicd-gke-test-6c648ea8-63vk
  Normal   Pulled     4m11s                 kubelet            Container image "oceanbasedev/oceanbase-chart:4.2.0.0-100010032023083021" already present on machine
  Normal   Created    4m11s                 kubelet            Created container observer-container
  Normal   Started    4m11s                 kubelet            Started container observer-container
  Warning  Unhealthy  42s (x22 over 3m52s)  kubelet            Readiness probe failed: cat: /tmp/ready: No such file or directory
➜  ~ 

logs error pod

➜  ~ kubectl logs ob-cluster-ob-bundle-0  
Getting dynamic replica ips
REPLICA_NUM: 3
Recovering: True
cat: /home/admin/oceanbase/store/etc/observer.conf.bin: No such file or directory
IP changed, need to rejoin the cluster
Prepare config folders
Start server
Start observer process as normal server...
/home/admin/oceanbase/bin/observer --appname obcluster --cluster_id 1 --zone zone0 --devname eth0 -p 2881 -P 2882 -d /home/admin/oceanbase/store/ -l info -o config_additional_dir=/home/admin/oceanbase/store/etc,cpu_count=16,memory_limit=8G,system_memory=1G,__min_full_resource_pool_memory=1073741824,datafile_size=40G,log_disk_size=40G,net_thread_count=2,stack_size=512K,cache_wash_threshold=1G,schema_history_expire_time=1d,enable_separate_sys_clog=false,enable_merge_by_turn=false,enable_syslog_recycle=true,enable_syslog_wf=false,max_syslog_file_count=4
appname: obcluster
cluster id: 1
zone: zone0
devname: eth0
mysql port: 2881
rpc port: 2882
data_dir: /home/admin/oceanbase/store/
log level: info
optstr: config_additional_dir=/home/admin/oceanbase/store/etc,cpu_count=16,memory_limit=8G,system_memory=1G,__min_full_resource_pool_memory=1073741824,datafile_size=40G,log_disk_size=40G,net_thread_count=2,stack_size=512K,cache_wash_threshold=1G,schema_history_expire_time=1d,enable_separate_sys_clog=false,enable_merge_by_turn=false,enable_syslog_recycle=true,enable_syslog_wf=false,max_syslog_file_count=4
observer on this node is not ready, wait for a moment...
Resolving other servers' IPs
nslookup ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless
ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless is not ready yet
nslookup ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless
ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless is not ready yet
nslookup ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless
ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless is not ready yet
nslookup ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless
ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless is not ready yet
nslookup ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless
ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless is not ready yet
nslookup ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless
ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless is not ready yet
nslookup ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless
ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless is not ready yet
nslookup ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless
ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless is not ready yet
nslookup ob-cluster-ob-bundle-1.ob-cluster-ob-bundle-headless
...

Expected behavior
oceanbase cluster stop to start success.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@JashBook JashBook added kind/bug Something isn't working severity/major Great chance user will encounter the same problem labels Oct 16, 2023
@JashBook JashBook added this to the Release 0.7.0 milestone Oct 16, 2023
@ahjing99
Copy link
Collaborator

vscale also encountered the same error

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  19m                    default-scheduler  Successfully assigned default/ocbase-iuefny-ob-bundle-2 to gke-yjtest-default-pool-ff74a9b4-w1bp
  Normal   Pulled     19m                    kubelet            Container image "oceanbasedev/oceanbase-chart:4.2.0.0-100010032023083021" already present on machine
  Normal   Created    19m                    kubelet            Created container observer-container
  Normal   Started    19m                    kubelet            Started container observer-container
  Warning  Unhealthy  4m39s (x101 over 19m)  kubelet            Readiness probe failed: cat: /tmp/ready: No such file or directory

@powerfooI
Copy link
Contributor

OceanBase Cluster for KB does not support stop and start which scale replicas to 0. The Cluster will be inactive (or dead) once more than half of ob servers are suddenly shut down.

@ahjing99
Copy link
Collaborator

Closing as this is not supported yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug kind/bug Something isn't working severity/major Great chance user will encounter the same problem
Projects
None yet
Development

No branches or pull requests

5 participants