Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]ob primary/secondary switchover failed #7029

Closed
ahjing99 opened this issue Apr 11, 2024 · 3 comments
Closed

[BUG]ob primary/secondary switchover failed #7029

ahjing99 opened this issue Apr 11, 2024 · 3 comments
Assignees
Labels
kind/bug Something isn't working severity/major Great chance user will encounter the same problem Stale
Milestone

Comments

@ahjing99
Copy link
Collaborator

➜ ~ kbcli version
Kubernetes: v1.27.8-gke.1067004
KubeBlocks: 0.9.0-beta.5
kbcli: 0.9.0-beta.1

# Add Helm repo 
helm repo add kubeblocks-addons https://apecloud.github.io/helm-charts
# If github is not accessible or very slow for you, please use following repo instead
helm repo add kubeblocks-addons https://jihulab.com/api/v4/projects/150246/packages/helm/stable
# Update helm repo
helm repo update

# Enable oceanbase 
helm upgrade -i oceanbase-ce kubeblocks-addons/oceanbase-ce --version 0.9.0 -n kb-system  
  1. create cluster
    k apply -f cluster.yaml
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: oceanbase-cluster1
  namespace: default
  annotations:
    #Specify how many clusters to create with-in one ob cluster, set to 2 when creating a primary and secondary cluster
    "kubeblocks.io/extra-env": "{\"TENANT_NAME\":\"obtenant\",\"ZONE_COUNT\":\"1\",\"OB_CLUSTERS_COUNT\":\"2\",\"TENANT_CPU\":\"2\",\"TENANT_MEMORY\":\"2G\",\"TENANT_DISK\":\"5G\"}"
spec:
  # Specifies the cluster termination policy.
  # - DoNotTerminate will block delete operation.
  # - Halt will delete workload resources such as statefulset, deployment workloads but keep PVCs.
  # - Delete is based on Halt and deletes PVCs.
  # - WipeOut is based on Delete and wipe out all volume snapshots and snapshot data from backup storage location.
  terminationPolicy: Delete
  # The cluster-level configuration is used as the default configuration of all components;
  # if the affinity and tolerations exists in a component, the component-level configuration
  # will take effect and cover the default cluster-level configuration
  affinity:
    # Specifies the anti-affinity level of pods within a component.
    # - Preferred
    # - Required
    podAntiAffinity: Preferred
    # Represents the key of node labels.
    topologyKeys:
      - kubernetes.io/hostname
    # Defines how pods are distributed across nodes.
    # - SharedNode
    # - DedicatedNode
    tenancy: SharedNode
  # Attached to tolerate any taint that matches the triple `key,value,effect` using the matching operator `operator`.
  tolerations:
    - key: kb-data
      operator: Equal
      value: "true"
      effect: NoSchedule
  # List of componentSpec used to define the components that make up a cluster.
  # ComponentSpecs and ShardingSpecs cannot both be empty at the same time.
  # ClusterComponentSpec defines the specifications for a cluster component.
  componentSpecs:
      # Specifies the name of the cluster's component.
      # This name is also part of the Service DNS name and must comply with the IANA Service Naming rule.
    - name: ob-ce-0
      # References the componentDef defined in the ClusterDefinition spec. Must comply with the IANA Service Naming rule.
      # - ob-ce-repl, will use container network
      # - ob-ce-repl-host, will use host network
      componentDef: ob-ce-repl
      # Specifies the number of component replicas.
      replicas: 1
      # Specifies the resources requests and limits of the workload.
      resources:
        limits:
          cpu: "3"
          memory: "8Gi"
        requests:
          cpu: "3"
          memory: "8Gi"
      # Provides information for statefulset.spec.volumeClaimTemplates.
      volumeClaimTemplates:
        # Refers to `clusterDefinition.spec.componentDefs.containers.volumeMounts.name`.
        - name: data-file
          spec:
            # Contains the desired access modes the volume should have.
            accessModes:
              - ReadWriteOnce
            # Represents the minimum resources the volume should have.
            resources:
              requests:
                storage: 50Gi
        - name: data-log
          spec:
            # Contains the desired access modes the volume should have.
            accessModes:
              - ReadWriteOnce
            # Represents the minimum resources the volume should have.
            resources:
              requests:
                storage: 50Gi
        - name: log
          spec:
            # Contains the desired access modes the volume should have.
            accessModes:
              - ReadWriteOnce
            # Represents the minimum resources the volume should have.
            resources:
              requests:
                storage: 10Gi
        - name: workdir
          spec:
            # Contains the desired access modes the volume should have.
            accessModes:
              - ReadWriteOnce
            # Represents the minimum resources the volume should have.
            resources:
              requests:
                storage: 20Gi
    - name: ob-ce-1
      # References the componentDef defined in the ClusterDefinition spec. Must comply with the IANA Service Naming rule.
      # - ob-ce-repl, will use container network
      # - ob-ce-repl-host, will use host network
      componentDef: ob-ce-repl
      # Specifies the number of component replicas.
      replicas: 1
      # Specifies the resources requests and limits of the workload.
      resources:
        limits:
          cpu: "3"
          memory: "8Gi"
        requests:
          cpu: "3"
          memory: "8Gi"
      # Provides information for statefulset.spec.volumeClaimTemplates.
      volumeClaimTemplates:
        # Refers to `clusterDefinition.spec.componentDefs.containers.volumeMounts.name`.
        - name: data-file
          spec:
            # Contains the desired access modes the volume should have.
            accessModes:
              - ReadWriteOnce
            # Represents the minimum resources the volume should have.
            resources:
              requests:
                storage: 50Gi
        - name: data-log
          spec:
            # Contains the desired access modes the volume should have.
            accessModes:
              - ReadWriteOnce
            # Represents the minimum resources the volume should have.
            resources:
              requests:
                storage: 50Gi
        - name: log
          spec:
            # Contains the desired access modes the volume should have.
            accessModes:
              - ReadWriteOnce
            # Represents the minimum resources the volume should have.
            resources:
              requests:
                storage: 10Gi
        - name: workdir
          spec:
            # Contains the desired access modes the volume should have.
            accessModes:
              - ReadWriteOnce
            # Represents the minimum resources the volume should have.
            resources:
              requests:
                storage: 20Gi
  1. Before Switchover
➜  ~ kbcli cluster describe oceanbase-cluster1
Name: oceanbase-cluster1	 Created Time: Apr 11,2024 18:27 UTC+0800
NAMESPACE   CLUSTER-DEFINITION   VERSION   STATUS    TERMINATION-POLICY
default                                    Running   Delete

Endpoints:
COMPONENT   MODE        INTERNAL                                                              EXTERNAL
ob-ce-0     ReadWrite   oceanbase-cluster1-ob-ce-0-oceanbase.default.svc.cluster.local:2881   <none>
                        oceanbase-cluster1-ob-ce-0-oceanbase.default.svc.cluster.local:2882
ob-ce-1     ReadWrite   oceanbase-cluster1-ob-ce-1-oceanbase.default.svc.cluster.local:2881   <none>
                        oceanbase-cluster1-ob-ce-1-oceanbase.default.svc.cluster.local:2882

Topology:
COMPONENT   INSTANCE                       ROLE      STATUS    AZ              NODE                                                  CREATED-TIME
ob-ce-0     oceanbase-cluster1-ob-ce-0-0   primary   Running   us-central1-c   gke-yijing-default-pool-ea930834-23wf/10.128.0.6      Apr 11,2024 18:27 UTC+0800
ob-ce-1     oceanbase-cluster1-ob-ce-1-0   standby   Running   us-central1-c   gke-yijing-default-pool-ea930834-hq9p/10.128.15.238   Apr 11,2024 18:27 UTC+0800

Resources Allocation:
COMPONENT   DEDICATED   CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE-SIZE     STORAGE-CLASS
ob-ce-0     false       3 / 3                8Gi / 8Gi               data-file:50Gi   kb-default-sc
                                                                     data-log:50Gi    kb-default-sc
                                                                     log:10Gi         kb-default-sc
                                                                     workdir:20Gi     kb-default-sc
ob-ce-1     false       3 / 3                8Gi / 8Gi               data-file:50Gi   kb-default-sc
                                                                     data-log:50Gi    kb-default-sc
                                                                     log:10Gi         kb-default-sc
                                                                     workdir:20Gi     kb-default-sc

Images:
COMPONENT   TYPE   IMAGE
ob-ce-0            docker.io/apecloud/oceanbase:4.2.0.0-100010032023083021
ob-ce-1            docker.io/apecloud/oceanbase:4.2.0.0-100010032023083021

Show cluster events: kbcli cluster list-events -n default oceanbase-cluster1
  1. Switchover
    k apply -f switchover.yaml
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: oceanbase-switchover
spec:
  # References the cluster object.
  clusterRef: oceanbase-cluster1
  # Defines the operation type.
  type: Switchover
  # Switches over the specified components.
  switchover:
    # Specifies the name of the cluster component.
  - componentName: ob-ce-1
    # If assigned "*", it signifies that no specific primary or leader is designated for the switchover
    instanceName: '*'
  1. The ops failed
➜  ~ k describe ops oceanbase-switchover
Name:         oceanbase-switchover
Namespace:    default
Labels:       app.kubernetes.io/instance=oceanbase-cluster1
              ops.kubeblocks.io/ops-type=Switchover
Annotations:  <none>
API Version:  apps.kubeblocks.io/v1alpha1
Kind:         OpsRequest
Metadata:
  Creation Timestamp:  2024-04-11T10:32:35Z
  Finalizers:
    opsrequest.kubeblocks.io/finalizer
  Generation:  1
  Managed Fields:
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:clusterRef:
        f:switchover:
          .:
          k:{"componentName":"ob-ce-1"}:
            .:
            f:componentName:
            f:instanceName:
        f:ttlSecondsBeforeAbort:
        f:type:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2024-04-11T10:32:35Z
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"opsrequest.kubeblocks.io/finalizer":
        f:labels:
          .:
          f:app.kubernetes.io/instance:
          f:ops.kubeblocks.io/ops-type:
        f:ownerReferences:
          .:
          k:{"uid":"981e35db-5cd0-4828-bae3-3b9efb075aac"}:
    Manager:      manager
    Operation:    Update
    Time:         2024-04-11T10:32:35Z
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:completionTimestamp:
        f:conditions:
          .:
          k:{"type":"Validated"}:
            .:
            f:lastTransitionTime:
            f:message:
            f:reason:
            f:status:
            f:type:
          k:{"type":"WaitForProgressing"}:
            .:
            f:lastTransitionTime:
            f:message:
            f:reason:
            f:status:
            f:type:
        f:phase:
        f:progress:
    Manager:      manager
    Operation:    Update
    Subresource:  status
    Time:         2024-04-11T10:32:35Z
  Owner References:
    API Version:     apps.kubeblocks.io/v1alpha1
    Kind:            Cluster
    Name:            oceanbase-cluster1
    UID:             981e35db-5cd0-4828-bae3-3b9efb075aac
  Resource Version:  2807965
  UID:               357aaab0-5f6f-4d9b-9165-00fa7786f503
Spec:
  Cluster Ref:  oceanbase-cluster1
  Switchover:
    Component Name:          ob-ce-1
    Instance Name:           *
  Ttl Seconds Before Abort:  0
  Type:                      Switchover
Status:
  Completion Timestamp:  2024-04-11T10:32:35Z
  Conditions:
    Last Transition Time:  2024-04-11T10:32:35Z
    Message:               wait for the controller to process the OpsRequest: oceanbase-switchover in Cluster: oceanbase-cluster1
    Reason:                WaitForProgressing
    Status:                True
    Type:                  WaitForProgressing
    Last Transition Time:  2024-04-11T10:32:35Z
    Message:               this cluster component ob-ce-1 does not support switchover
    Reason:                ValidateFailed
    Status:                False
    Type:                  Validated
  Phase:                   Failed
  Progress:                -/-
Events:
  Type     Reason              Age    From                    Message
  ----     ------              ----   ----                    -------
  Normal   WaitForProgressing  4m52s  ops-request-controller  wait for the controller to process the OpsRequest: oceanbase-switchover in Cluster: oceanbase-cluster1
  Warning  ValidateFailed      4m52s  ops-request-controller  this cluster component ob-ce-1 does not support switchover
➜  ~
@ahjing99 ahjing99 added kind/bug Something isn't working severity/major Great chance user will encounter the same problem labels Apr 11, 2024
@ahjing99 ahjing99 added this to the Release 0.9.0 milestone Apr 11, 2024
@ahjing99
Copy link
Collaborator Author

still failed on
Kubernetes: v1.28.7-gke.1026000
KubeBlocks: 0.9.0-beta.17
kbcli: 0.9.0-beta.4

➜ ~ helm list -Aa | grep oceanbase
oceanbase-ce kb-system 1 2024-04-30 10:50:15.621908 +0800 CST deployed oceanbase-ce-0.9.0 4.2.0.0-100010032023083021

Copy link

github-actions bot commented Jun 3, 2024

This issue has been marked as stale because it has been open for 30 days with no activity

@github-actions github-actions bot added the Stale label Jun 3, 2024
@ahjing99
Copy link
Collaborator Author

ahjing99 commented Jun 24, 2024

Cannot recreate with 0.9.0-beta.36, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working severity/major Great chance user will encounter the same problem Stale
Projects
None yet
Development

No branches or pull requests

2 participants