Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vtctldclient backup not working with defined VitessBackupStorages in cluster #472

Open
voarsh2 opened this issue Sep 14, 2023 · 5 comments

Comments

@voarsh2
Copy link

voarsh2 commented Sep 14, 2023

I run

./vtctldclient --server 192.168.100.103:31487 Backup --allow-primary zone1-30573399

I get:

E0914 01:23:16.047192 3349604 main.go:56] rpc error: code = Unknown desc = TabletManager.Backup on zone1-0030573399 error: unable to get backup storage: no registered implementation of BackupStorage: unable to get backup storage: no registered implementation of BackupStorage

However, in the cluster spec I have defined a hostpath for the backups - and I can see it show up in VitessBackupStorages

apiVersion: planetscale.com/v2
kind: VitessBackupStorage
metadata:
  creationTimestamp: '2023-09-14T00:52:24Z'
  generation: 1
  labels:
    backup.planetscale.com/location: ''
    planetscale.com/cluster: example
  managedFields:
    - apiVersion: planetscale.com/v2
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:labels:
            .: {}
            f:backup.planetscale.com/location: {}
            f:planetscale.com/cluster: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"272e7a69-a91f-4196-ad2d-8930c88c2715"}: {}
        f:spec:
          .: {}
          f:location:
            .: {}
            f:volume:
              .: {}
              f:hostPath:
                .: {}
                f:path: {}
                f:type: {}
      manager: vitess-operator
      operation: Update
      time: '2023-09-14T00:52:24Z'
    - apiVersion: planetscale.com/v2
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:observedGeneration: {}
      manager: vitess-operator
      operation: Update
      subresource: status
      time: '2023-09-14T00:53:06Z'
  name: example-90089e05
  namespace: vitess
  ownerReferences:
    - apiVersion: planetscale.com/v2
      blockOwnerDeletion: true
      controller: true
      kind: VitessCluster
      name: example
      uid: 272e7a69-a91f-4196-ad2d-8930c88c2715
  resourceVersion: '233908598'
  uid: 24068990-be71-49b5-ad17-773f581170a9
spec:
  location:
    volume:
      hostPath:
        path: /mnt/minio-store/vitess-backups
        type: Directory
status:
  observedGeneration: 1
@voarsh2
Copy link
Author

voarsh2 commented Sep 14, 2023

I restarted the primary and saw "--file_backup_storage_root=/vt/backups/example" was added, but this is not the path I specified in the cluster.

@mattlord
Copy link
Collaborator

Hi @voarsh2,

This is not the best place to try and get support/help. You should instead use the Vitess slack as this kind of thing requires a lot of back and forth: https://vitess.io/community/ There are also many people there from the community that are using the operator in production.

I don't know anything about your setup (k8s version, vitess operator version, etc), nor what you've done -- e.g. the VitessCluster CRD definition you used. Nor what you want to do (how you want the backups to be performed).

It's clear that something isn't quite right but w/o any details I cannot say what.

In the meantime you can find the CRD/API reference here: https://github.com/planetscale/vitess-operator/blob/main/docs/api.md

You can see some example walkthroughs here: https://github.com/planetscale/vitess-operator/tree/main/docs

And a blog post: https://vitess.io/blog/2020-11-09-vitess-operator-for-kubernetes/

And the Vitess backup docs: https://vitess.io/docs/17.0/user-guides/operating-vitess/backup-and-restore/

The backups are very configurable and again I have no idea what you've specified. At the Vitess level, the error you shared is Vitess telling you that the component (vtctld,vttablet,vtbackup) has no value for its --backup_storage_implementation flag. What backup implementation are you trying to use, e.g. file, s3, ceph...: https://github.com/planetscale/vitess-operator/tree/main/pkg/operator/vitessbackup ?

Between k8s (each install is a snowflake), Vitess, and the Vitess Operator this gets complicated. This is why Slack is easier for things like this. I know that this is complicated for you as well, and the docs are largely non-existent for the operator, but we'd need much more detail in order to try and help.

I can only guess that perhaps you specified something like this in your CRD:

spec:
  backup:
    engine: xtrabackup
    locations:
    - volume:
        hostPath:
          path: /backup
          type: Directory

But guessing doesn't help. 🙂 After knowing the actual CRD definition, we'd have to look at the pod definitions, logs, etc.

Best Regards

@voarsh2
Copy link
Author

voarsh2 commented Sep 14, 2023

Howdy @mattlord

This is not the best place to try and get support/help. You should instead use the Vitess slack as this kind of thing requires a lot of back and forth: https://vitess.io/community/ There are also many people there from the community that are using the operator in production.

Will look to try Slack next time.

As you pointed out:

spec:
  backup:
    engine: xtrabackup
    locations:
    - volume:
      hostPath:
        path: /mnt/minio-store/vitess-backups
        type: Directory

This is what I used for the Vitess Cluster config.

I've read most of those links.
The problem is now, despite the hostpath, the DB pods have --file_backup_storage_root=/vt/backups/example in the command args. Not the path I specified.

So, when running ./vtctldclient --server 192.168.100.103:31487 BackupShard commerce/- I get:

rpc error: code = Unknown desc = TabletManager.Backup on zone1-2469782763 error: StartBackup failed: mkdir /vt/backups/example: permission denied: StartBackup failed: mkdir /vt/backups/example: permission denied

Notice it's not using the hostpath I specified in the Cluster configuration.

VitessBackupStorage

apiVersion: planetscale.com/v2
kind: VitessBackupStorage
metadata:
  labels:
    backup.planetscale.com/location: ""
    planetscale.com/cluster: example
  name: example-90089e05
  namespace: vitess
spec:
  location:
    volume:
      hostPath:
        path: /mnt/minio-store/vitess-backups
        type: Directory

VitessCluster: example

apiVersion: planetscale.com/v2
kind: VitessCluster
metadata:
  annotations:
    objectset.rio.cattle.io/applied: H4sIAAAAAAAA/3zPwY6sIBCF4Xeptdoq0gjb+w69L4oizR0EIzWdSTq++8SZ/SzPv/iS8wbc04OPlmoBB3vGwtIIMw9Ut9trhg4+Ugng4JGEW/uXP5vwAR1sLBhQENwbsJQqKKmWds3q/zNJYxmOVAdCkcxDqrd0OQFjGLUx/R1Z9cuiYo8jh54mY+JiVz8vFs4OMnrOf3JPbE9wgGQU0TytaraavPKaFCkzTlGrqNXk7ervs50utODG4IC/cNszw29oO9JVXz8P4Ty/AwAA//+KvyL+FgEAAA
    objectset.rio.cattle.io/id: dafd0577-6ae3-443f-a0ed-c177f498b249
  labels:
    objectset.rio.cattle.io/hash: ac73cc2183295cb3b5c3c3701f53f531b98b6291
  name: example
  namespace: vitess
spec:
  backup:
    engine: xtrabackup
    locations:
    - volume:
        hostPath:
          path: /mnt/minio-store/vitess-backups
          type: Directory
  cells:
  - gateway:
      authentication:
        static:
          secret:
            key: users.json
            name: example-cluster-config
      replicas: 3
      resources:
        requests:
          cpu: 100m
          memory: 256Mi
    name: zone1
  images:
    mysqld:
      mysql80Compatible: vitess/lite:latest
    mysqldExporter: prom/mysqld-exporter:v0.11.0
    vtadmin: vitess/vtadmin:latest
    vtbackup: vitess/lite:latest
    vtctld: vitess/lite:latest
    vtgate: vitess/lite:latest
    vtorc: vitess/lite:latest
    vttablet: vitess/lite:latest
  keyspaces:
  - durabilityPolicy: semi_sync
    name: commerce
    partitionings:
    - equal:
        parts: 1
        shardTemplate:
          databaseInitScriptSecret:
            key: init_db.sql
            name: example-cluster-config
          tabletPools:
          - cell: zone1
            dataVolumeClaimTemplate:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 10Gi
            mysqld:
              resources:
                requests:
                  cpu: 100m
                  memory: 512Mi
            replicas: 3
            type: replica
            vttablet:
              extraFlags:
                db_charset: utf8mb4
                disable_active_reparents: "true"
              resources:
                requests:
                  cpu: 100m
                  memory: 256Mi
  - durabilityPolicy: semi_sync
    name: betawonder3
    partitionings:
    - equal:
        parts: 1
        shardTemplate:
          databaseInitScriptSecret:
            key: init_db.sql
            name: example-cluster-config
          tabletPools:
          - cell: zone1
            dataVolumeClaimTemplate:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 10Gi
            mysqld:
              resources:
                requests:
                  cpu: 500m
                  memory: 512Mi
            replicas: 1
            type: replica
            vttablet:
              extraFlags:
                db_charset: utf8mb4
                disable_active_reparents: "true"
              resources:
                requests:
                  cpu: 100m
                  memory: 256Mi
    turndownPolicy: Immediate
  updateStrategy:
    type: Immediate
  vitessDashboard:
    cells:
    - zone1
    extraFlags:
      security_policy: read-only
    replicas: 1
    resources:
      limits:
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
  vtadmin:
    apiAddresses:
    - http://192.168.100.103:31252
    apiResources:
      limits:
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
    cells:
    - zone1
    rbac:
      key: rbac.yaml
      name: example-cluster-config
    readOnly: false
    replicas: 1
    webResources:
      limits:
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi

https://github.com/planetscale/vitess-operator/blob/main/docs/api.md#planetscale.com/v2.VitessBackup - this doesn't show hostpath as a valid option, but I saw it in the sample YAML of the operator. In any case, I can't see any obvious reason why this is not working - not sure why the DB pods are using /vt/backups when I specify a different path. I might give S3 a try next......

@GuptaManan100
Copy link
Collaborator

I looked at the code and here is what I found -
The operator is using the volume configuration provided in the yaml to create a volume called vitess-backups.

func fileBackupVolumes(volume *corev1.VolumeSource) []corev1.Volume {
	return []corev1.Volume{
		{
			Name:         fileBackupStorageVolumeName,
			VolumeSource: *volume,
		},
	}
}

Next, Vitess mounts this said volume on a fixed hardcoded path in the vtbackup and vtctld pod. The path that is used is /vt/backups.

func fileBackupVolumeMounts(subPath string) []corev1.VolumeMount {
	return []corev1.VolumeMount{
		{
			Name:      fileBackupStorageVolumeName,
			MountPath: fileBackupStorageMountPath,
			SubPath:   subPath,
		},
	}
}

Since the volume has been mounted on the path /vt/backups, this is what is used in the flags for vtctld and vtbackup -

func fileBackupFlags(clusterName string) vitess.Flags {
	return vitess.Flags{
		"backup_storage_implementation": fileBackupStorageImplementationName,
		"file_backup_storage_root":      rootKeyPrefix(fileBackupStorageMountPath, clusterName),
	}
}

So while taking a backup, vtbackup will try to create a directory with the cluster name, in your case example, and then take a backup there.

☝️ explains why you are seeing /vt/backups in the error messages, because the vtctld and vtbackup binaries have the volume mounted at this directory.

Unfortunately, I don't know why the volume mount is unaccessible rpc error: code = Unknown desc = TabletManager.Backup on zone1-2469782763 error: StartBackup failed: mkdir /vt/backups/example: permission denied: StartBackup failed: mkdir /vt/backups/example: permission denied.
One possible reason could be that maybe the volume /mnt/minio-store/vitess-backups doesn't allow all users to create a directory inside it 🤷‍♂️ Maybe only the root user is permitted to create a directory. Could you try changing the permissions on this or try using a different directory that doesn't have this problem? Even in the e2e test that Vitess runs to verify that backups are working properly, we have to run mkdir -p -m 777 ./vtdataroot/backup to change the permissions on the backup directory we mount to allow all users to create directories inside it.

@frouioui
Copy link
Member

frouioui commented Dec 2, 2024

@voarsh2 any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants