Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

estimatedQueuePosition counter only gets reduced for successful velero backups #197

Open
PrasadJoshi12 opened this issue Feb 18, 2025 · 4 comments

Comments

@PrasadJoshi12
Copy link

I noticed a issue while testing nonAdminBackup the queue position only gets reduced when backup gets successful for failed backups it stays the same. Queue count be reduced for the failed velero backups as velero backups cannot be continued.

Steps to follow:-

  1. Create a dpa with non existing Bucket
$ oc get bsl ts-dpa-1 -o yaml
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  creationTimestamp: "2025-02-18T07:26:06Z"
  generation: 17
  labels:
    app.kubernetes.io/component: bsl
    app.kubernetes.io/instance: ts-dpa-1
    app.kubernetes.io/managed-by: oadp-operator
    app.kubernetes.io/name: oadp-operator-velero
    openshift.io/oadp: "True"
    openshift.io/oadp-registry: "True"
  name: ts-dpa-1
  namespace: openshift-adp
  ownerReferences:
  - apiVersion: oadp.openshift.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: DataProtectionApplication
    name: ts-dpa
    uid: 35a0800f-a794-44c9-84ce-4f0339ec0417
  resourceVersion: "51517"
  uid: af377f1e-85b2-4f6d-916d-e849ccb9bdad
spec:
  credential:
    key: cloud
    name: cloud-credentials-gcp
  default: true
  objectStorage:
    bucket: oadp111171f49z9
    prefix: velero
  provider: gcp
status:
  lastValidationTime: "2025-02-18T07:41:18Z"
  message: 'BackupStorageLocation "ts-dpa-1" is unavailable: rpc error: code = Unknown
    desc = storage: bucket doesn''t exist'
  phase: Unavailable
  1. create non Admin backup
$ oc get nab test-nab2 -o yaml
apiVersion: oadp.openshift.io/v1alpha1
kind: NonAdminBackup
metadata:
  creationTimestamp: "2025-02-18T07:44:11Z"
  finalizers:
  - nonadminbackup.oadp.openshift.io/finalizer
  generation: 2
  name: test-nab2
  namespace: test-nac
  resourceVersion: "52187"
  uid: a104a215-5e32-49a5-b14b-264470246ed0
spec:
  backupSpec:
    csiSnapshotTimeout: 0s
    hooks: {}
    itemOperationTimeout: 0s
    metadata: {}
    ttl: 0s
status:
  conditions:
  - lastTransitionTime: "2025-02-18T07:44:11Z"
    message: backup accepted
    reason: BackupAccepted
    status: "True"
    type: Accepted
  - lastTransitionTime: "2025-02-18T07:44:11Z"
    message: Created Velero Backup object
    reason: BackupScheduled
    status: "True"
    type: Queued
  phase: Created
  queueInfo:
    estimatedQueuePosition: 1
  veleroBackup:
    nacuuid: test-nac-test-nab2-908dfca5-473f-464e-8597-9f074259fc4f
    name: test-nac-test-nab2-908dfca5-473f-464e-8597-9f074259fc4f
    namespace: openshift-adp
    status:
      expiration: "2025-03-20T07:44:11Z"
      failureReason: 'rpc error: code = Unknown desc = googleapi: Error 404: The specified
        bucket does not exist., notFound'
      formatVersion: 1.1.0
      hookStatus: {}
      phase: Failed
      progress:
        itemsBackedUp: 21
        totalItems: 21
      startTimestamp: "2025-02-18T07:44:11Z"
      version: 1

Actual result:-

Queue count doesn't get changed for the failed velero backups

@PrasadJoshi12 PrasadJoshi12 changed the title estimatedQueuePosition counter only gets reduced for successful backups estimatedQueuePosition counter only gets reduced for successful velero backups Feb 18, 2025
@weshayutin
Copy link
Contributor

@PrasadJoshi12 the queue is a list of backups waiting to be executed. Why would a backup in a terminal state and failed be added to the queue?

@kaovilai
Copy link
Member

kaovilai commented Feb 18, 2025

the queue is a list of backups waiting to be executed.

exactly.

Why would a backup in a terminal state and failed not be removed from the queue?

@mpryc
Copy link
Collaborator

mpryc commented Feb 18, 2025

This is minor usability bug ?

The estimated queue is the queue of the NAB object to be handled by the velero.

It went down to 1 meaning it's now being worked on.

The work has finished with fail state, we update state, but we are not removing the queue number.

From the user perspective I would look first at the failure and ignore queue number as I don't expect something with Failed final error to be worked on ever again.

@kaovilai
Copy link
Member

The work has finished with fail state, we update state, but we are not removing the queue number.

why not when updating to fail, remove queue number?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants