Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete Velero Backup when user sets the deleteVeleroBackup #67

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions api/v1alpha1/nonadminbackup_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ type NonAdminBackupSpec struct {
// BackupSpec defines the specification for a Velero backup.
BackupSpec *velerov1api.BackupSpec `json:"backupSpec,omitempty"`

// DeleteVeleroBackup tells the controller to remove created Velero Backup.
// +optional
DeleteVeleroBackup *bool `json:"deleteVeleroBackup,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also deletes NAB itself

Would be better to inform this in doc comment and change field name to simply delete?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete isn't enough imo, we need to be more precise here, however I agree that current name isn't ideal as we also remove NAB so we need to find out better one.

How about DeleteBackup ?

Copy link
Member

@shubham-pampattiwar shubham-pampattiwar May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah DeleteBackup sounds good !


// NonAdminBackup log level (use debug for the most logging, leave unset for default)
// +optional
// +kubebuilder:validation:Enum=trace;debug;info;warning;error;fatal;panic
Expand Down
1 change: 1 addition & 0 deletions api/v1alpha1/nonadmincontroller_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,5 @@ type NonAdminCondition string
const (
NonAdminConditionAccepted NonAdminCondition = "Accepted"
NonAdminConditionQueued NonAdminCondition = "Queued"
NonAdminConditionDeletion NonAdminCondition = "Deletion"
)
5 changes: 5 additions & 0 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions config/crd/bases/nac.oadp.openshift.io_nonadminbackups.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,10 @@ spec:
type: string
type: array
type: object
deleteVeleroBackup:
description: DeleteVeleroBackup tells the controller to remove created
Velero Backup.
type: boolean
logLevel:
description: NonAdminBackup log level (use debug for the most logging,
leave unset for default)
Expand Down
39 changes: 39 additions & 0 deletions internal/common/function/function.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import (

"github.com/go-logr/logr"
velerov1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
apimeta "k8s.io/apimachinery/pkg/api/meta"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
Expand Down Expand Up @@ -280,6 +281,44 @@ func CheckVeleroBackupLabels(backup *velerov1api.Backup) bool {
return exists && value == constant.ManagedByLabelValue
}

// DeleteVeleroBackup deletes Velero Backup object based on the NonAdminBackup object name and namespace
func DeleteVeleroBackup(ctx context.Context, r client.Client, logger logr.Logger, nab *nacv1alpha1.NonAdminBackup) (bool, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the Deletion logic flawed here ? Why are we using GenerateVeleroBackupName function to fetch the backup to be deleted ? Isnt there a high possibility that the Generated Name won't match the actual VeleroBackup Name that we want to delete ? Shouldn't we List the Velero backups and then check the annotations of these Velero backup objects and match the nab name with the annotation value of the key "openshift.io/oadp-nab-origin-name" to get the VeleroBackup to be deleted ? Let me know if I am missing something ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our velero backup name is always generated based on https://github.com/migtools/oadp-non-admin/blob/master/internal/common/function/function.go#L115-L145

fmt.Sprintf("%s-%s-%s", constant.VeleroBackupNamePrefix, namespace, nameHash)

So we have always nab-NAMESPACE-HASH where the HASH is sha256 from NonAdminBackup name.

VeleroBackup could get deleted only if one matches that entire 3 parts.

If we want to add extra protection we could, but then I would argue the UUID of the NonAdminBackup needs to be used from the VeleroBackup to match the UUID of the requestor.

It is currently possible, because we store this annotation within VeleroBackup:

openshift.io/oadp-nab-origin-uuid=<UUID_OF_NAB>

Let me know if I should add that protection on top of current name match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not to just use

VeleroBackupName string `json:"veleroBackupName,omitempty"`

?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but with the current implementation I can:

  1. Create NAB
  2. VeleroBackup will get created
  3. Remove NAB

At this stage I have still can remove VeleroBackup from 2 if I create new NAB with the same name as the previous one with flag deleteVeleroBackup: true.

Using status will disallow me to do so, because 2 will never get created as this object will be there forever until sysadmin removes it. It's because Status can only be updated by controller when 2 is created.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to have corner case where for some reason NAB did not got proper update of the Status.

To be honest we may want also add some safety feature to only delete VeleroBackup if it has the UID or other linkage to VeleroBackup from NAB object, this imo is separate design ?

There needs to be one source of truth here and I've chosen our function, where it could also be Status, so either way. Function is less check if status exists as we have name and only check if VeleroBackup exists.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NAB will not have a status only when the NAB controller is not running right ? or if the NAB controller is running and the status update call fails ? I think this makes it even more prudent that we check for the status of NAB first:

  • check if NAB has status
  • lookup of VeleroBackup object from NAB.status.VeleroBackupName
  • VeleroBackup is in terminal state
  • then delete the VeleroBackup object and then subsequently NAB
    Picking up the VeleroBackupName from NAB status at the very least makes sure that VeleroBackup exists for that NAB object and Delete Call makes sense from user perspective.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a time period when the NAB was created, but Backup was not yet created. If we are in that time window there is a problem with getting proper velero backup name from status. If that happens we won't reconcile again and that object will be out of sync. That is the only additional use case which I can think of when it comes to inconsistency.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See thats the thing, we are not allowing Delete for such cases, grabbing the name from status would not allow a delete for such cases(because there is no VeleroBackup object in terminal phase)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, more then half of myself tend to agree with you now :-) Thanks.

veleroBackupName := GenerateVeleroBackupName(nab.Namespace, nab.Name)

logger.V(1).Info(fmt.Sprintf("Attempting to delete VeleroBackup: %s", veleroBackupName))
veleroBackup := velerov1api.Backup{}

err := r.Get(ctx, types.NamespacedName{Name: veleroBackupName, Namespace: constant.OadpNamespace}, &veleroBackup)

if err == nil {
err = r.Delete(ctx, &veleroBackup)
if err != nil {
logger.V(1).Info(fmt.Sprintf("Error deleting VeleroBackup: %s", veleroBackupName))
return false, err
}
} else if apierrors.IsNotFound(err) {
return false, nil
} else {
logger.V(1).Info(fmt.Sprintf("Error while requeting VeleroBackup: %s from Namespace: %s", veleroBackupName, constant.OadpNamespace))
return false, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think implementation is flawed here

we ignore case that err is not nil and different than apierrors.IsNotFound, not?

that means that Velero Backup was not even retrieved, but we say it got deleted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a suggestion is to always check if err != nil when dealing with errors to avoid forgetting anything

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree on that, will add that case.


logger.V(1).Info(fmt.Sprintf("Deleted VeleroBackup: %s", veleroBackupName))
return true, nil
}

// DeleteNonAdminBackup deletes Non Admin Backup object
func DeleteNonAdminBackup(ctx context.Context, r client.Client, logger logr.Logger, nab *nacv1alpha1.NonAdminBackup) error {
err := r.Delete(ctx, nab)
if err != nil {
logger.V(1).Info(fmt.Sprintf("Error deleting NonAdminBackup: %s", nab.Name))
return err
}

logger.V(1).Info(fmt.Sprintf("Deleted NonAdminBackup: %s", nab.Name))
return nil
}

// TODO not used

// GetNonAdminBackupFromVeleroBackup return referenced NonAdminBackup object from Velero Backup object, if no error occurs
Expand Down
56 changes: 55 additions & 1 deletion internal/controller/nonadminbackup_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ package controller
import (
"context"
"errors"
"fmt"
"time"

"github.com/go-logr/logr"
Expand Down Expand Up @@ -87,6 +88,13 @@ func (r *NonAdminBackupReconciler) Reconcile(ctx context.Context, req ctrl.Reque
return ctrl.Result{}, err
}

reconcileExit, reconcileErr := r.DeleteVeleroBackup(ctx, rLog, &nab)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we have a case to requeue here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lint job complained that the requeue was not used in the r.DeleteVeleroBackup function, so I removed it from the logic.

if reconcileExit && reconcileErr != nil {
return ctrl.Result{}, reconcile.TerminalError(reconcileErr)
} else if reconcileExit {
return ctrl.Result{}, nil
}

reconcileExit, reconcileRequeue, reconcileErr := r.InitNonAdminBackup(ctx, rLog, &nab)
if reconcileRequeue {
return ctrl.Result{Requeue: true, RequeueAfter: requeueTimeSeconds * time.Second}, reconcileErr
Expand Down Expand Up @@ -118,6 +126,52 @@ func (r *NonAdminBackupReconciler) Reconcile(ctx context.Context, req ctrl.Reque
return ctrl.Result{}, nil
}

// DeleteVeleroBackup deletes original Velero Backup object when the deleteVeleroBackup Spec is set to true
//
// Parameters:
//
// ctx: Context for the request.
// logrLogger: Logger instance for logging messages.
// nab: Pointer to the NonAdminBackup object.
func (r *NonAdminBackupReconciler) DeleteVeleroBackup(ctx context.Context, logrLogger logr.Logger, nab *nacv1alpha1.NonAdminBackup) (exitReconcile bool, errorReconcile error) {
rLog := log.FromContext(ctx)
logger := logrLogger.WithValues("DeleteVeleroBackup", nab.Namespace)

// Delete Velero Backup if the deleteVeleroBackup was set to true
if nab.Spec.DeleteVeleroBackup != nil && *nab.Spec.DeleteVeleroBackup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check here NAB status?

for example, I create a NAB with delete: true, it will be immediately deleted, right? should we only delete finished NABs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, if someone want to abuse spec then it's free to remove object that has been just created. Similarly to:

mkdir /tmp/test && rmdir /tmp/test

Nobody stops anyone from doing above

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mateusoliveira43 we should only delete VeleroBackup that are in terminal phase. Currently, I don't think we can delete an in-progress VeleroBackup

deleted, deleteErr := function.DeleteVeleroBackup(ctx, r.Client, rLog, nab)
if !deleted {
// The Velero Object was not found
// Remove NonAdminBackup
// There is potential that Velero Sync controller will recreate Velero Backup
// if one exists in the s3 storage, and this will as well cause NonAdminBackup
// to be recreated by the NAC sync controller, but at this stage we have current
// proper state reflected.
logger.V(1).Info(fmt.Sprintf("Velero Backup not deleted for NAB: %s", nab.Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be interesting to add reason here?

something like "Velero Backup %s, referenced in for %s, was not found in %s namespace and was not deleted"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this will cause NAB to be in phase Backing Off, and all the info about the referenced velero backup is in the status of that NAB, do we want to duplicate this info within debug logs?

}

if deleteErr == nil {
// Delete NonAdminBackup object
deleteErrNab := function.DeleteNonAdminBackup(ctx, r.Client, rLog, nab)
return true, deleteErrNab
}
// There was an error while trying to delete Velero Backup
// Set the Phase to BackingOff and Deletion condition to False
_, errUpdateCondition := function.UpdateNonAdminBackupCondition(ctx, r.Client, logger, nab, nacv1alpha1.NonAdminConditionDeletion, metav1.ConditionFalse, "BackupDeleted", "Unable to delete origin Velero Backup")
if errUpdateCondition != nil {
logger.Error(errUpdateCondition, "Unable to set Deletion Condition: True", nameField, nab.Name, constant.NameSpaceString, nab.Namespace)
}
_, errUpdatePhase := function.UpdateNonAdminPhase(ctx, r.Client, logger, nab, nacv1alpha1.NonAdminBackupPhaseBackingOff)

if errUpdatePhase != nil {
logger.Error(errUpdatePhase, "Unable to set NonAdminBackup Phase: BackingOff", nameField, nab.Name, constant.NameSpaceString, nab.Namespace)
}
return true, deleteErr
}

return false, nil
}

// InitNonAdminBackup sets the New Phase on a NonAdminBackup object if it is not already set.
//
// Parameters:
Expand Down Expand Up @@ -201,7 +255,7 @@ func (r *NonAdminBackupReconciler) ValidateVeleroBackupSpec(ctx context.Context,
return true, false, err
}

updatedStatus, errUpdateStatus := function.UpdateNonAdminBackupCondition(ctx, r.Client, logger, nab, nacv1alpha1.NonAdminConditionAccepted, metav1.ConditionTrue, "BackupAccepted", "backup accepted")
updatedStatus, errUpdateStatus := function.UpdateNonAdminBackupCondition(ctx, r.Client, logger, nab, nacv1alpha1.NonAdminConditionAccepted, metav1.ConditionTrue, "BackupAccepted", "Backup accepted")
if errUpdateStatus != nil {
logger.Error(errUpdateStatus, "Unable to set BackupAccepted Condition: True", nameField, nab.Name, constant.NameSpaceString, nab.Namespace)
return true, false, errUpdateStatus
Expand Down
Loading