Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust the Status of NonAdminBackup #13

Merged
merged 4 commits into from
Apr 25, 2024

Conversation

mpryc
Copy link
Collaborator

@mpryc mpryc commented Mar 6, 2024

Moves Status outside of Spec and adjusts this to reflect
Velero Backup status as well additional Status when the
Spec within NonAdminBackup is not defined.

To test:
Scenario 1:

  1. Create NonAdminBackup without BackupSpec, for example:
apiVersion: nac.oadp.openshift.io/v1alpha1
kind: NonAdminBackup
metadata:
  name: example
  namespace: nacproject
spec: {}
  1. Check that the Status contains the desired fields:
apiVersion: nac.oadp.openshift.io/v1alpha1
kind: NonAdminBackup
metadata:
  creationTimestamp: '2024-03-06T12:53:27Z'
  generation: 1
  managedFields:
    - apiVersion: nac.oadp.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec': {}
      manager: Mozilla
      operation: Update
      time: '2024-03-06T12:53:27Z'
    - apiVersion: nac.oadp.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          .: {}
          'f:failureReason': {}
          'f:phase': {}
      manager: main
      operation: Update
      subresource: status
      time: '2024-03-06T12:53:44Z'
  name: example
  namespace: nacproject
  resourceVersion: '37684386'
  uid: bd21d62c-3e86-49de-a2e4-23d42db94ddd
spec: {}
status:
  failureReason: NonAdminBackup CR does not contain valid VeleroBackupSpec
  phase: FailedValidation

Scenario 2:

  1. Make sure to include the backupSpec:
apiVersion: nac.oadp.openshift.io/v1alpha1
kind: NonAdminBackup
metadata:
  name: example
  namespace: nacproject
spec:
  backupSpec: {}
  1. Save the object and check if the Status was populated correctly based on the Velero Backup Status. It can be Failed, but should reflect what original Velero Backup object contains:
apiVersion: nac.oadp.openshift.io/v1alpha1
kind: NonAdminBackup
metadata:
  creationTimestamp: '2024-03-06T12:55:55Z'
  generation: 2
  managedFields:
    - apiVersion: nac.oadp.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec':
          .: {}
          'f:backupSpec': {}
      manager: Mozilla
      operation: Update
      time: '2024-03-06T12:55:55Z'
    - apiVersion: nac.oadp.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          .: {}
          'f:expiration': {}
          'f:failureReason': {}
          'f:formatVersion': {}
          'f:phase': {}
          'f:startTimestamp': {}
          'f:version': {}
      manager: main
      operation: Update
      subresource: status
      time: '2024-03-06T12:55:55Z'
    - apiVersion: nac.oadp.openshift.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:spec':
          'f:backupSpec':
            'f:itemOperationTimeout': {}
            'f:metadata': {}
            'f:volumeSnapshotLocations': {}
            'f:snapshotMoveData': {}
            'f:defaultVolumesToFsBackup': {}
            'f:ttl': {}
            'f:csiSnapshotTimeout': {}
            'f:storageLocation': {}
            'f:hooks': {}
      manager: manager
      operation: Update
      time: '2024-03-06T12:55:55Z'
  name: example
  namespace: nacproject
  resourceVersion: '37685280'
  uid: 20b1d693-98e0-4adb-bc76-48a531daa421
spec:
  backupSpec:
    volumeSnapshotLocations:
      - velero-sample-1
    defaultVolumesToFsBackup: false
    csiSnapshotTimeout: 10m0s
    ttl: 720h0m0s
    itemOperationTimeout: 4h0m0s
    metadata: {}
    storageLocation: velero-sample-1
    hooks: {}
    snapshotMoveData: false
status:
  expiration: '2024-04-05T12:55:55Z'
  failureReason: >-
    unable to get credentials: unable to get key for secret: Secret
    "cloud-credentials" not found
  formatVersion: 1.1.0
  phase: Failed
  startTimestamp: '2024-03-06T12:55:55Z'
  version: 1

@mateusoliveira43
Copy link
Contributor

This one has a lot of duplication with #12

lets merge that one, then linters one, then this one

@mpryc
Copy link
Collaborator Author

mpryc commented Mar 6, 2024

This one has a lot of duplication with #12

lets merge that one, then linters one, then this one

Yes it does as the #12 was not merged I based this PR on the previous branch to reduce conflicts.

@mpryc mpryc force-pushed the move_status_info branch 2 times, most recently from 70434b7 to 963db7d Compare March 6, 2024 13:24
@@ -32,14 +32,12 @@ type NonAdminBackupSpec struct {

// BackupSpec defines the specification for a Velero backup.
BackupSpec *velerov1api.BackupSpec `json:"backupSpec,omitempty"`

// BackupStatus captures the current status of a Velero backup.
BackupStatus *velerov1api.BackupStatus `json:"backupStatus,omitempty"`
}

// NonAdminBackupStatus defines the observed state of NonAdminBackup
type NonAdminBackupStatus struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-Admin backup status can have more details than velero backup status, eg: validation status or reconciliation status of the non-admin backup CR itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified. We now have (two examples from this PR):

spec: {}
status:
  failureReason: NonAdminBackup CR does not contain valid VeleroBackupSpec
  phase: FailedValidation
spec:
  backupSpec:
    volumeSnapshotLocations:
      - velero-sample-1
    defaultVolumesToFsBackup: false
    csiSnapshotTimeout: 10m0s
    ttl: 720h0m0s
    itemOperationTimeout: 4h0m0s
    metadata: {}
    storageLocation: velero-sample-1
    hooks: {}
    snapshotMoveData: false
status:
  backupStatus:
    expiration: '2024-04-05T16:46:15Z'
    failureReason: >-
      unable to get credentials: unable to get key for secret: Secret
      "cloud-credentials" not found
    formatVersion: 1.1.0
    phase: Failed
    startTimestamp: '2024-03-06T16:46:15Z'
    version: 1

@mpryc
Copy link
Collaborator Author

mpryc commented Mar 6, 2024

/hold need to move update Velero Backup Status within it's own section.

@mpryc mpryc force-pushed the move_status_info branch from 963db7d to 56dc88b Compare March 6, 2024 19:39
@mpryc
Copy link
Collaborator Author

mpryc commented Mar 6, 2024

/unhold

// BackupStatus captures the current status of a Velero backup.

// +optional
Phase velerov1api.BackupPhase `json:"phase,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This phase should be Non-Admin Backup CR Reconciliation phase, right ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or better option would be to use conditions just like we do for DPA CR status ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this seems duplicated information. What is your goal here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, however for this implementation I've used same as https://github.com/vmware-tanzu/velero/blob/4d548612d45d5eb0c61b2473286127dfa6e3c39d/pkg/apis/velero/v1/backup_types.go#L290-L348

It's because I think at the moment it's enough for us within velerov1api.BackupPhase. If we decide to have more states we can always create our own Non Admin Backup Phase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mateusoliveira43 the goal was to create Phase in the Non Admin Backup separate from BackupStatus Phase, I've used the velerov1api.BackupPhase, but we can change it as condidions or create our own NonAdminPhase type for the Non Admin Backup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham-pampattiwar I 100% agree with that, but isn't current list of conditions from velerov1api.BackupPhase enough? The idea was to use one of those as an overall status, but we can have own:

	BackupPhaseNew BackupPhase = "New"
	BackupPhaseFailedValidation BackupPhase = "FailedValidation"
	BackupPhaseInProgress BackupPhase = "InProgress"
	BackupPhaseWaitingForPluginOperations BackupPhase = "WaitingForPluginOperations"
	BackupPhaseWaitingForPluginOperationsPartiallyFailed BackupPhase = "WaitingForPluginOperationsPartiallyFailed"
	BackupPhaseFinalizing BackupPhase = "Finalizing"
	BackupPhaseFinalizingPartiallyFailed BackupPhase = "FinalizingPartiallyFailed"
	BackupPhaseCompleted BackupPhase = "Completed"
	BackupPhasePartiallyFailed BackupPhase = "PartiallyFailed"
	BackupPhaseFailed BackupPhase = "Failed"
	BackupPhaseDeleting BackupPhase = "Deleting"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I will change :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpryc Let elaborate more and give you an example why we need independent conditions for non-admin backup CR:
The Velero BackupStatus provides us information on the reconciliation status of Velero Backup CR but how would we know bout the reconciliation of the non-admin backup CR ? This is when the independent conditions on non-admin backup CR come into picture. For example if the non-admin backup controller failed to created the Velero backup and got an error, how would the user know that the request failed, in such scenario the non-admin backup controller should add a error condition on non-admin backup CR and specify the error on the non-admin CR status conditions so that the user gains transparency on what is actually happening with non-admin backup CR. This is in general how controllers work, spec is what the user wants and status provides information on how the CR is doing, whether its performing the intended tasks and whether the actions intended via spec are done.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Conditions as well.

@mpryc mpryc force-pushed the move_status_info branch 3 times, most recently from 3d67dff to 068dbc6 Compare March 7, 2024 07:23
}

// NonAdminBackupStatus defines the observed state of NonAdminBackup
type NonAdminBackupStatus struct {
Conditions []metav1.Condition `json:"conditions,omitempty"`

// +optional
Phase NonAdminBackupPhase `json:"phase,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Phase and FailureReason might be redundant here, becasue conditions consists of similar fields like Conditions.Type , Conditions.Reason and Conditions.Message
Reference: https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1#Condition

@mpryc
Copy link
Collaborator Author

mpryc commented Apr 15, 2024

Rebased work. Now it's following design from 2nd scenario which is described in:
#23

@mpryc mpryc force-pushed the move_status_info branch 4 times, most recently from cedc613 to 8073768 Compare April 15, 2024 17:04
@mpryc mpryc self-assigned this Apr 15, 2024
@mateusoliveira43
Copy link
Contributor

Is this PR blocker by #23 ?


// NonAdminCondition are used for more detailed information supporing NonAdminBackupPhase state.
// +kubebuilder:validation:Enum=BackupAccepted;BackupQueued
type NonAdminCondition string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like declaring things on a file that are not used on that file. Would move to file that uses it

kubebuilder annotation here does not do anything, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what is the comment here about, could you give me more information?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check CRD yaml, it does not mention NonAdminCondition, right? then // +kubebuilder:validation:Enum=BackupAccepted;BackupQueued does not do anything

This type is declared here, but never used here, only by other files. If it is used by more then one file, would move to constants file, otherwise, just to the file that uses this type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I moved that from spec to status and now it's not used by kubebuilder

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still never used in the file it was declared, right ❓

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in this file, few lines below as a NonAdminCondition:

const (
	NonAdminConditionAccepted NonAdminCondition = "Accepted"
	NonAdminConditionQueued   NonAdminCondition = "Queued"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the NonAdminCondition type is not used in this file (it is just created and its allowed values declared), it should be moved to another file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty much standard to keep the type with const that is expecting that type, see even in velero:

https://github.com/vmware-tanzu/velero/blob/main/pkg/apis/velero/v1/backupstoragelocation_types.go#L168

Where would you move it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is used by functions and nab controller, so common folder

we can add it to constants directly, or create a types file in their first

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you go

// No change, no need to update
logger.V(1).Info("NonAdminBackup status and spec is already up to date")
return nil
logger.V(1).Info("NonAdminBackup Phase is already up to date")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Should not be an Info log, can make it debug

currentCondition := apimeta.FindStatusCondition(nab.Status.Conditions, string(condition))
if currentCondition != nil && currentCondition.Status == conditionStatus {
// Condition is already set to the desired status, no need to update
logger.V(1).Info(fmt.Sprintf("NonAdminBackup Condition is already set to: %s", condition))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Unsure if this should be Info, might get logs flooded in reconcile cycles

},
)

logger.V(1).Info(fmt.Sprintf("NonAdminBackup Condition to: %s", condition))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Again should not be info level I think

}

// Check if BackupSpec needs to be updated
if !reflect.DeepEqual(nab.Spec.BackupSpec, &veleroBackup.Spec) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the spec update from velero backup spec to nab backup spec, seems opposite to what we want to do in our NAC feature.
NAC spec should be cascaded to Velero backup spec, not the other way round.

@@ -69,44 +72,98 @@ const (
func (r *NonAdminBackupReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
r.Log = log.FromContext(ctx)
logger := r.Log.WithValues("NonAdminBackup", req.NamespacedName)
logger.V(1).Info(">>> Reconcile NonAdminBackup - loop start")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: remove this log


// Resource version

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove these comments ?

// Bail out when the Non Admin Backup reconcile was triggered, when the NAB got deleted
// Reconcile loop was triggered when Velero Backup object got updated and NAB isn't there
if err != nil && apierrors.IsNotFound(err) {
logger.V(1).Info("Deleted NonAdminBackup CR", nameField, req.Name, constant.NameSpaceString, req.Namespace)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure about this Deleted log. This is implying that CR triggred the reconcile but in between reconcile trigger and CR Get call, the CR was deleted. Maybe we just say the CR was not found and not Deleted

if err != nil {
logger.Error(err, "Error while performing NonAdminBackup reconcile")
errMsg := "NonAdminBackup CR does not contain valid BackupSpec"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message string must from err object itself


// Phase: BackingOff
// BackupAccepted: False
// BackupQueued: False # already set

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are you setting BackupQueued: False as false ? Maybe I am missing this somehow :(

@@ -48,7 +50,8 @@ type NonAdminBackupReconciler struct {
}

const (
nameField = "Name"
nameField = "Name"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using this nameField everywhere in logs ?

logger.Error(errUpdate, "Unable to set NonAdminBackup Phase: BackingOff", nameField, req.Name, constant.NameSpaceString, req.Namespace)
return ctrl.Result{}, errUpdate
} else if updatedStatus {
// We do not requeue as the State is BackingOff

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ vvimp

// We do not requeue as the State is BackingOff
return ctrl.Result{}, nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment that you are setting BackupAccepted: False

if err != nil {
logger.Error(err, "Error while performing NonAdminBackup reconcile")
errMsg := "NonAdminBackup CR does not contain valid BackupSpec"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This main tasks of this whole code block seems like:

  • Set Phase: BackingOff
  • Set Condition BackupAccepted: False
  • Set Condition BackupQueued: False (seems to be missing )
    So why not combine these tasks into one function like SetPhaseAndCondition ? This will make the main reconciler more readable and more debug-gable ? WDYT ?

@@ -48,7 +50,8 @@ type NonAdminBackupReconciler struct {
}

const (
nameField = "Name"
nameField = "Name"
requeueTimeSeconds = 10

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't 10 sec a lot ? maybe 5 sec ? WDYT ?


veleroBackupName := function.GenerateVeleroBackupName(nab.Namespace, nab.Name)

if veleroBackupName == constant.EmptyString {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From line 154-line 214 seems like a task that we want the controller to perform only if the NAB.Status has New phase and Backup Accepted condition. Should'nt we be adding conditionals on these two things and then perform the task ? Also, IMO making a function of this whole Create Velero Backup task would be better suited.

logger.Info("Backup successfully created", nameField, veleroBackupName)

logger.Info("NonAdminBackup Reconcile loop end")
logger.V(1).Info(">>> Reconcile NonAdminBackup - loop end")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Remove this log or maybe add a debug log

@@ -143,10 +211,28 @@ func (r *NonAdminBackupReconciler) Reconcile(ctx context.Context, req ctrl.Reque
logger.Error(err, "Failed to create backup", nameField, veleroBackupName)
return ctrl.Result{}, err
}
logger.Info("VeleroBackup successfully created", nameField, veleroBackupName)

// Phase: Created

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as this one: https://github.com/migtools/oadp-non-admin/pull/13/files#r1576897507
IMHO we adopting a ReconcileBatch pattern as OADP Operator will be pretty helpful here. YMMV though.

Copy link
Contributor

@mateusoliveira43 mateusoliveira43 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not change OADP checks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api folder files are created by kubebuilder, I think makes more sense to this be in internal/common/type/type.go

// NonAdminBackupSpec defines the desired state of NonAdminBackup
type NonAdminBackupSpec struct {
// https://github.com/vmware-tanzu/velero/blob/main/pkg/apis/velero/v1/backup_types.go

// BackupSpec defines the specification for a Velero backup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I saw in the code a check if this was not nil

adding // +kubebuilder:validation:Required would be valid here ❓

Copy link
Member

@shubham-pampattiwar shubham-pampattiwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acking this one. As discussed a follow up PR will be posted to resolve some of the things. @mpryc Thank you for working on this one !

Copy link

openshift-ci bot commented Apr 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mpryc, shubham-pampattiwar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [mpryc,shubham-pampattiwar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shubham-pampattiwar
Copy link
Member

Follow up PR ref issue: #52

@shubham-pampattiwar
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Apr 25, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit c1983f9 into migtools:master Apr 25, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Merged / Ready for build
Development

Successfully merging this pull request may close these issues.

5 participants