Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DPA reconcile error messages #1536

Open
mateusoliveira43 opened this issue Sep 30, 2024 · 2 comments
Open

Improve DPA reconcile error messages #1536

mateusoliveira43 opened this issue Sep 30, 2024 · 2 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@mateusoliveira43
Copy link
Contributor

Problem

It is difficult for users (and developers) to debug OADP errors without more context (studying DPA, for example).

Example

Some months ago, a new user was not able to set custom credential name to BSL (managed by DPA), as DPA would reconcile false with the generic error

Secret "cloud-credentials" not found

After some hours, OADP team asked for user DPA, and the problem was found. There was also VSL defined in DPA (probably copy/paste from example DPA)

snapshotLocations:
- velero:
provider: aws
config:
region: us-west-2
profile: "default"

This problem could have been resolved immediately if the error message was more informative. Example

VolumeSnapshotLocation <VSL-name> defined in DPA spec.snapshotLocations[<index>] is invalid: 
   create Secret "cloud-credentials" in namespace <namespace-name> or change Secret name in DPA spec.snapshotLocations[<index>].velero.credential.name

Solution

Solution would be to rewrite OADP error messages to be more informative.

A first approach suggestion is to check error messages in tests that are already written, BUT do NOT check error messages.

Example

This test

func TestDPAReconciler_ValidateBackupStorageLocations(t *testing.T) {

only checks if error occurred, without testing the error message.

This is very WRONG. Some of these tests are broken, because their error messages do not relate to what is being tested. But they are are not failing in CI, as we check only if an error occurred, and not what error occurred.

Maybe this command can help found other tests like this

grep -Iinr 'wantErr:' . --include=\*_test.go
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 30, 2024
@kaovilai
Copy link
Member

kaovilai commented Jan 9, 2025

/lifecycle frozen

@openshift-ci openshift-ci bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

3 participants