Skip to content
This repository has been archived by the owner on Dec 3, 2024. It is now read-only.

[sidecar] Failure Events are still present after status of Bucket Ready: true during Bucket creation - Title #156

Open
asundriya opened this issue Oct 25, 2024 · 3 comments

Comments

@asundriya
Copy link

asundriya commented Oct 25, 2024

What happened:
Events raised when we have Bucket creation failure should be cleared when Bucket creation is successful
What you expected to happen:
Events generated should get cleared when bucket creation issue is rectified

How to reproduce this bug (as minimally and precisely as possible):

  1. Induce an error while bucket creation.
  2. Bucket creation will fail with status **Bucket Ready:  False** and event is generated
    

kubectl describe bucket bc1a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Name: bc1a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Namespace:
Labels:
Annotations:
API Version: objectstorage.k8s.io/v1alpha1
Kind: Bucket
Metadata:
Creation Timestamp: 2024-10-25T08:48:31Z
Generation: 1
Resource Version: 509042
UID: 6ae2c16d-2b67-45b5-bcd9-bd9585d5f63b
Spec:
Bucket Claim:
Name: arvclaim
Namespace: default
UID: a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Bucket Class Name: bc1
Deletion Policy: Delete
Driver Name: cosi.XXXX.com
Parameters:
Bucket Tags: key1=value1, key2=, key3=value3,
Cosi User Secret Name: cosi-user-secret-hfjyjf112o
Cosi User Secret Namespace: default
Protocols:
s3l
Events:
Type Reason Age From Message


Warning FailedCreateBucket 1s (x13 over 2m26s) cosi failed to create bucket: rpc error: code = Internal desc = failed to create bucket due to an internal error

  1. Resolve the issue and we see that over Status Bucket Ready:  True
    

kubectl describe bucket bc1a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Name: bc1a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Namespace:
Labels:
Annotations:
API Version: objectstorage.k8s.io/v1alpha1
Kind: Bucket
Metadata:
Creation Timestamp: 2024-10-25T08:48:31Z
Finalizers:
cosi.objectstorage.k8s.io/bucket-protection
Generation: 1
Resource Version: 510601
UID: 6ae2c16d-2b67-45b5-bcd9-bd9585d5f63b
Spec:
Bucket Claim:
Name: arvclaim
Namespace: default
UID: a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Bucket Class Name: bc1
Deletion Policy: Delete
Driver Name: cosi.XXXX.com
Parameters:
Bucket Tags: key1=value1, key2=, key3=value3,
Cosi User Secret Name: cosi-user-secret-hfjyjf112o
Cosi User Secret Namespace: default
Protocols:
s3
Status:
Bucket ID: bc1a2a39683-f400-4e1d-9a7a-05e3ec85abc0
Bucket Ready: true
Events:
Type Reason Age From Message


Warning FailedCreateBucket 108s (x38 over 16m) cosi failed to create bucket: rpc error: code = Internal desc = failed to create bucket due to an internal error

  1. Issue is ,
    a. We still see the Events for one hour which is misleading.
    b. If we are showing Failed event , then we should also show Successful event , so that user is assured that his workflow passed

Same issue is being seen with Bucket access also

Issue is
The event handling for the COSI APIs is handled by the sidecar (https://github.com/kubernetes-sigs/container-object-storage-interface-provisioner-sidecar) where if an error is returned to it from the driver, it will create an event in the related custom resource. However, the sidecar does not currently delete the event if the reconciliation is successful in a subsequent retry of the same operation.

Environment:
• Kubernetes version (use kubectl version), please list client and server:
Client Version: v1.30.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
• Controller version (provide the release tag or commit hash):
gcr.io/k8s-staging-sig-storage/objectstorage-controller:v20221027-v0.1.1-8-g300019f
• Provisoner name and version (provide the release tag or commit hash):
gcr.io/k8s-staging-sig-storage/objectstorage-sidecar:latest
• Cloud provider or hardware configuration:
• OS (e.g: cat /etc/os-release):
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
• Kernel (e.g. uname -a):
Linux tnh-cosi-3 5.15.0-46-generic move to sigs.k8s.io, remove retry logic in cosi-controller #49-Ubuntu SMP Thu Aug 4 18:03:25 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
• Install tools:
• Network plugin and version (if this is a network-related bug):
• Others:

@asundriya asundriya changed the title [sidecar] Failure Events are still present after status of Bucket Ready: true during Bucket creationDATE] - Title [sidecar] Failure Events are still present after status of Bucket Ready: true during Bucket creationion - Title Oct 25, 2024
@asundriya asundriya changed the title [sidecar] Failure Events are still present after status of Bucket Ready: true during Bucket creationion - Title [sidecar] Failure Events are still present after status of Bucket Ready: true during Bucket creation - Title Oct 25, 2024
@gauriKrishnan
Copy link

Hi @BlaineEXE

Adding to the observations of @asundriya, the COSI sidecar records an event after receiving an error from the driver's DriverBucketCreate and DriverGrantBucketAccess functions. The event persists even if the failure has been resolved in a subsequent retry - indicated by the Status showing Bucket Ready: True.

https://github.com/kubernetes-sigs/container-object-storage-interface-provisioner-sidecar/blob/80979e8992a6a2b2166f3ff1e7d39b4ab03f045c/pkg/bucket/bucket_controller.go#L131

https://github.com/kubernetes-sigs/container-object-storage-interface-provisioner-sidecar/blob/80979e8992a6a2b2166f3ff1e7d39b4ab03f045c/pkg/bucketaccess/bucketaccess_controller.go#L177

Displaying a warning event with an error at the same time as the Status shows True, makes the Bucket or BucketAccess description look ambiguous during the 1 hour that this Kubernetes event persists.

I suggest either deleting the warning event or creating a normal event when the operation has succeeded. I have highlighted the location in code where these changes could be made (after the Status has been updated to True):

https://github.com/kubernetes-sigs/container-object-storage-interface-provisioner-sidecar/blob/80979e8992a6a2b2166f3ff1e7d39b4ab03f045c/pkg/bucket/bucket_controller.go#L169

https://github.com/kubernetes-sigs/container-object-storage-interface-provisioner-sidecar/blob/80979e8992a6a2b2166f3ff1e7d39b4ab03f045c/pkg/bucketaccess/bucketaccess_controller.go#L300

Please let us know if you have any questions. Thanks!
CC: @narayviv @asundriya

@BlaineEXE
Copy link
Contributor

This repo is going to be archived and deprecated in the next month. Please duplicate this issue to https://github.com/kubernetes-sigs/container-object-storage-interface-api. From there, I can add it to our Kanban board.

@gauriKrishnan
Copy link

Thanks @BlaineEXE. Sure, we'll duplicate the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants