Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bottlerocket AMI mounting fail event in pod #168

Open
hanselblack opened this issue Mar 14, 2024 · 6 comments
Open

Bottlerocket AMI mounting fail event in pod #168

hanselblack opened this issue Mar 14, 2024 · 6 comments
Labels
bug Something isn't working pending customer info

Comments

@hanselblack
Copy link

hanselblack commented Mar 14, 2024

/kind bug
What happened?
When using the Bottlerocket AMI with Karpenter NodeClass.
Describing the pod, the events shows:

 Warning  FailedMount       3m54s (x7 over 4m26s)  kubelet            MountVolume.MountDevice failed for volume "3416296-pv" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name s3.csi.aws.com not found in the list of registered CSI drivers
kubectl describe csidrivers.storage.k8s.io/s3.csi.aws.com

Name:         s3.csi.aws.com
Namespace:    
Labels:       app.kubernetes.io/component=csi-driver
              app.kubernetes.io/instance=aws-mountpoint-s3-csi-driver
              app.kubernetes.io/managed-by=EKS
              app.kubernetes.io/name=aws-mountpoint-s3-csi-driver
Annotations:  <none>
API Version:  storage.k8s.io/v1
Kind:         CSIDriver
Metadata:
  Creation Timestamp:  2024-02-07T02:01:48Z
  Resource Version:    5363335
  UID:                 c7037a7c-edc6-473b-bcab-4c9443cdef7f
Spec:
  Attach Required:     false
  Fs Group Policy:     ReadWriteOnceWithFSType
  Pod Info On Mount:   false
  Requires Republish:  false
  Se Linux Mount:      false
  Storage Capacity:    false
  Volume Lifecycle Modes:
    Persistent
Events:  <none>

This error does not appear in when using AL2 AMI.
However, even with the warning, I am still able to read data from the S3 mountpoint.

What you expected to happen?
No warnings messages.

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?:

Environment

  • Kubernetes version (use kubectl version): v1.28
  • Driver version: v1.4.0-eksbuild.1
@jjkr
Copy link
Contributor

jjkr commented Mar 20, 2024

I am not able to reproduce this with basic mounting on Bottlerocket. Any more logs or information about your configuration will be helpful. I'm interested on how you are actually deploying this and the timing between events. Given that the mount does succeed and is functional, it seems like this could just be a timing issue if the pv is trying to mount while the driver is still coming up, but that is speculation.

@hanselblack
Copy link
Author

apiVersion: v1
kind: PersistentVolume
metadata:
  name: xxx-pv
  namespace: default
spec:
  capacity:
    storage: 1200Gi
  accessModes:
    - ReadWriteMany
  mountOptions:
    - allow-overwrite
    - region ap-southeast-1
    - max-threads 16
  csi:
    driver: s3.csi.aws.com
    volumeHandle: s3-csi-driver-volume-output
    volumeAttributes:
      bucketName: xxx
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: xxx-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 1200Gi
  volumeName: xxx-pv
---
apiVersion: batch/v1
kind: Job
metadata:
  name: xxx-job
  namespace: default
spec:
  template:
    metadata:
      labels:
        app: xxx-job
    spec:
      nodeSelector:
        type: gpu
      containers:
        - name: xxx
          image: # AWS ECR image URI
          imagePullPolicy: Always
          command: ["/bin/sh", "-c"]
          args:
            - cp -r /tmp/mount/xxx /usr/src/app/;
          resources:
            limits:
              memory: 10000Mi
              nvidia.com/gpu: 1
            requests:
              memory: 10000Mi
              cpu: 4000m
              nvidia.com/gpu: 1
          volumeMounts:
            - name: persistent-storage-data
              mountPath: /tmp/mount
      volumes:
        - name: persistent-storage-data
          persistentVolumeClaim:
            claimName: xxx-pvc

The above is the manifest for the deployment.
The nodes are scaled up through Karpenter, using spec.amiFamily Bottlerocket runs with GPU.
The driver is installed by EKS addon, and the kube-system name-space is on fargate-profile.

Yeah it could be timing issue. Oddly, dint have this issue on AL2.

@muddyfish
Copy link
Contributor

Hi, sorry for the delay in processing this issue, are you still facing this problem in v1.7.0?

@ajnozari
Copy link

Hi, sorry for the delay in processing this issue, are you still facing this problem in v1.7.0?

Yes this issue is still happening in 1.7.0, however for me it's happening without bottleRocket

@unexge
Copy link
Contributor

unexge commented Aug 30, 2024

Seems like this is the same issue with #107.

Setting node.tolerateAllTaints to true or node.tolerations to an array of tolerations should fix the problem. For example:

$ aws eks create-addon --cluster-name ... \
    --addon-name aws-mountpoint-s3-csi-driver \
    --service-account-role-arn ... \
    --configuration-values '{"node":{"tolerateAllTaints":true}}' 

Could you please try upgrading to v1.8.0 with a toleration config to see if that solves the problem?

@artamokhin
Copy link

Hey @unexge
I also encountered a similar problem on EKS spot instances.
I updated the plugin version to v1. 8. 0 and installed it with the node.tolerateAllTaints = true parameter, but on some nodes I still get an error in ArgoCD:

MountVolume.MountDevice failed for volume "playing-albatross-notebooks-storage" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name s3.csi.aws.com not found in the list of registered CSI drivers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending customer info
Projects
None yet
Development

No branches or pull requests

7 participants