Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Had to update the Storage class manually to #832

Open
KK-LTIM opened this issue Mar 2, 2025 · 0 comments
Open

Had to update the Storage class manually to #832

KK-LTIM opened this issue Mar 2, 2025 · 0 comments

Comments

@KK-LTIM
Copy link

KK-LTIM commented Mar 2, 2025

make deploy-kubeflow INSTALLATION_OPTION=helm DEPLOYMENT_OPTION=vanilla
test tagcluster || (echo Please export CLUSTER_NAME variable ; exit 1)
test us-east-1 || (echo Please export CLUSTER_REGION variable ; exit 1)
aws eks update-kubeconfig --name tagcluster --region us-east-1
Updated context arn:aws:eks:us-east-1:xxxxxxxxxxxxx:cluster/tagcluster in /home/karthik/.kube/config
yq e '.cluster.name=env(CLUSTER_NAME)' -i tests/e2e/utils/ack_sm_controller_bootstrap/config.yaml
yq e '.cluster.region=env(CLUSTER_REGION)' -i tests/e2e/utils/ack_sm_controller_bootstrap/config.yaml
cd tests/e2e && PYTHONPATH=.. python3.8 utils/ack_sm_controller_bootstrap/setup_sm_controller_req.py

                      Reading Config                         

=================================================================

    Create OIDC IAM role for ACK SageMaker Controller        

=================================================================
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
An error occurred (EntityAlreadyExists) when calling the CreateRole operation: Role with name kf-ack-sm-controller-role-tagcluster already exists.
Try running cleanup_sm_controller_req.py

     Writing params.env for ACK SageMaker Controller         

=================================================================
Params file written to : ../../awsconfigs/common/ack-sagemaker-controller/params.env
Editing ./utils/ack_sm_controller_bootstrap/config.yaml with appropriate values...
Config file written to : ./utils/ack_sm_controller_bootstrap/config.yaml

                         SUCCESS                             

=================================================================
cd tests/e2e && PYTHONPATH=.. python3.8 utils/kubeflow_installation.py --deployment_option vanilla --installation_option helm --pipeline_s3_credential_option irsa --cluster_name tagcluster
tagcluster

Installing kubeflow vanilla deployment with helm with irsa   

=================================================================
==========Installing cert-manager==========
"jetstack" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "eks" chart repository
...Successfully got an update from the "jetstack" chart repository
Update Complete. ⎈Happy Helming!⎈
Release "cert-manager" does not exist. Installing it now.
NAME: cert-manager
LAST DEPLOYED: Mon Mar 3 00:41:09 2025
NAMESPACE: cert-manager
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
cert-manager v1.10.1 has been deployed successfully!

In order to begin issuing certificates, you will need to set up a ClusterIssuer
or Issuer resource (for example, by creating a 'letsencrypt-staging' issuer).

More information on the different types of issuers and how to configure them
can be found in our documentation:

https://cert-manager.io/docs/configuration/

For information on how to configure cert-manager to automatically provision
Certificates for Ingress resources, take a look at the ingress-shim
documentation:

https://cert-manager.io/docs/usage/ingress/
Waiting for cert-manager pods to be ready ...
running command: kubectl wait --for=condition=ready pod -l 'app.kubernetes.io/instance in (cert-manager)' --timeout=240s -n cert-manager
pod/cert-manager-5f58985b79-nzp9g condition met
pod/cert-manager-cainjector-5cdbcddbc8-7frkq condition met
pod/cert-manager-webhook-5788d8d7c6-gl5mg condition met
All cert-manager pods are running!
==========Installing istio==========
Release "istio" does not exist. Installing it now.
NAME: istio
LAST DEPLOYED: Mon Mar 3 00:41:31 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
Waiting for istio pods to be ready ...
running command: kubectl wait --for=condition=ready pod -l 'app in (istio-ingressgateway, istiod)' --timeout=240s -n istio-system
pod/istio-ingressgateway-799fbddb8c-z4fvl condition met
pod/istiod-684894b77-wmkk8 condition met
All istio pods are running!
==========Installing dex==========
Release "dex" does not exist. Installing it now.
NAME: dex
LAST DEPLOYED: Mon Mar 3 00:41:48 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
Waiting for dex pods to be ready ...
running command: kubectl wait --for=condition=ready pod -l 'app in (dex)' --timeout=240s -n auth
pod/dex-77bbb9b76c-hxjpb condition met
All dex pods are running!
==========Installing oidc-authservice==========
Release "oidc-authservice" does not exist. Installing it now.
NAME: oidc-authservice
LAST DEPLOYED: Mon Mar 3 00:41:52 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
Waiting for oidc-authservice pods to be ready ...
running command: kubectl wait --for=condition=ready pod -l 'app in (authservice)' --timeout=240s -n istio-system
error: timed out waiting for the condition on pods/authservice-0
error: unknown flag: --timeout
See 'kubectl describe --help' for usage.
Waiting for oidc-authservice pods to be ready ...
running command: kubectl wait --for=condition=ready pod -l 'app in (authservice)' --timeout=240s -n istio-system
error: timed out waiting for the condition on pods/authservice-0
error: unknown flag: --timeout
See 'kubectl describe --help' for usage.
Waiting for oidc-authservice pods to be ready ...
running command: kubectl wait --for=condition=ready pod -l 'app in (authservice)' --timeout=240s -n istio-system
error: timed out waiting for the condition on pods/authservice-0
error: unknown flag: --timeout
See 'kubectl describe --help' for usage.
Traceback (most recent call last):
File "/home/karthik/kubeflow/kubeflow-manifests/tests/e2e/utils/kubeflow_installation.py", line 324, in
install_kubeflow(
File "/home/karthik/kubeflow/kubeflow-manifests/tests/e2e/utils/kubeflow_installation.py", line 101, in install_kubeflow
install_component(
File "/home/karthik/kubeflow/kubeflow-manifests/tests/e2e/utils/kubeflow_installation.py", line 180, in install_component
validate_component_installation(installation_config, component_name)
File "/home/karthik/.local/lib/python3.10/site-packages/retrying.py", line 56, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File "/home/karthik/.local/lib/python3.10/site-packages/retrying.py", line 266, in call
raise attempt.get()
File "/home/karthik/.local/lib/python3.10/site-packages/retrying.py", line 301, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
raise value
File "/home/karthik/.local/lib/python3.10/site-packages/retrying.py", line 251, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/home/karthik/kubeflow/kubeflow-manifests/tests/e2e/utils/kubeflow_installation.py", line 192, in validate_component_installation
kubectl_wait_pods(value, namespace, key)
File "/home/karthik/kubeflow/kubeflow-manifests/tests/e2e/utils/utils.py", line 275, in kubectl_wait_pods
raise Exception("Timeout/error waiting for pod condition")
Exception: Timeout/error waiting for pod condition
make: *** [Makefile:104: deploy-kubeflow] Error 1

Checked the pods and found the PVC was not bound.

Checked the PVC where the storage class is not set
karthik@U-1BB4IDIUHCE8V:/kubeflow/kubeflow-manifests/awsconfigs$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp2 kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 10d
karthik@U-1BB4IDIUHCE8V:
/kubeflow/kubeflow-manifests/awsconfigs$ kubectl get pvc -n istio-system
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
authservice-pvc Pending 16m
karthik@U-1BB4IDIUHCE8V:~/kubeflow/kubeflow-manifests/awsconfigs$ kubectl get configmap -n istio-system

kubectl describe pvc -n istio-system

Name: authservice-pvc
Namespace: istio-system
StorageClass:
Status: Pending
Volume:
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: oidc-authservice
meta.helm.sh/release-namespace: default
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: authservice-0
Events:
Type Reason Age From Message


Normal FailedBinding 32s (x82 over 20m) persistentvolume-controller no persistent volumes available for this claim and no storage class is set

Used the below command to overcome the error.

kubectl patch pvc authservice-pvc -n istio-system -p '{"spec":{"storageClassName":"gp2"}}'

Installer should find the default storage class name and add the storage class in the PVC to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant