Provisioning of OpenShift on vSphere fails #781

fketelaars · 2024-09-10T18:32:28Z

Describe the bug
When running deployer to provision an OpenShift cluster on vSphere, the following error occurs:

TASK [provision-ipi : Make sure the specified VM folder exists] ****************
Tuesday 10 September 2024  05:22:38 +0000 (0:00:00.026)       0:01:08.235 ***** 
fatal: [localhost]: FAILED! => {"msg": "Could not find imported module support code for ansible_collections.community.vmware.plugins.modules.vcenter_folder.  Looked for (['ansible.module_utils.compat.version.StrictVersion', 'ansible.module_utils.compat.version'])"}

PLAY RECAP *********************************************************************

Solution
Remove dependency on the vmware Galaxy collection.

The text was updated successfully, but these errors were encountered:

fketelaars · 2024-11-06T13:06:05Z

Commenting out the vcenter_folder and pre-creating the folder in vCenter went past the error. Now hitting an issue creating the OpenShift cluster using openshift-install. OpenShift installer log file:

level=info msg=Not all ingress controllers are available.
level=error msg=Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1: Some pods are not scheduled: Pod "router-default-7dff78bcd6-5k82m" cannot be scheduled: 0/2 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.. Pod "router-default-7dff78bcd6-9r8zm" cannot be scheduled: 0/2 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.. Make sure you have sufficient worker nodes.), CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller)
level=info msg=Cluster operator ingress EvaluationConditionsDetected is False with AsExpected: 
level=info msg=Cluster operator insights ClusterTransferAvailable is Unknown with : 
level=info msg=Cluster operator insights Disabled is False with AsExpected: 
level=info msg=Cluster operator insights SCAAvailable is Unknown with : 
level=error msg=Cluster operator kube-apiserver Degraded is True with GuardController_SyncError::NodeController_MasterNodesReady: GuardControllerDegraded: Missing operand on node arrow-cluster-nng8p-master-1
level=error msg=NodeControllerDegraded: The master nodes not ready: node "arrow-cluster-nng8p-master-0" not ready since 2024-11-05 15:36:45 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
level=info msg=Cluster operator kube-apiserver Progressing is True with NodeInstaller: NodeInstallerProgressing: 2 nodes are at revision 0; 0 nodes have achieved new revision 5
level=error msg=Cluster operator kube-apiserver Available is False with StaticPods_ZeroNodesActive: StaticPodsAvailable: 0 nodes are active; 2 nodes are at revision 0; 0 nodes have achieved new revision 5
level=info msg=Cluster operator kube-apiserver EvaluationConditionsDetected is False with AsExpected: All is well
level=error msg=Cluster operator kube-controller-manager Degraded is True with GuardController_SyncError::NodeController_MasterNodesReady::StaticPods_Error: GuardControllerDegraded: Missing operand on node arrow-cluster-nng8p-master-1
level=error msg=NodeControllerDegraded: The master nodes not ready: node "arrow-cluster-nng8p-master-0" not ready since 2024-11-05 15:36:45 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
level=error msg=StaticPodsDegraded: pod/kube-controller-manager-arrow-cluster-nng8p-master-0 container "cluster-policy-controller" is waiting: ContainerCreating: 
level=error msg=StaticPodsDegraded: pod/kube-controller-manager-arrow-cluster-nng8p-master-0 container "kube-controller-manager" is waiting: ContainerCreating: 
level=error msg=StaticPodsDegraded: pod/kube-controller-manager-arrow-cluster-nng8p-master-0 container "kube-controller-manager-cert-syncer" is waiting: ContainerCreating: 
level=error msg=StaticPodsDegraded: pod/kube-controller-manager-arrow-cluster-nng8p-master-0 container "kube-controller-manager-recovery-controller" is waiting: ContainerCreating: 
level=info msg=Cluster operator kube-controller-manager Progressing is True with NodeInstaller: NodeInstallerProgressing: 2 nodes are at revision 0; 0 nodes have achieved new revision 7
level=error msg=Cluster operator kube-controller-manager Available is False with StaticPods_ZeroNodesActive: StaticPodsAvailable: 0 nodes are active; 2 nodes are at revision 0; 0 nodes have achieved new revision 7
level=info msg=Cluster operator kube-controller-manager EvaluationConditionsDetected is Unknown with NoData: 
level=error msg=Cluster operator kube-scheduler Degraded is True with GuardController_SyncError::NodeController_MasterNodesReady: GuardControllerDegraded: Missing operand on node arrow-cluster-nng8p-master-1
level=error msg=NodeControllerDegraded: The master nodes not ready: node "arrow-cluster-nng8p-master-0" not ready since 2024-11-05 15:36:45 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
level=info msg=Cluster operator kube-scheduler Progressing is True with NodeInstaller: NodeInstallerProgressing: 2 nodes are at revision 0; 0 nodes have achieved new revision 7
level=error msg=Cluster operator kube-scheduler Available is False with StaticPods_ZeroNodesActive: StaticPodsAvailable: 0 nodes are active; 2 nodes are at revision 0; 0 nodes have achieved new revision 7
level=info msg=Cluster operator kube-scheduler EvaluationConditionsDetected is Unknown with NoData: 
level=info msg=Cluster operator machine-api Progressing is True with SyncingResources: Progressing towards operator: 4.15.37
level=error msg=Cluster operator machine-api Degraded is True with SyncingFailed: Failed when progressing towards operator: 4.15.37 because error syncing machine-api-controller: Internal error occurred: admission plugin "image.openshift.io/ImagePolicy" failed to complete mutation in 13s
level=error msg=Cluster operator machine-api Available is False with Initializing: Operator is initializing
level=error msg=Cluster operator machine-config Degraded is True with MachineConfigDaemonFailed: Failed to resync 4.15.37 because: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [context deadline exceeded, daemonset machine-config-daemon is not ready. status: (desired: 2, updated: 2, ready: 1, unavailable: 1)]
level=error msg=Cluster operator machine-config Available is False with MachineConfigDaemonFailed: Cluster not available for [{operator 4.15.37}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [context deadline exceeded, daemonset machine-config-daemon is not ready. status: (desired: 2, updated: 2, ready: 1, unavailable: 1)]
level=info msg=Cluster operator machine-config EvaluationConditionsDetected is False with AsExpected: 
level=error msg=Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: UpdatingPrometheusOperator: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: context deadline exceeded
level=error msg=Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: UpdatingPrometheusOperator: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: context deadline exceeded
level=info msg=Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack.
level=info msg=Cluster operator network ManagementStateDegraded is False with : 
level=info msg=Cluster operator network Progressing is True with Deploying: DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
level=info msg=DaemonSet "/openshift-network-node-identity/network-node-identity" is not available (awaiting 1 nodes)
level=info msg=DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
level=info msg=DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes)
level=info msg=DaemonSet "/openshift-multus/multus-additional-cni-plugins" is not available (awaiting 1 nodes)
level=info msg=DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
level=info msg=Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready
level=info msg=Deployment "/openshift-ovn-kubernetes/ovnkube-control-plane" is not available (awaiting 1 nodes)
level=info msg=Cluster operator node-tuning Progressing is True with ProfileProgressing: Waiting for 1/2 Profiles to be applied
level=info msg=Cluster operator openshift-apiserver Progressing is True with APIServerDeployment_PodsUpdating: APIServerDeploymentProgressing: deployment/apiserver.openshift-apiserver: 1/2 pods have been updated to the latest generation
level=info msg=Cluster operator openshift-controller-manager Progressing is True with _DesiredStateNotYetAchieved: Progressing: deployment/controller-manager: updated replicas is 1, desired replicas is 2
level=info msg=Progressing: deployment/route-controller-manager: updated replicas is 1, desired replicas is 2
level=error msg=Cluster operator operator-lifecycle-manager-packageserver Available is False with ClusterServiceVersionNotSucceeded: ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install failed: deployment packageserver not ready before timeout: deployment "packageserver" exceeded its progress deadline
level=info msg=Cluster operator storage Progressing is True with VSphereCSIDriverOperatorCR_VMwareVSphereDriverNodeServiceController_Deploying: VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
level=error msg=Bootstrap failed to complete: timed out waiting for the condition
level=error msg=Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane.
level=warning msg=The bootstrap machine is unable to resolve API and/or API-Int Server URLs
level=info msg=    root : PWD=/var/opt/openshift ; USER=root ; ENV=KUBECONFIG=/opt/openshift/auth/kubeconfigCOMMAND=/bin/oc --request-timeout=5s get events --all-namespaces -o json
level=info msg=    root : PWD=/var/opt/openshift ; USER=root ; ENV=KUBECONFIG=/opt/openshift/auth/kubeconfigCOMMAND=/bin/oc --request-timeout=5s get machineconfigs -o json
level=info msg=    root : PWD=/var/opt/openshift ; USER=root ; ENV=KUBECONFIG=/opt/openshift/auth/kubeconfigCOMMAND=/bin/oc --request-timeout=5s get nodes -o json
level=info msg=Bootstrap gather logs captured here "/root/cpd-status/vsphere-ipi/arrow-cluster/log-bundle-20241105155443.tar.gz"

fketelaars · 2024-12-13T08:13:20Z

Issue fixed. Run several times on internal infrastructure.

fketelaars added a commit that referenced this issue Sep 10, 2024

#781 Do not attempt to create vCenter folder

3cad6d7

fketelaars added a commit that referenced this issue Nov 20, 2024

#781 Get template from existing machineset

c6b8159

fketelaars added a commit that referenced this issue Nov 20, 2024

#781 Temp do not fail if openshift-install times out

94cb053

fketelaars added a commit that referenced this issue Nov 22, 2024

#781 Handle multiple values for vsphere

bc45ac3

fketelaars added a commit that referenced this issue Nov 22, 2024

#781 Default SC for vSphere in sample config

69d1d30

fketelaars added a commit that referenced this issue Nov 22, 2024

#781 openshift_ai and gpu for vsphere config

24f72e1

fketelaars added a commit that referenced this issue Nov 22, 2024

#781 Handle timed out install on vSphere

07c4f7f

fketelaars closed this as completed Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provisioning of OpenShift on vSphere fails #781

Provisioning of OpenShift on vSphere fails #781

fketelaars commented Sep 10, 2024

fketelaars commented Nov 6, 2024

fketelaars commented Dec 13, 2024

Provisioning of OpenShift on vSphere fails #781

Provisioning of OpenShift on vSphere fails #781

Comments

fketelaars commented Sep 10, 2024

fketelaars commented Nov 6, 2024

fketelaars commented Dec 13, 2024