Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provisioning of OpenShift on vSphere fails #781

Closed
fketelaars opened this issue Sep 10, 2024 · 2 comments
Closed

Provisioning of OpenShift on vSphere fails #781

fketelaars opened this issue Sep 10, 2024 · 2 comments

Comments

@fketelaars
Copy link
Collaborator

Describe the bug
When running deployer to provision an OpenShift cluster on vSphere, the following error occurs:

TASK [provision-ipi : Make sure the specified VM folder exists] ****************
Tuesday 10 September 2024  05:22:38 +0000 (0:00:00.026)       0:01:08.235 ***** 
fatal: [localhost]: FAILED! => {"msg": "Could not find imported module support code for ansible_collections.community.vmware.plugins.modules.vcenter_folder.  Looked for (['ansible.module_utils.compat.version.StrictVersion', 'ansible.module_utils.compat.version'])"}

PLAY RECAP *********************************************************************

Solution
Remove dependency on the vmware Galaxy collection.

@fketelaars
Copy link
Collaborator Author

Commenting out the vcenter_folder and pre-creating the folder in vCenter went past the error. Now hitting an issue creating the OpenShift cluster using openshift-install. OpenShift installer log file:

level=info msg=Not all ingress controllers are available.
level=error msg=Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1: Some pods are not scheduled: Pod "router-default-7dff78bcd6-5k82m" cannot be scheduled: 0/2 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.. Pod "router-default-7dff78bcd6-9r8zm" cannot be scheduled: 0/2 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.. Make sure you have sufficient worker nodes.), CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller)
level=info msg=Cluster operator ingress EvaluationConditionsDetected is False with AsExpected: 
level=info msg=Cluster operator insights ClusterTransferAvailable is Unknown with : 
level=info msg=Cluster operator insights Disabled is False with AsExpected: 
level=info msg=Cluster operator insights SCAAvailable is Unknown with : 
level=error msg=Cluster operator kube-apiserver Degraded is True with GuardController_SyncError::NodeController_MasterNodesReady: GuardControllerDegraded: Missing operand on node arrow-cluster-nng8p-master-1
level=error msg=NodeControllerDegraded: The master nodes not ready: node "arrow-cluster-nng8p-master-0" not ready since 2024-11-05 15:36:45 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
level=info msg=Cluster operator kube-apiserver Progressing is True with NodeInstaller: NodeInstallerProgressing: 2 nodes are at revision 0; 0 nodes have achieved new revision 5
level=error msg=Cluster operator kube-apiserver Available is False with StaticPods_ZeroNodesActive: StaticPodsAvailable: 0 nodes are active; 2 nodes are at revision 0; 0 nodes have achieved new revision 5
level=info msg=Cluster operator kube-apiserver EvaluationConditionsDetected is False with AsExpected: All is well
level=error msg=Cluster operator kube-controller-manager Degraded is True with GuardController_SyncError::NodeController_MasterNodesReady::StaticPods_Error: GuardControllerDegraded: Missing operand on node arrow-cluster-nng8p-master-1
level=error msg=NodeControllerDegraded: The master nodes not ready: node "arrow-cluster-nng8p-master-0" not ready since 2024-11-05 15:36:45 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
level=error msg=StaticPodsDegraded: pod/kube-controller-manager-arrow-cluster-nng8p-master-0 container "cluster-policy-controller" is waiting: ContainerCreating: 
level=error msg=StaticPodsDegraded: pod/kube-controller-manager-arrow-cluster-nng8p-master-0 container "kube-controller-manager" is waiting: ContainerCreating: 
level=error msg=StaticPodsDegraded: pod/kube-controller-manager-arrow-cluster-nng8p-master-0 container "kube-controller-manager-cert-syncer" is waiting: ContainerCreating: 
level=error msg=StaticPodsDegraded: pod/kube-controller-manager-arrow-cluster-nng8p-master-0 container "kube-controller-manager-recovery-controller" is waiting: ContainerCreating: 
level=info msg=Cluster operator kube-controller-manager Progressing is True with NodeInstaller: NodeInstallerProgressing: 2 nodes are at revision 0; 0 nodes have achieved new revision 7
level=error msg=Cluster operator kube-controller-manager Available is False with StaticPods_ZeroNodesActive: StaticPodsAvailable: 0 nodes are active; 2 nodes are at revision 0; 0 nodes have achieved new revision 7
level=info msg=Cluster operator kube-controller-manager EvaluationConditionsDetected is Unknown with NoData: 
level=error msg=Cluster operator kube-scheduler Degraded is True with GuardController_SyncError::NodeController_MasterNodesReady: GuardControllerDegraded: Missing operand on node arrow-cluster-nng8p-master-1
level=error msg=NodeControllerDegraded: The master nodes not ready: node "arrow-cluster-nng8p-master-0" not ready since 2024-11-05 15:36:45 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
level=info msg=Cluster operator kube-scheduler Progressing is True with NodeInstaller: NodeInstallerProgressing: 2 nodes are at revision 0; 0 nodes have achieved new revision 7
level=error msg=Cluster operator kube-scheduler Available is False with StaticPods_ZeroNodesActive: StaticPodsAvailable: 0 nodes are active; 2 nodes are at revision 0; 0 nodes have achieved new revision 7
level=info msg=Cluster operator kube-scheduler EvaluationConditionsDetected is Unknown with NoData: 
level=info msg=Cluster operator machine-api Progressing is True with SyncingResources: Progressing towards operator: 4.15.37
level=error msg=Cluster operator machine-api Degraded is True with SyncingFailed: Failed when progressing towards operator: 4.15.37 because error syncing machine-api-controller: Internal error occurred: admission plugin "image.openshift.io/ImagePolicy" failed to complete mutation in 13s
level=error msg=Cluster operator machine-api Available is False with Initializing: Operator is initializing
level=error msg=Cluster operator machine-config Degraded is True with MachineConfigDaemonFailed: Failed to resync 4.15.37 because: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [context deadline exceeded, daemonset machine-config-daemon is not ready. status: (desired: 2, updated: 2, ready: 1, unavailable: 1)]
level=error msg=Cluster operator machine-config Available is False with MachineConfigDaemonFailed: Cluster not available for [{operator 4.15.37}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [context deadline exceeded, daemonset machine-config-daemon is not ready. status: (desired: 2, updated: 2, ready: 1, unavailable: 1)]
level=info msg=Cluster operator machine-config EvaluationConditionsDetected is False with AsExpected: 
level=error msg=Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: UpdatingPrometheusOperator: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: context deadline exceeded
level=error msg=Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: UpdatingPrometheusOperator: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: context deadline exceeded
level=info msg=Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack.
level=info msg=Cluster operator network ManagementStateDegraded is False with : 
level=info msg=Cluster operator network Progressing is True with Deploying: DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
level=info msg=DaemonSet "/openshift-network-node-identity/network-node-identity" is not available (awaiting 1 nodes)
level=info msg=DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
level=info msg=DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes)
level=info msg=DaemonSet "/openshift-multus/multus-additional-cni-plugins" is not available (awaiting 1 nodes)
level=info msg=DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
level=info msg=Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready
level=info msg=Deployment "/openshift-ovn-kubernetes/ovnkube-control-plane" is not available (awaiting 1 nodes)
level=info msg=Cluster operator node-tuning Progressing is True with ProfileProgressing: Waiting for 1/2 Profiles to be applied
level=info msg=Cluster operator openshift-apiserver Progressing is True with APIServerDeployment_PodsUpdating: APIServerDeploymentProgressing: deployment/apiserver.openshift-apiserver: 1/2 pods have been updated to the latest generation
level=info msg=Cluster operator openshift-controller-manager Progressing is True with _DesiredStateNotYetAchieved: Progressing: deployment/controller-manager: updated replicas is 1, desired replicas is 2
level=info msg=Progressing: deployment/route-controller-manager: updated replicas is 1, desired replicas is 2
level=error msg=Cluster operator operator-lifecycle-manager-packageserver Available is False with ClusterServiceVersionNotSucceeded: ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install failed: deployment packageserver not ready before timeout: deployment "packageserver" exceeded its progress deadline
level=info msg=Cluster operator storage Progressing is True with VSphereCSIDriverOperatorCR_VMwareVSphereDriverNodeServiceController_Deploying: VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods
level=error msg=Bootstrap failed to complete: timed out waiting for the condition
level=error msg=Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane.
level=warning msg=The bootstrap machine is unable to resolve API and/or API-Int Server URLs
level=info msg=    root : PWD=/var/opt/openshift ; USER=root ; ENV=KUBECONFIG=/opt/openshift/auth/kubeconfigCOMMAND=/bin/oc --request-timeout=5s get events --all-namespaces -o json
level=info msg=    root : PWD=/var/opt/openshift ; USER=root ; ENV=KUBECONFIG=/opt/openshift/auth/kubeconfigCOMMAND=/bin/oc --request-timeout=5s get machineconfigs -o json
level=info msg=    root : PWD=/var/opt/openshift ; USER=root ; ENV=KUBECONFIG=/opt/openshift/auth/kubeconfigCOMMAND=/bin/oc --request-timeout=5s get nodes -o json
level=info msg=Bootstrap gather logs captured here "/root/cpd-status/vsphere-ipi/arrow-cluster/log-bundle-20241105155443.tar.gz"

@fketelaars
Copy link
Collaborator Author

Issue fixed. Run several times on internal infrastructure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant