Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cp-deployer failed with multiple issues. It gone away post re-run the scripts. #831

Open
barochiarg opened this issue Nov 8, 2024 · 2 comments

Comments

@barochiarg
Copy link

Describe the bug
Multiple minor issues have been observed.

Issue 1: Post creation of ROSA cluster, cp-deployer is failed to login to ROSA cluster. It retried till end. Post this, I was able to login to ROSA cluster. It would be good to increase number of retries.

FAILED - RETRYING: Login to OpenShift ROSA cluster (2 retries left).
FAILED - RETRYING: Login to OpenShift ROSA cluster (1 retries left).

Issue 2: cp-deployer is failed with below error message. It gone away post re-run the script.

TASK [odf-operator : Retrieve default channel for ocs-operator manifest] *******
task path: /cloud-pak-deployer/automation-roles/40-configure-infra/odf-operator/tasks/main.yml:26
Friday 08 November 2024 08:51:25 +0000 (0:00:00.497) 0:56:40.263 *******
fatal: [localhost]: FAILED! => changed=true
cmd: oc get packagemanifest ocs-operator -o jsonpath='{.status.defaultChannel}'
delta: '0:00:00.222956'
end: '2024-11-08 08:51:25.780681'
msg: non-zero return code
rc: 1
start: '2024-11-08 08:51:25.557725'
stderr: 'Error from server (NotFound): packagemanifests.packages.operators.coreos.com "ocs-operator" not found'
stderr_lines:
stdout: ''
stdout_lines:

PLAY RECAP *********************************************************************
localhost : ok=590 changed=88 unreachable=0 failed=1 skipped=265 rescued=0 ignored=0

Friday 08 November 2024 08:51:25 +0000 (0:00:00.544) 0:56:40.807 *******

provision-aws : Waiting for cluster creation to complete, logs are in /home/ec2-user/cpd-status/log/r-wa-d01-create-cluster.log 2821.15s
/cloud-pak-deployer/automation-roles/30-provision-infra/provision-aws/tasks/provision-rosa.yml:37
openshift-login : Login to OpenShift ROSA cluster --------------------- 281.22s
/cloud-pak-deployer/automation-roles/99-generic/openshift/openshift-login/tasks/aws-login-rosa-ocp.yml:39
provision-aws : Create ROSA cluster, logs can be found in /home/ec2-user/cpd-status/log/r-wa-d01-create-cluster.log -- 33.82s
/cloud-pak-deployer/automation-roles/30-provision-infra/provision-aws/tasks/provision-rosa.yml:27
openshift-download-installer : Unpack OpenShift installer -------------- 17.52s
/cloud-pak-deployer/automation-roles/99-generic/openshift/openshift-download-installer/tasks/main.yml:36
aws-download-cli : Unpack aws-cli client installer --------------------- 17.45s
/cloud-pak-deployer/automation-roles/99-generic/aws/aws-download-cli/tasks/main.yml:33
nfs-storage-class : Wait 15 seconds for the dynamic NFS client to deploy -- 15.04s
/cloud-pak-deployer/automation-roles/40-configure-infra/nfs-storage-class/tasks/create-nfs-storage-class.yml:52
openshift-download-installer : Download OpenShift installer "https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest-4.15/openshift-install-linux.tar.gz" -- 14.60s
/cloud-pak-deployer/automation-roles/99-generic/openshift/openshift-download-installer/tasks/main.yml:24
cpd-cli-download : Unpack cpd-cli from /home/ec2-user/cpd-status/downloads/cpd-cli-linux-amd64.tar.gz -- 13.03s
/cloud-pak-deployer/automation-roles/99-generic/cpd-cli/cpd-cli-download/tasks/main.yml:23
aws-download-cli : Install aws client ----------------------------------- 5.77s
/cloud-pak-deployer/automation-roles/99-generic/aws/aws-download-cli/tasks/main.yml:39
openshift-download-client : Unpack OpenShift client from /home/ec2-user/cpd-status/downloads/openshift-client-linux.tar.gz-4.15 --- 4.94s
/cloud-pak-deployer/automation-roles/99-generic/openshift/openshift-download-client/tasks/main.yml:38

====================================================================================
Deployer FAILED. Check previous messages. If command line is not returned, press

As per discussion over slack, please find the output as below

oc get packagemanifest ocs-operator

NAME CATALOG AGE
ocs-operator Red Hat Operators 35m

oc get pods -n openshift-marketplace

NAME READY STATUS RESTARTS AGE
certified-operators-k2rkd 1/1 Running 0 5m48s
community-operators-4b9w4 1/1 Running 0 5m49s
marketplace-operator-6c6dccc45d-nflmh 1/1 Running 0 2m2s
redhat-marketplace-rsfzc 1/1 Running 0 6m8s
redhat-operators-v5svc 1/1 Running 0 5m45s

@fketelaars
Copy link
Collaborator

I was able to reproduce the issue #2 and included a retry.

TASK [odf-operator : Retrieve default channel for ocs-operator manifest] *******
Sunday 10 November 2024  07:51:56 +0000 (0:00:00.434)       0:48:32.472 *******
FAILED - RETRYING: Retrieve default channel for ocs-operator manifest (30 retries left).
FAILED - RETRYING: Retrieve default channel for ocs-operator manifest (29 retries left).
FAILED - RETRYING: Retrieve default channel for ocs-operator manifest (28 retries left).
FAILED - RETRYING: Retrieve default channel for ocs-operator manifest (27 retries left).
FAILED - RETRYING: Retrieve default channel for ocs-operator manifest (26 retries left).
FAILED - RETRYING: Retrieve default channel for ocs-operator manifest (25 retries left).
FAILED - RETRYING: Retrieve default channel for ocs-operator manifest (24 retries left).
FAILED - RETRYING: Retrieve default channel for ocs-operator manifest (23 retries left).
FAILED - RETRYING: Retrieve default channel for ocs-operator manifest (22 retries left).
changed: [localhost]

For issue #1, I doubled the number of retries based on input from @barochiarg .

fketelaars added a commit that referenced this issue Nov 11, 2024
#831 Increase wait time for  OCP login and ocs-operator channel
@fketelaars
Copy link
Collaborator

Issue fixed

@fketelaars fketelaars reopened this Nov 12, 2024
fketelaars added a commit that referenced this issue Nov 12, 2024
fketelaars added a commit that referenced this issue Nov 12, 2024
#831 Fix remaining packagemanifests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants