Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: crmsh workflow and SUSE support #186

Merged
merged 6 commits into from
Feb 9, 2024

Conversation

marcelmamula
Copy link
Contributor

Enhancement: This PR adds support for crmsh shell and covers most of its functionalities, with few exceptions.
List of changes:

  1. crmsh workflow added, with reusing existing pcs steps where possible and creating rest from scratch, creating jinja2 for each crmsh command.
  2. main.yml was adjusted to allow easier loading of shell specific files with removing pcs names and simplifying.
  3. Few pcs shell files were renamed to go along with point 2.
  4. Create and push CIB was changed to run_once: true, because running CIB configuration steps on multiple nodes can cause corruption and misaligned CIB files. This is 100% case for crmsh and it would be best if pcs workflow was also cautious of this.
  5. corosync.j2 was added for corosync conf creation.
  6. crmsh steps are using Maintenance Mode for cluster to freeze config to avoid errors due to versioning caused by cluster resource status changes.

Following functionalities were not part of current scope:

  • qnet/qdevice
  • resource bundles

Reason: Support for SUSE systems and allow wider use, including community.sap_install which is being enabled for SUSE as well.

Result: Testing was successfully completed on RHEL9.2 and SLES for SAP 15 SP5 on AWS. It includes:

  • Running on fresh servers
  • Destroying clusters
  • Re-executing and checking results of idempotency
  • Consuming using community.sap_install
  • local Ansible Lint checks through development

Issue Tracker Tickets (Jira or BZ if any): #175

tasks/main.yml Outdated Show resolved Hide resolved
Copy link

codecov bot commented Feb 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2450454) 0.00% compared to head (8eab942) 50.54%.
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##           main     #186       +/-   ##
=========================================
+ Coverage      0   50.54%   +50.54%     
=========================================
  Files         0        1        +1     
  Lines         0       91       +91     
  Branches      0       12       +12     
=========================================
+ Hits          0       46       +46     
- Misses        0       45       +45     
Flag Coverage Δ
sanity 50.54% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vars/RedHat.yml Outdated Show resolved Hide resolved
@marcelmamula marcelmamula changed the title crmsh workflow and SUSE support feat: crmsh workflow and SUSE support Feb 8, 2024
Copy link
Member

@tomjelinek tomjelinek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a lot of errors from linters in CI. Please, fix those:

yamllint:

./vars/Suse.yml
  19:21     warning  too few spaces before comment  (comments)
  23:81     error    line too long (82 > 80 characters)  (line-length)
./tasks/main.yml
  95:81     error    line too long (87 > 80 characters)  (line-length)
./tasks/shell_crmsh/crm-cib-constraint-location.yml
  7:81      error    line too long (99 > 80 characters)  (line-length)
  8:81      error    line too long (104 > 80 characters)  (line-length)
  13:81     error    line too long (93 > 80 characters)  (line-length)
  18:81     error    line too long (85 > 80 characters)  (line-length)
  21:81     error    line too long (95 > 80 characters)  (line-length)
  29:81     error    line too long (94 > 80 characters)  (line-length)
./tasks/shell_crmsh/create-and-push-cib.yml
  237:81    error    line too long (81 > 80 characters)  (line-length)
./tasks/shell_crmsh/crm-cib-constraint-ticket.yml
  12:81     error    line too long (93 > 80 characters)  (line-length)
  17:81     error    line too long (85 > 80 characters)  (line-length)
  20:81     error    line too long (95 > 80 characters)  (line-length)
  28:81     error    line too long (96 > 80 characters)  (line-length)
./tasks/shell_crmsh/crm-cib-constraint-set.yml
  3:88      warning  too few spaces before comment  (comments)
  3:81      error    line too long (108 > 80 characters)  (line-length)
  12:81     error    line too long (93 > 80 characters)  (line-length)
  17:81     error    line too long (85 > 80 characters)  (line-length)
  20:81     error    line too long (95 > 80 characters)  (line-length)
  25:89     warning  too few spaces before comment  (comments)
  25:81     error    line too long (109 > 80 characters)  (line-length)
  33:81     error    line too long (81 > 80 characters)  (line-length)
  39:81     error    line too long (82 > 80 characters)  (line-length)
./tasks/shell_crmsh/crm-cib-resource-primitive.yml
  11:81     error    line too long (83 > 80 characters)  (line-length)
  20:28     warning  too few spaces before comment  (comments)
./tasks/shell_crmsh/crm-cib-stonith-level.yml
  12:81     error    line too long (88 > 80 characters)  (line-length)
./tasks/shell_crmsh/crm-cluster-properties.yml
  6:81      error    line too long (110 > 80 characters)  (line-length)
./tasks/shell_crmsh/crm-cib-constraint-order.yml
  12:81     error    line too long (93 > 80 characters)  (line-length)
  17:81     error    line too long (85 > 80 characters)  (line-length)
  20:81     error    line too long (95 > 80 characters)  (line-length)
  28:81     error    line too long (99 > 80 characters)  (line-length)
  29:81     error    line too long (95 > 80 characters)  (line-length)
  30:81     error    line too long (93 > 80 characters)  (line-length)
./tasks/shell_crmsh/sbd.yml
  15:81     error    line too long (82 > 80 characters)  (line-length)
  16:81     error    line too long (92 > 80 characters)  (line-length)
  17:81     error    line too long (81 > 80 characters)  (line-length)
  18:81     error    line too long (93 > 80 characters)  (line-length)
./tasks/shell_crmsh/crm-cib-resource-group.yml
  6:81      error    line too long (84 > 80 characters)  (line-length)
  11:81     error    line too long (83 > 80 characters)  (line-length)
  14:81     error    line too long (86 > 80 characters)  (line-length)
./tasks/shell_crmsh/cluster-start-and-reload.yml
  64:81     error    line too long (85 > 80 characters)  (line-length)
  98:81     error    line too long (85 > 80 characters)  (line-length)
  103:81    error    line too long (86 > 80 characters)  (line-length)
./tasks/shell_crmsh/crm-cib-resource-clone.yml
  3:81      error    line too long (113 > 80 characters)  (line-length)
  6:81      error    line too long (106 > 80 characters)  (line-length)
  11:81     error    line too long (91 > 80 characters)  (line-length)
  16:81     error    line too long (83 > 80 characters)  (line-length)
  19:81     error    line too long (93 > 80 characters)  (line-length)
  27:81     error    line too long (89 > 80 characters)  (line-length)
./tasks/shell_crmsh/crm-cib-constraint-colocation.yml
  3:81      error    line too long (81 > 80 characters)  (line-length)
  12:81     error    line too long (93 > 80 characters)  (line-length)
  17:81     error    line too long (85 > 80 characters)  (line-length)
  20:81     error    line too long (95 > 80 characters)  (line-length)
  28:81     error    line too long (96 > 80 characters)  (line-length)
  34:81     error    line too long (97 > 80 characters)  (line-length)

ansible-lint:

yaml[line-length]: Line too long (87 > 80 characters)
tasks/main.yml:95
yaml[line-length]: Line too long (85 > 80 characters)
tasks/shell_crmsh/cluster-start-and-reload.yml:64
yaml[line-length]: Line too long (85 > 80 characters)
tasks/shell_crmsh/cluster-start-and-reload.yml:98
yaml[line-length]: Line too long (86 > 80 characters)
tasks/shell_crmsh/cluster-start-and-reload.yml:103
yaml[line-length]: Line too long (81 > 80 characters)
tasks/shell_crmsh/create-and-push-cib.yml:237
yaml[line-length]: Line too long (81 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-colocation.yml:3
yaml[line-length]: Line too long (93 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-colocation.yml:12
yaml[line-length]: Line too long (85 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-colocation.yml:17
yaml[line-length]: Line too long (95 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-colocation.yml:20
yaml[line-length]: Line too long (96 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-colocation.yml:28
yaml[line-length]: Line too long (97 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-colocation.yml:34
yaml[line-length]: Line too long (99 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-location.yml:7
yaml[line-length]: Line too long (104 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-location.yml:8
yaml[line-length]: Line too long (93 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-location.yml:13
yaml[line-length]: Line too long (85 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-location.yml:18
yaml[line-length]: Line too long (95 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-location.yml:21
yaml[line-length]: Line too long (94 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-location.yml:29
yaml[line-length]: Line too long (93 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-order.yml:12
yaml[line-length]: Line too long (85 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-order.yml:17
yaml[line-length]: Line too long (95 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-order.yml:20
yaml[line-length]: Line too long (99 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-order.yml:28
yaml[line-length]: Line too long (95 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-order.yml:29
yaml[line-length]: Line too long (93 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-order.yml:30
yaml[line-length]: Line too long (108 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-set.yml:3
yaml[comments]: Too few spaces before comment
tasks/shell_crmsh/crm-cib-constraint-set.yml:3
yaml[line-length]: Line too long (93 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-set.yml:12
yaml[line-length]: Line too long (85 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-set.yml:17
yaml[line-length]: Line too long (95 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-set.yml:20
yaml[line-length]: Line too long (109 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-set.yml:25
yaml[comments]: Too few spaces before comment
tasks/shell_crmsh/crm-cib-constraint-set.yml:25
yaml[line-length]: Line too long (81 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-set.yml:33
yaml[line-length]: Line too long (82 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-set.yml:39
yaml[line-length]: Line too long (93 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-ticket.yml:12
yaml[line-length]: Line too long (85 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-ticket.yml:17
yaml[line-length]: Line too long (95 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-ticket.yml:20
yaml[line-length]: Line too long (96 > 80 characters)
tasks/shell_crmsh/crm-cib-constraint-ticket.yml:28
yaml[line-length]: Line too long (113 > 80 characters)
tasks/shell_crmsh/crm-cib-resource-clone.yml:3
yaml[line-length]: Line too long (106 > 80 characters)
tasks/shell_crmsh/crm-cib-resource-clone.yml:6
yaml[line-length]: Line too long (91 > 80 characters)
tasks/shell_crmsh/crm-cib-resource-clone.yml:11
yaml[line-length]: Line too long (83 > 80 characters)
tasks/shell_crmsh/crm-cib-resource-clone.yml:16
yaml[line-length]: Line too long (93 > 80 characters)
tasks/shell_crmsh/crm-cib-resource-clone.yml:19
yaml[line-length]: Line too long (89 > 80 characters)
tasks/shell_crmsh/crm-cib-resource-clone.yml:27
yaml[line-length]: Line too long (84 > 80 characters)
tasks/shell_crmsh/crm-cib-resource-group.yml:6
yaml[line-length]: Line too long (83 > 80 characters)
tasks/shell_crmsh/crm-cib-resource-group.yml:11
yaml[line-length]: Line too long (86 > 80 characters)
tasks/shell_crmsh/crm-cib-resource-group.yml:14
yaml[line-length]: Line too long (83 > 80 characters)
tasks/shell_crmsh/crm-cib-resource-primitive.yml:11
yaml[comments]: Too few spaces before comment
tasks/shell_crmsh/crm-cib-resource-primitive.yml:20
yaml[line-length]: Line too long (88 > 80 characters)
tasks/shell_crmsh/crm-cib-stonith-level.yml:12
yaml[line-length]: Line too long (110 > 80 characters)
tasks/shell_crmsh/crm-cluster-properties.yml:6
yaml[line-length]: Line too long (82 > 80 characters)
tasks/shell_crmsh/sbd.yml:15
yaml[line-length]: Line too long (92 > 80 characters)
tasks/shell_crmsh/sbd.yml:16
yaml[line-length]: Line too long (81 > 80 characters)
tasks/shell_crmsh/sbd.yml:17
yaml[line-length]: Line too long (93 > 80 characters)
tasks/shell_crmsh/sbd.yml:18
yaml[comments]: Too few spaces before comment
vars/Suse.yml:19
yaml[line-length]: Line too long (82 > 80 characters)
vars/Suse.yml:23

tasks/shell_pcs/cluster-configure.yml Outdated Show resolved Hide resolved
templates/corosync.j2 Outdated Show resolved Hide resolved
templates/corosync.j2 Outdated Show resolved Hide resolved
templates/corosync.j2 Outdated Show resolved Hide resolved
templates/corosync.j2 Outdated Show resolved Hide resolved
vars/Suse.yml Show resolved Hide resolved
@richm
Copy link
Contributor

richm commented Feb 8, 2024

If you need help with line wrapping, see https://linux-system-roles.github.io/documentation/howto/working-with-ansible-jinja-code-and-filters.html "How to solve some common ansible-lint issues"

@marcelmamula
Copy link
Contributor Author

If you need help with line wrapping, see https://linux-system-roles.github.io/documentation/howto/working-with-ansible-jinja-code-and-filters.html "How to solve some common ansible-lint issues"

My ansible lint is not showing line length warnings, but I am updating them now based on Tomas's list.

@richm
Copy link
Contributor

richm commented Feb 8, 2024

If you need help with line wrapping, see https://linux-system-roles.github.io/documentation/howto/working-with-ansible-jinja-code-and-filters.html "How to solve some common ansible-lint issues"

My ansible lint is not showing line length warnings, but I am updating them now based on Tomas's list.

How are you running ansible-lint?

@marcelmamula
Copy link
Contributor Author

If you need help with line wrapping, see https://linux-system-roles.github.io/documentation/howto/working-with-ansible-jinja-code-and-filters.html "How to solve some common ansible-lint issues"

My ansible lint is not showing line length warnings, but I am updating them now based on Tomas's list.

How are you running ansible-lint?

on linux bastion machine, with simple config, which does not include any line skips.
skip_list:

  • no-tabs
  • run-once[task]
  • risky-shell-pipe
  • ignore-errors
  • role-name[path]
  • fqcn[keyword]
  • name[template]

@richm
Copy link
Contributor

richm commented Feb 8, 2024

This repo CI uses the .ansible-lint in the ha_cluster root directory. https://github.com/linux-system-roles/ha_cluster/blob/main/.ansible-lint

skip_list:
  - fqcn-builtins
  - var-naming[no-role-prefix]
  - sanity[cannot-ignore]  # wokeignore:rule=sanity

The reason why we don't see line length errors in the github action CI is because the latest ansible-lint github action only works on collections - so we have to convert the role to collection format before running ansible-lint - https://github.com/linux-system-roles/ha_cluster/blob/main/.github/workflows/ansible-lint.yml#L37 - when we convert to collection format, we cannot preserve line wrapping (due to ruamel roundtrip), so we disable line-length checking.

@tomjelinek
Copy link
Member

This repo CI uses the .ansible-lint in the ha_cluster root directory. https://github.com/linux-system-roles/ha_cluster/blob/main/.ansible-lint

skip_list:
  - fqcn-builtins
  - var-naming[no-role-prefix]
  - sanity[cannot-ignore]  # wokeignore:rule=sanity

The reason why we don't see line length errors in the github action CI is because the latest ansible-lint github action only works on collections - so we have to convert the role to collection format before running ansible-lint - https://github.com/linux-system-roles/ha_cluster/blob/main/.github/workflows/ansible-lint.yml#L37 - when we convert to collection format, we cannot preserve line wrapping (due to ruamel roundtrip), so we disable line-length checking.

@richm Does that mean we don't care about line length anymore? Should I disable those checks in my tests? I still run ansible-lint and yamllint on the original source files.

@richm
Copy link
Contributor

richm commented Feb 8, 2024

This repo CI uses the .ansible-lint in the ha_cluster root directory. https://github.com/linux-system-roles/ha_cluster/blob/main/.ansible-lint

skip_list:
  - fqcn-builtins
  - var-naming[no-role-prefix]
  - sanity[cannot-ignore]  # wokeignore:rule=sanity

The reason why we don't see line length errors in the github action CI is because the latest ansible-lint github action only works on collections - so we have to convert the role to collection format before running ansible-lint - https://github.com/linux-system-roles/ha_cluster/blob/main/.github/workflows/ansible-lint.yml#L37 - when we convert to collection format, we cannot preserve line wrapping (due to ruamel roundtrip), so we disable line-length checking.

@richm Does that mean we don't care about line length anymore?

I think we should care about line length - not sure if Galaxy role gating checks. I'm considering putting in a separate github action check to run ansible-lint against the role.

Should I disable those checks in my tests?

No, I think we should still care about line length.

I still run ansible-lint and yamllint on the original source files.

ansible-lint now runs yamllint, so you shouldn't need a separate check (and yes, ansible-lint will use .yamllint.yml if you have one).

@richm
Copy link
Contributor

richm commented Feb 8, 2024

[citest]
just want to make sure there are no regressions in the RedHat os_family pcs tests

+
(['fence-virt'] if ansible_architecture == 'x86_64' else [])
}}"
ha_cluster_fence_agent_packages: []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ha_cluster_fence_agent_packages: []
ha_cluster_fence_agent_packages: "{{ __ha_cluster_fence_agent_packages }}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is correct place, because it would expose private variable to potential user inputs.

I have added conditional into main yesterday to do switch, depending if users added something to it.

+
      ha_cluster_fence_agent_packages
        if ha_cluster_fence_agent_packages | length > 0
        else __ha_cluster_fence_agent_packages }}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is correct place, because it would expose private variable to potential user inputs.

I'm not sure what you mean. I believe my proposal encapsulates the desired behavior - putting this in defaults/main.yml:

ha_cluster_fence_agent_packages: "{{ __ha_cluster_fence_agent_packages }}"

This allows users to provide their own list of packages for ha_cluster_fence_agent_packages (which is the desired behavior - @tomjelinek please correct me if I'm wrong), and if the user does not specify ha_cluster_fence_agent_packages, it will be set to the default value __ha_cluster_fence_agent_packages. And, since __ha_cluster_fence_agent_packages is defined for different values depending on the platform/version, we get the correct value for ha_cluster_fence_agent_packages for all platforms/versions.

I have added conditional into main yesterday to do switch, depending if users added something to it.

+
      ha_cluster_fence_agent_packages
        if ha_cluster_fence_agent_packages | length > 0
        else __ha_cluster_fence_agent_packages }}"

I think this is not the "Ansible way" to do this. I believe the correct way to do this is the way I outlined this above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm I have changed to in latest commit after conversation with @tomjelinek.

Copy link
Contributor

@richm richm Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the default variable was renamed to __ha_cluster_fence_agent_packages_default, so this should be

ha_cluster_fence_agent_packages: "{{ __ha_cluster_fence_agent_packages_default }}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm Changes are completed and pushed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm this can be marked as resolved?

@marcelmamula
Copy link
Contributor Author

I can see Workflow test failed on firewall and qdevice, are those CI environments in good shape? PR does not touch firewall or qdevices.

@tomjelinek
Copy link
Member

I can see Workflow test failed on firewall and qdevice, are those CI environments in good shape? PR does not touch firewall or qdevices.

Tests are failing with the same errors on my systems as well. This needs to be investigated.

@marcelmamula
Copy link
Contributor Author

I can see Workflow test failed on firewall and qdevice, are those CI environments in good shape? PR does not touch firewall or qdevices.

Tests are failing with the same errors on my systems as well. This needs to be investigated.

I tried running this PR and existing collection (without PR) on CentOS9, but it fails in different place, which seems as wrong order of steps in pcs. So i cannot re-test this with my current setup. Let me know if you identify something., since it seems you have more variables defined to overcome error below.

Issue I have in mine are from pcs attempting to do property change (empty value) on nonexistent CIB
fatal: [centos0]: FAILED! => {"changed": true, "cmd": ["pcs", "--force", "-f", "/var/lib/pacemaker/cib/cib.xml", "--", "property", "set", "stonith-watchdog-timeout=0"], "delta": "0:00:00.992301", "end": "2024-02-09 09:22:47.144951", "msg": "non-zero return code", "rc": 1, "start": "2024-02-09 09:22:46.152650", "stderr": "Error: unable to get cib", "stderr_lines": ["Error: unable to get cib"], "stdout": "", "stdout_lines": []}

@tomjelinek
Copy link
Member

tomjelinek commented Feb 9, 2024

I can see Workflow test failed on firewall and qdevice, are those CI environments in good shape? PR does not touch firewall or qdevices.

The firewall failures are caused by not enabling port 1229/tcp. This port is required for fence-virt. This got broken by changes related to ha_cluster_fence_agent_packages variable. Check tasks/firewall.yml - ha_cluster_fence_agent_packages is empty there, thus fence-virt port is not enabled. The ha_cluster_fence_agent_packages variable gets populated by __ha_cluster_fence_agent_packages in tasks/main.yml after that. That's the root cause.

@marcelmamula
Copy link
Contributor Author

I can see Workflow test failed on firewall and qdevice, are those CI environments in good shape? PR does not touch firewall or qdevices.

The firewall failures are caused by not enabling port 1229/tcp. This port is required for fence-virt. This got broken by changes related to ha_cluster_fence_agent_packages variable. Check tasks/firewall.yml - ha_cluster_fence_agent_packages is empty there, thus fence-virt port is not enabled. The ha_cluster_fence_agent_packages variable gets populated by __ha_cluster_fence_agent_packages in tasks/main.yml after that. That's the root cause.

I see, I have tried to add it to include_vars/os_family yesterday but it resulted in critical nesting, so I moved it to main.yml into "Install cluster packages" as

+
          ha_cluster_fence_agent_packages
            if ha_cluster_fence_agent_packages | length > 0
            else __ha_cluster_fence_agent_packages }}"

But there are other places that use it before that - firewall and selinux. Where would be best place to do this assignment then if include_vars cannot be used with same syntax?

Nesting came from this in include_vars

ha_cluster_fence_agent_packages: "{{ ha_cluster_fence_agent_packages if
 ha_cluster_fence_agent_packages | length > 0
 else __ha_cluster_fence_agent_packages }}"

@marcelmamula
Copy link
Contributor Author

I can see Workflow test failed on firewall and qdevice, are those CI environments in good shape? PR does not touch firewall or qdevices.

The firewall failures are caused by not enabling port 1229/tcp. This port is required for fence-virt. This got broken by changes related to ha_cluster_fence_agent_packages variable. Check tasks/firewall.yml - ha_cluster_fence_agent_packages is empty there, thus fence-virt port is not enabled. The ha_cluster_fence_agent_packages variable gets populated by __ha_cluster_fence_agent_packages in tasks/main.yml after that. That's the root cause.

I see, I have tried to add it to include_vars/os_family yesterday but it resulted in critical nesting, so I moved it to main.yml into "Install cluster packages" as

+
          ha_cluster_fence_agent_packages
            if ha_cluster_fence_agent_packages | length > 0
            else __ha_cluster_fence_agent_packages }}"

But there are other places that use it before that - firewall and selinux. Where would be best place to do this assignment then if include_vars cannot be used with same syntax?

Nesting came from this in include_vars

ha_cluster_fence_agent_packages: "{{ ha_cluster_fence_agent_packages if
 ha_cluster_fence_agent_packages | length > 0
 else __ha_cluster_fence_agent_packages }}"

@tomjelinek We can simplify it with pre-step in main.yml which avoids nesting and still leaves user input as option.

- name: Set ha_cluster_fence_agent_packages fact
  ansible.builtin.set_fact:
    ha_cluster_fence_agent_packages: "{{ __ha_cluster_fence_agent_packages }}"
  when: ha_cluster_fence_agent_packages | length == 0

There is also second approach, to always add os_family fence agents on top of user input with | unique, which would be more manageable.

@marcelmamula
Copy link
Contributor Author

@tomjelinek I have updated code for 2nd approach and it works well by combining both and then running unique on top.
Let me know if you are fine with this approach and I will commit it for CI testing.

selinux:

(
            'fence-virt' in ha_cluster_fence_agent_packages
            or
            'fence-virt' in __ha_cluster_fence_agent_packages
            or
            'fence-virt' in ha_cluster_extra_packages
            or
            'fence-agents-all' in ha_cluster_fence_agent_packages
            or
            'fence-agents-all' in __ha_cluster_fence_agent_packages
            or
            'fence-agents-all' in ha_cluster_extra_packages
          )

firewall

    __use_fence_fw_port: "{{ __arch == 'x86_64' and
      ('fence-virt' in ha_cluster_fence_agent_packages
       or 'fence-virt' in __ha_cluster_fence_agent_packages
       or 'fence-virt' in ha_cluster_extra_packages
       or 'fence-agents-all' in ha_cluster_fence_agent_packages
       or 'fence-agents-all' in __ha_cluster_fence_agent_packages
       or 'fence-agents-all' in ha_cluster_extra_packages) }}"

main:

    - name: Install cluster packages
      package:
        name: "{{(
          __ha_cluster_fullstack_node_packages
          +
          __ha_cluster_qdevice_in_use |
            ternary(__ha_cluster_qdevice_node_packages, [])
          +
          ha_cluster_sbd_enabled | ternary(__ha_cluster_sbd_packages, [])
          +
          ha_cluster_fence_agent_packages
          +
          __ha_cluster_fence_agent_packages ) | unique }}"
        state: present
        use: "{{ (__ha_cluster_is_ostree | d(false)) |
                 ternary('ansible.posix.rhel_rpm_ostree', omit) }}"

tasks/main.yml Outdated Show resolved Hide resolved
@tomjelinek
Copy link
Member

tomjelinek commented Feb 9, 2024

@tomjelinek I have updated code for 2nd approach and it works well by combining both and then running unique on top.

@marcelmamula That is not going to work. Originally, users could redefine ha_cluster_fence_agent_packages value to install only those fence-agents they were interested in. If you combine ha_cluster_fence_agent_packages and __ha_cluster_fence_agent_packages together, then all fence-agents will be installed every time, as __ha_cluster_fence_agent_packages contains fence-agents-all.

@tomjelinek
Copy link
Member

I can see Workflow test failed on firewall and qdevice, are those CI environments in good shape? PR does not touch firewall or qdevices.

Qdevice tests are failing due to corosync-qdevice package not being installed by the role. This is caused by changes in "Install cluster packages" task in main.yml. I explained this in a comment above.

@tomjelinek
Copy link
Member

@marcelmamula re: ha_cluster_fence_agent_packages

I would try to resolve this by setting fact __ha_cluster_fence_agent_packages in check-and-prepare-role-variables.yml depending on ha_cluster_fence_agent_packages and then using __ha_cluster_fence_agent_packages instead of ha_cluster_fence_agent_packages throughout the role.

@marcelmamula
Copy link
Contributor Author

@marcelmamula re: ha_cluster_fence_agent_packages

I would try to resolve this by setting fact __ha_cluster_fence_agent_packages in check-and-prepare-role-variables.yml depending on ha_cluster_fence_agent_packages and then using __ha_cluster_fence_agent_packages instead of ha_cluster_fence_agent_packages throughout the role.

@tomjelinek I have retested using ternary ha_cluster_fence_agent_packages: "{{ ha_cluster_fence_agent_packages | length == 0 | ternary(__ha_cluster_fence_agent_packages, ha_cluster_fence_agent_packages) }}" and it ends with endless nesting like if/else.

It might be simplest to do set_facts before firewall/selinux block:

- name: Set ha_cluster_fence_agent_packages fact
  ansible.builtin.set_fact:
    ha_cluster_fence_agent_packages: "{{ __ha_cluster_fence_agent_packages }}"
  when: ha_cluster_fence_agent_packages | length == 0

@tomjelinek
Copy link
Member

I have retested using ternary ha_cluster_fence_agent_packages: "{{ ha_cluster_fence_agent_packages | length == 0 | ternary(__ha_cluster_fence_agent_packages, ha_cluster_fence_agent_packages) }}" and it ends with endless nesting like if/else.

It might be simplest to do set_facts before firewall/selinux block:

- name: Set ha_cluster_fence_agent_packages fact
  ansible.builtin.set_fact:
    ha_cluster_fence_agent_packages: "{{ __ha_cluster_fence_agent_packages }}"
  when: ha_cluster_fence_agent_packages | length == 0

This would work, but it modifies user-defined ha_cluster_fence_agent_packages variable, and the modification will be in effect even after the role finishes. Users won't expect this to happen and it may cause issues. I would rather set a "private" variable. If there is endless nesting, perhaps we need a new variable name? __ha_cluster_fence_agent_packages_final: "{{ ha_cluster_fence_agent_packages | length == 0 | ternary(__ha_cluster_fence_agent_packages_default, ha_cluster_fence_agent_packages) }}"

Also, I would put this task into check-and-prepare-role-variables.yml

@marcelmamula
Copy link
Contributor Author

@tomjelinek Thank you for suggestion. I have implemented it and retested in rhel, centos and sles4sap and it was working correctly. New commit was pushed for restesting.

@sean-freeman
Copy link
Contributor

sean-freeman commented Feb 9, 2024

[citest]
Triggers tests

@richm
Copy link
Contributor

richm commented Feb 9, 2024

I have retested using ternary ha_cluster_fence_agent_packages: "{{ ha_cluster_fence_agent_packages | length == 0 | ternary(__ha_cluster_fence_agent_packages, ha_cluster_fence_agent_packages) }}" and it ends with endless nesting like if/else.
It might be simplest to do set_facts before firewall/selinux block:

- name: Set ha_cluster_fence_agent_packages fact
  ansible.builtin.set_fact:
    ha_cluster_fence_agent_packages: "{{ __ha_cluster_fence_agent_packages }}"
  when: ha_cluster_fence_agent_packages | length == 0

This would work, but it modifies user-defined ha_cluster_fence_agent_packages variable, and the modification will be in effect even after the role finishes. Users won't expect this to happen and it may cause issues. I would rather set a "private" variable. If there is endless nesting, perhaps we need a new variable name? __ha_cluster_fence_agent_packages_final: "{{ ha_cluster_fence_agent_packages | length == 0 | ternary(__ha_cluster_fence_agent_packages_default, ha_cluster_fence_agent_packages) }}"

Also, I would put this task into check-and-prepare-role-variables.yml

I think my proposal #186 (comment) will solve this problem.

__ha_cluster_fence_agent_packages_final:
"{{ (ha_cluster_fence_agent_packages | length == 0) |
ternary(__ha_cluster_fence_agent_packages_default,
ha_cluster_fence_agent_packages) }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what my proposal does, without having to resort to set_fact and defining yet another variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes. Somehow, I missed your comment about this. @marcelmamula, can you try implementing @richm's proposal regarding this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaults/main.yml does not contain any __private variables. Do you want to expose them there now?
Latest commit has option for users to define fence agents and then decide based on that decision, without having them to to go and replace this assignment in defaults.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaults/main.yml does not contain any __private variables. Do you want to expose them there now?

Yes. This is quite a common idiom in system roles.

Latest commit has option for users to define fence agents and then decide based on that decision, without having them to to go and replace this assignment in defaults.

Not sure what you mean. With my proposal:

  • if the user defines ha_cluster_fence_agent_packages then the role will use that definition
  • if the user does not define ha_cluster_fence_agent_packages then ha_cluster_fence_agent_packages will be set to the default value __ha_cluster_fence_agent_packages_default and the default value will be used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note - in Ansible there is no way to "protect" variables - we use the convention that any variable that starts with __ha_cluster_... should not be set by users, and the role is "unsupported" if such variables are set by users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm Both solution works, but I will update it so we can move past this.

Copy link
Contributor Author

@marcelmamula marcelmamula Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[citest]
Changes are pushed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm this can be marked as resolved?

Copy link
Contributor

@sean-freeman sean-freeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@richm
Copy link
Contributor

richm commented Feb 9, 2024

[citest] is only allowed by role maintainers or linux-system-roles org admins, for security purposes

Copy link
Contributor

@richm richm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for being so patient! hopefully the next PR won't require 134 gh pr comments . . .

@richm richm merged commit b8de4f5 into linux-system-roles:main Feb 9, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants