Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Rewrite tests_bond_options.yml in new testing format #671

Merged
merged 4 commits into from
Feb 15, 2024

Conversation

liangwen12year
Copy link
Collaborator

Enhancement:

Reason:

Result:

Issue Tracker Tickets (Jira or BZ if any):

Copy link

codecov bot commented Jan 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (6c74dee) 20.50% compared to head (9c23735) 20.50%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #671   +/-   ##
=======================================
  Coverage   20.50%   20.50%           
=======================================
  Files          10       10           
  Lines        1478     1478           
  Branches      433      433           
=======================================
  Hits          303      303           
  Misses       1174     1174           
  Partials        1        1           
Flag Coverage Δ
sanity 20.50% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@liangwen12year
Copy link
Collaborator Author

[citest]

1 similar comment
@liangwen12year
Copy link
Collaborator Author

[citest]

@liangwen12year
Copy link
Collaborator Author

[citest]

@richm
Copy link
Contributor

richm commented Feb 1, 2024

well, that didn't help

TASK [Install dnsmasq] *********************************************************
task path: /WORKDIR/git-bond_options_rewritem7luhouy/tests/playbooks/tasks/create_test_interfaces_with_dhcp.yml:3
Wednesday 31 January 2024  23:34:50 +0000 (0:00:00.034)       0:01:33.565 ***** 
FAILED - RETRYING: [sut]: Install dnsmasq (6 retries left).
FAILED - RETRYING: [sut]: Install dnsmasq (5 retries left).
FAILED - RETRYING: [sut]: Install dnsmasq (4 retries left).
FAILED - RETRYING: [sut]: Install dnsmasq (3 retries left).
FAILED - RETRYING: [sut]: Install dnsmasq (2 retries left).
FAILED - RETRYING: [sut]: Install dnsmasq (1 retries left).
fatal: [sut]: FAILED! => {
    "attempts": 6,
    "changed": false,
    "rc": 1,
    "results": []
}

MSG:

Failed to download metadata for repo 'epel': Cannot prepare internal mirrorlist: Curl error (6): Couldn't resolve host name for https://mirrors.fedoraproject.org/metalink?repo=epel-8&arch=x86_64&infra=stock&content=centos [Could not resolve host: mirrors.fedoraproject.org]

The test prior to this is tests_bond_nm.yml - https://dl.fedoraproject.org/pub/alt/linuxsystemroles/logs/lsr-citool_network-671-2123ffc_CentOS-Stream-8_20240201-000417/artifacts/tests_bond_nm-PASSED.log

TASK [Delete the device 'nm-bond'] *********************************************
task path: /WORKDIR/git-bond_options_rewritem7luhouy/tests/playbooks/tests_bond.yml:114
Wednesday 31 January 2024  23:33:36 +0000 (0:00:00.258)       0:00:26.636 ***** 
ok: [sut] => {
    "changed": false,
    "cmd": [
        "ip",
        "link",
        "del",
        "nm-bond"
    ],
    "delta": "0:00:00.006632",
    "end": "2024-01-31 23:33:36.426429",
    "failed_when_result": false,
    "rc": 1,
    "start": "2024-01-31 23:33:36.419797"
}

STDERR:

Cannot find device "nm-bond"

Not sure if this is a problem - perhaps the other cleanup removes this interface (which is why this task has ignore_errors: true)

It would be useful to add to the cleanup section something which verifies DNS lookups are still working (maybe add an explicit getent hosts mirrors.fedoraproject.org and getent hosts mirrors.centos.org - something like this)

- name: Verify DNS and ping still work
  shell: |
    set -euxo pipefail
    for host in mirrors.fedoraproject.org mirrors.centos.org; do
      if ! getent hosts "$host"; then
        # show whatever diagnostic information you need to determine why DNS isn't working
        # not sure what that would be
        # see if dnsmasq is still running and intercepting DNS requests?
        # and maybe ip a and ip route?  what other information do you need here for debugging?
        exit 1
      fi
      if ! ping "$host"; then
        # show whatever diagnostic information you need to determine why ping isn't working
        # not sure what that would be
        # and maybe ip a and ip route?  what other information do you need here for debugging?
        exit 1
      fi
    done
  when: ansible_facts["distribution"] == "CentOS"
  changed_when: false

@liangwen12year
Copy link
Collaborator Author

[citest]

@liangwen12year
Copy link
Collaborator Author

[citest]

changed_when: false
- name: Import the task 'remove_test_interfaces_with_dhcp.yml'
import_tasks: tasks/remove_test_interfaces_with_dhcp.yml
lsr_description: I can reconfigure the bond options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the description in user story format before I knew about acceptance criteria. Probably, it makes more sense to write this as an acceptance criteria.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the lsr_description as acceptance criteria.

Comment on lines 9 to 24
- {key: 'mode', value: '802.3ad'}
- {key: 'ad_actor_sys_prio', value: '65535'}
- {key: 'ad_actor_system', value: '00:00:5e:00:53:5d'}
- {key: 'ad_select', value: 'stable'}
- {key: 'ad_user_port_key', value: '1023'}
# wokeignore:rule=slave
- {key: 'all_slaves_active', value: '1'}
- {key: 'downdelay', value: '0'}
- {key: 'lacp_rate', value: 'slow'}
- {key: 'lp_interval', value: '128'}
- {key: 'miimon', value: '110'}
- {key: 'num_grat_arp', value: '64'}
- {key: 'resend_igmp', value: '225'}
- {key: 'updelay', value: '0'}
- {key: 'use_carrier', value: '1'}
- {key: 'xmit_hash_policy', value: 'encap2+3'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to assign this as a variable in tests/playbooks/tests_bond_options.yml?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please do so. It seems to me, that it will help making the playbooks easier to understand. What do you think? It should also reduce the amount of needed task files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do not need to keep some redundant task files.

@liangwen12year
Copy link
Collaborator Author

[citest]

@liangwen12year
Copy link
Collaborator Author

[citest]

@liangwen12year
Copy link
Collaborator Author

[citest]

Comment on lines 45 to 48
lsr_description: Given the system administrators using the network
role, when they reconfigure the bond options as specified in
`create_bond_profile.yml`, then I can verify that the
bond options are configured correctly in sysfs filesystem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the acceptance criteria should describe the steps in lsr_setup, lsr_test and lsr_assert:

A start would be:

Given two DHCP-enabled network interfaces,
When creating a bond profile with them,
Then the controller device and bond port profiles are present and the specified bond options are set for the controller device.

@liangwen12year
Copy link
Collaborator Author

[citest]

@liangwen12year
Copy link
Collaborator Author

[citest]

Comment on lines +14 to +30
mode: 802.3ad
ad_actor_sys_prio: 65535
ad_actor_system: 00:00:5e:00:53:5d
ad_select: stable
ad_user_port_key: 1023
all_ports_active: true
downdelay: 0
lacp_rate: slow
lp_interval: 128
miimon: 110
min_links: 0
num_grat_arp: 64
primary_reselect: better
resend_igmp: 225
updelay: 0
use_carrier: true
xmit_hash_policy: encap2+3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it was clear, these options also belong in a variable in the playbook.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, thanks for making it clearer, I will use a variable to hold the bond options. Bonus from this, I think that we can have fewer task files.

Comment on lines 16 to 28
- name: Verify DNS and ping still work
shell: |
set -euxo pipefail
for host in mirrors.fedoraproject.org mirrors.centos.org; do
if ! getent hosts "$host"; then
exit 1
fi
if ! ping -c5 "$host"; then
exit 1
fi
done
when: ansible_facts["distribution"] == "CentOS"
changed_when: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you intend to keep this code in, then it should be its own task file.

Copy link
Collaborator Author

@liangwen12year liangwen12year Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not planning to keep this code in, but only for temporary debugging purposes.

@liangwen12year
Copy link
Collaborator Author

[citest]

@liangwen12year
Copy link
Collaborator Author

[citest]

richm
richm previously approved these changes Feb 13, 2024
@liangwen12year
Copy link
Collaborator Author

[citest]

@liangwen12year
Copy link
Collaborator Author

[citest]

@liangwen12year
Copy link
Collaborator Author

[citest]

liangwen12year and others added 4 commits February 14, 2024 15:42
The new testing format is more concise and easier to debug when test
failure happens.

Signed-off-by: Wen Liang <[email protected]>
Sometimes the rpm download returns a 403, which is likely caused by
too many parallel jobs attempt the download from the same controller in
too short a period of time, so the epel server throttles additional
downloads - use a retry here to mitigate.

Signed-off-by: Wen Liang <[email protected]>
In order to guarantee each test is cleaned up properly in the end, it
is important to add a post-test check to each test checking that:

- Routes and DNS are restored.
- Network connectivity to certain hosts are preserved.

Signed-off-by: Wen Liang <[email protected]>
Without purging the DNS testing config at the end
`tests_network_state.yml`, the managed hosts can not properly resolve
certain hosts (e.g. mirrors.fedoraproject.org, mirrors.centos.org )
in the package installation task of other tests.

Signed-off-by: Wen Liang <[email protected]>
@liangwen12year
Copy link
Collaborator Author

[citest]

@liangwen12year
Copy link
Collaborator Author

@richm, thanks for the hint and contribution to this PR.

@richm
Copy link
Contributor

richm commented Feb 15, 2024

There were 3 tests that hung:

network 671 CentOS-8/ansible-2.9 tests_bond_deprecated_nm.yml UNDEFINED 2024-02-14T14:03:44
network 671 Fedora-39/ansible-2.16 tests_ethernet_nm.yml UNDEFINED 2024-02-14T13:59:44
network 671 Fedora-38/ansible-2.16 tests_bond_removal_nm.yml UNDEFINED 2024-02-14T09:49:21

the time now is 2024-02-14T20:18:44, so you can see that those tests are quite hung. I'm going to merge this PR, and we will see if those 3 tests are flakes or are persistent.

@richm richm merged commit c6be8df into linux-system-roles:main Feb 15, 2024
31 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants