Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_put_multi_fetch_page fails frequently against Ubuntu 24.04 / MySQL 8.4 #279

Open
2 tasks
korydraughn opened this issue Aug 14, 2024 · 1 comment
Open
2 tasks
Assignees
Labels
Milestone

Comments

@korydraughn
Copy link
Collaborator

  • main
  • 4-3-stable

Bug Report

Encountered during testing of what will be iRODS 4.3.3.
Platform is Ubuntu 24.04.
Database is MySQL 8.4.

The test fails due to the following assertion.

delay_assert(lambda: admin_session.assert_icommand_fail(['ils', '-l', dirname], 'STDOUT_SINGLELINE', 'ufs0'))

The test creates (256 * 2) + 1 data objects and then starts waiting for them to be moved to another tier. This works, but close observation shows there are failures which lead to stale replicas existing on the original tier. I've noticed at least 3 replicas in this state following test completion. The test fails because it finds replicas on the original tier, even though it moved 400+ replicas.

Decreasing the number of replicas involved (by 100 or so) resulted in the test passing. However, reducing the number of replicas isn't a real fix. We need to figure out WHY data movement fails for some replicas.

Below is the output of the failed test.

        <testcase classname="irods.test.test_plugin_unified_storage_tiering.TestStorageTieringContinueInxMigration" name="test_put_multi_fetch_page" time="287.555" timestamp="2024-08-13T21:11:55" file="scripts/irods/test/test_plugin_unified_storage_tiering.py" line="1503">
                <failure type="AssertionError" message=""><![CDATA[Traceback (most recent call last):
  File "/var/lib/irods/scripts/irods/test/test_plugin_unified_storage_tiering.py", line 180, in delay_assert
    out, err, rc = function()
                   ^^^^^^^^^^
  File "/var/lib/irods/scripts/irods/test/test_plugin_unified_storage_tiering.py", line 1520, in <lambda>
    delay_assert(lambda: admin_session.assert_icommand_fail(['ils', '-l', dirname], 'STDOUT_SINGLELINE', 'ufs0'))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/irods/scripts/irods/test/session.py", line 166, in assert_icommand_fail
    return assert_command_fail(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/irods/scripts/irods/test/command.py", line 79, in assert_command_fail
    return _assert_helper(*args, should_fail=True, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/irods/scripts/irods/test/command.py", line 104, in _assert_helper
    assert result
           ^^^^^^
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/irods/scripts/irods/test/test_plugin_unified_storage_tiering.py", line 1520, in test_put_multi_fetch_page
    delay_assert(lambda: admin_session.assert_icommand_fail(['ils', '-l', dirname], 'STDOUT_SINGLELINE', 'ufs0'))
  File "/var/lib/irods/scripts/irods/test/test_plugin_unified_storage_tiering.py", line 187, in delay_assert
    assert(False)
           ^^^^^
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/irods/scripts/irods/test/test_plugin_unified_storage_tiering.py", line 1526, in test_put_multi_fetch_page
    admin_session.assert_icommand('irm -r ' + dirname)
  File "/var/lib/irods/scripts/irods/test/session.py", line 162, in assert_icommand
    return assert_command(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/irods/scripts/irods/test/command.py", line 76, in assert_command
    return _assert_helper(*args, should_fail=False, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/irods/scripts/irods/test/command.py", line 104, in _assert_helper
    assert result
           ^^^^^^
AssertionError
]]></failure>

Here is the final listing before the assertion fails.

id     name
11110 {"delay_conditions":"<INST_NAME>irods_rule_engine_plugin-unified_storage_tiering-instance</INST_NAME><EF>60s REPEAT UNTIL SUCCESS OR 5 TIMES</EF><PLUSET>1s</PLUSET>","destination-resource":"ufs1","group-name":"example_group","md5":"ca0e41ba44e21e0c2e4eb7d9064d0caf","object-path":"/tempZone/home/rods/test_put_multi_fetch_page/junk0365","preserve-replicas":false,"rule-
engine-instance-name":"irods_rule_engine_plugin-unified_storage_tiering-instance","rule-engine-operation":"irods_policy_data_movement","source-replica-number":"0","source-resource":"ufs0","user-name":"rods","user-zone":"tempZone","verification-type":"catalog"}

 --- IrodsSession: icommand executed by [rods#tempZone] [ils -l test_put_multi_fetch_page] ---
Assert FAIL Command: ils -l test_put_multi_fetch_page
Expecting STDOUT_SINGLELINE: ['ufs0']
  stdout:
    | /tempZone/home/rods/test_put_multi_fetch_page:
<snip>
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0360
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0361
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0362
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0363
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0364
    |   rods              0 ufs0            1 2024-08-14.15:30 & junk0365  <== Should not see ufs0.
    |   rods              1 ufs1            1 2024-08-14.15:32 X junk0365  <== Or a stale replica.
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0366
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0367
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0368
    |   rods              1 ufs1            1 2024-08-14.15:32 & junk0369
<snip>
@alanking alanking added this to the 4.3.3.1 milestone Dec 12, 2024
@alanking alanking self-assigned this Dec 12, 2024
@alanking
Copy link
Contributor

While this is not the same failure as #246, it is in the same test class as that failure, so I'm going to link them as possibly related...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants