tests: event_loop: increase eplison on linux. #8246

pwhelan · 2023-12-04T17:10:48Z

Summary

The event loop has multiple time epsilons defined, one for each operating system, which defines the longest time delta that is acceptable for a wait or timeout when running timers in the event loop. This delta can easily be exceeded when either running on over-provisioned machines, running other tasks on the same hardware, etc...

This pull request increases this epsilon from 20ms to 50ms for linux. I arrived at this number by running the tests in 1024 parallel instances using GNU parallel while running sysbench on the same AMD Ryzen 9 5900X 12-Core Processor.

The script I used to run the parallel instances of the test:

#!/bin/bash

NUM=1024
parallel --joblog job.log -j 1024 ./test.sh {1} ::: {1..1024}
if cat job.log | awk -F ' ' '{print $7}' | grep -v "0\|Exitval"; then
	exit 1
fi

#!/bin/bash

./bin/flb-it-flb_event_loop test_non_blocking_and_blocking_timeout

I used two scripts simply to mask the fact that GNU parallel needs to pass different arguments to each instance.

I ran tests.sh in a while loop while running sysbench in another terminal:

sysbench cpu run --cpu-max-prime=100000000 --threads=24

I was able to successfully run the test script multiple times for the entire duration of the sysbench test without any failures. My machine might be overpowered compared to the actual runners at github which might justify raising the epsilon even more.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

Example configuration file for the change
Debug log output from testing the change

Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Run local packaging test showing all targets (including any new ones) build.
Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

Documentation required for this feature

Backporting

Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

edsiper · 2023-12-04T18:29:39Z

@pwhelan thanks for submitting this PR. some comments:

there is a commit subject called fleet, that one needs to be adjusted.

The whole PR description is about fixing the test framework but there are commits for in_calyptia_fleet plugin , please submit those in a different PR

pwhelan · 2023-12-04T18:34:26Z

@pwhelan thanks for submitting this PR. some comments:

there is a commit subject called fleet, that one needs to be adjusted.

The whole PR description is about fixing the test framework but there are commits for in_calyptia_fleet plugin , please submit those in a different PR

Sorry, that must have slipped in. I'll rebase the change on top of master, correctly.

@edsiper I accidentally initially based the branch off of #8102 since it was meant as a fixed to a test that it is failing.

pwhelan · 2023-12-07T14:46:16Z

There is another failure on macOS for flb-rt-dummy I plan to tackle in another PR: https://github.com/fluent/fluent-bit/actions/runs/7092496098/job/19414115930?pr=8246#step:4:4669

[2023/12/07 14:35:32] [ info] [input] pausing dummy.0
[ FAILED ]
  in_dummy.c:538: Check records->num_records >= 20... failed
FAILED: 1 of 1 unit tests has failed.

…ghbours. Increase the epsilon for timed tests on linux to account for noisy neighbours and other factors when running tests. Without this increase the event_loop tests is prone to random failures, especially when the same machine is being used for other tasks. Signed-off-by: Phillip Whelan <[email protected]>

edsiper · 2023-12-20T20:29:04Z

note: prefix must be tests: internal: event_loop: ...

pwhelan requested review from edsiper, leonardo-albertovich, fujimotos and koleini as code owners December 4, 2023 17:10

github-actions bot added the docs-required label Dec 4, 2023

pwhelan temporarily deployed to pr December 4, 2023 17:11 — with GitHub Actions Inactive

pwhelan temporarily deployed to pr December 4, 2023 17:30 — with GitHub Actions Inactive

pwhelan force-pushed the pwhelan-event-loop-fix-linux-epsilon branch from f9f0cdb to 2427770 Compare December 4, 2023 21:01

pwhelan temporarily deployed to pr December 4, 2023 21:02 — with GitHub Actions Inactive

pwhelan temporarily deployed to pr December 4, 2023 21:21 — with GitHub Actions Inactive

pwhelan force-pushed the pwhelan-event-loop-fix-linux-epsilon branch from 2427770 to 4597b42 Compare December 12, 2023 18:22

pwhelan temporarily deployed to pr December 12, 2023 18:22 — with GitHub Actions Inactive

pwhelan temporarily deployed to pr December 12, 2023 18:40 — with GitHub Actions Inactive

edsiper merged commit bcd3281 into master Dec 20, 2023
44 checks passed

edsiper deleted the pwhelan-event-loop-fix-linux-epsilon branch December 20, 2023 20:28

BrewTestBot mentioned this pull request Dec 22, 2023

fluent-bit 2.2.1 Homebrew/homebrew-core#158060

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: event_loop: increase eplison on linux. #8246

tests: event_loop: increase eplison on linux. #8246

pwhelan commented Dec 4, 2023

edsiper commented Dec 4, 2023

pwhelan commented Dec 4, 2023 •

edited

Loading

pwhelan commented Dec 7, 2023

edsiper commented Dec 20, 2023

tests: event_loop: increase eplison on linux. #8246

tests: event_loop: increase eplison on linux. #8246

Conversation

pwhelan commented Dec 4, 2023

Summary

edsiper commented Dec 4, 2023

pwhelan commented Dec 4, 2023 • edited Loading

pwhelan commented Dec 7, 2023

edsiper commented Dec 20, 2023

pwhelan commented Dec 4, 2023 •

edited

Loading