Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cases] Improve unit test flakiness #212489

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

cnasikas
Copy link
Member

@cnasikas cnasikas commented Feb 26, 2025

Summary

This PR attempts (I have lost count of how many there have been so far) to stabilize the flakiness of cases tests.

Approach

Based on our investigations, I believe a common factor that causes all tests to time out is how we wrap the tests with the needed providers. Instead of figuring out why they time out (which is very difficult), I tried a different approach. I rewrote how we initialize the testing dependencies, mocks, and providers on tests. To test my theory, I created a VM instance in GCloud with the same configuration as the one running in the CI, specifically the n2-standard-4 (4 vCPUs, 16 GB Memory) machine type. I ran the tests 100 times, which took almost two days. In 10 of the runs, a random test was timeouted. I noticed that the machine was stressed while running the tests, and the CPU frequently spiked to 100%, especially at the beginning of each test. Then, I increased the timeout for all cases tests to 10 seconds and ran the tests again 100 times. No timeouts occurred. Lastly, I created a VM instance in GCloud with the same configuration as the one running in the CI, specifically the n2-standard-8 (8 vCPUs, 32 GB Memory) machine type. I ran the tests again 100 times. In 1 of the runs, a random test was timeouted. The machine on the CI cannot handle the cases tests. I believe the work in this PR is a step in the right direction either way, and we will benefit from it. I also believe increasing the timeout is a good decision as we need it based on the experiments and the research we have done in the last months.

CPU usage on n2-standard-4

perf.mp4

CPU usage on n2-standard-8

perf-2.mp4

RLT eslint rules

I enabled the RLT eslint rules for Cases and resolved any eslint errors. The process revealed small bugs in the tests, which I fixed them.

Testing utils

I removed the appMockRender and any usage in favor of the new renderWithTestingProviders utility function and the improved TestProviders component. The renderWithTestingProviders follows the principals suggested by the RLT team. Specifically:

  • The renderWithTestingProviders is a wrapper of the render function of the RTL library.
  • The renderWithTestingProviders does not create the services or any component inside it.
  • The renderWithTestingProviders cannot be used in beforeEach functions. It should be called separately on each test.
  • The renderWithTestingProviders accepts props to override the default mocks.
  • The renderWithTestingProviders passes the TestProviders in the wrapper argument of the RLT render function.
  • The TestProviders component initializes and memoizes all services and dependencies. It accepts props to override the default mocks.
  • Mock overrides (like core services) should be created and passed to renderWithTestingProviders on each test, even if it means duplication. We favor test isolation.

Checklist

Check the PR satisfies the following conditions.

Reviewers should verify this PR satisfies this list as well.

Issues

List

Fixes: #207712
Fixes: #192739
Fixes: #174682
Fixes: #206366
Fixes: #207427
Fixes: #175239
Fixes: #177334
Fixes: #208443
Fixes: #187526
Fixes: #208310
Fixes: #192640
Fixes: #207077
Fixes: #197304
Fixes: #207249
Fixes: #202761
Fixes: #202115
Fixes: #193026
Fixes: #177304
Fixes: #208415
Fixes: #174661
Fixes: #201611
Fixes: #182364
Fixes: #175841
Fixes: #207907
Fixes: #171177
Fixes: #196628
Fixes: #194703
Fixes: #207241
Fixes: #206056
Fixes: #207328
Fixes: #205953
Fixes: #176524
Fixes: #176335
Fixes: #207404
Fixes: #207384
Fixes: #208380
Fixes: #207248
Fixes: #207444
Fixes: #175240
Fixes: #178001

@cnasikas
Copy link
Member Author

/ci

@cnasikas
Copy link
Member Author

cnasikas commented Mar 1, 2025

/ci

kibanamachine and others added 3 commits March 1, 2025 12:40
…t --include-path /api/status --include-path /api/alerting/rule/ --include-path /api/alerting/rules --include-path /api/actions --include-path /api/security/role --include-path /api/spaces --include-path /api/fleet --include-path /api/dashboards --update'
@cnasikas
Copy link
Member Author

cnasikas commented Mar 1, 2025

/ci

@cnasikas
Copy link
Member Author

cnasikas commented Mar 2, 2025

/ci

@cnasikas
Copy link
Member Author

cnasikas commented Mar 3, 2025

/ci

@@ -270,24 +270,6 @@ describe('schema', () => {
]
`);
});

it.skip('fails when page number is negative', () => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never implemented.

@cnasikas
Copy link
Member Author

cnasikas commented Mar 6, 2025

/ci

@cnasikas cnasikas added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:Cases Cases feature backport:version Backport to applied version labels labels Mar 6, 2025
@cnasikas cnasikas changed the title [Cases] Unskip all cases flaky tests [Cases] Improve unit test flakiness Mar 8, 2025
@cnasikas cnasikas marked this pull request as ready for review March 8, 2025 12:05
@cnasikas cnasikas requested review from a team as code owners March 8, 2025 12:05
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops-cases (Feature:Cases)

@cnasikas cnasikas added the release_note:skip Skip the PR/issue when compiling release notes label Mar 8, 2025
Copy link
Member

@jbudz jbudz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.buildkite/ftr_platform_stateful_configs.yml / .eslintrc.js LGTM

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
cases 1.3MB 1.3MB -38.0B
Unknown metric groups

ESLint disabled line counts

id before after diff
cases 65 68 +3

Total ESLint disabled count

id before after diff
cases 83 86 +3

History

cc @cnasikas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:version Backport to applied version labels Feature:Cases Cases feature release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failing test: Jest Tests.x-pack/plugins/cases/public/components/all_cases - SolutionFilter when the owner is a single solution should call onChange with selected solution id when no option selected yet Failing test: Jest Tests.x-pack/plugins/cases/public/components/all_cases - SolutionFilter when the owner is a single solution renders options correctly Failing test: Jest Tests.x-pack/plugins/cases/public/components/all_cases - ColumnsPopover renders correctly a list of selected columns Failing test: Jest Tests.x-pack/plugins/cases/public/components/files - FileAttachmentEvent renders clickable name Failing test: Jest Tests.x-pack/plugins/cases/public/components/markdown_editor - EditableMarkdown Save button click calls onSaveContent and onChangeEditable when text area value changed
4 participants