From 46d1a8a353129fc3443ed095a1ca7030221be5dd Mon Sep 17 00:00:00 2001 From: Patrick Ohly Date: Wed, 27 Nov 2024 09:43:31 +0100 Subject: [PATCH] testing strategy: add policy for non-blocking jobs by path This was motivated in part by https://github.com/kubernetes/test-infra/pull/33463#issuecomment-2348289131 and is part of an effort to document best practices. --- .../devel/sig-testing/testing-strategy.md | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/contributors/devel/sig-testing/testing-strategy.md b/contributors/devel/sig-testing/testing-strategy.md index 82f50f51f4f..32585276e2e 100644 --- a/contributors/devel/sig-testing/testing-strategy.md +++ b/contributors/devel/sig-testing/testing-strategy.md @@ -26,6 +26,44 @@ The Kubernetes job uses [prow](https://prow.k8s.io) to implement the CI system. - **Postsubmit:** Runs after code is merged. Useful for building artifacts. - **Periodic:** Runs at scheduled intervals. Ideal for monitoring trends and catching regressions. +#### Non-blocking triggered by path + +Usually, blocking pre-submit jobs run by default and non-blocking jobs don't. The `/test` command +has to be used explicitly for such non-blocking jobs. It is possible to configure such +jobs so that they [run automatically when certain paths are modified](https://github.com/kubernetes/test-infra/blob/ee70308f09c10f7cd933c26c98acc7ebf785d436/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml#L3201-L3202). + +Non-blocking jobs cannot detect all regressions. A test flake might succeed +when tested only once during presubmit. When defining the path trigger, it's +impossible to list everything that might cause a need to run tests +(e.g. tool changes, updates in packages that a feature depends on). Therefore +it is required to have a periodic job which runs the same tests regularly. + +The advantage of also having a non-blocking job that gets triggered automatically is +that reviewers don't need to remember to run it and that problems get +discovered sooner. Without it, maintainers are forced to diagnose regressions +in a periodic job and then have to ping the contributor who caused the problem. +If that contributor is unresponsive, maintainers may have to fix the problem +themselves. + +Instead, the burden is on the contributor whose pull request fails the +tests. If they are unresponsive, their change doesn't get merged and there's no +regression. + +> [!CAUTION] +> A non-blocking job that fails confuses other contributors +> who are not familiar with the job or the failures. If it runs too often, it +> wastes CI resources. + +To avoid those negative consequences for the project, the guidelines for +setting up such a job are: + +* The job owners are responsive and react to problems with the job. +* The job must have a low failure rate to avoid confusion in drive-by pull requests. +* The importance of the feature must justify the extra CI resources (depends + on how often it gets triggered). +* The `run_if_changed` regular expression must be narrow enough that + the job doesn't run for unrelated changes. + #### SIG Release Blocking and Informing jobs SIG Release maintains two sets of jobs that decide whether the release is