Test karg drift #3115

cgwalters · 2022-04-25T21:10:38Z

Add an e2e for #3105

openshift-ci · 2022-04-25T21:11:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

I'm debugging https://bugzilla.redhat.com/show_bug.cgi?id=2075126 and while I haven't verified this is the case, as far as I can tell from looking through the code and thinking about things, if we somehow fail to apply the expected kernel arguments (which can occur if `ostree-finalize-staged` fails) then we will (on the next boot) drop in to `validateOnDiskState()` which has for a long time checked that all the expected *files* exist and mark the update as complete. But we didn't check the kernel arguments. That can then cause later problems because in trying to apply further updates we'll ask rpm-ostree to try to remove kernel arguments that aren't actually present. Worse, often these kernel arguments are actually *quite important* and may even have security relevant properties (e.g. `nosmt`). Now...I am actually increasingly convinced that we *really* need to move opinionated kernel argument handling into ostree (and rpm-ostree). There's ye olde ostreedev/ostree#2217 and the solution may look something like that. Particularly now with the layering philosophy that it makes sense to support e.g. customizations dropping content in `/usr/lib` and such. For now though, validating that we didn't get the expected kargs should make things go Degraded, the same as if there was a file conflict. And *that* in turn should make it easier to debug failures. As of right now, it will appear that updates are complete, and then we'll only find out much later that the kargs are actually missing. And in turn, because kubelet spams the journal, any error messages from e.g. `ostree-finalize-staged.service` may be lost.

Putting this a separate commit/PR because I think we also need to bump our e2e times to ship this.

openshift-ci · 2022-04-27T00:20:14Z

@cgwalters: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-agnostic-upgrade	`faa0cc8`	link	true	`/test e2e-agnostic-upgrade`
ci/prow/e2e-gcp-op	`faa0cc8`	link	true	`/test e2e-gcp-op`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

cgwalters · 2022-04-28T21:24:03Z

OK so I ran into some tech debt working on this around the infra pool. The mess here is that today the KernelArguments test calls

helpers.CreateMCP(t, cs, "infra")

And...it's the only one to do so, even though the other tests rely on a created infra pool! So there's some implicit "state sharing" between these tests.

The next thing I noticed is that function actually returns a "cleanup function" that should be called...but isn't, and that's how the other tests can use it!

I started looking at cleaning this up...I think what we actually want is something like:

helpers.RunWithInfraPool(func () { 
  ... test code here ...
})

But then I got distracted looking at layering branch stuff. Not too sure about reworking the tests right now.

openshift-bot · 2022-07-28T01:24:09Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2022-08-27T08:31:02Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

sinnykumari · 2022-08-29T14:05:20Z

Is this still getting worked on or we should close this?

openshift-bot · 2022-09-29T00:00:15Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2022-09-29T00:00:36Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cgwalters mentioned this pull request Apr 25, 2022

daemon: Validate kernel arguments #3105

Merged

openshift-ci bot requested review from kikisdeliveryservice and yuqi-zhang April 25, 2022 21:11

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 25, 2022

cgwalters added 3 commits April 26, 2022 12:57

mcd_test: Add an e2e for kernel arguments failing to roll out

2132e94

Putting this a separate commit/PR because I think we also need to bump our e2e times to ship this.

Comment out some tests

c72c5c8

cgwalters force-pushed the test-karg-drift branch from a843a97 to c72c5c8 Compare April 26, 2022 16:57

cgwalters added 2 commits April 26, 2022 14:49

fixup! mcd_test: Add an e2e for kernel arguments failing to roll out

cd74f9e

fixup! mcd_test: Add an e2e for kernel arguments failing to roll out

faa0cc8

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 28, 2022

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 27, 2022

openshift-ci bot closed this Sep 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test karg drift #3115

Test karg drift #3115

cgwalters commented Apr 25, 2022

openshift-ci bot commented Apr 25, 2022

openshift-ci bot commented Apr 27, 2022

cgwalters commented Apr 28, 2022

openshift-bot commented Jul 28, 2022

openshift-bot commented Aug 27, 2022

sinnykumari commented Aug 29, 2022

openshift-bot commented Sep 29, 2022

openshift-ci bot commented Sep 29, 2022

Test karg drift #3115

Test karg drift #3115

Conversation

cgwalters commented Apr 25, 2022

openshift-ci bot commented Apr 25, 2022

openshift-ci bot commented Apr 27, 2022

cgwalters commented Apr 28, 2022

openshift-bot commented Jul 28, 2022

openshift-bot commented Aug 27, 2022

sinnykumari commented Aug 29, 2022

openshift-bot commented Sep 29, 2022

openshift-ci bot commented Sep 29, 2022