Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test karg drift #3115

Closed
wants to merge 5 commits into from
Closed

Conversation

cgwalters
Copy link
Member

Add an e2e for #3105

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 25, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 25, 2022
I'm debugging https://bugzilla.redhat.com/show_bug.cgi?id=2075126
and while I haven't verified this is the case, as far as I can tell
from looking through the code and thinking about things, if we somehow
fail to apply the expected kernel arguments (which can occur if
`ostree-finalize-staged` fails) then we will (on the next boot)
drop in to `validateOnDiskState()` which has for a long time
checked that all the expected *files* exist and mark the update as
complete.  But we didn't check the kernel arguments.

That can then cause later problems because in trying to apply further
updates we'll ask rpm-ostree to try to remove kernel arguments that
aren't actually present.

Worse, often these kernel arguments are actually *quite important*
and may even have security relevant properties (e.g. `nosmt`).

Now...I am actually increasingly convinced that we *really* need
to move opinionated kernel argument handling into ostree (and rpm-ostree).

There's ye olde ostreedev/ostree#2217
and the solution may look something like that.  Particularly now
with the layering philosophy that it makes sense to support
e.g. customizations dropping content in `/usr/lib` and such.

For now though, validating that we didn't get the expected kargs
should make things go Degraded, the same as if there was a file conflict.
And *that* in turn should make it easier to debug failures.

As of right now, it will appear that updates are complete, and then
we'll only find out much later that the kargs are actually missing.
And in turn, because kubelet spams the journal, any error messages
from e.g. `ostree-finalize-staged.service` may be lost.
Putting this a separate commit/PR because I think we also need
to bump our e2e times to ship this.
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 27, 2022

@cgwalters: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agnostic-upgrade faa0cc8 link true /test e2e-agnostic-upgrade
ci/prow/e2e-gcp-op faa0cc8 link true /test e2e-gcp-op

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cgwalters
Copy link
Member Author

OK so I ran into some tech debt working on this around the infra pool. The mess here is that today the KernelArguments test calls

helpers.CreateMCP(t, cs, "infra")

And...it's the only one to do so, even though the other tests rely on a created infra pool! So there's some implicit "state sharing" between these tests.

The next thing I noticed is that function actually returns a "cleanup function" that should be called...but isn't, and that's how the other tests can use it!

I started looking at cleaning this up...I think what we actually want is something like:

helpers.RunWithInfraPool(func () { 
  ... test code here ...
})

But then I got distracted looking at layering branch stuff. Not too sure about reworking the tests right now.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 28, 2022
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 27, 2022
@sinnykumari
Copy link
Contributor

Is this still getting worked on or we should close this?

@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Sep 29, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 29, 2022

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants