Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tolerations to Build and BuildRun objects #1711

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dorzel
Copy link
Contributor

@dorzel dorzel commented Oct 30, 2024

Changes

Fixes #1636

Submitter Checklist

  • Includes tests if functionality changed/was added
  • Includes docs if changes are user-facing
  • Set a kind label on this PR
  • Release notes block has been filled in, or marked NONE

See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.

Release Notes

Add Tolerations to Build and BuildRun objects

@openshift-ci openshift-ci bot added the release-note Label for when a PR has specified a release note label Oct 30, 2024
@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 30, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 30, 2024
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 8 times, most recently from 872db31 to 462d9bb Compare November 6, 2024 20:22
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 3 times, most recently from 4ecfc21 to dfe25d5 Compare November 13, 2024 18:59
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 3 times, most recently from e449fbd to 03b3b21 Compare November 19, 2024 18:13
@pull-request-size pull-request-size bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 19, 2024
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 8 times, most recently from 3e66b55 to 43382a6 Compare November 20, 2024 22:06
@dorzel dorzel marked this pull request as ready for review November 21, 2024 17:28
@dorzel dorzel changed the title WIP Add Tolerations to Build and BuildRun objects Add Tolerations to Build and BuildRun objects Nov 21, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2024
@dorzel
Copy link
Contributor Author

dorzel commented Nov 21, 2024

/kind feature

@openshift-ci openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 21, 2024
@dorzel
Copy link
Contributor Author

dorzel commented Nov 21, 2024

Ok, this is ready for review/discussion.

I was unsure how the implementation should look for the strategic merge json patch mentioned in https://github.com/shipwright-io/community/blob/main/ships/0039-build-scheduler-opts.md#tolerations

This currently works and patches the BuildRun/Build object:

kubectl patch BuildRun <buildrun-name> --type=merge -p '{"spec":{"tolerations":[{"key":"test-key-patch","operator":"Equal","value":"test-value-patch"}]}}' -n test-ns

But didn't know if that was all that was required or what that should ideally look like.

Also looking for input on what implementing the subset of tolerations functionality should ideally look like, or if the current implementation is ok (failing validation if unsupported fields are specified/ always setting "NoSchedule").

I've also left the validaitons on the Shipwright end to enforce the simpler requirements here, and let Kubernetes do the rest of the toleration validation with the full ruleset.

pkg/reconciler/buildrun/resources/taskrun.go Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@dorzel dorzel force-pushed the MULTIARCH-5036 branch 7 times, most recently from 5dbdbba to 30677bc Compare December 6, 2024 17:19
@dorzel
Copy link
Contributor Author

dorzel commented Dec 9, 2024

Bit stumped on the failing integration tests - after several revisions I can't find why the taint effect isn't getting set. It also looks like the e2e tests are running out of disk space.

pkg/validate/tolerations.go Outdated Show resolved Hide resolved
test/e2e/v1beta1/e2e_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

openshift-ci bot commented Dec 16, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign heavywombat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines 754 to 758
validateBuildRunToFail(testBuild, buildRun)
buildRun, err = testBuild.LookupBuildRun(types.NamespacedName{Name: buildRun.Name, Namespace: testBuild.Namespace})

Expect(buildRun.Status.FailureDetails.Message).To(Equal(shpgit.AuthPrompted.ToMessage()))
Expect(buildRun.Status.FailureDetails.Reason).To(Equal(shpgit.AuthPrompted.String()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this. There is a BuildRun which cannot start because the tolerations do not match any node. Is not the result that the Pod is stuck in Pending, eventually the TaskRun and BuildRun time out. Why would there be a failure coming from the Git source step ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was an oversight on my part. I would expect the Pending/timeout you mentioned instead. I'll change this.

Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, you may want to create the buildrun with a shorter timeout. I guess it would otherwise wait the default ten minutes.

@dorzel
Copy link
Contributor Author

dorzel commented Jan 8, 2025

An update - still looking into why TaintEffect isn't getting set in the integration tests. Integration tests are at least all failing in the same way.
e2e tests are all over the place, with unrelated tests failing and 2/4 of them nearly passing. I'm wondering if some of these failures are due to the new three-node configuration and disk space issues.

@dorzel dorzel force-pushed the MULTIARCH-5036 branch 2 times, most recently from 065ba79 to 6038e88 Compare January 14, 2025 19:31
Signed-off-by: Dylan Orzel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. release-note Label for when a PR has specified a release note size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

SHIP-0039: Allow tolerations on Build and BuildRun to be set
2 participants