sync: add support for moved fixed bugzilla bugs to verified #161

AlexNPavel · 2020-04-03T00:19:21Z

This PR adds a new step that gets run when a release gets accepted.

With the addition of the /bugzilla cc-qa command to the bugzilla hook plugin for prow, developers can request a review from the QA contact specified in Bugzilla. The QA contact can review the code and verify that a new test exists in the bugfix PR that fails if the bug was not properly fixed. In this case, the QA contact can comment /lgtm or leave an approving review on a repo that had ReviewActsAsLGTM set up for the lgtm plugin.

This allows us to automatically verify bugs in the release-controller. When a release gets accepted, the release-controller can generate a list of bugs that were fixed in the release. It can then find the GitHub PR associated with the bug and, if a QA contact has LGTM'd the PR, the bug can be updated to the VERIFIED state.

This PR does all the steps up to actually updating the bug state. For now, it will simply print whether or not the bug would be updated to VERIFIED or not. This allows us to check that it is properly working before configuring it to make actual changes in bugzilla.

/cc @smarterclayton @stevekuznetsov

stevekuznetsov

/assign @smarterclayton

cmd/release-controller/main.go

stevekuznetsov

What is the expected frequency of execution and total throughput on the GitHub calls?

pkg/bugzilla/bugzilla.go

AlexNPavel · 2020-04-03T17:52:16Z

What is the expected frequency of execution and total throughput on the GitHub calls?

The frequency is 1 comment list request and 1 review list request for bugs fixed in new releases (unless a PR has more than 100 comments or 100 reviews; then the ghClient would paginate every 100 comments/reviews). Most nightlies seems to have ~1 fixed bug. Full releases may have more (4.4.0-rc4 had 19 fixed bugs from 4.4.0-rc3, 4.4.0-rc6 had 24 fixed bugs from 4.4.0-rc4). Overall, I think we'll probably make <100 requests per hour.

I'm not sure about throughput. Since the frequency of these calls will be quite low, I don't expect us to get throttled, and getting comments and reviews should be quite quick because of that (the main throughput issue we tend to face with GitHub throughput is ratelimit abuse throttling for calls that are frequent and need heavy pagination (like ListCollaborators in hook), which won't be an issue here).

cmd/release-controller/info.go

smarterclayton · 2020-04-05T14:17:47Z

cmd/release-controller/sync.go

@@ -547,6 +547,26 @@ func (c *Controller) syncAccepted(release *Release) error {
 			}
 		}
 	}
+	// If bugzilla bugs were fixed by a PR approved by their QA contact, move bug to verified
+	if c.bugzillaVerifier != nil {


This may have gotten lost in the previous PR, but this can't and should not belong here. You need to move this to a side sync loop like audit (which is just a separate main controller loop att the top level) that gets notifications of changes to image streams but has a completely different work queue and lifecycle.

With the current code structure, if any of this fails it will never get run again, and the go routines here actually make that worse because we could run syncAccepted fifty times in a second.

Your new loop at the top level is "ensure that every new accepted release has successfully marked all bugs fixed between those two levels as verified at least once, and then in the future it can be a no-op". The two parts that are still missing are the "no-op" part - if you've completed successfully you need some indicator that is so (probably for now we'll add that as a verification field on the actual release tag) - and the control flow that doesn't ever run two coroutines at the same time.

While audit.go is more structured around running jobs, you can start with its flow by running the "verify bugs" function in its place, and if everything completes successfully annotating the release tag with an agreed on value (for now you can test with an arbitrary name and then when I see the code we can agree on what the actual value and name will be). Then you skip any release that has that "already verified successfully", and retry any that don't have it.

Also, it looks like there's a logical gap here that you need to only be running this on image streams that are nightlies (CIs are considered insufficient to close bugs), so you'll need to add an attribute/field/struct on the Release configuration that allows only those inputs to be selected. That's probably going to be adding a new struct to "ReleasePublish" which is an optional *VerifyBugs that will hold any configuration we need. The presence of the struct will be required in your audit-like-loop to execute this code.

If you'd like to set up time on Monday to discuss this, poke me in slack and I'll clear some space on the calendar.

AlexNPavel · 2020-04-07T22:15:55Z

Updated. I am slightly unsure of when it makes sense to add to the bugzillaQueueKeys, as we only need to run when a release is accepted. I currently only added it in the if check for t.Annotations[releaseAnnotationConfig].

One more thing that I am unsure of is whether I am getting the proper current and previous tags for a release correctly. I repurposed some of the code used for the web UI for this, but there may be a better and more correct way to handle that.

cmd/release-controller/http.go

cmd/release-controller/main.go

cmd/release-controller/types.go

pkg/bugzilla/bugzilla.go

cmd/release-controller/bugzilla.go

smarterclayton · 2020-04-14T23:53:20Z

cmd/release-controller/bugzilla.go

+	}
+	var errs []error
+	// get latest accepted tag
+	tag := acceptedTags[0]


So in a disaster recovery scenario you're going to verify the tags in a newer release than they really went in. Are you sure you don't want to scan the list to find the last one you verified (if any) and verify one release at a time (i.e. find oldest N, then diff N and N+1, and then next time around scan the next, etc)? And if you find no N, start at the oldest and walk forward? Knowing exactly what release it went into is probably valuable.

It would be good to restructure the "find both versions" into a function that takes a release stream as input and returns the two and then add a unit test for those cases.

I'm not sure why we need a function to "find both versions". Shouldn't it be fine to just loop through acceptedTags until we find the last tag that we verified?

cmd/release-controller/bugzilla.go

pkg/bugzilla/bugzilla.go

smarterclayton

Structure looks ok, i want to see a better algorithm for walking the accepted tags so that we pin the correct release that the code changed in.

cmd/release-controller/bugzilla.go

cmd/release-controller/controller.go

cmd/release-controller/info.go

smarterclayton · 2020-04-22T16:52:24Z

Accepted tags walking looks good, a few more things and this should be ready to try out.

smarterclayton · 2020-05-13T14:41:31Z

Ok, this is ready for a spin. Please squash and I'll tag. How will you be testing the rollout?

This commit simply prints what would be changed in the logs; support for actual changes to bugzilla states will be added once this is deemed to work correctly in cluster.

smarterclayton · 2020-05-15T02:23:32Z

/lgtm

openshift-ci-robot · 2020-05-15T02:23:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AlexNPavel, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Updating the default value appropriately

openshift-ci-robot requested review from smarterclayton and stevekuznetsov April 3, 2020 00:19

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 3, 2020

stevekuznetsov reviewed Apr 3, 2020

View reviewed changes

openshift-ci-robot assigned smarterclayton Apr 3, 2020

stevekuznetsov reviewed Apr 3, 2020

View reviewed changes

cmd/release-controller/main.go Show resolved Hide resolved

stevekuznetsov reviewed Apr 3, 2020

View reviewed changes

pkg/bugzilla/bugzilla.go Outdated Show resolved Hide resolved

smarterclayton reviewed Apr 5, 2020

View reviewed changes

cmd/release-controller/info.go Outdated Show resolved Hide resolved

smarterclayton reviewed Apr 5, 2020

View reviewed changes

AlexNPavel marked this pull request as ready for review April 7, 2020 22:20

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2020