Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mergeatron should throttle multiple jobs #39

Open
mikesherov opened this issue Dec 7, 2012 · 10 comments
Open

mergeatron should throttle multiple jobs #39

mikesherov opened this issue Dec 7, 2012 · 10 comments

Comments

@mikesherov
Copy link
Contributor

If I submit a pull, then I push some commits, then some more... if the job for this pull hasn't started, don't queue up a new one.

@Krinkle
Copy link

Krinkle commented Dec 7, 2012

Implementation details, but may be better to do queue them all and instead effectively do the throttle from the job execution.

So when the job is popped (when the job gets an executor slot) check if this pull has newer commits and abort early.

That way instead of letting old queued jobs finish in favour of queueing more, the old queued jobs get cancelled early and the latest one runs instead of the old one.

@steves
Copy link
Contributor

steves commented Dec 7, 2012

To do that we'll have to start saving the Jenkins build id in mergeatron, or at least fetching it. Jenkins doesn't provide us this in the response when we trigger a build but since we're polling Jenkins regularly we should be able to get it from there easily enough.

@Krinkle
Copy link

Krinkle commented Dec 7, 2012

@steves "To do that" , do what? To prevent new builds for a PR that already has a build, or to detect from within a build execution to cancel early if the build is for a PR that has newer commits?

They are different approaches of which I think only one needs what you describe.

@mikesherov
Copy link
Contributor Author

@Krinkle, mergeatron immediately queues the job. Also, the mergeatron/Jenkins integration works on branches, not commits. So lets say you open a pull request, and then immediately push up 3 commits separately. Mergeatron would queue 4 jobs, but when the first one starts, it's already working on the latest commit because when the branch is checked out, it's got all the commits.

All mergeatron needs to do is check it's database to see if it has a job with status "new" for the pull already, and if so, not queue up a new job.

Or at least I think that makes sense!

@Krinkle
Copy link

Krinkle commented Dec 8, 2012

@mikesherov I didn't know that. In that case it makes sense to prevent the additional jobs from being created indeed.

May I ask, why was this design chosen for mergeatron? I can understand "Skipping non-latest commits between intervals and load queue" as an option (to optimise load), but not as the default way or as the only way.

In general one would want a build for every commit (especially if there have been multiple commits in a short time) so that each commit has a commit status (the GitHub API also works per commit, not per branch or per pull request).

For pull requests it might be useful to skip non-latest commits, though even then it could be useful to simply build each commit still (the API supports that) in order to easily figure out where a regression occurred.

Now that we're only using it for PRs this is unlikely to be a problem (chances of someething breaking between 2 versions of a pull aren't that big, though not impossible), but if/when we use it for all builds (which I am hoping we will one day) then this would become a problem. We'd want a complete history, not one scattered based on load and random intervals.

Travis-CI also runs per-commit.

@mikesherov
Copy link
Contributor Author

Sure, those are good questions. However, there simply doesn't seem to be a need to attach a status to each commit. In fact, running a build for each commit, in practice, is equivalent to running per pull request, and when it isn't, I'm not sure why I'd want to run 5 builds if I push up 5 commits. The default assumption that the HEAD commit is stable is a good one, IMHO.

@mikesherov
Copy link
Contributor Author

The regression use case is nice, but that doesn't seem to jive with pull requests. That's what testswarm's job is... to show where a regression in the authoritative history is. Mergeatron is a guardian, and finding where a regression is in a single PR doesn't seem to be its thing.

@Krinkle
Copy link

Krinkle commented Dec 9, 2012

I think you're missing the point. TestSwarm doesn't fetch or run anything. It distributes what it is given, and it is given jobs by Jenkins, which currently gets some from the "Git+GitHub Trigger" plugin (for pushed commits) and from Mergeatron (for PRs).

As I said before, if we are going to switch to only using Mergeatron (so that it would trigger a Jenkins job for commits to branches in addition to PRs), then it is crucial that it does so for the actual commit, and not just queueing a generic job for the branch name and running on whatever the latest commit is when the job is popped from the queue - as that would be guaranteed to miss commits if either someone pushes multiple commits at once and/or if multiple pushes occur between the job queuing and the execution. This isn't TestSwarm's job (garbage in, garbage out). It ought to be obvious that that isn't debatable, either Mergeatron gets this or we can't use it. That wouldn't be a problem per se, Jenkins is running fine with the Git+GitHub Trigger plugin.

@mikesherov
Copy link
Contributor Author

@Krinkle, I think we should take this convo off of this issue and to the mailing list, because I think we're at cross purposes here.

@mikesherov
Copy link
Contributor Author

@steves, for now proceed as expected. At some point we may want to have "test each commit" mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants