Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document errors returned by Execution/WaitExecution #225

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion build/bazel/remote/execution/v2/remote_execution.proto
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ service Execution {
//
// Errors discovered during creation of the `Operation` will be reported
// as gRPC Status errors, while errors that occurred while running the
// action will be reported in the `status` field of the `ExecuteResponse`. The
// action will be reported in the `status` field of the
// [ExecuteResponse][build.bazel.remote.execution.v2.ExecuteResponse]. The
// server MUST NOT set the `error` field of the `Operation` proto.
// The possible errors include:
//
Expand Down Expand Up @@ -120,6 +121,21 @@ service Execution {
// operation completes, and then respond with the completed operation. The
// server MAY choose to stream additional updates as execution progresses,
// such as to provide an update as to the state of the execution.
//
// Similarly to [Execute][build.bazel.remote.execution.v2.Execution.Execute],
// errors that occured while running the action will be reported in the
// `status` field of
// [ExecuteResponse][build.bazel.remote.execution.v2.ExecuteResponse].
// The possible errors include:
//
// * `NOT_FOUND`: The client requested an
// [Operation][google.longrunning.Operation] that is not known by the server
// (e.g., because
// [Execute][build.bazel.remote.execution.v2.Execution.Execute] never
// returned that value, or because the action was already executed and
// returned the result).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buildbarn automatically terminates execution of actions in case it has not observed any clients during the last minute that have called Execute() or WaitExecution() on it. This means that a client could also get NOT_FOUND in that case.

Maybe worth calling it out? Or is that too oddly specific?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the case of cancelled actions should be explicitly documented here.

I'd like some clarification around separating the Operation error code from the Execution error code. If a client asks for an operation that was completed (or cancelled) in the past, the server may respond with NOT_FOUND if it has already cleared the state related to that Operation, but it is also allowed to respond with the completed Operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the super long delay here. PTAL

// * All errors returned by
// [Execute][build.bazel.remote.execution.v2.Execution.Execute].
rpc WaitExecution(WaitExecutionRequest) returns (stream google.longrunning.Operation) {
option (google.api.http) = { post: "/v2/{name=operations/**}:waitExecution" body: "*" };
}
Expand Down