-
Notifications
You must be signed in to change notification settings - Fork 0
Enable timeout propagation for remote actions #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
When building using remote execution, action timeouts sometimes need to be adjusted per action to ensure reasonable timeouts are set. This change will allow better control over how actions behave and encourage users to optimize their actions since the global default timeout does not have to be set to the absolute largest action timeout in the graph. Change-Id: I0ea63787ff7769e25dd597575cbfd7fd1050797e
Timeout may either be specified as a tag on format tags = ["tag:<int-seconds>"] or as execution_requirements on any action on dict format execution_requirements = { "timeout": "<int-seconds>" } Change-Id: Ida3bf173a291e7d949046bc4df3d1d4a02d8328d
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Hi @quic-sbjorkle, Could you please sign the CLA. |
@sgowroji Should be signed now if you retrigger the check. |
@tjgq Any ETA for when you can take a look at this? Thanks |
@bazel-io flag |
I'm supportive of this PR in principle, but I need to do some due diligence to verify that reinterpreting an existing |
@bazel-io fork 8.0.0 |
WalkthroughThe changes introduce enhanced support for timeout specifications in execution requirements and spawn actions. The Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Spawn
participant Spawns
participant ExecutionRequirements
User->>Spawn: Provide execution info (may include timeout tags)
Spawn->>Spawns: Call getTimeout(executionInfo, defaultTimeout)
Spawns->>ExecutionRequirements: Validate and parse timeout tags
ExecutionRequirements-->>Spawns: Return validated timeout or error
Spawns-->>User: Return timeout Duration or throw ExecException
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.31.1)src/test/java/com/google/devtools/build/lib/starlark/StarlarkRuleImplementationFunctionsTest.java✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
For triaging purposes -- is this fixing a regression introduced in an earlier 8.0.0 RC? If not, this need not be release-blocking, but we can certainly cherry-pick to 8.1.0 at least. |
@Wyverald It's a feature request, no need to hold up the release. |
@tjgq Are there any more changes needed from my side, or is it good to go now? |
Sorry, I'm about to be out of office, so I won't have time to look into this until January. I would also like @zhengwei143 to do a review of his own. |
@zhengwei143 @tjgq Friendly ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @arvi18, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request enables timeout propagation for remote actions. The goal is to allow better control over how actions behave and encourage users to optimize their actions, since the global default timeout does not have to be set to the absolute largest action timeout in the graph. This is achieved by allowing the timeout to be specified either as a tag or as a key-value pair in the execution requirements. The code changes modify how timeouts are parsed and validated, and add new failure detail codes for duplicate or invalid timeout tags.
Highlights
- Timeout Propagation: The PR enables the propagation of timeouts for remote actions, allowing for more granular control over action timeouts.
- Timeout Specification: Timeouts can now be specified either as a tag (timeout:) or as a key-value pair in the execution requirements.
- Validation: The PR includes validation to ensure that timeout values are properly formatted and within acceptable ranges.
- Error Handling: New failure detail codes have been added to handle cases where there are duplicate or invalid timeout tags.
Changelog
Click here to see the changelog
- src/main/java/com/google/devtools/build/lib/actions/ExecutionRequirements.java
- Introduces
ParseableRequirement
for timeout to allow validation and parsing of timeout values. - Adds validation logic to ensure timeout values are positive integers in canonical format.
- Introduces
- src/main/java/com/google/devtools/build/lib/actions/Spawns.java
- Modifies
getTimeout
to retrieve timeout values from either a 'timeout' key or a 'timeout:' tag in the execution info. - Adds error handling for multiple timeout values and invalid timeout tags, including new
FailureDetail
codes. - Introduces helper methods
getTimeoutValueFromExecutionInfoKeyOrValue
andparseTimeoutValue
to handle timeout parsing and retrieval.
- Modifies
- src/main/java/com/google/devtools/build/lib/exec/StandaloneTestStrategy.java
- Updates the way the timeout is added to the execution info, using the key "timeout" instead of
ExecutionRequirements.TIMEOUT
.
- Updates the way the timeout is added to the execution info, using the key "timeout" instead of
- src/main/java/com/google/devtools/build/lib/packages/TargetUtils.java
- Adds 'timeout' to the list of legal execution info keys.
- src/main/protobuf/failure_details.proto
- Adds new failure detail codes
DUPLICATE_TIMEOUT_TAGS
andINVALID_TIMEOUT_TAG
.
- Adds new failure detail codes
- src/test/java/com/google/devtools/build/lib/packages/TargetUtilsTest.java
- Adds a test case
testExecutionInfo_withPrefixTimeout
to verify that timeout tags are correctly parsed.
- Adds a test case
- src/test/java/com/google/devtools/build/lib/starlark/StarlarkRuleImplementationFunctionsTest.java
- Removes timeout from the default execution requirements in
testCreateSpawnActionEnvAndExecInfo
. - Adds a new test case
testCreateSpawnActionEnvAndExecInfo_withTimeout
to verify that timeout values are correctly passed to theSpawnAction
.
- Removes timeout from the default execution requirements in
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A timed action's plight,
Needs a timeout, day and night,
No longer wait,
Propagate,
And set things right.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces timeout propagation for remote actions, allowing for better control over action behavior and encouraging users to optimize their actions. The changes involve modifying the ExecutionRequirements
class to use a ParseableRequirement
for timeouts, updating Spawns
to handle timeout values from execution info, and adjusting StandaloneTestStrategy
and TargetUtils
to accommodate the new timeout implementation. Overall, the changes seem well-structured and address the intended goal.
Summary of Findings
- Timeout Value Parsing: The code introduces a new way to parse timeout values, allowing them to be specified either as a tag or as a key-value pair. It's important to ensure that this new parsing logic is robust and handles various edge cases, such as invalid timeout formats or multiple timeout values.
- Error Handling: The code includes error handling for invalid timeout values and duplicate timeout tags. It's crucial to ensure that these error messages are clear and informative, providing users with actionable guidance on how to resolve the issues.
- Code Duplication: The code introduces new methods for retrieving and parsing timeout values. Consider refactoring the code to reduce duplication and improve maintainability.
Merge Readiness
The pull request introduces important functionality for timeout propagation. While the changes appear to be well-structured, there are some areas that could benefit from further refinement, particularly in error handling and code duplication. I recommend addressing the review comments before merging. I am unable to directly approve this pull request, and other reviewers should also review this code before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/main/java/com/google/devtools/build/lib/actions/Spawns.java (2)
158-176
: Duplicate-tag scan can short-circuit after first hitOnce a valid timeout value is discovered you continue iterating over the remaining
executionInfo
entries and only fail when a second timeout is met.
A quickbreak
after settingtimeout
would:
- Avoid needless work on large tag maps.
- Prevent accidentally accepting an earlier malformed tag that appears after
the first valid one (today it is silently ignored).- timeout = parseTimeoutValue(value); + timeout = parseTimeoutValue(value); + break; // first match wins – avoids O(n) scan & inconsistent state
165-168
: Error message could include offending tags for easier debuggingWhen duplicate timeouts are detected, consider including the full set of
conflicting tags/keys encountered in the exception message so that users can
quickly locate and fix them.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/main/java/com/google/devtools/build/lib/actions/ExecutionRequirements.java
(1 hunks)src/main/java/com/google/devtools/build/lib/actions/Spawns.java
(2 hunks)src/main/java/com/google/devtools/build/lib/exec/StandaloneTestStrategy.java
(1 hunks)src/main/java/com/google/devtools/build/lib/packages/TargetUtils.java
(1 hunks)src/main/protobuf/failure_details.proto
(1 hunks)src/test/java/com/google/devtools/build/lib/packages/TargetUtilsTest.java
(1 hunks)src/test/java/com/google/devtools/build/lib/starlark/StarlarkRuleImplementationFunctionsTest.java
(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/main/java/com/google/devtools/build/lib/actions/Spawns.java (1)
src/main/java/com/google/devtools/build/lib/actions/ExecutionRequirements.java (2)
ExecutionRequirements
(40-335)ValidationException
(55-77)
🔇 Additional comments (7)
src/main/java/com/google/devtools/build/lib/actions/ExecutionRequirements.java (1)
121-145
: Well-designed timeout validation implementationThe new
ParseableRequirement
for timeout correctly validates that:
- The value is a parseable integer
- The value is in canonical format (no leading zeros or plus signs)
- The value is positive (greater than zero)
- Provides clear error messages for each validation failure
This implementation aligns well with the PR's goal of improving timeout management by enabling per-action timeout propagation.
src/main/java/com/google/devtools/build/lib/packages/TargetUtils.java (1)
53-64
: Good addition for timeout tag supportAdding
tag.startsWith("timeout")
tolegalExecInfoKeys
properly enables propagation of timeout-related tags from targets to execution requirements.This change complements the ExecutionRequirements refactoring and ensures that users can specify timeout tags that will be correctly propagated to spawned actions.
src/main/java/com/google/devtools/build/lib/exec/StandaloneTestStrategy.java (1)
122-122
: Correctly updated timeout key usageThe change from
ExecutionRequirements.TIMEOUT
to the literal string"timeout"
properly accommodates the refactoring ofTIMEOUT
from a string constant to aParseableRequirement
object, ensuring that the execution info map continues to work with the new implementation.src/main/protobuf/failure_details.proto (1)
225-226
: Good error handling for timeout issuesThe new error codes
DUPLICATE_TIMEOUT_TAGS
andINVALID_TIMEOUT_TAG
provide clear, specific failure reporting for timeout-related issues, improving the user experience when timeouts are misconfigured.This complements the validation logic added to ExecutionRequirements and enables better error reporting when users specify invalid timeout values.
src/test/java/com/google/devtools/build/lib/packages/TargetUtilsTest.java (1)
226-237
: Nice addition of test for timeout prefix.The new test correctly verifies that execution info tags prefixed with
timeout:
are properly extracted into the execution info map. This test aligns well with the other prefix tests in the file and properly supports the enhanced timeout handling functionality.src/test/java/com/google/devtools/build/lib/starlark/StarlarkRuleImplementationFunctionsTest.java (2)
419-419
: Appropriately removed timeout from combined test.Good cleanup by removing the
timeout
key from this test's execution requirements. This makes sense since you've now created a dedicated test for timeout handling.
431-451
: Good addition of dedicated timeout test.This new test properly validates that the timeout execution requirement is correctly handled in spawn actions. Isolating the timeout test ensures clean validation of this specific feature and aligns with the PR's objective of enhancing timeout propagation for remote actions.
try { | ||
return Duration.ofSeconds(Integer.parseInt(timeoutStr)); | ||
return Duration.ofSeconds(Integer.parseInt(value)); | ||
} catch (NumberFormatException e) { | ||
throw new UserExecException( | ||
e, | ||
FailureDetail.newBuilder() | ||
.setMessage("could not parse timeout") | ||
.setSpawn(FailureDetails.Spawn.newBuilder().setCode(Code.INVALID_TIMEOUT)) | ||
.build()); | ||
throw new UserExecException( | ||
e, | ||
FailureDetail.newBuilder() | ||
.setMessage("could not parse timeout") | ||
.setSpawn(FailureDetails.Spawn.newBuilder().setCode(Code.INVALID_TIMEOUT)) | ||
.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative or zero seconds still accepted – explicitly reject them
Duration.ofSeconds(Integer.parseInt(value))
will happily create negative or
zero durations. The validator above (tag path) already forbids those, so the
value-based path should be equally strict.
try {
- return Duration.ofSeconds(Integer.parseInt(value));
+ int seconds = Integer.parseInt(value);
+ if (seconds < 1) {
+ throw new UserExecException(
+ FailureDetail.newBuilder()
+ .setMessage("timeout must be a positive integer")
+ .setSpawn(FailureDetails.Spawn.newBuilder().setCode(Code.INVALID_TIMEOUT_TAG))
+ .build());
+ }
+ return Duration.ofSeconds(seconds);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
try { | |
return Duration.ofSeconds(Integer.parseInt(timeoutStr)); | |
return Duration.ofSeconds(Integer.parseInt(value)); | |
} catch (NumberFormatException e) { | |
throw new UserExecException( | |
e, | |
FailureDetail.newBuilder() | |
.setMessage("could not parse timeout") | |
.setSpawn(FailureDetails.Spawn.newBuilder().setCode(Code.INVALID_TIMEOUT)) | |
.build()); | |
throw new UserExecException( | |
e, | |
FailureDetail.newBuilder() | |
.setMessage("could not parse timeout") | |
.setSpawn(FailureDetails.Spawn.newBuilder().setCode(Code.INVALID_TIMEOUT)) | |
.build()); | |
try { | |
int seconds = Integer.parseInt(value); | |
if (seconds < 1) { | |
throw new UserExecException( | |
FailureDetail.newBuilder() | |
.setMessage("timeout must be a positive integer") | |
.setSpawn(FailureDetails.Spawn.newBuilder().setCode(Code.INVALID_TIMEOUT_TAG)) | |
.build()); | |
} | |
return Duration.ofSeconds(seconds); | |
} catch (NumberFormatException e) { | |
throw new UserExecException( | |
e, | |
FailureDetail.newBuilder() | |
.setMessage("could not parse timeout") | |
.setSpawn(FailureDetails.Spawn.newBuilder().setCode(Code.INVALID_TIMEOUT)) | |
.build()); | |
} |
if (entry.getKey().equals("timeout")) { | ||
value = entry.getValue(); | ||
} else { | ||
try { | ||
value = requirement.parseIfMatches(entry.getKey()); | ||
} catch (ValidationException e) { | ||
throw new UserExecException( | ||
e, | ||
FailureDetail.newBuilder() | ||
.setMessage( | ||
String.format( | ||
"%s has a '%s' tag, but its value '%s' didn't pass validation: %s", | ||
spawn.getTargetLabel(), requirement.userFriendlyName(), e.getTagValue(), e.getMessage())) | ||
.setSpawn(FailureDetails.Spawn.newBuilder().setCode(Code.INVALID_TIMEOUT_TAG)) | ||
.build()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Validation is inconsistent between timeout
key and timeout:<int>
tag
The "timeout"
key path bypasses the ExecutionRequirements.TIMEOUT
validator,
so values like "0"
, "-3"
, "+04"
or " 5 "
silently pass, even though the
same values are rejected when expressed as a tag (timeout:…
).
This breaks the “canonical, positive integer” contract and can surface as
negative‐duration Duration
s later in the pipeline.
- if (entry.getKey().equals("timeout")) {
- value = entry.getValue();
+ if (entry.getKey().equals("timeout")) {
+ value = entry.getValue();
+ // Re-use the canonical validator for consistency.
+ String error = requirement.validator().apply(value);
+ if (error != null) {
+ throw new UserExecException(
+ FailureDetail.newBuilder()
+ .setMessage(
+ String.format(
+ "%s has a 'timeout' key, but its value '%s' %s",
+ spawn.getTargetLabel(), value, error))
+ .setSpawn(FailureDetails.Spawn.newBuilder().setCode(Code.INVALID_TIMEOUT_TAG))
+ .build());
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if (entry.getKey().equals("timeout")) { | |
value = entry.getValue(); | |
} else { | |
try { | |
value = requirement.parseIfMatches(entry.getKey()); | |
} catch (ValidationException e) { | |
throw new UserExecException( | |
e, | |
FailureDetail.newBuilder() | |
.setMessage( | |
String.format( | |
"%s has a '%s' tag, but its value '%s' didn't pass validation: %s", | |
spawn.getTargetLabel(), requirement.userFriendlyName(), e.getTagValue(), e.getMessage())) | |
.setSpawn(FailureDetails.Spawn.newBuilder().setCode(Code.INVALID_TIMEOUT_TAG)) | |
.build()); | |
} | |
} | |
if (entry.getKey().equals("timeout")) { | |
value = entry.getValue(); | |
// Re-use the canonical validator for consistency. | |
String error = requirement.validator().apply(value); | |
if (error != null) { | |
throw new UserExecException( | |
FailureDetail.newBuilder() | |
.setMessage( | |
String.format( | |
"%s has a 'timeout' key, but its value '%s' %s", | |
spawn.getTargetLabel(), value, error)) | |
.setSpawn(FailureDetails.Spawn.newBuilder().setCode(Code.INVALID_TIMEOUT_TAG)) | |
.build()); | |
} | |
} else { | |
try { | |
value = requirement.parseIfMatches(entry.getKey()); | |
} catch (ValidationException e) { | |
throw new UserExecException( | |
e, | |
FailureDetail.newBuilder() | |
.setMessage( | |
String.format( | |
"%s has a '%s' tag, but its value '%s' didn't pass validation: %s", | |
spawn.getTargetLabel(), | |
requirement.userFriendlyName(), | |
e.getTagValue(), | |
e.getMessage())) | |
.setSpawn(FailureDetails.Spawn.newBuilder().setCode(Code.INVALID_TIMEOUT_TAG)) | |
.build()); | |
} | |
} |
When building using remote execution,
action timeouts sometimes need to be adjusted
per action to ensure reasonable timeouts are set.
This change will allow better control over how
actions behave and encourage users to
optimize their actions since the global default timeout does not have to be set to the absolute largest
action timeout in the graph.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation