-
-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache read errors not recorded, and no retry for eager fetch #12544
Labels
Comments
First attempt at trying to use Doesn't compile, will need to discuss the types with someone. |
Apparently tonic does not work with tower::retry::Retry: hyperium/tonic#733 |
stuhood
added a commit
to stuhood/pants
that referenced
this issue
Jun 3, 2022
The TODO referenced from pantsbuild#12544 was even more wrong than initially suspected. All of the relevant `remote::Store` accessing methods have internal retry, so the only issue was that we were not recording errors for requests which failed to fetch process outputs. Fixes pantsbuild#12544. [ci skip-build-wheels]
stuhood
added a commit
that referenced
this issue
Jun 4, 2022
…15747) The TODO referenced from #12544 was even more wrong than initially suspected. All of the relevant `remote::Store` accessing methods have internal retry, so the only issue was that we were not recording errors for requests which failed to fetch process outputs. Fixes #12544. [ci skip-build-wheels]
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As noticed previously (and then forgotten: oops): the remote cache does not record read errors when populating stdout/stderr on the response, or when eager fetching the content:
pants/src/rust/engine/process_execution/src/remote.rs
Lines 1393 to 1394 in 064036b
The comment isn't quite right though: we can't trivially move that block inside of the
retry_call
without refactoring the relevant methods fromString
errors to something classifiable (which might be useful, but isn't necessarily the right scope to retry at, since local errors won't be retriable). Fixing the metric only involves moving all of the steps involved in the hit under anasync
block so that we don't call the hit "Ok" until eager_fetch has succeeded.But instead, we should likely move to using
tower
's retry withtonic
. This would have the advantage of applying to all gRPC methods in one fell swoop. That would remove localretry_call
blocks and ease adding additional nesting here.The text was updated successfully, but these errors were encountered: