Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache read errors not recorded, and no retry for eager fetch #12544

Closed
stuhood opened this issue Aug 11, 2021 · 2 comments · Fixed by #15744
Closed

Cache read errors not recorded, and no retry for eager fetch #12544

stuhood opened this issue Aug 11, 2021 · 2 comments · Fixed by #15744
Assignees
Labels

Comments

@stuhood
Copy link
Member

stuhood commented Aug 11, 2021

As noticed previously (and then forgotten: oops): the remote cache does not record read errors when populating stdout/stderr on the response, or when eager fetching the content:

// TODO: This should move inside the retry_call above, both in order to be retried, and
// to ensure that we increment a miss if we fail to eagerly fetch.

The comment isn't quite right though: we can't trivially move that block inside of the retry_call without refactoring the relevant methods from String errors to something classifiable (which might be useful, but isn't necessarily the right scope to retry at, since local errors won't be retriable). Fixing the metric only involves moving all of the steps involved in the hit under an async block so that we don't call the hit "Ok" until eager_fetch has succeeded.

But instead, we should likely move to using tower's retry with tonic. This would have the advantage of applying to all gRPC methods in one fell swoop. That would remove local retry_call blocks and ease adding additional nesting here.

@tdyas
Copy link
Contributor

tdyas commented Aug 20, 2021

First attempt at trying to use tower::retry::Retry: #12614

Doesn't compile, will need to discuss the types with someone.

@tdyas
Copy link
Contributor

tdyas commented Aug 20, 2021

Apparently tonic does not work with tower::retry::Retry: hyperium/tonic#733

@stuhood stuhood assigned stuhood and unassigned tdyas Jun 3, 2022
@stuhood stuhood added this to the 2.12.x milestone Jun 3, 2022
@stuhood stuhood removed this from the 2.12.x milestone Jun 3, 2022
stuhood added a commit that referenced this issue Jun 3, 2022
The TODO referenced from #12544 was even more wrong than initially suspected. All of the relevant `remote::Store` accessing methods have internal retry, so the only issue was that we were not recording errors for requests which failed to fetch process outputs.

Fixes #12544.

[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this issue Jun 3, 2022
The TODO referenced from pantsbuild#12544 was even more wrong than initially suspected. All of the relevant `remote::Store` accessing methods have internal retry, so the only issue was that we were not recording errors for requests which failed to fetch process outputs.

Fixes pantsbuild#12544.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Jun 4, 2022
…15747)

The TODO referenced from #12544 was even more wrong than initially suspected. All of the relevant `remote::Store` accessing methods have internal retry, so the only issue was that we were not recording errors for requests which failed to fetch process outputs.

Fixes #12544.

[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants