-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read from Storage in file source #3869
Conversation
@fmassot (cc @trinity-1686a) commit 9a74f5a contains the scaffolding for the streaming storage. The central point is that
|
I read the comment for the |
Noted, I'll rephrase it if we move forward in this direction
That's one of the questions (I actually made a typo in my previous comment, sorry!). |
The trade-off is adding a method that will remain unimplemented for some storage implementations vs adding two new traits, a new method to @fulmicoton, you want to weigh in? |
We can add a
We can have a "best-effort" default impl that calls
ReadableStorage/ WritableStorage is an interesting idea. The same idea exists on the Directory side. ({Read/Write}Directory. |
Ok, thanks for your insight! I'll go with a |
6186933
to
62836f6
Compare
67961a2
to
0981d3f
Compare
This is taking shape. Before moving this PR out of draft, I have a few questions:
I also still need to look into the tests with external systems (Azure and S3) |
ee47ce8
to
ae417ff
Compare
Cache implementation: I added a naive implementation that uses get_slice and wraps it into a cursor. This implementation is actually never used. Should we rather leave it un-implemented and let it to the person who will first use it to come up with the implementation they want? (if we leave this dummy implementation, I'll obviously add a unit test for it) Yes Retry on Azure: the Azure SDK opens a fully lazy stream, so nothing happens until the stream is first polled. This means that we cannot use the usual mechanism here. One workaround to enable the retry mechanism would be to peak into the stream and wrap that into a retry. Do you think it is worth it?
Buffered reader: The S3 SDK documentation seems to suggest that the AsyncRead should be wrapped into a BufReader. The existing file source implementation was also performing this wrapping around the tokio::fs::File. Currently, it is still the file source that performs the wrapping, but my feeling is that it should be the Storage implementation that takes care of this Which documentation? Debouncer: I'm not sure I fully understand the role of the ins an outs of the debouncer. I added a dummy implementation of get_slice_stream that bypasses it. Does that seem sound? Yes that's ok. The debouncer detects that several queries currently in flight are actually fetching the same piece of data, and makes sure we only emit one GET request. |
Cache implementationok, switching to unimplemented Buffered reader
Very sorry for this, I had the link ready but forgot to paste it in: But now that I think about it, it's probably just for the convenience of having I took a deeper look at the code, and the Retry on Azure
I feel it would be relevant to ensure that the stream is polled at least once in the body, as a sort of equivalent to "opening the file". This should rarely be useful in practice as connectivity checks seem to be usually performed beforehand, but it would make the behavior more consistent with with the other Storage implementations. To be consistent with the approach I proposed above regarding buffered readers and avoid wrapping the stream in multiple layers of Debouncer👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a bug in the local file storage. See comment inline. The other comments are nitpicks.
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #3869 +/- ##
==========================================
- Coverage 81.83% 81.83% -0.01%
==========================================
Files 401 401
Lines 100563 100736 +173
==========================================
+ Hits 82300 82433 +133
- Misses 18263 18303 +40
☔ View full report in Codecov by Sentry. |
Description
Closes #3868
How was this PR tested?
Running full test suite: