-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ObjectStore] Add SpawnService for running requests on different tokio runtime/Handle
#7253
base: main
Are you sure you want to change the base?
Conversation
@@ -195,6 +195,18 @@ impl HttpResponseBody { | |||
} | |||
} | |||
|
|||
impl Body for HttpResponseBody { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a drive-by improvement, and allows using HttpResponseBody with http_body_util::BodyExt
.
It technically isn't necessary, as object_store ignores any non-data frames and therefore HttpResponseBody::bytes
is sufficient, but this is cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is probably good context to add as an inline comment for future readers
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
//! Generic HTTP client abstraction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a little confusing having the public HTTP client abstraction mixed in with the client utilities for implementing object_store - so split it out into a separate module
} | ||
|
||
#[async_trait] | ||
impl<T: HttpService + Clone> HttpService for SpawnService<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit ouroborosy but I wonder if we should implement HttpService
for HttpClient
so that you can also wrap an HttpClient
with this
@@ -195,6 +195,18 @@ impl HttpResponseBody { | |||
} | |||
} | |||
|
|||
impl Body for HttpResponseBody { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is probably good context to add as an inline comment for future readers
|
||
/// Wraps a provided [`HttpService`] and runs it on a separate tokio runtime | ||
#[derive(Debug)] | ||
pub struct SpawnService<T: HttpService + Clone> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always find templated code like this harder to use and understand than just dyn ...
When it is important to make type specialized code for performance I understand the need
But when the usecase is to make a function call for an http request I think the mental overhead is unecessary
I realize this is a style opinion / point of view and that others may not feel the same way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I originally actually had this wrapping HttpClient, but it ended up a bit weird as you ended up doing
let client = HttpClient::new(SpawnService::new(HttpClient::new(reqwest::Client::new())))
As opposed to composing the services to yield the final client
let client = HttpClient::new(SpawnService::new(reqwest::Client::new()))
How would one configure a store with a custom HttpClient? I don't see anything that resembles with_http_client() on the stores |
0.12 (when it is released) adds |
Is the idea that this would be able poll the response stream on a different runtime without all the high level wrapping needed if we did it in ObjectStore? If so that sounds like a good ideae I am less clear if other layers (like chunked requests) for example make sense to do at this layer (as I think they also need to take into account things like verifing that the etag is the same for all responses, etc) |
Yes, similar story for metrics, tracing, etc... Caching could also be implemented at this layer quite easily, either by caching anything with an ETag header or by respecting the Cache-Control header, or some other mechanism.
Chunking can be done at this layer, however, it has the exact same restrictions as implementing it at the ObjectStore level:
It is unclear if these restrictions are too onerous for the feature to be useful. |
@tustvold do you think this would still land in a 0.12.x? I would like to do a deltalake 1.0 soon but this is kind of a must to have included since it's a better implementation then the DeltaStorageBackend we have now. |
I suspect the biggest potential blocker to that would be getting DataFusion updated to the 0.12 line, which in turn is waiting on the next breaking arrow release scheduled for the end of April.
FWIW given that arrow and DataFusion are not "stable", i.e. they make periodic breaking releases, I would have thought releasing deltalake 1.0 would quickly end up needing a deltalake 2.0, etc... IMO neither arrow nor datafusion should be 1.0, but there is a historical reason for it. |
That's fine! I will probably start with some RCs first, I was mainly curious that it gets published in the 0.12.x versioning.
I am mainly talking about the python API, which has been quite stable. Breaking changes generally effect our internal api's, and DF44 has resolved most of our issues. There are some minor issues still in datafusion but tbh these are edge cases so that's fine for now |
I will try and review this a bit more carefully tomorrow I think it could do with some more documentation / examples (which I can help with) I also think the name |
I filed apache/arrow-rs-object-store#13 and linked it to this ticket's close |
Handle
Thank you for this PR. We are in the process of moving the object_store code to its own repository. Would it be possible for you to create a PR in that repository instead?
(we will handle moving all existing issues to the new repository) |
Which issue does this PR close?
Rationale for this change
This allows spawning IO onto a separate tokio runtime
What changes are included in this PR?
Are there any user-facing changes?