-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming proposals #53
base: master
Are you sure you want to change the base?
Conversation
5160f7f
to
3992c88
Compare
…g the whole body in memory
Hi @mendess I think this is it, tests are green locally, I think I am happy with the results, would you be open to discuss these changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this seems like a very good idea. Overall I would just make things more async friendly
impl<A> Iterator for ItemIterator<A> { | ||
type Item = Result<A, Error>; | ||
|
||
fn next(&mut self) -> Option<Self::Item> { | ||
self.receiver.recv().ok() | ||
} | ||
} | ||
|
||
Box::new(ItemIterator { receiver }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use a tokio channel here you can probably implement stream instead of iterator to keep things async. You might even be able to implement both, depending on what the tokio docs say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my question here: #53 (comment)
error: e.into(), | ||
url: self.download_uri.inner().clone(), | ||
})?; | ||
writer.write_all(&chunk)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use a tokio::fs::File here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see tokio
is not available only with the bin
feature. Would it be ok to make tokio a non optional dependency?
src/bulk.rs
Outdated
async fn get_reader(&self) -> crate::Result<BufReader<impl std::io::Read + Send>> { | ||
|
||
let response = self.download_uri.fetch_raw_blocking()?; | ||
let body = response.into_body(); | ||
Ok(BufReader::new(body.into_reader())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of issuing a blocking request inside an async fn, this will break a lot of code. Can we keep using reqwest async and change the return type here to be something that impls AsyncRead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, the difficulty (at least for me) is to translate between non-async (serde traits) and async (rest of the code) context. I am still looking into this, but I have a few ideas already...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so the blocking code is definitely a problem, but, what would we gain with the result having an async interface? In the end this reader will be used by serde_json::Deserializer
which can only work with non async readers anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found another instance of a blocking call in an async context, so I need to change the strategy here... I'll get back to you with a solution probably next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to know here as well if tokio is allowed as a dependency (see other comments) as it can affect the solution.
@@ -237,7 +237,7 @@ pub struct ImageUris { | |||
/// | |||
/// For more details, see the [official documentation](https://scryfall.com/docs/api/cards). | |||
#[derive(Serialize, Deserialize, Clone, PartialEq, Debug)] | |||
#[cfg_attr(test, serde(deny_unknown_fields))] | |||
// #[cfg_attr(test, serde(deny_unknown_fields))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be restored. Tests were failing because of some new attributes in the scryfall API, this needs to be fixed separately.
Hi @mendess there are 2 proposals / ideas in this PR that I'd like to discuss with you.
stream_iterator
module.Unfortunately I don't remember the exact source of this solution, maybe it was one of the issues in the
serde
repo. This variant uses a Visitor and channels to emit each item in the top level array. To me this seems like a cleaner implementation, and (according to my tests) it is sligtly faster then the current one.Here the idea is to not load the full response body into the memory but write the byte stream directly into a file. This might be a bit slower (maybe just a temprary blip in my connection), but at least it's only using around 10M peak memory.
The non cached version would require some more work, and I think I have an idea how to solve that too, but I wanted to wait for you to give the green light.Update: the non caching version has been reimplemented in the same memory efficient way, although it is using another http request library
ureq
for its non async interface (required byserde)
.What do you think, would you be interested in either of these ideas? If yes I am happy to clean up to code based on your guidelines and open 2 different PRs for the different features.
This draft also contains #52 just to make
all_cards
parsing work, please diregard that part here.