Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming proposals #53

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

voidcontext
Copy link
Contributor

@voidcontext voidcontext commented Jan 28, 2025

Hi @mendess there are 2 proposals / ideas in this PR that I'd like to discuss with you.

  1. stream_iterator module.

Unfortunately I don't remember the exact source of this solution, maybe it was one of the issues in the serde repo. This variant uses a Visitor and channels to emit each item in the top level array. To me this seems like a cleaner implementation, and (according to my tests) it is sligtly faster then the current one.

  1. streaming download

Here the idea is to not load the full response body into the memory but write the byte stream directly into a file. This might be a bit slower (maybe just a temprary blip in my connection), but at least it's only using around 10M peak memory. The non cached version would require some more work, and I think I have an idea how to solve that too, but I wanted to wait for you to give the green light.

Update: the non caching version has been reimplemented in the same memory efficient way, although it is using another http request library ureq for its non async interface (required by serde).

What do you think, would you be interested in either of these ideas? If yes I am happy to clean up to code based on your guidelines and open 2 different PRs for the different features.

This draft also contains #52 just to make all_cards parsing work, please diregard that part here.

@voidcontext
Copy link
Contributor Author

Hi @mendess I think this is it, tests are green locally, I think I am happy with the results, would you be open to discuss these changes?

Copy link
Owner

@mendess mendess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems like a very good idea. Overall I would just make things more async friendly

Comment on lines +61 to +69
impl<A> Iterator for ItemIterator<A> {
type Item = Result<A, Error>;

fn next(&mut self) -> Option<Self::Item> {
self.receiver.recv().ok()
}
}

Box::new(ItemIterator { receiver })
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use a tokio channel here you can probably implement stream instead of iterator to keep things async. You might even be able to implement both, depending on what the tokio docs say

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my question here: #53 (comment)

error: e.into(),
url: self.download_uri.inner().clone(),
})?;
writer.write_all(&chunk)?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use a tokio::fs::File here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see tokio is not available only with the bin feature. Would it be ok to make tokio a non optional dependency?

src/bulk.rs Outdated
Comment on lines 124 to 128
async fn get_reader(&self) -> crate::Result<BufReader<impl std::io::Read + Send>> {

let response = self.download_uri.fetch_raw_blocking()?;
let body = response.into_body();
Ok(BufReader::new(body.into_reader()))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of issuing a blocking request inside an async fn, this will break a lot of code. Can we keep using reqwest async and change the return type here to be something that impls AsyncRead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, the difficulty (at least for me) is to translate between non-async (serde traits) and async (rest of the code) context. I am still looking into this, but I have a few ideas already...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the blocking code is definitely a problem, but, what would we gain with the result having an async interface? In the end this reader will be used by serde_json::Deserializer which can only work with non async readers anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found another instance of a blocking call in an async context, so I need to change the strategy here... I'll get back to you with a solution probably next week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to know here as well if tokio is allowed as a dependency (see other comments) as it can affect the solution.

@@ -237,7 +237,7 @@ pub struct ImageUris {
///
/// For more details, see the [official documentation](https://scryfall.com/docs/api/cards).
#[derive(Serialize, Deserialize, Clone, PartialEq, Debug)]
#[cfg_attr(test, serde(deny_unknown_fields))]
// #[cfg_attr(test, serde(deny_unknown_fields))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be restored. Tests were failing because of some new attributes in the scryfall API, this needs to be fixed separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants