-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge the small read io #3072
Comments
See also: |
Hello, I'm new to the project and I would like to contribute. Not sure if this is a good first issue though. As I understand it, you're concurrently fetching data from Parquet files using a function that receives a vector of ranges, and your concern is that too many small requests can lead to expensive bills, so an optimization would be to merge ranges that are close together before fetching data, is that right? In this case, what would be a reasonable distance between the ranges? As I am not familiar with the kind of data that is being fetched. |
Thanks @L-Fiori. My bad; my colleague @QuenKar is working on this. I forget to update this issue.
This is key to this issue, and my colleague is doing some benchmarking to figure it out. We will use these benchmark results to select an optimized range distance(and the benchmark results may be posted in related PRs). |
Interesting! I'll stay tuned for other issues I might want to tackle, thanks for the reply ;) |
We still send a lot of small IO requests. |
Originally posted by @WenyXu in #2959 (comment)
The text was updated successfully, but these errors were encountered: