Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support fetch ranges in concurrent #2959

Merged
merged 4 commits into from
Dec 22, 2023

Conversation

QuenKar
Copy link
Contributor

@QuenKar QuenKar commented Dec 19, 2023

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

support concurrently fetch page ranges.
now the policy is that if object store support blocking, use blocking read priorly,
if not, use concurrently read.

Benchmark

Env

Local PC: mac m2 16G RAM 512 SSD

Config

[storage]
type = S3
region = minipc-0
root = "test-concurrent-read"
bucket = "test-bucket-0"

Result

fetch parquet according to ranges in sequence use develop branch, and this branch is concurrent.

Query Use Sequence Blocking Read(mean) Use Concurrent Read(mean)
cpu-max-all-1 1089.73ms 446.90ms
cpu-max-all-8 6287.86ms 1792.10ms
double-groupby-5 4230.23ms 3121.35ms
double-groupby-1 5581.11ms 4187.89ms
double-groupby-all 7625.86ms 6191.14ms
groupby-orderby-limit 2432.57ms 2731.98ms
high-cpu-1 345.14ms 229.95ms
high-cpu-all 7983.64ms 8101.72ms
lastpoint 12468.17ms 12795.19ms
single-groupby-1-1-1 134.69ms 126.53ms
single-groupby-1-1-12 160.07ms 157.99ms
single-groupby-1-8-1 371.19ms 397.88ms
single-groupby-5-1-1 152.95ms 222.45ms
single-groupby-5-1-12 183.41ms 222.23ms
single-groupby-5-8-1 554.36ms 528.04ms

conclusion

It's still slightly improved until lastpoint query
because the lastpoint query basically reads the whole table, and all the pages are cached in memory, and there is not much difference in the following queries.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.

Refer to a related PR or issue link (optional)

Closes #2808

@QuenKar QuenKar changed the title feat: concurrent fetch ranges feat: support fetch ranges in concurrent Dec 19, 2023
@QuenKar QuenKar marked this pull request as ready for review December 20, 2023 02:27
Copy link

codecov bot commented Dec 20, 2023

Codecov Report

Attention: 27 lines in your changes are missing coverage. Please review.

Comparison is base (6221e5b) 85.90% compared to head (a6265e4) 85.39%.
Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2959      +/-   ##
===========================================
- Coverage    85.90%   85.39%   -0.51%     
===========================================
  Files          780      780              
  Lines       125900   125980      +80     
===========================================
- Hits        108155   107585     -570     
- Misses       17745    18395     +650     

src/mito2/src/sst/parquet/row_group.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/parquet/row_group.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/parquet/row_group.rs Show resolved Hide resolved
@WenyXu WenyXu requested review from evenyag and zhongzc December 20, 2023 04:24
src/mito2/src/sst/parquet/row_group.rs Show resolved Hide resolved
src/mito2/src/sst/parquet/row_group.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/parquet/row_group.rs Outdated Show resolved Hide resolved
@QuenKar QuenKar force-pushed the feat/concurrent-fetch-parquet branch from d90146c to a70eebf Compare December 21, 2023 06:55
@killme2008 killme2008 added the C-performance Category Performance label Dec 22, 2023
@killme2008
Copy link
Contributor

A great job! But I think we can do a benchmark for this feature. We don't know if this feature brings the performance upgrading or downgrading.

@QuenKar QuenKar added the docs-not-required This change does not impact docs. label Dec 22, 2023
@evenyag
Copy link
Contributor

evenyag commented Dec 22, 2023

A great job! But I think we can do a benchmark for this feature. We don't know if this feature brings the performance upgrading or downgrading.

@QuenKar Could you please add the benchmark result to the description?

Copy link
Contributor

@evenyag evenyag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhongzc zhongzc added this pull request to the merge queue Dec 22, 2023
@zhongzc zhongzc removed this pull request from the merge queue due to a manual request Dec 22, 2023
@zhongzc zhongzc added this pull request to the merge queue Dec 22, 2023
@QuenKar QuenKar removed this pull request from the merge queue due to a manual request Dec 22, 2023
@QuenKar QuenKar force-pushed the feat/concurrent-fetch-parquet branch from a70eebf to a6265e4 Compare December 22, 2023 08:35
@evenyag evenyag added this pull request to the merge queue Dec 22, 2023
Merged via the queue into GreptimeTeam:develop with commit a7349b5 Dec 22, 2023
15 checks passed
@WenyXu
Copy link
Member

WenyXu commented Jan 1, 2024

Similar idea: apache/opendal#3675

@WenyXu WenyXu mentioned this pull request Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Category Performance docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fetch pages concurrently from object store
5 participants