Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(mito): scan SSTs and memtables in parallel #2852

Merged
merged 33 commits into from
Dec 11, 2023

Conversation

evenyag
Copy link
Contributor

@evenyag evenyag commented Nov 30, 2023

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

This PR scans SSTs and memtables in parallel. Scanning each file or memtable is still sequential.

The implementation of parallel scan is quite intuitive: It spawns a scan task for each data source and yields the scanned batch to the merge reader via a channel.

There are two new options for parallel scan.

  • scan_parallelism: The default parallelism is 1/4 of CPU cores
  • parallel_scan_channel_size: The default channel size is 32

TSBS results

query type parallelism 1 avg (ms) parallelism 4 avg (ms)
cpu-max-all-1 55.80 27.89
cpu-max-all-8 202.82 114.03
double-groupby-1 1144.48 469.67
double-groupby-5 2650.60 883.85
double-groupby-all 4411.23 1589.75
groupby-orderby-limit 1507.01 502.63
high-cpu-1 52.65 24.80
high-cpu-all 8058.48 5484.44
lastpoint 14225.00 7393.31
single-groupby-1-1-1 20.81 12.80
single-groupby-1-1-12 34.29 13.47
single-groupby-1-8-1 68.47 42.45
single-groupby-5-1-1 28.22 16.57
single-groupby-5-1-12 42.64 19.21
single-groupby-5-8-1 94.53 61.04

Larger parallelism might not always help as the total number of SSTs to scan might be less than the parallelism.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.

Refer to a related PR or issue link (optional)

@evenyag evenyag marked this pull request as ready for review December 5, 2023 12:41
Copy link

codecov bot commented Dec 6, 2023

Codecov Report

Merging #2852 (f1210b8) into develop (f9e7762) will decrease coverage by 0.30%.
Report is 10 commits behind head on develop.
The diff coverage is 95.41%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2852      +/-   ##
===========================================
- Coverage    84.73%   84.44%   -0.30%     
===========================================
  Files          749      754       +5     
  Lines       117900   118785     +885     
===========================================
+ Hits         99900   100303     +403     
- Misses       18000    18482     +482     

config/datanode.example.toml Show resolved Hide resolved
src/mito2/src/read/seq_scan.rs Show resolved Hide resolved
@evenyag evenyag requested review from fengjiachun and WenyXu December 8, 2023 07:57
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fengjiachun fengjiachun added this pull request to the merge queue Dec 11, 2023
Merged via the queue into GreptimeTeam:develop with commit 6a57f49 Dec 11, 2023
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants