-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Optimizing pipeline performance #4390
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
4098a98
to
cf299cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comments for the purpose of the newly added fields and methods, like
processors.output_keys();
processors.required_keys();
processors.required_original_keys();
transforms.required_keys();
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4390 +/- ##
==========================================
- Coverage 84.95% 84.64% -0.31%
==========================================
Files 1091 1091
Lines 194496 195587 +1091
==========================================
+ Hits 165229 165559 +330
- Misses 29267 30028 +761 |
I think we must deliver this PR in v0.9.1, so please take a look @sunng87 @shuiyisong @zhongzc |
b38a888
to
c34c92f
Compare
* chore: improve pipeline performance * chore: use arc to improve time type * chore: improve pipeline coerce * chore: add vec refactor * chore: add vec pp * chore: improve pipeline * inprocess * chore: set log ingester use new pipeline * chore: fix some error by pr comment * chore: fix typo * chore: use enum_dispatch to simplify code * chore: some minor fix * chore: format code * chore: update by pr comment * chore: fix typo * chore: make clippy happy * chore: fix by pr comment * chore: remove epoch and date process add new timestamp process * chore: add more test for pipeline * chore: restore epoch and date processor * chore: compatibility issue * chore: fix by pr comment * chore: move the evaluation out of the loop * chore: fix by pr comment * chore: fix dissect output key filter * chore: fix transform output greptime value has order error * chore: keep pipeline transform output order * chore: revert tests * chore: simplify pipeline prepare implementation * chore: add test for timestamp pipelin processor * chore: make clippy happy * chore: replace is_some check to match --------- Co-authored-by: shuiyisong <[email protected]>
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
There are over a thousand lines of code that are test data and benchmarks can be ignored.
src/pipeline/benches/data.log
is test data. andsrc/pipeline/benches/processor.rs
is benchmark.timeindex
andtimestamp
are supported to specify the timeindex in the transform.Original pipeline workflow
Current pipeline workflow
In the original method, there is a lot of data allocation and drop, and a lot of hash get operations. By converting the data into a vector, hash get operations are avoided. And the same batch of data share a common intermediate state. After testing, the performance of the same pipeline processing the same data is improved by more than 30%.
Checklist