-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Add query benchmarking suite for pg_analytics #131
test: Add query benchmarking suite for pg_analytics #131
Conversation
Signed-off-by: shamb0 <[email protected]>
Signed-off-by: shamb0 <[email protected]>
Signed-off-by: shamb0 <[email protected]>
Signed-off-by: shamb0 <[email protected]>
- Profiled query performance on a foreign table with and without the DuckDB metadata cache enabled - Tested on Hive-style partitioned data in S3 to simulate real-world scenarios Signed-off-by: shamb0 <[email protected]>
…d testcontainers version - Merged changes from [PR#30](paradedb#30). - Integrated benchmarking for Hive-style partitioned Parquet file source. - Applied a patched version of to address an async container cleanup issue. Signed-off-by: shamb0 <[email protected]>
- Verified: - Test harness: pass - Integration test: pass - Benchmarking: pass Signed-off-by: shamb0 <[email protected]>
Signed-off-by: shamb0 <[email protected]>
…ity and consistency in tests. - Adjusted module imports accordingly. Signed-off-by: shamb0 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting the PR to paradedb/paradedb
for the analytics work. We'll review that and get it merged there. If you want to adjust this PR to no longer contain that work, we can review this one too
…benchmarks Signed-off-by: shamb0 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shamb0 these PRs have grown extremely large and are basically impossible to review properly.
Could we scope this work better so it's easier to merge? They've been blocked for a long time because of how large they are.
- cargo-paradedb should host all benchmarking code
- it should not depend on Postgres and simply take a PG DB URL
- The new test fixtures are nice, but perhaps we can PR them separately with a rationale for why we need new test fixtures?
This will help bring this in
Hi @philippemnoel, Haha, I definitely understand the challenge of reviewing large PRs! You're right—this PR is a combination of changes from multiple sources, which has made it more complex, breaking these down will significantly improve the review process. Based on your suggestions, here's my proposed approach to restructure the work:
This breakdown should make each PR more manageable and easier to review. I'll start implementing these changes right away. Do you think this approach addresses your concerns? |
Sounds promising! |
Going to close this until a more scoped PR is raised. |
Ticket(s) Closed
pg_analytics
#57This PR is part of a pair; please consider both for review and merge.
paradedb/paradedb#1703
#131
What
This PR implements benchmarking functionality to analyze query performance under different caching conditions across various data sources supported by
pg_analytics
.Why
To evaluate how different cache configurations impact query performance, ensuring that the system optimally handles various data sources and caching scenarios.
How
The test function follows a structured flow:
criterion
framework.criterion
.The SQL command below is used to toggle Parquet metadata caching (In-memory):
Where
cache_setting
can be either "true" or "false", depending on the test scenario.Benchmarking
To run the benchmarking, use the following command:
cd ./cargo-paradedb RUST_LOG=info cargo run -- paradedb pga-bench parquet-run-all
Integration Notes
The diagram below outlines key components and their interactions, providing a high-level overview of the prototype design: