-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Add query benchmarking suite for pg_analytics #1703
test: Add query benchmarking suite for pg_analytics #1703
Conversation
…lytics Signed-off-by: shamb0 <[email protected]>
Signed-off-by: shamb0 <[email protected]>
…update-demo-multi-level-partition-table-dset-auto-sales
Signed-off-by: shamb0 <[email protected]>
…update-demo-multi-level-partition-table-dset-auto-sales
- Introduced new benchmarking modules: - Renamed to for clarity - Integrated as a submodule - Added fixtures module to support testing Signed-off-by: shamb0 <[email protected]>
Signed-off-by: shamb0 <[email protected]>
Signed-off-by: shamb0 <[email protected]>
.gitmodules
Outdated
@@ -0,0 +1,4 @@ | |||
[submodule "pg_analytics/downloads"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this without a submodule please? The code should just be in cargo-paradedb
. We plan to eventually move cargo-paradedb
to a separate repository
cargo-paradedb/Cargo.toml
Outdated
@@ -4,14 +4,22 @@ version = "0.10.1" | |||
edition = "2021" | |||
|
|||
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | |||
[features] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neilyio should review here. I believe he intentionally made cargo-paradedb
be Postgres-agnostic. We may want to preserve that if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's this strange folder called /downloads
in pg_analytics
. Can we remove that?
Hi @philippemnoel, Thank you for your feedback on the PR. I've made some changes to address the issues you raised. In the initial implementation, the To resolve this, I've implemented a solution that decouples the %% Top-to-Bottom Layout
flowchart TB
subgraph paradedb["paradedb"]
direction TB
package11["cargo-paradedb<br>(rs)<br>Common Benchmarking<br>Orchestrator"]
end
subgraph pg_analytics["pg_analytics"]
direction TB
package21["tests<br>(rs)<br>Integration Test<br>"]
package22["pga_fixtures"]
end
subgraph Postgres["Postgres"]
direction TB
package31["pg_search<br>(rs)<br>Extension<br>"]
package32["pg_analytics<br>(rs)<br>Extension<br>"]
end
package21 -->|Uses| package22
package11 -->|Uses<br>| package22
package11 -->|Query| package31
package22 -->|Query| package32
I believe this updated patch meets the requirements we discussed. I'm eager to hear your thoughts on the refactored code and am open to any further suggestions or adjustments you might have. Please let me know if you need any additional information or clarification on the changes I've made. |
Signed-off-by: shamb0 <[email protected]>
@@ -16,3 +16,6 @@ inherits = "release" | |||
debug = true | |||
lto = "thin" | |||
codegen-units = 32 | |||
|
|||
[patch.crates-io] | |||
testcontainers = { package = "testcontainers", git = "https://github.com/shamb0/testcontainers-rs.git", rev = "b05c13d" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be upstreamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove this file?
tracing-subscriber = { version = "0.3.18", features = ["env-filter", "time"] } | ||
datafusion = { version = "37.1.0" } | ||
tokio = { version = "1.0", features = ["full"] } | ||
bytes = { version = "1.7.1" } | ||
prettytable = { version = "0.10.0" } | ||
time = { version = "0.3.34", features = ["serde", "macros", "local-offset"] } | ||
rand = { version = "0.8.5" } | ||
approx = { version = "0.5.1" } | ||
bigdecimal = { version = "0.3.1", features = ["serde"] } | ||
soa_derive = { version = "0.13.0" } | ||
aws-config = { version = "1.5.6" } | ||
aws-sdk-s3 = { version = "1.49.0" } | ||
serde_arrow = { version = "0.11.7", features = ["arrow-51"] } | ||
testcontainers = { version = "0.22.0" } | ||
testcontainers-modules = { version = "0.10.0", features = ["localstack"] } | ||
rstest = { version = "0.19.0" } | ||
duckdb = { git = "https://github.com/paradedb/duckdb-rs.git", features = [ | ||
"bundled", | ||
"extensions-full", | ||
], rev = "e532dd6" } | ||
cargo_metadata = { version = "0.18.0" } | ||
camino = { version = "1.0.7", features = ["serde1"] } | ||
pga_fixtures = { package = "pga_fixtures", git = "https://github.com/shamb0/pg_analytics", rev = "4511eba" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need all of these new dev dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the rename?
@@ -0,0 +1,59 @@ | |||
--- | |||
title: Multilevel Partition Tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi-Level
@@ -15,7 +15,7 @@ | |||
// You should have received a copy of the GNU Affero General Public License | |||
// along with this program. If not, see <http://www.gnu.org/licenses/>. | |||
|
|||
use crate::benchmark::Benchmark; | |||
use crate::benchmark::{pga_bench_parquet, pgs_bench_sanity}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why sanity?
This is still too complicated for my liking. The current state of In that world, we wouldn't need this complex setup. We will eventually move |
Going to close this until a more scoped PR is raised. |
Ticket(s) Closed
Sink
#57This PR is part of a pair; please consider both for review and merge.
#1703
paradedb/pg_analytics#131
What
This PR implements benchmarking functionality to analyze query performance under different caching conditions across various data sources supported by
pg_analytics
.Why
To evaluate how different cache configurations impact query performance, ensuring that the system optimally handles various data sources and caching scenarios.
How
The test function follows a structured flow:
criterion
framework.criterion
.The SQL command below is used to toggle Parquet metadata caching (In-memory):
Where
cache_setting
can be either "true" or "false", depending on the test scenario.Benchmarking
To run the benchmarking, use the following command:
cd ./cargo-paradedb RUST_LOG=info cargo run -- paradedb pga-bench parquet-run-all
Integration Notes
The diagram below outlines key components and their interactions, providing a high-level overview of the prototype design: