Run super command perf tests in vector runtime that couldn't before #5552

philrz · 2024-12-28T00:38:54Z

What's Changing

Query performance results are added to the super command doc for tests that couldn't be run previously, thanks to the merge of recent PRs that enabled new functionality in vector runtime.

Why

This allows us to get rid of some cells in the results summary where we previously had disclaimers saying we couldn't run these yet.

Details

When making these updates I hoped to show vector CSUP performance with these queries as well, but I bumped into new issue #5550. Therefore I've left the relevant parts of the scripts commented out for now, but will revisit once #5550 is addressed.

philrz · 2024-12-28T00:44:05Z

docs/commands/super.md

+This code path in `super` is not multi-threaded so not particularly performant,
+but on our test machine it runs a bit faster than both the `duckdb` method of
+creating a schema-fused table or loading the data to the `clickhouse` beta JSON type.


Even though I used the same DuckDB release version and AWS instance type as the last time I ran these tests, for some reason DuckDB did happen to run this "table creation" step substantially faster than last time (328 seconds now vs. 513 seconds last time). I can't explain it offhand and I'm not inspired to take a detour and study it deeply, but in the interest of science I've corrected the summary here to reflect that. I'll keep an eye on it in future runs and maybe the operation just turns out to have high variance.

philrz · 2024-12-28T00:46:38Z

docs/commands/super.md

 ClickHouse's beta JSON type_

 Since DuckDB with its native format could successfully run all queries with
 decent performance, we used it as the baseline for all of the speed-up factors.

 To summarize,
 `super` with Super Binary is substantially faster than multiple relational systems for
-the search use cases and performs on par with the others for traditional OLAP queries,
+the search use cases, and with Parquet performs on par with the others for traditional OLAP queries,


Especially now that we have across-the-boards Parquet results, I figured I might as well clarify that those are the ones where super perf comes closest with OLAP queries. Looking forward to seeing CSUP showing the best performance. 🤞

philrz added 4 commits December 27, 2024 15:41

Update super command perf scripts to include more vector tests

7f8e288

Update super command perf doc with new results

0e18582

Adjust summary of load times

dd35ab9

Don't clone this branch

7c324a1

philrz requested a review from a team December 28, 2024 00:38

philrz self-assigned this Dec 28, 2024

philrz commented Dec 28, 2024

View reviewed changes

Add clickhouse-client raw results I missed

b189af0

philrz changed the title ~~Super cmd perf dec2024~~ Run super command perf tests in vector runtime that couldn't before Dec 28, 2024

philrz mentioned this pull request Dec 28, 2024

Vector CSUP search query consumes 75+ GB memory #5550

Open

mattnibs approved these changes Jan 6, 2025

View reviewed changes

mccanne approved these changes Jan 7, 2025

View reviewed changes

philrz merged commit a070a07 into main Jan 7, 2025
5 checks passed

philrz deleted the super-cmd-perf-dec2024 branch January 7, 2025 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run super command perf tests in vector runtime that couldn't before #5552

Run super command perf tests in vector runtime that couldn't before #5552

philrz commented Dec 28, 2024 •

edited

Loading

philrz Dec 28, 2024

philrz Dec 28, 2024

Run super command perf tests in vector runtime that couldn't before #5552

Run super command perf tests in vector runtime that couldn't before #5552

Conversation

philrz commented Dec 28, 2024 • edited Loading

What's Changing

Why

Details

philrz Dec 28, 2024

Choose a reason for hiding this comment

philrz Dec 28, 2024

Choose a reason for hiding this comment

philrz commented Dec 28, 2024 •

edited

Loading