Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Slightly) Slower reported performance on ClickBenck benchmarks in DataFusion 34.0.0 than DataFusion 33.0.0 #8836

Open
alamb opened this issue Jan 11, 2024 · 3 comments
Labels
bug Something isn't working performance Make DataFusion faster

Comments

@alamb
Copy link
Contributor

alamb commented Jan 11, 2024

Describe the bug

As part of #8789, @kmitchener ran the ClickBench results using DataFusion 34.0.0 and compared to DataFusion 33.0.0 they appear to go slightly slower.

I would like to know why the benchmark shows it going slightly slower

To Reproduce

He ran the v33 benchmarks on the same instance and modified the benchmark so it will display both 33 and 34 at the same time so you can compare the runs:
image

You can grab that from -> https://github.com/kmitchener/ClickBench/blob/new-run-of-datafusion-33/index.html

Expected behavior

Each release should be as good or better than the last

Additional context

No response

@alamb alamb added bug Something isn't working performance Make DataFusion faster labels Jan 11, 2024
@Dandandan
Copy link
Contributor

I wonder if this is really slower or it is just noise.

Note that the benchmark runs on c6a.4xlarge and EBS (gp2), which contribute to variations in performance (i.e. load from other users).

@alamb
Copy link
Contributor Author

alamb commented Jan 12, 2024

I wonder if this is really slower or it is just noise.

Note that the benchmark runs on c6a.4xlarge and EBS (gp2), which contribute to variations in performance (i.e. load from other users).

I wondered the same thing but @kmitchener seems to have been able to reproduce the difference reliably #8789 (comment) 🤔

@alamb
Copy link
Contributor Author

alamb commented Mar 9, 2024

Update here is that we see the same small slowdown in version 36.

I was thinking perhaps it could be due to the overhead of reading/parsing per-file metadata. More details here: #9404 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance Make DataFusion faster
Projects
None yet
Development

No branches or pull requests

2 participants