-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: explain with statistics #7459
Conversation
77741ab
to
66a417c
Compare
--TableScan: simple_explain_test projection=[a, b, c], fetch=10 | ||
physical_plan | ||
GlobalLimitExec: skip=0, fetch=10, statistics=[rows=10, bytes=None, exact=false] | ||
--CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/example.csv]]}, projection=[a, b, c], limit=10, has_header=true, statistics=[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add a test which prints non-empty statistics
? I think it will be more useful to see what it looks like and verify it is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test for parquet with collected statistics (however at this moment bytes
are not collected for parquet files -- only rowcounts are available). Also, statistics is defined for in-memory tables and VALUES
function output, but I believe their inner record batches could possibly have different memory size, depending on environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice improvement to me -- thank you @korowa
I agree it would be nice to include these statistics everywhere but that that would make the PR massive and distruptive
Perhaps we can file a follow on ticket to enable the setting by default?
Which issue does this PR close?
Closes #7254.
Rationale for this change
Show physical plan operators statistics in
explain [analyze] [verbose]
output.What changes are included in this PR?
DisplayableExecutionPlan
now is able to display statistics for both ident (default text) and graphviz formats -- behaviour is controlled byshow statistics
field, which is implicitly set in explain handler and explain analyze operatordatafusion.explain.show_statistics
setting, disabled by default -- controls display statistics behaviourNew setting is sort of questionable , but as for now, execution plan comparisons are widely used while testing across multiple DF packages, and possible alternatives could be to keep plans as they are with statistics (which could lead to frequent adjustments of expected plan values in case of any statistics-related modifications) or building common plan normalizer for replacing actual statistics with a placeholders (seems to be unnecessary dependency for tests). So, any suggestions are welcome here!
Are these changes tested?
Covered by sqllogictests for explain & unit tests for
explain analyze
and graphviz format.Are there any user-facing changes?
Availability of statistics in
explain
output