Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect relation-level stats during compression #7520

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

erimatnor
Copy link
Contributor

@erimatnor erimatnor commented Dec 5, 2024

During compression, column min/max stats are collected on a per-segment basis for orderby columns and those that have indexes.

This change uses the same mechanism to collect relation-level min/max stats to be used by chunk skipping. This avoids, in worst case, an extra full table scan to gather these chunk column stats.

For simplicity, stats gathering is enabled for all columns that can support it, even though a column might use neither segment-level stats nor relation-level (chunk column) stats. The overhead of collecting min/max values should be negligible.

Disable-check: force-changelog-file

@erimatnor erimatnor force-pushed the compression-minmax-columnstats branch from 0200340 to 1111d62 Compare December 5, 2024 16:54
@erimatnor erimatnor force-pushed the compression-minmax-columnstats branch 3 times, most recently from d31700e to 6e50217 Compare December 5, 2024 16:59
During compression, column min/max stats are collected on a
per-segment basis for orderby columns and those that have indexes.

This change uses the same mechanism to collect relation-level min/max
stats to be used by chunk skipping. This avoids, in worst case, an
extra full table scan to gather these chunk column stats.

For simplicity, stats gathering is enabled for all columns that can
support it, even though a column might use neither segment-level stats
nor relation-level (chunk column) stats. The overhead of collecting
min/max values should be negligible.
@erimatnor erimatnor force-pushed the compression-minmax-columnstats branch from 6e50217 to 25d61ff Compare December 5, 2024 17:01
Copy link

codecov bot commented Dec 5, 2024

Codecov Report

Attention: Patch coverage is 89.36170% with 10 lines in your changes missing coverage. Please review.

Project coverage is 82.17%. Comparing base (59f50f2) to head (25d61ff).
Report is 641 commits behind head on main.

Files with missing lines Patch % Lines
src/ts_catalog/chunk_column_stats.c 66.66% 1 Missing and 3 partials ⚠️
tsl/src/compression/compression.c 87.50% 0 Missing and 4 partials ⚠️
tsl/src/compression/api.c 75.00% 0 Missing and 1 partial ⚠️
tsl/src/compression/segment_meta.c 97.82% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7520      +/-   ##
==========================================
+ Coverage   80.06%   82.17%   +2.11%     
==========================================
  Files         190      230      +40     
  Lines       37181    43183    +6002     
  Branches     9450    10854    +1404     
==========================================
+ Hits        29770    35487    +5717     
- Misses       2997     3369     +372     
+ Partials     4414     4327      -87     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant