-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for partial aggregations at chunk level #5596
Support for partial aggregations at chunk level #5596
Conversation
be248e9
to
e1ea7ce
Compare
c835fa8
to
7bde4fb
Compare
7454af4
to
a1a17ef
Compare
8487617
to
ec264b3
Compare
3b63e30
to
7bd1a7c
Compare
Codecov Report
@@ Coverage Diff @@
## main #5596 +/- ##
==========================================
+ Coverage 81.44% 81.51% +0.07%
==========================================
Files 246 246
Lines 56197 56531 +334
Branches 12460 12516 +56
==========================================
+ Hits 45767 46080 +313
+ Misses 8097 8086 -11
- Partials 2333 2365 +32
... and 23 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
09d0bb4
to
a0db2f3
Compare
@erimatnor I addressed the comments/questions you added. Could you do a re-review of the PR? |
So far, we have created fake partitioning info for hypertables if the PostgreSQL setting 'enable_partitionwise_aggregate' is set. This causes PostgreSQL to push down partial aggregations to the chunk level. However, the PostgreSQL code has some drawbacks because the query is replanned and optimizations like ChunkAppend are lost. Since timescale#5596 we have implemented our own code to push down partial aggregations. Therefore, we can ignore the PostgreSQL setting from now on.
So far, we have created fake partitioning info for hypertables if the PostgreSQL setting 'enable_partitionwise_aggregate' is set. This causes PostgreSQL to push down partial aggregations to the chunk level. However, the PostgreSQL code has some drawbacks because the query is replanned and optimizations like ChunkAppend are lost. Since timescale#5596 we have implemented our own code to push down partial aggregations. Therefore, we can ignore the PostgreSQL setting from now on.
54e57ce
to
d77f95a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving. The patch looks fine and seems to do the right thing based on test output.
However, there are some things I am not sure I understand still, which boils down to some opaqueness in the code I think. Some of the opaqueness is probably inherited from PostgreSQL, but I don't think that is a good reason why our code should also be hard to understand.
I put some discussion comments inline. I am mostly looking for clarification and validation that my intuition is correct and that the code does what I think it is doing. But ideally, I should not have to "guess" to understand the code.
* Get all subpaths from a Append, MergeAppend, or ChunkAppend path | ||
*/ | ||
static List * | ||
get_subpaths_from_append_path(Path *path, bool handle_gather_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion: Maybe it is more intuitive to understand this if it is generalized to something like a max_recurse
integer that you decrement every recursive call and stop recursion when hitting 0?
You might also want to expand the comment to explain special handling of GatherPath.
With regards to reason (1) above, I am wondering why there are gather nodes in non-partial paths? Or is this just a check for an error condition?
{ | ||
List *subpaths = get_subpaths_from_append_path(path, true); | ||
|
||
Ensure(subpaths != NIL, "Unable to determine aggregation type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure(subpaths != NIL, "Unable to determine aggregation type"); | |
Ensure(subpaths != NIL, "unable to determine aggregation type"); |
if (!is_sorted) | ||
{ | ||
path = (Path *) create_sort_path(root, path->parent, path, root->group_pathkeys, -1.0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear you and I've seen that too.
Sorry to belabor the point, however, but to clarify: I was not correct using the term "shadowing" since a new variable with same name is not defined, so I wasn't clear in my comment.
What I meant was that it might be easier to follow what the code does with aptly named variables instead of reusing the generically named path
variable. The intention of the code here is to ensure that the Path is returning ordered data, which can be achieved by the path itself or by adding a sort path on top. Reusing "path" makes it more opaque what path node we are dealing with and what the code did in the previous step.
Further below, and elsewhere in the patch, you don't reuse path
but instead use a new variable name to indicate similar intention, e.g., via a new sorted_agg_path
. I am simply arguing for the same intention-by-name convention here to more easily understand what is going on in the code. In particular, in big functions it becomes increasingly hard to understand what type of path a path
variable is at a specific line.
As a reviewer, I don't see (or even know) that this code originates from PostgreSQL, so I make suggestions I think will improve the understandability and maintainability of the code irrespective of code origin.
Thus, IMHO, let's not make PostgreSQL and (maybe) future copy-pasting an argument for not making changes to our code based on reviewer suggestions.
Code might change upstream as well, making future code adoption hard in other ways. Our outset should be that we own and understand what this code does, allowing us to make the changes we need.
|
||
if (sorted_subpaths != NIL) | ||
{ | ||
add_path(partially_grouped_rel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow. Doesn't all partial paths require finalization with a Gather node, regardless of whether they are parallel or not?
I am just trying to understand whether we are adding paths to the right path "list" as there are two "pathlists" in a RelOptInfo. One rel->pathlist
and one rel->partial_pathlist
. I don't recall the difference between these two so I am asking to ensure we are doing the right thing here.
src/planner/partialize.c
Outdated
} | ||
|
||
/* | ||
* Generate a total aggregation path for partial aggregations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "total aggregation" path? Is it one that includes the Gather node to make it "total" or is it a non-partial path?
d_num_groups, | ||
extra_data); | ||
|
||
/* The same as above but for partial paths */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this confusing: We just generated push-down paths above, which in my understanding are also partial paths. (Please correct me if I am wrong.) But here it sounds like the partial paths are generated here, implying it was not done in the previous step.
I think what is happening is that we are conflating partial and parallel. (And maybe PostgreSQL does too.)
So, in my understanding, we can have partial paths that are both parallel and non-parallel. Thus:
- We use non-parallel partials to push down aggregates to individual chunk relations ("partitionwise" agg).
- We can also generate parallel execution paths, which also require partials since they execute in different workers ("parallel" agg).
Thus, I think in the previous step we generated partial push-down paths that are non-parallel. Here we generate partial paths to be executed in parallel (given the consider_parallel
check).
Unless I am wrong, I think PostgreSQL uses rel->pathlist
to store non-parallel paths and rel->partial_pathlist
to store parallel paths although both can contain partials if we do partitionwise aggregation (which makes the naming super-confusing). I guess PostgreSQL's naming convention here is confusing and legacy and probably a result of initially only using partial paths in the parallel case.
Currently, I think it is hard to understand the difference between what generate_agg_pushdown_path
does above compared to generate_partial_agg_pushdown_path
below. And I am not even sure my understanding is correct here. Is there a way to clarify in function names and comments?
I am wondering if it would not be more clear to say that above we generate "partitionwise" paths, while here we generate the corresponding "parallel" paths?
cf24afe
to
2608e04
Compare
Improved code comments and discussed the open questions with @erimatnor. We agreed that the PR could be merged now. |
This patch adds support for partial aggregations at the chunk level. The aggregation is replanned in the create_upper_paths_hook of PostgreSQL. The AggPath is split up into multiple AGGSPLIT_INITIAL_SERIAL operations (one on top of each chunk), which create partials, and one AGGSPLIT_FINAL_DESERIAL operation, which finalizes the aggregation.
2608e04
to
844412c
Compare
So far, we have created fake partitioning info for hypertables if the PostgreSQL setting 'enable_partitionwise_aggregate' is set. This causes PostgreSQL to push down partial aggregations to the chunk level. However, the PostgreSQL code has some drawbacks because the query is replanned and optimizations like ChunkAppend are lost. Since timescale#5596 we have implemented our own code to push down partial aggregations. Therefore, we can ignore the PostgreSQL setting from now on.
Please look at the following query, that triggers an assertion failure for me since ba9b818: CREATE TABLE t(time timestamptz NOT NULL, device_id int); ===
|
So far, we have created fake partitioning info for hypertables if the PostgreSQL setting 'enable_partitionwise_aggregate' is set. This causes PostgreSQL to push down partial aggregations to the chunk level. However, the PostgreSQL code has some drawbacks because the query is replanned and optimizations like ChunkAppend are lost. Since timescale#5596 we have implemented our own code to push down partial aggregations. Therefore, we can ignore the PostgreSQL setting from now on.
So far, we have created fake partitioning info for hypertables if the PostgreSQL setting 'enable_partitionwise_aggregate' is set. This causes PostgreSQL to push down partial aggregations to the chunk level. However, the PostgreSQL code has some drawbacks because the query is replanned and optimizations like ChunkAppend are lost. Since timescale#5596 we have implemented our own code to push down partial aggregations. Therefore, we can ignore the PostgreSQL setting from now on.
So far, we have created fake partitioning info for hypertables if the PostgreSQL setting 'enable_partitionwise_aggregate' is set. This causes PostgreSQL to push down partial aggregations to the chunk level. However, the PostgreSQL code has some drawbacks because the query is replanned and optimizations like ChunkAppend are lost. Since #5596 we have implemented our own code to push down partial aggregations. Therefore, we can ignore the PostgreSQL setting from now on.
This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * timescale#5137 Insert into index during chunk compression * timescale#5150 MERGE support on hypertables * timescale#5515 Make hypertables support replica identity * timescale#5586 Index scan support during UPDATE/DELETE on compressed hypertables * timescale#5596 Support for partial aggregations at chunk level * timescale#5599 Enable ChunkAppend for partially compressed chunks * timescale#5655 Improve the number of parallel workers for decompression * timescale#5758 Enable altering job schedule type through `alter_job` * timescale#5805 Make logrepl markers for (partial) decompressions * timescale#5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * timescale#5839 Support CAgg names in chunk_detailed_size * timescale#5852 Make set_chunk_time_interval CAggs aware * timescale#5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * timescale#5875 Add job exit status and runtime to log * timescale#5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * timescale#5860 Fix interval calculation for hierarchical CAggs * timescale#5894 Check unique indexes when enabling compression * timescale#5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * timescale#5988 Move functions to _timescaledb_functions schema * timescale#5788 Chunk_create must add an existing table or fail * timescale#5872 Fix duplicates on partially compressed chunk reads * timescale#5918 Fix crash in COPY from program returning error * timescale#5990 Place data in first/last function in correct mctx * timescale#5991 Call eq_func correctly in time_bucket_gapfill * timescale#6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * timescale#6035 Fix server crash on UPDATE of compressed chunk * timescale#6044 Fix server crash when using duplicate segmentby column * timescale#6045 Fix segfault in set_integer_now_func * timescale#6053 Fix approximate_row_count for CAggs * timescale#6081 Improve compressed DML datatype handling * timescale#6084 Propagate parameter changes to decompress child nodes **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill
This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill
This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill
This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill
This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes * #6102 Schedule compression policy more often **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill
This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes * #6102 Schedule compression policy more often **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill
This patch adds support for partial aggregations at the chunk level. The aggregation is replanned in the
create_upper_paths_hook
of PostgreSQL. The AggPath is split up into multipleAGGSPLIT_INITIAL_SERIAL
operations (one on top of each chunk), which create partials, and oneAGGSPLIT_FINAL_DESERIAL
operation, which finalizes the aggregation.Benchmark
https://grafana.ops.savannah-dev.timescale.com/d/NdmLnOk4z/compare-benchmark-runs?orgId=1&var-branch=main&var-run1=2699&var-run2=2700&var-threshold=0.02
Speedup of the aggregations ~ 10-15%. The result shows one regression, but the regression is flaky and previous runs show the same execution time.