Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compression datatype handling #6081

Merged
merged 1 commit into from
Sep 18, 2023

Conversation

svenklemm
Copy link
Member

@svenklemm svenklemm commented Sep 16, 2023

Fall back to btree operator input type when it is binary compatible with the column type and no operator for column type could be found. This should improve performance when using column types like char or varchar instead of text.

Fixes: #6076

@github-actions
Copy link

@konskov, @mkindahl: please review this pull request.

Powered by pull-review

@codecov
Copy link

codecov bot commented Sep 16, 2023

Codecov Report

Merging #6081 (24edddf) into main (24e4972) will increase coverage by 7.46%.
The diff coverage is 25.00%.

@@            Coverage Diff             @@
##             main    #6081      +/-   ##
==========================================
+ Coverage   74.03%   81.49%   +7.46%     
==========================================
  Files         246      246              
  Lines       49808    56665    +6857     
  Branches    12511    12552      +41     
==========================================
+ Hits        36874    46179    +9305     
- Misses       7100     8082     +982     
+ Partials     5834     2404    -3430     
Files Changed Coverage Δ
tsl/src/compression/compression.c 91.45% <25.00%> (+7.69%) ⬆️

... and 225 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@svenklemm svenklemm force-pushed the compression_scankey_type branch from 5cef66f to f4237fe Compare September 16, 2023 11:31
@svenklemm svenklemm added this to the TimescaleDB 2.12 milestone Sep 16, 2023
time TIMESTAMPTZ NOT NULL,
source_id varchar(64) NOT NULL,
label varchar NOT NULL,
data jsonb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
data jsonb
data jsonb

*/
if (!OidIsValid(opr) && IsBinaryCoercible(atttypid, tce->btree_opintype))
{
opr =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov complains about missing coverage. However, the test case seems to cover this part of the code and removing these lines triggers a test failure. So, it seems to be tested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codecov has been all over the place lately, not sure whats going on with it. Reporting random lines being untested which are in the same branch where all other lines have been tested.

get_opfamily_member(tce->btree_opf, tce->btree_opintype, tce->btree_opintype, strategy);
}

/* No operator could be found so we can't create the scankey. */
if (!OidIsValid(opr))
return num_scankeys;
Copy link
Contributor

@jnidzwetzki jnidzwetzki Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Might be out of the scope of this PR. But I think we should also remove the following Assert in Line 1971/1982 (Assert(OidIsValid(opr));).

Again, we have an Assert followed by an if/else on the same condition. The assert prevents proper testing of this branch in a debug build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

@svenklemm svenklemm force-pushed the compression_scankey_type branch 2 times, most recently from 2e794a8 to a7d742b Compare September 18, 2023 06:50
Fall back to btree operator input type when it is binary compatible with
the column type and no operator for column type could be found. This
should improve performance when using column types like char or varchar
instead of text.
@svenklemm svenklemm force-pushed the compression_scankey_type branch from a7d742b to 24edddf Compare September 18, 2023 08:36
@svenklemm svenklemm enabled auto-merge (rebase) September 18, 2023 08:42
@svenklemm svenklemm merged commit a7e7e67 into timescale:main Sep 18, 2023
33 of 35 checks passed
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Sep 20, 2023
This release contains performance improvements for compressed hypertables
and continuous aggregates and bug fixes since the 2.11.2 release.
We recommend that you upgrade at the next available opportunity.

This release moves all internal functions from the _timescaleb_internal
schema into the _timescaledb_functions schema. This separates code from
internal data objects and improves security by allowing more restrictive
permissions for the code schema. If you are calling any of those internal
functions you should adjust your code as soon as possible. This version
also includes a compatibility layer that allows calling them in the old
location but that layer will be removed in 2.14.0.

**PostgreSQL 12 support removal announcement**
Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10,
PostgreSQL 12 is not supported starting with TimescaleDB 2.12.
Currently supported PostgreSQL major versions are 13, 14 and 15.
PostgreSQL 16 support will be added with a following TimescaleDB release.

**Features**
* timescale#5137 Insert into index during chunk compression
* timescale#5150 MERGE support on hypertables
* timescale#5515 Make hypertables support replica identity
* timescale#5586 Index scan support during UPDATE/DELETE on compressed hypertables
* timescale#5596 Support for partial aggregations at chunk level
* timescale#5599 Enable ChunkAppend for partially compressed chunks
* timescale#5655 Improve the number of parallel workers for decompression
* timescale#5758 Enable altering job schedule type through `alter_job`
* timescale#5805 Make logrepl markers for (partial) decompressions
* timescale#5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate
* timescale#5839 Support CAgg names in chunk_detailed_size
* timescale#5852 Make set_chunk_time_interval CAggs aware
* timescale#5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates)
* timescale#5875 Add job exit status and runtime to log
* timescale#5909 CREATE INDEX ONLY ON hypertable creates index on chunks

**Bugfixes**
* timescale#5860 Fix interval calculation for hierarchical CAggs
* timescale#5894 Check unique indexes when enabling compression
* timescale#5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows
* timescale#5988 Move functions to _timescaledb_functions schema
* timescale#5788 Chunk_create must add an existing table or fail
* timescale#5872 Fix duplicates on partially compressed chunk reads
* timescale#5918 Fix crash in COPY from program returning error
* timescale#5990 Place data in first/last function in correct mctx
* timescale#5991 Call eq_func correctly in time_bucket_gapfill
* timescale#6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output
* timescale#6035 Fix server crash on UPDATE of compressed chunk
* timescale#6044 Fix server crash when using duplicate segmentby column
* timescale#6045 Fix segfault in set_integer_now_func
* timescale#6053 Fix approximate_row_count for CAggs
* timescale#6081 Improve compressed DML datatype handling
* timescale#6084 Propagate parameter changes to decompress child nodes

**Thanks**
* @ajcanterbury for reporting a problem with lateral joins on compressed chunks
* @alexanderlaw for reporting multiple server crashes
* @lukaskirner for reporting a bug with monthly continuous aggregates
* @mrksngl for reporting a bug with unusual user names
* @willsbit for reporting a crash in time_bucket_gapfill
@svenklemm svenklemm mentioned this pull request Sep 20, 2023
svenklemm added a commit that referenced this pull request Sep 20, 2023
This release contains performance improvements for compressed hypertables
and continuous aggregates and bug fixes since the 2.11.2 release.
We recommend that you upgrade at the next available opportunity.

This release moves all internal functions from the _timescaleb_internal
schema into the _timescaledb_functions schema. This separates code from
internal data objects and improves security by allowing more restrictive
permissions for the code schema. If you are calling any of those internal
functions you should adjust your code as soon as possible. This version
also includes a compatibility layer that allows calling them in the old
location but that layer will be removed in 2.14.0.

**PostgreSQL 12 support removal announcement**
Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10,
PostgreSQL 12 is not supported starting with TimescaleDB 2.12.
Currently supported PostgreSQL major versions are 13, 14 and 15.
PostgreSQL 16 support will be added with a following TimescaleDB release.

**Features**
* #5137 Insert into index during chunk compression
* #5150 MERGE support on hypertables
* #5515 Make hypertables support replica identity
* #5586 Index scan support during UPDATE/DELETE on compressed hypertables
* #5596 Support for partial aggregations at chunk level
* #5599 Enable ChunkAppend for partially compressed chunks
* #5655 Improve the number of parallel workers for decompression
* #5758 Enable altering job schedule type through `alter_job`
* #5805 Make logrepl markers for (partial) decompressions
* #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate
* #5839 Support CAgg names in chunk_detailed_size
* #5852 Make set_chunk_time_interval CAggs aware
* #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates)
* #5875 Add job exit status and runtime to log
* #5909 CREATE INDEX ONLY ON hypertable creates index on chunks

**Bugfixes**
* #5860 Fix interval calculation for hierarchical CAggs
* #5894 Check unique indexes when enabling compression
* #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows
* #5988 Move functions to _timescaledb_functions schema
* #5788 Chunk_create must add an existing table or fail
* #5872 Fix duplicates on partially compressed chunk reads
* #5918 Fix crash in COPY from program returning error
* #5990 Place data in first/last function in correct mctx
* #5991 Call eq_func correctly in time_bucket_gapfill
* #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output
* #6035 Fix server crash on UPDATE of compressed chunk
* #6044 Fix server crash when using duplicate segmentby column
* #6045 Fix segfault in set_integer_now_func
* #6053 Fix approximate_row_count for CAggs
* #6081 Improve compressed DML datatype handling
* #6084 Propagate parameter changes to decompress child nodes

**Thanks**
* @ajcanterbury for reporting a problem with lateral joins on compressed chunks
* @alexanderlaw for reporting multiple server crashes
* @lukaskirner for reporting a bug with monthly continuous aggregates
* @mrksngl for reporting a bug with unusual user names
* @willsbit for reporting a crash in time_bucket_gapfill
svenklemm added a commit that referenced this pull request Sep 20, 2023
This release contains performance improvements for compressed hypertables
and continuous aggregates and bug fixes since the 2.11.2 release.
We recommend that you upgrade at the next available opportunity.

This release moves all internal functions from the _timescaleb_internal
schema into the _timescaledb_functions schema. This separates code from
internal data objects and improves security by allowing more restrictive
permissions for the code schema. If you are calling any of those internal
functions you should adjust your code as soon as possible. This version
also includes a compatibility layer that allows calling them in the old
location but that layer will be removed in 2.14.0.

**PostgreSQL 12 support removal announcement**
Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10,
PostgreSQL 12 is not supported starting with TimescaleDB 2.12.
Currently supported PostgreSQL major versions are 13, 14 and 15.
PostgreSQL 16 support will be added with a following TimescaleDB release.

**Features**
* #5137 Insert into index during chunk compression
* #5150 MERGE support on hypertables
* #5515 Make hypertables support replica identity
* #5586 Index scan support during UPDATE/DELETE on compressed hypertables
* #5596 Support for partial aggregations at chunk level
* #5599 Enable ChunkAppend for partially compressed chunks
* #5655 Improve the number of parallel workers for decompression
* #5758 Enable altering job schedule type through `alter_job`
* #5805 Make logrepl markers for (partial) decompressions
* #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate
* #5839 Support CAgg names in chunk_detailed_size
* #5852 Make set_chunk_time_interval CAggs aware
* #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates)
* #5875 Add job exit status and runtime to log
* #5909 CREATE INDEX ONLY ON hypertable creates index on chunks

**Bugfixes**
* #5860 Fix interval calculation for hierarchical CAggs
* #5894 Check unique indexes when enabling compression
* #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows
* #5988 Move functions to _timescaledb_functions schema
* #5788 Chunk_create must add an existing table or fail
* #5872 Fix duplicates on partially compressed chunk reads
* #5918 Fix crash in COPY from program returning error
* #5990 Place data in first/last function in correct mctx
* #5991 Call eq_func correctly in time_bucket_gapfill
* #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output
* #6035 Fix server crash on UPDATE of compressed chunk
* #6044 Fix server crash when using duplicate segmentby column
* #6045 Fix segfault in set_integer_now_func
* #6053 Fix approximate_row_count for CAggs
* #6081 Improve compressed DML datatype handling
* #6084 Propagate parameter changes to decompress child nodes

**Thanks**
* @ajcanterbury for reporting a problem with lateral joins on compressed chunks
* @alexanderlaw for reporting multiple server crashes
* @lukaskirner for reporting a bug with monthly continuous aggregates
* @mrksngl for reporting a bug with unusual user names
* @willsbit for reporting a crash in time_bucket_gapfill
svenklemm added a commit that referenced this pull request Sep 20, 2023
This release contains performance improvements for compressed hypertables
and continuous aggregates and bug fixes since the 2.11.2 release.
We recommend that you upgrade at the next available opportunity.

This release moves all internal functions from the _timescaleb_internal
schema into the _timescaledb_functions schema. This separates code from
internal data objects and improves security by allowing more restrictive
permissions for the code schema. If you are calling any of those internal
functions you should adjust your code as soon as possible. This version
also includes a compatibility layer that allows calling them in the old
location but that layer will be removed in 2.14.0.

**PostgreSQL 12 support removal announcement**
Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10,
PostgreSQL 12 is not supported starting with TimescaleDB 2.12.
Currently supported PostgreSQL major versions are 13, 14 and 15.
PostgreSQL 16 support will be added with a following TimescaleDB release.

**Features**
* #5137 Insert into index during chunk compression
* #5150 MERGE support on hypertables
* #5515 Make hypertables support replica identity
* #5586 Index scan support during UPDATE/DELETE on compressed hypertables
* #5596 Support for partial aggregations at chunk level
* #5599 Enable ChunkAppend for partially compressed chunks
* #5655 Improve the number of parallel workers for decompression
* #5758 Enable altering job schedule type through `alter_job`
* #5805 Make logrepl markers for (partial) decompressions
* #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate
* #5839 Support CAgg names in chunk_detailed_size
* #5852 Make set_chunk_time_interval CAggs aware
* #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates)
* #5875 Add job exit status and runtime to log
* #5909 CREATE INDEX ONLY ON hypertable creates index on chunks

**Bugfixes**
* #5860 Fix interval calculation for hierarchical CAggs
* #5894 Check unique indexes when enabling compression
* #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows
* #5988 Move functions to _timescaledb_functions schema
* #5788 Chunk_create must add an existing table or fail
* #5872 Fix duplicates on partially compressed chunk reads
* #5918 Fix crash in COPY from program returning error
* #5990 Place data in first/last function in correct mctx
* #5991 Call eq_func correctly in time_bucket_gapfill
* #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output
* #6035 Fix server crash on UPDATE of compressed chunk
* #6044 Fix server crash when using duplicate segmentby column
* #6045 Fix segfault in set_integer_now_func
* #6053 Fix approximate_row_count for CAggs
* #6081 Improve compressed DML datatype handling
* #6084 Propagate parameter changes to decompress child nodes

**Thanks**
* @ajcanterbury for reporting a problem with lateral joins on compressed chunks
* @alexanderlaw for reporting multiple server crashes
* @lukaskirner for reporting a bug with monthly continuous aggregates
* @mrksngl for reporting a bug with unusual user names
* @willsbit for reporting a crash in time_bucket_gapfill
svenklemm added a commit that referenced this pull request Sep 25, 2023
This release contains performance improvements for compressed hypertables
and continuous aggregates and bug fixes since the 2.11.2 release.
We recommend that you upgrade at the next available opportunity.

This release moves all internal functions from the _timescaleb_internal
schema into the _timescaledb_functions schema. This separates code from
internal data objects and improves security by allowing more restrictive
permissions for the code schema. If you are calling any of those internal
functions you should adjust your code as soon as possible. This version
also includes a compatibility layer that allows calling them in the old
location but that layer will be removed in 2.14.0.

**PostgreSQL 12 support removal announcement**
Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10,
PostgreSQL 12 is not supported starting with TimescaleDB 2.12.
Currently supported PostgreSQL major versions are 13, 14 and 15.
PostgreSQL 16 support will be added with a following TimescaleDB release.

**Features**
* #5137 Insert into index during chunk compression
* #5150 MERGE support on hypertables
* #5515 Make hypertables support replica identity
* #5586 Index scan support during UPDATE/DELETE on compressed hypertables
* #5596 Support for partial aggregations at chunk level
* #5599 Enable ChunkAppend for partially compressed chunks
* #5655 Improve the number of parallel workers for decompression
* #5758 Enable altering job schedule type through `alter_job`
* #5805 Make logrepl markers for (partial) decompressions
* #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate
* #5839 Support CAgg names in chunk_detailed_size
* #5852 Make set_chunk_time_interval CAggs aware
* #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates)
* #5875 Add job exit status and runtime to log
* #5909 CREATE INDEX ONLY ON hypertable creates index on chunks

**Bugfixes**
* #5860 Fix interval calculation for hierarchical CAggs
* #5894 Check unique indexes when enabling compression
* #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows
* #5988 Move functions to _timescaledb_functions schema
* #5788 Chunk_create must add an existing table or fail
* #5872 Fix duplicates on partially compressed chunk reads
* #5918 Fix crash in COPY from program returning error
* #5990 Place data in first/last function in correct mctx
* #5991 Call eq_func correctly in time_bucket_gapfill
* #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output
* #6035 Fix server crash on UPDATE of compressed chunk
* #6044 Fix server crash when using duplicate segmentby column
* #6045 Fix segfault in set_integer_now_func
* #6053 Fix approximate_row_count for CAggs
* #6081 Improve compressed DML datatype handling
* #6084 Propagate parameter changes to decompress child nodes
* #6102 Schedule compression policy more often

**Thanks**
* @ajcanterbury for reporting a problem with lateral joins on compressed chunks
* @alexanderlaw for reporting multiple server crashes
* @lukaskirner for reporting a bug with monthly continuous aggregates
* @mrksngl for reporting a bug with unusual user names
* @willsbit for reporting a crash in time_bucket_gapfill
svenklemm added a commit that referenced this pull request Sep 25, 2023
This release contains performance improvements for compressed hypertables
and continuous aggregates and bug fixes since the 2.11.2 release.
We recommend that you upgrade at the next available opportunity.

This release moves all internal functions from the _timescaleb_internal
schema into the _timescaledb_functions schema. This separates code from
internal data objects and improves security by allowing more restrictive
permissions for the code schema. If you are calling any of those internal
functions you should adjust your code as soon as possible. This version
also includes a compatibility layer that allows calling them in the old
location but that layer will be removed in 2.14.0.

**PostgreSQL 12 support removal announcement**
Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10,
PostgreSQL 12 is not supported starting with TimescaleDB 2.12.
Currently supported PostgreSQL major versions are 13, 14 and 15.
PostgreSQL 16 support will be added with a following TimescaleDB release.

**Features**
* #5137 Insert into index during chunk compression
* #5150 MERGE support on hypertables
* #5515 Make hypertables support replica identity
* #5586 Index scan support during UPDATE/DELETE on compressed hypertables
* #5596 Support for partial aggregations at chunk level
* #5599 Enable ChunkAppend for partially compressed chunks
* #5655 Improve the number of parallel workers for decompression
* #5758 Enable altering job schedule type through `alter_job`
* #5805 Make logrepl markers for (partial) decompressions
* #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate
* #5839 Support CAgg names in chunk_detailed_size
* #5852 Make set_chunk_time_interval CAggs aware
* #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates)
* #5875 Add job exit status and runtime to log
* #5909 CREATE INDEX ONLY ON hypertable creates index on chunks

**Bugfixes**
* #5860 Fix interval calculation for hierarchical CAggs
* #5894 Check unique indexes when enabling compression
* #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows
* #5988 Move functions to _timescaledb_functions schema
* #5788 Chunk_create must add an existing table or fail
* #5872 Fix duplicates on partially compressed chunk reads
* #5918 Fix crash in COPY from program returning error
* #5990 Place data in first/last function in correct mctx
* #5991 Call eq_func correctly in time_bucket_gapfill
* #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output
* #6035 Fix server crash on UPDATE of compressed chunk
* #6044 Fix server crash when using duplicate segmentby column
* #6045 Fix segfault in set_integer_now_func
* #6053 Fix approximate_row_count for CAggs
* #6081 Improve compressed DML datatype handling
* #6084 Propagate parameter changes to decompress child nodes
* #6102 Schedule compression policy more often

**Thanks**
* @ajcanterbury for reporting a problem with lateral joins on compressed chunks
* @alexanderlaw for reporting multiple server crashes
* @lukaskirner for reporting a bug with monthly continuous aggregates
* @mrksngl for reporting a bug with unusual user names
* @willsbit for reporting a crash in time_bucket_gapfill
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Crash in ON CONFLICT ... DO UPDATE query
5 participants