-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: chunk size and table size increased after compression #5961
Comments
Added reproduce steps, please check, thanks |
You choose a segmentby column that is a unique column which will lead to very bad compression rate.
|
Hi @yinan8128 , Unique segment by column will lead to the worst case compression scenario and it is evident from the number of live tuples comparison between Uncompressed Chunk and Compressed Chunk. In this case both are having exact same value, this is Zero Compression or Worst case scenario. Number of tuples Before Compression
Number of tuples After Compression
Best Practice: |
Hello, Thank you for the information After remove unique values column from timescaledb.compress_segmentby, the size shows reduced, there are some questions:
--decompress chunk --alter table compress false --drop primary key --alter table compress true without timescaledb.compress_segmentby --before compression hypertable size
--before compression chunk size
--compress chunk --after compression hypertable size
--after compression chunk size
--chunk information
|
Hi Any update please? Thanks |
Hi, @yinan8128 , Please find the answers inline below:
After compression both original chunk and compressed chunk exists but the rows are not duplicated across both these chunks. The rows which are being compressed are removed from the original chunk and moved to the compressed chunk. This is the reason behind the reduced storage size with the coexistence of both original and compressed chunk.
If you do not specify a
Yes, as mentioned in point no. 1, this is expected. |
Thank you, we will use each not unique column for compress_segmentby to check the compression ratio |
Hello Our tag column will always have the same value and if use it as compress_segmentby column the compression ratio is 3.16 Is it okay to use an same value column as compress_segmentby column please, if it will cause that no index on the compressed chunk, each subsequent operation will require to decompress the whole chunk? Or any other performance problem please? Thanks |
Hi @yinan8128 When a If the tag column will always have the same value then it is like creating an index on column with only one key and there is only tuple in compressed chunk which will require decompression on subsequent operation. Multiple columns can be specified as Specifying a unique columns as segment-by column and a column with same value for all the rows are two extreme cases. In the first case there will be no compression because the compress table will contain same number of tuples as original table whereas in the second case the compress table will have only one tuple. Ideally the selection of
However, the date column has relatively distinct values in the data set provided. Could you please try using |
Hello, I work with @yinan8128 and am working on figuring out which compress_segmentby column is best in terms of compression ratio. Is there a case to be made for the column used to also be part of a composite primary key? Is this best guidance or something to be avoided? |
Hi Thank you for the information If we change the compress_segmentby column, if have to change the primary key please? As this ticket #5892, our existing compress_segmentby column id is part of primary key(PK: id, date), if we need to change primary key to new compress_segmentby column and date? Thanks |
Hi @jjrodenburg and @yinan8128 ,
If you change the The following guideline is provided as part of documentation [1]. The segmentby columns are useful, but can be overused. If you specify a lot of segmentby columns, the number of items in each compressed column is reduced, and compression is not as effective. A good guide is for each segment to contain at least 100 rows per chunk. [1] -> https://docs.timescale.com/use-timescale/latest/compression/about-compression/#segment-by-columns |
Hello Thank you for your information As change compress_segmentby column need to decompress chunks firstly
Thank you |
Hello @yinan8128, This is correct. To change the The relationship between the compressed and uncompressed chunks is stored in a catalog table. You can access this information by executing the following SQL statement: SELECT c.schema_name, ht.table_name, c.table_name, compressed_chunk_id FROM _timescaledb_catalog.chunk c, _timescaledb_catalog.hypertable ht WHERE c.hypertable_id = ht.id; However, to decompress all chunks of a hypertable, you can use the following SQL statement: SELECT decompress_chunk(c, true) FROM show_chunks('tablename') c; |
Hello Thank you for the information So truncate compressed chunks and their original chunk before decompression can reduce the deployment time:
|
Hello @yinan8128, This is correct. Dropping unneeded chunks will save time when you plan to change compression settings and recompress an entire hypertable. I recommend using the |
Dear Author, This issue has been automatically marked as stale due to lack of activity. With only the issue description that is currently provided, we do not have enough information to take action. If you have or find the answers we would need, please reach out. Otherwise, this issue will be closed in 30 days. Thank you! |
After our local test env use date column, the table size decreased, but after go to production, after compression still caused storage increased, will check and update here |
@jnidzwetzki Are there any other things we can try? In production, at scale, we see compression still increasing storage space (local testing showed good results, but we cannot test at the same scale as production). We tried VACUUM FULLs, but to no avail. The segment by ID column is set to date (which is the most non-unique column we have). |
Found the root caused is that date column values are almost unique in production |
How about the relationship of compression rate, query efficiency and data?
Which cases can get good compression rate and query efficieny please? Will case 3 get the best compression rate but worst query efficiency as query need to decompress compressed chunk firstly please? Thanks |
Hi @yinan8128, thank you for finding the root cause. The documentation here gives some guidelines on the data for compression to be effective. Is there anything else to be addressed in this issue? |
Dear Author, This issue has been automatically marked as stale due to lack of activity. With only the issue description that is currently provided, we do not have enough information to take action. If you have or find the answers we would need, please reach out. Otherwise, this issue will be closed in 30 days. Thank you! |
Dear Author, We are closing this issue due to lack of activity. Feel free to add a comment to this issue if you can provide more information and we will re-open it. Thank you! |
What type of bug is this?
Incorrect result
What subsystems and features are affected?
Compression
What happened?
Hello,
There is no compression job failed but chunk and table size increased after compression, so we reproduce it in local, manual compression also can make chunk and table size increased, same behavior in 2.10.1 and 2.11.1, is it a bug or table compression configuration wrongly caused please? Thanks
Before compression chunk size
After compression chunk size
TimescaleDB version affected
2.10.1, 2.11.1
PostgreSQL version used
14.6
What operating system did you use?
CentOS Linux release 7.9.2009
What installation method did you use?
RPM
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
How can we reproduce the bug?
hypertable creation
table structure
insert data
before compression hypertable size
before compression chunk size
compression
after compression hypertable size
after compression chunk size
The text was updated successfully, but these errors were encountered: