Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: After table compression database size increased a lot #5850

Closed
yinan8128 opened this issue Jul 3, 2023 · 18 comments
Closed

[Bug]: After table compression database size increased a lot #5850

yinan8128 opened this issue Jul 3, 2023 · 18 comments

Comments

@yinan8128
Copy link

yinan8128 commented Jul 3, 2023

What type of bug is this?

Other

What subsystems and features are affected?

Compression, Other

What happened?

Our production encounter the issue that after table compression table size reduced but database size increased a lot

Table and compression defination

SELECT create_hypertable('tag', 'date', if_not_exists => TRUE);

DO language plpgsql
$$
DECLARE var_sql text;
BEGIN
IF NOT EXISTS(SELECT * FROM chunk_compression_stats('tag') WHERE compression_status = 'Compressed') THEN
                var_sql := 'ALTER TABLE {tag SET (timescaledb.compress, timescaledb.compress_orderby = 'date DESC', timescaledb.compress_segmentby = 'id');';
                execute var_sql;
end if;
END;
$$;
SELECT add_retention_policy('tag', INTERVAL '30 days', if_not_exists => TRUE);
SELECT add_compression_policy('tag', INTERVAL '8 days', if_not_exists => TRUE);

\d+ tag;
                                  Table "tag"
     Column      |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
-----------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 tenantid        | uuid                        |           | not null |         | plain    |             |              |
 resourceid      | uuid                        |           | not null |         | plain    |             |              |
 value           | numeric                     |           | not null |         | main     |             |              |
 context         | uuid                        |           | not null |         | plain    |             |              |
 topic           | text                        |           | not null |         | extended |             |              |
 tag             | text                        |           | not null |         | extended |             |              |
 id              | uuid                        |           | not null |         | plain    |             |              |
 date            | timestamp without time zone |           | not null |         | plain    |             |              |
 created_at      | timestamp without time zone |           | not null |         | plain    |             |              |
 visibility_list | uuid[]                      |           |          |         | extended |             |              |
Indexes:
    "tag_pkey" PRIMARY KEY, btree (id, date)
    "tag_date_idx" btree (date DESC)
    "index_tag_date" btree (resourceid, date DESC)
Triggers:
    ts_insert_blocker BEFORE INSERT ON timeseries.tag FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Access method: heap

SELECT h.table_name, c.interval_length
  FROM _timescaledb_catalog.dimension c
  JOIN _timescaledb_catalog.hypertable h
    ON h.id = c.hypertable_id;
               table_name                | interval_length
-----------------------------------------+-----------------
 tag                                |    604800000000

Table size before compression:
3.23GB

Table size after compression:
847mb

Total database size before compression:
93.4GB

Total database size after compression:
101GB

There are no other tables compressions happening simultaneously

There is another ticket #5582, the ticket space not claimed but no database size increased

Please suggest, thanks

TimescaleDB version affected

2.10.1

PostgreSQL version used

14.6

What operating system did you use?

CentOS Linux release 7.9.2009

What installation method did you use?

RPM

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

N.A
@yinan8128 yinan8128 added the bug label Jul 3, 2023
@lkshminarayanan
Copy link
Contributor

Hi @yinan8128,

Thank you for filing the issue!

As explained in #5582, this could happen if the compression error'ed out for some reason before reclaiming the space.

Running vacuum on the hypertable will not work as it is not be propagated to the individual chunks in version 2.10.1. This is now fixed in version 2.11.0. Upgrading to that version and running vacuum on the hypertable should fix your issue.

Alternatively, you can continue using version 2.10.1 but you would have to run the vacuum on all individual chunks manually.

Can you try either of these and let us know if that helps improve your situation?

@yinan8128
Copy link
Author

yinan8128 commented Jul 4, 2023

Hi

Thank you for your reply

vacuum script

select 'vacuum verbose'||' '||chunk_schema||'.'||chunk_name||';' from timescaledb_information.chunks where hypertable_name='tag' order by hypertable_name,chunk_name;
                         ?column?                          
-----------------------------------------------------------
 vacuum verbose _timescaledb_internal._hyper_19_10_chunk;
 vacuum verbose _timescaledb_internal._hyper_19_246_chunk;
 vacuum verbose _timescaledb_internal._hyper_19_258_chunk;
 vacuum verbose _timescaledb_internal._hyper_19_43_chunk;
 vacuum verbose _timescaledb_internal._hyper_19_47_chunk;

vacuum log in ansible

Something happen INFO:  vacuuming "_timescaledb_internal._hyper_19_10_chunk"
INFO:  table "_hyper_19_10_chunk": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242365
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "pg_toast.pg_toast_18292"
INFO:  table "pg_toast_18292": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242365
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "_timescaledb_internal._hyper_19_246_chunk"
INFO:  launched 2 parallel vacuum workers for index cleanup (planned: 2)
INFO:  table "_hyper_19_246_chunk": found 0 removable, 227775 nonremovable row versions in 9176 out of 53563 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242365
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.17 s, system: 0.14 s, elapsed: 6.33 s.
INFO:  vacuuming "pg_toast.pg_toast_69237"
INFO:  table "pg_toast_69237": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "_timescaledb_internal._hyper_19_258_chunk"
INFO:  launched 2 parallel vacuum workers for index cleanup (planned: 2)
INFO:  table "_hyper_19_258_chunk": found 0 removable, 251045 nonremovable row versions in 10068 out of 73831 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.08 s, system: 0.08 s, elapsed: 0.18 s.
INFO:  vacuuming "pg_toast.pg_toast_75680"
INFO:  table "pg_toast_75680": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "_timescaledb_internal._hyper_19_43_chunk"
INFO:  table "_hyper_19_43_chunk": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "pg_toast.pg_toast_21713"
INFO:  table "pg_toast_21713": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "_timescaledb_internal._hyper_19_47_chunk"
INFO:  scanned index "47_38_events_monitoring_netitag_pkey" to remove 62 row versions
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.03 s
INFO:  scanned index "_hyper_19_47_chunk_events_monitoring_netitag_date_idx" to remove 62 row versions
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  scanned index "_hyper_19_47_chunk_index_events_monitoring_netitag_date" to remove 62 row versions
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  table "_hyper_19_47_chunk": removed 62 dead item identifiers in 4 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  index "47_38_events_monitoring_netitag_pkey" now contains 0 row versions in 2 pages
DETAIL:  62 index row versions were removed.
0 index pages were newly deleted.
0 index pages are currently deleted, of which 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "_hyper_19_47_chunk_events_monitoring_netitag_date_idx" now contains 0 row versions in 2 pages
DETAIL:  62 index row versions were removed.
0 index pages were newly deleted.
0 index pages are currently deleted, of which 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "_hyper_19_47_chunk_index_events_monitoring_netitag_date" now contains 0 row versions in 2 pages
DETAIL:  62 index row versions were removed.
0 index pages were newly deleted.
0 index pages are currently deleted, of which 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  table "_hyper_19_47_chunk": found 62 removable, 0 nonremovable row versions in 4 out of 4 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.03 s.
INFO:  table "_hyper_19_47_chunk": truncated 4 to 0 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.05 s
INFO:  vacuuming "pg_toast.pg_toast_27725"
INFO:  table "pg_toast_27725": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242375
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

Compared the database size, no released, should we run vacuum full or vacuum is okay please?

@lkshminarayanan
Copy link
Contributor

Hi @yinan8128,

You can try VACUUM FULL but please note that it requires an ACCESS EXCLUSIVE lock on the table, and therefore cannot be done in parallel with any other READ/WRITE operations on the table.

@yinan8128
Copy link
Author

Hi

Thank you for your reply

  1. We checked timescaledb_information.job_errors, there are no error log of 2023-06-30, but still some compression problem happened in 2023-06-30, is it possible that timescaledb_information.job_errors not record all compression job error?
  2. What is the reason caused compression failure and how could we check the failure please? Thanks

@lkshminarayanan
Copy link
Contributor

Hi @yinan8128,

Any error in the jobs will be recorded in job_errors table. So, if you don't see that there, it is highly unlikely that the compression job failed.

But as your database size increased after compression, the most likely explanation is that the old uncompressed rows have not been vacuumed yet. It could be the case that vacuum has not freed up those rows yet due to some long running transaction. You can follow this - https://emmer.dev/blog/finding-long-running-queries-in-postgresql/ - to check if you have any of those. If you did have such a transaction running, cancel that transaction and then retry vacuum again.

As for why this has happened, it could be that you have hit a bug. But without an error log or an entry in job_errors, this might be harder to investigate.

@sb230132
Copy link
Contributor

Hi @yinan8128
It would of much help if we know what errors are occurring as part of compression policy getting triggered in the background. When a compression policy fails, we do not reclaim the space occupied as part of compression. Might be the case that compression policy failed multiple times and we ended up occupying more space. However this is just a guess, we need the exact errors about what is happening with the compression policy job.

Any chance to re-run your setup on timescaledb 2.11 and see if the issue is resolved.

@yinan8128
Copy link
Author

Hi

Thank you for the update

We already disabled the compression in the production env, let's check another production env which have problem and feedback to you

@yinan8128
Copy link
Author

yinan8128 commented Aug 1, 2023

Hi

In our one env, there is database size increase problem after compression, retention reduce some storage size, but storage size usage trend is increasing, sooner or later will hit the limit
image

--but there is no records in timescaledb_information.job_errors
select * from timescaledb_information.job_errors;
job_id | proc_schema | proc_name | pid | start_time | finish_time | sqlerrcode | err_message
--------+-------------+-----------+-----+------------+-------------+------------+-------------
(0 rows)

Any suggestion for the next step please? Thanks

@konskov
Copy link
Contributor

konskov commented Aug 1, 2023

Hello @yinan8128,

Some more questions:

  • Is it only the compression policy that fails to run? Or do you have other background jobs/policies which are not running? I’m asking this because it is possible for the background job scheduler to exit and no jobs would run because of that, until it is restarted (although it seems from the graph that the retention policy is running normally).
  • Have you tried manually compressing your chunks to see if you get any errors then? Does compression complete normally in that case and is the space consumption reduced?

@yinan8128
Copy link
Author

Hello

There are retention jobs and compression jobs, there is no timescaledb_information.job_errors records which means no job failure?

From the attached image can see that retention worked(database size reduced 4 times), compression caused database size increased 7 times

@erimatnor
Copy link
Contributor

@yinan8128 Note that you need to upgrade to 2.11 to allow a VACUUM on the hypertable to release disk space. Your earlier vacuum script won't work because the informational view you used doesn't show the "internal" compressed chunks. You need to query the _timescaledb_catalog.chunk table to get all chunks.

@yinan8128
Copy link
Author

Hi

We found that manual compression also can make chunk and table size increased, added reproduce steps to the ticket, #5961 could you please help check?

@erimatnor erimatnor removed their assignment Aug 28, 2023
@pdipesh02 pdipesh02 self-assigned this Aug 28, 2023
@pdipesh02
Copy link
Contributor

Hi @yinan8128 ,

Replied to questions asked as part of issue #5961. Please let us know if you have any further questions.

@pdipesh02 pdipesh02 removed their assignment Aug 29, 2023
Copy link

Dear Author,

This issue has been automatically marked as stale due to lack of activity. With only the issue description that is currently provided, we do not have enough information to take action. If you have or find the answers we would need, please reach out. Otherwise, this issue will be closed in 30 days.

Thank you!

@yinan8128
Copy link
Author

yinan8128 commented Nov 21, 2023

How about the relationship of compression rate, query efficiency and data?
For example 10000 records:

  1. 1000 unique values, each value 10 records
  2. 100 unique values, each value 100 records
  3. 10 unique values, each value 1000 records

Which cases can get good compression rate and query efficieny please? Will case 3 get the best compression rate but worst query efficiency as query need to decompress compressed chunk firstly please? Thanks

@antekresic
Copy link
Contributor

Hi,

I believe this was answered in your other issue:
#5961 (comment)

Thanks.

Copy link

Dear Author,

This issue has been automatically marked as stale due to lack of activity. With only the issue description that is currently provided, we do not have enough information to take action. If you have or find the answers we would need, please reach out. Otherwise, this issue will be closed in 30 days.

Thank you!

Copy link

Dear Author,

We are closing this issue due to lack of activity. Feel free to add a comment to this issue if you can provide more information and we will re-open it.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants