Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Using now() selects way more chunks than using timestamp literals when enable_partitionwise_aggregate=on #6059

Closed
mrksngl opened this issue Sep 11, 2023 · 3 comments

Comments

@mrksngl
Copy link
Contributor

mrksngl commented Sep 11, 2023

What type of bug is this?

Performance issue

What subsystems and features are affected?

Query planner

What happened?

I executed two queries:

  • EXPLAIN … SELECT … WHERE time >= now()-'1 month'::timestamp
  • EXPLAIN … SELECT … WHERE time >= '2023-08-11'::timestamp

When enable_partitionwise_aggregate=off, both queries roughly do the same, but with enable_partitionwise_aggregate=on, the first one using now() scans on chunks which are not applicable for the time range in question.

I first encountered this in a multinode setup, but could also reproduce it on a non-distributed scenario, see below.

I guess that this leads to a performance degradation, as a lot more (all?) chunks are scanned by those queries. It is also very visible in a multinode setup, where the array of chunks to be examined is quite long when using EXPLAIN.

Extra Info:
This has been the root cause for a lot of strange behavior we have encountered lately:

  • when deleting data using DELETE ... WHERE ... now(), it also scans on old, i.e. (in our case) compressed chunks, ultimately resulting in error transparent decompression only supports tableoid system column although those chunks should have never been touched as of the WHERE clause. Again it worked when using an absolute timestamp literal in the clause.
  • when dropping older chunks, while other processes executed queries (on only new chunks only, as we supposed until figuring out this issue), the dropping query was cancelled when it detected deadlocks (I think this is related to [Bug]: Deadlock between drop_chunks and query on hypertable #5996)

Again note, that it only happens if enable_partitionwise_aggregate is on. In a multinode setup it is sufficient to enable it on the access node only to provoke the misbehavior.

TimescaleDB version affected

2.11.2

PostgreSQL version used

15.4

What operating system did you use?

Official docker image timescale/timescaledb:latest-pg15

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

-- this is executed in docker run --rm --network host -it -e POSTGRES_HOST_AUTH_METHOD=trust timescale/timescaledb:latest-pg15 -c enable_partitionwise_aggregate=on

CREATE TABLE tbl(time TIMESTAMP);
SELECT create_hypertable('tbl', 'time');
INSERT INTO tbl VALUES ('2023-09-11'::timestamptz), ('2023-01-01'::timestamptz);

-- query 1: timestamp literal
EXPLAIN (VERBOSE,ANALYZE) SELECT * FROM tbl WHERE time >= '2023-08-11'::timestamp;
-- note: does index scan on one chunk only

-- query 2: timestamp from now()
EXPLAIN (VERBOSE,ANALYZE) SELECT * FROM tbl WHERE time >= now()-'1 month'::interval;
-- note: does index scan on both chunks
@mrksngl mrksngl added the bug label Sep 11, 2023
@jnidzwetzki
Copy link
Contributor

Hello @mrksngl,

Thanks for the detailed steps to reproduce the issue. In a single-node environment, I see the following query plans:

test2=# EXPLAIN (VERBOSE,ANALYZE) SELECT * FROM tbl WHERE time >= now()-'1 month'::interval;
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Custom Scan (ChunkAppend) on public.tbl  (cost=10.00..66.35 rows=1506 width=8) (actual time=0.042..0.049 rows=1 loops=1)
   Output: tbl."time"
   Startup Exclusion: true
   Runtime Exclusion: false
   Chunks excluded during startup: 1
   ->  Bitmap Heap Scan on _timescaledb_internal._hyper_52_144_chunk  (cost=10.00..33.17 rows=753 width=8) (actual time=0.040..0.043 rows=1 loops=1)
         Output: _hyper_52_144_chunk."time"
         Recheck Cond: (_hyper_52_144_chunk."time" >= (now() - '1 mon'::interval))
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on _hyper_52_144_chunk_tbl_time_idx  (cost=0.00..9.81 rows=753 width=0) (actual time=0.027..0.027 rows=1 loops=1)
               Index Cond: (_hyper_52_144_chunk."time" >= (now() - '1 mon'::interval))
 Planning Time: 1.922 ms
 Execution Time: 0.286 ms
(13 rows)
SET enable_partitionwise_aggregate=on;

test2=# EXPLAIN (VERBOSE,ANALYZE) SELECT * FROM tbl WHERE time >= now()-'1 month'::interval;
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=10.00..73.88 rows=1506 width=8) (actual time=0.047..0.070 rows=1 loops=1)
   ->  Bitmap Heap Scan on _timescaledb_internal._hyper_52_144_chunk  (cost=10.00..33.17 rows=753 width=8) (actual time=0.045..0.048 rows=1 loops=1)
         Output: _hyper_52_144_chunk."time"
         Recheck Cond: (_hyper_52_144_chunk."time" >= (now() - '1 mon'::interval))
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on _hyper_52_144_chunk_tbl_time_idx  (cost=0.00..9.81 rows=753 width=0) (actual time=0.031..0.031 rows=1 loops=1)
               Index Cond: (_hyper_52_144_chunk."time" >= (now() - '1 mon'::interval))
   ->  Bitmap Heap Scan on _timescaledb_internal._hyper_52_145_chunk  (cost=10.00..33.17 rows=753 width=8) (actual time=0.015..0.016 rows=0 loops=1)
         Output: _hyper_52_145_chunk."time"
         Recheck Cond: (_hyper_52_145_chunk."time" >= (now() - '1 mon'::interval))
         ->  Bitmap Index Scan on _hyper_52_145_chunk_tbl_time_idx  (cost=0.00..9.81 rows=753 width=0) (actual time=0.014..0.014 rows=0 loops=1)
               Index Cond: (_hyper_52_145_chunk."time" >= (now() - '1 mon'::interval))
 Planning Time: 1.279 ms
 Execution Time: 0.227 ms
(14 rows)

As you mentioned, as soon as enable_partitionwise_aggregate is set to on, both chunks are scanned by the query. The reason is that PostgreSQL replans the query, and the ChunkAppend node (which is used in the first query) cannot be used. The ChunkAppend node is responsible for restricting the number of chunks that are accessed.

For single-node environments, we plan to include a new feature in Timescale 2.12, which performs partition-wise aggregation and works with optimizations like ChunkAppend. So, the problems you are mentioning should be solved in a single-node environment. However, this feature does not address these problems in a multi-node environment.

@mrksngl
Copy link
Contributor Author

mrksngl commented Sep 12, 2023

Ok, so it's also postgres deciding to do the replan if now instead of a literal is used?
Because this is actually the point that's not really clear here.

In a multinode environment, I've also seen that the list of chunks to be scanned is already included in the query.
Would it be possible to send the query to the data node and have the data node decide which chunks to actually use? Would that help in this situation?

Just a wild idea, I'm not into that level of detail here ;)

@svenklemm
Copy link
Member

Fixed by #6065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants