[Bug]: Continuous aggregate performance with time_bucket and timezone #4825

hardikm10 · 2022-10-14T12:14:08Z

What type of bug is this?

Performance issue

What subsystems and features are affected?

Continuous aggregate, Query planner

What happened?

While investigating for the slow and cpu heavy refresh of a C-Agg, we observed that querying the underlying partial_view is scanning the entire hypertable. A partial view which is present inside the _timescaledb_internal schema is used to read the data from the main-hypertable during the C-Agg refresh. We are using time_bucket() function with timezone as well. If we remove the timezone argument, the underlying query explain looks robust, but with any timezone, the produced plan looks like scanning all the chunks. We think this might be the reason of slowness but would really appreciate any thoughts.

TimescaleDB version affected

2.8.1

PostgreSQL version used

14.5

What operating system did you use?

MST

What installation method did you use?

Other

What platform did you run on?

Managed Service for TimescaleDB (MST/Aiven)

Relevant log output and stack trace

How can we reproduce the bug?

Setup

create table sensor_data(
  time timestamptz not null,
  sensor_id integer not null,
  cpu double precision null,
  temperature double precision null
);

select * from create_hypertable('sensor_data','time');

INSERT INTO sensor_data
SELECT
time + (INTERVAL '1 minute' * random()) AS time,
sensor_id,
random() AS cpu,
random()* 100 AS temperature
FROM
generate_series(now() - INTERVAL '1 months', now() - INTERVAL '1 week', INTERVAL '10 minute') AS g1(time),
generate_series(1, 100, 1 ) AS g2(sensor_id)
ORDER BY
time;

Create the cagg using the timezone information in the time_bucket function

create materialized view sensor_summary_1day with (timescaledb.continuous) as
select time_bucket('1 day', time, 'UTC') bucket, sensor_id, avg(cpu) as avg_cpu, count(temperature), sum(temperature)
from sensor_data group by bucket, sensor_id;

select id from _timescaledb_catalog.hypertable 
WHERE table_name=(select materialization_hypertable_name 
from timescaledb_information.continuous_aggregates WHERE view_name='sensor_summary_1day');

-- as you can check, it's scanning the entire hypertable
twodmit=> explain analyze select * FROM  _timescaledb_internal._partial_view_10 WHERE bucket>='2022-10-10' and bucket<='2022-10-11';
                                                                                                                                                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=7382.35..7449.39 rows=200 width=36) (actual time=110.657..114.757 rows=0 loops=1)
   Group Key: (time_bucket('1 day'::interval, _hyper_9_16_chunk."time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval)), _hyper_9_16_chunk.sensor_id
   ->  Gather Merge  (cost=7382.35..7441.39 rows=400 width=60) (actual time=110.656..114.754 rows=0 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=6382.33..6395.19 rows=200 width=60) (actual time=73.721..73.725 rows=0 loops=3)
               Group Key: (time_bucket('1 day'::interval, _hyper_9_16_chunk."time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval)), _hyper_9_16_chunk.sensor_id
               ->  Sort  (cost=6382.33..6384.06 rows=691 width=28) (actual time=73.720..73.723 rows=0 loops=3)
                     Sort Key: (time_bucket('1 day'::interval, _hyper_9_16_chunk."time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval)), _hyper_9_16_chunk.sensor_id
                     Sort Method: quicksort  Memory: 25kB
                     Worker 0:  Sort Method: quicksort  Memory: 25kB
                     Worker 1:  Sort Method: quicksort  Memory: 25kB
                     ->  Result  (cost=0.00..6349.74 rows=691 width=28) (actual time=72.673..72.676 rows=0 loops=3)
                           ->  Parallel Append  (cost=0.00..6341.10 rows=691 width=28) (actual time=72.673..72.675 rows=0 loops=3)
                                 ->  Parallel Seq Scan on _hyper_9_16_chunk  (cost=0.00..1927.88 rows=296 width=28) (actual time=25.620..25.620 rows=0 loops=3)
                                       Filter: ((time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND (time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) <= '2022-10-11 00:00:00+00'::timestamp with time zone))
                                       Rows Removed by Filter: 33600
                                 ->  Parallel Seq Scan on _hyper_9_17_chunk  (cost=0.00..1927.88 rows=296 width=28) (actual time=33.357..33.358 rows=0 loops=2)
                                       Filter: ((time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND (time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) <= '2022-10-11 00:00:00+00'::timestamp with time zone))
                                       Rows Removed by Filter: 50400
                                 ->  Parallel Seq Scan on _hyper_9_18_chunk  (cost=0.00..1927.88 rows=296 width=28) (actual time=59.164..59.164 rows=0 loops=1)
                                       Filter: ((time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND (time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) <= '2022-10-11 00:00:00+00'::timestamp with time zone))
                                       Rows Removed by Filter: 100800
                                 ->  Parallel Seq Scan on _hyper_9_19_chunk  (cost=0.00..411.94 rows=64 width=28) (actual time=13.789..13.789 rows=0 loops=1)
                                       Filter: ((time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND (time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) <= '2022-10-11 00:00:00+00'::timestamp with time zone))
                                       Rows Removed by Filter: 21500
                                 ->  Parallel Seq Scan on _hyper_9_15_chunk  (cost=0.00..142.06 rows=22 width=28) (actual time=1.477..1.477 rows=0 loops=1)
                                       Filter: ((time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND (time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) <= '2022-10-11 00:00:00+00'::timestamp with time zone))
                                       Rows Removed by Filter: 7400
 Planning Time: 0.438 ms
 Execution Time: 114.825 ms
(31 rows)

However, if I remove the Timezone information from the defn of C-Agg:

create materialized view sensor_summary_1day with (timescaledb.continuous) as
select time_bucket('1 day', time) bucket, sensor_id, avg(cpu) as avg_cpu, count(temperature), sum(temperature)
from sensor_data group by bucket, sensor_id;

select id from _timescaledb_catalog.hypertable 
WHERE table_name=(select materialization_hypertable_name 
from timescaledb_information.continuous_aggregates WHERE view_name='no_tz_sensor_summary_1day');

twodmit=> explain analyze select * FROM  _timescaledb_internal._partial_view_11 WHERE bucket>='2022-10-10' and bucket<='2022-10-11';

-- it's only scanning a single chunk as it's supposed to
                                                                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=2.34..2.37 rows=1 width=36) (actual time=0.008..0.009 rows=0 loops=1)
   Group Key: (time_bucket('1 day'::interval, _hyper_9_19_chunk."time")), _hyper_9_19_chunk.sensor_id
   ->  Sort  (cost=2.34..2.35 rows=1 width=28) (actual time=0.008..0.008 rows=0 loops=1)
         Sort Key: (time_bucket('1 day'::interval, _hyper_9_19_chunk."time")), _hyper_9_19_chunk.sensor_id
         Sort Method: quicksort  Memory: 25kB
         ->  Result  (cost=0.29..2.33 rows=1 width=28) (actual time=0.004..0.004 rows=0 loops=1)
               ->  Index Scan using _hyper_9_19_chunk_sensor_data_time_idx on _hyper_9_19_chunk  (cost=0.29..2.32 rows=1 width=28) (actual time=0.004..0.004 rows=0 loops=1)
                     Index Cond: (("time" >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND ("time" <= '2022-10-12 00:00:00+00'::timestamp with time zone))
                     Filter: ((time_bucket('1 day'::interval, "time") >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND (time_bucket('1 day'::interval, "time") <= '2022-10-11 00:00:00+00'::timestamp with time zone))
 Planning Time: 0.367 ms
 Execution Time: 0.045 ms

fabriziomello · 2022-10-14T20:29:56Z

@hardikm10 currently we support pushdown only for time_bucket(bucket_width, time_dimension) it means the other variations of the time_bucket functions will not pushdown the time filter. Looking to our code base to see how difficult is to support it. And I can't guarantee a quick fix for it.

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

sebastianvoss · 2023-04-03T10:04:55Z

@fabriziomello Thanks for looking into this. We are facing the same issue. Are there any plans to put this on the roadmap? Thanks a lot in advance!

sebastianvoss · 2023-05-17T09:25:51Z

@fabriziomello Sorry to ask again, is this still being looked after? Thanks for your effort.

fabifink · 2023-10-07T11:51:11Z

@fabriziomello wanted to check on this issue as well, as I'm running into the same issue, which makes timezones in cagg unusable at the moment, my cagg policy runs for ~4.5 minutes with only about 4gb of data in the hypertable.

Any way I can support on fixing it?

fabriziomello · 2023-10-07T17:18:47Z

@fabriziomello wanted to check on this issue as well, as I'm running into the same issue, which makes timezones in cagg unusable at the moment, my cagg policy runs for ~4.5 minutes with only about 4gb of data in the hypertable.

Any way I can support on fixing it?

Hey, there are an ongoing (and kinda of abandoned due to other priorities) PR to add support of time_bucket with timezone on caggs (#4854), but still require some more work: #4854 (review).

I don't have an ETA for it yet due to a lot of changes internally in the company, but this is something that is on my radar.

fabifink · 2023-10-10T10:57:58Z

Hey @fabriziomello, Thanks for the update! I'll keep an eye on the PR then.

Since the optional time_bucket arguments like offset, origin and timezone shift the output by at most bucket width we can optimize these similar to how we optimize the other time_bucket constraints. Fixes timescale#4825

Since the optional time_bucket arguments like offset, origin and timezone shift the output by at most bucket width we can optimize these similar to how we optimize the other time_bucket constraints. Fixes #4825

hardikm10 added the bug label Oct 14, 2022

mkindahl self-assigned this Nov 24, 2022

mkindahl added continuous_aggregate performance labels Nov 24, 2022

mkindahl removed their assignment Nov 24, 2022

mkindahl added the Query Experience Team label Nov 24, 2022

fabriziomello mentioned this issue Nov 29, 2022

Chunk exclusion by time_bucket with timezone #4854

Closed

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Dec 22, 2022

Chunk exclusion by time_bucket with timezone

158eee1

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Dec 22, 2022

Chunk exclusion by time_bucket with timezone

3e198eb

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Dec 23, 2022

Chunk exclusion by time_bucket with timezone

751d63a

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Jan 2, 2023

Chunk exclusion by time_bucket with timezone

10ba3a1

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Jan 3, 2023

Chunk exclusion by time_bucket with timezone

04b6254

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Jan 4, 2023

Chunk exclusion by time_bucket with timezone

fba3029

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Jan 6, 2023

Chunk exclusion by time_bucket with timezone

bcf4371

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Jan 11, 2023

Chunk exclusion by time_bucket with timezone

4392d00

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Jan 17, 2023

Chunk exclusion by time_bucket with timezone

98b1c5b

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Feb 16, 2023

Chunk exclusion by time_bucket with timezone

d67b4d6

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Feb 16, 2023

Chunk exclusion by time_bucket with timezone

fc2f565

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Feb 17, 2023

Chunk exclusion by time_bucket with timezone

6c26e73

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Feb 25, 2023

Chunk exclusion by time_bucket with timezone

c199ba3

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Mar 6, 2023

Chunk exclusion by time_bucket with timezone

1471377

Added the ability to chunk exclusion happen when using time_bucket(width, ts, tzname, origin, offset) variation. Fixes timescale#4825

svenklemm self-assigned this Jan 19, 2024

svenklemm mentioned this issue Jan 20, 2024

Support all time_bucket variants in plan time chunk exclusion #6552

Merged

svenklemm closed this as completed in #6552 Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Continuous aggregate performance with time_bucket and timezone #4825

[Bug]: Continuous aggregate performance with time_bucket and timezone #4825

hardikm10 commented Oct 14, 2022 •

edited by fabriziomello

Loading

fabriziomello commented Oct 14, 2022

sebastianvoss commented Apr 3, 2023

sebastianvoss commented May 17, 2023

fabifink commented Oct 7, 2023

fabriziomello commented Oct 7, 2023

fabifink commented Oct 10, 2023

[Bug]: Continuous aggregate performance with time_bucket and timezone #4825

[Bug]: Continuous aggregate performance with time_bucket and timezone #4825

Comments

hardikm10 commented Oct 14, 2022 • edited by fabriziomello Loading

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

fabriziomello commented Oct 14, 2022

sebastianvoss commented Apr 3, 2023

sebastianvoss commented May 17, 2023

fabifink commented Oct 7, 2023

fabriziomello commented Oct 7, 2023

fabifink commented Oct 10, 2023

hardikm10 commented Oct 14, 2022 •

edited by fabriziomello

Loading