Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Continuous aggregate performance with time_bucket and timezone #4825

Closed
hardikm10 opened this issue Oct 14, 2022 · 6 comments · Fixed by #6552
Closed

[Bug]: Continuous aggregate performance with time_bucket and timezone #4825

hardikm10 opened this issue Oct 14, 2022 · 6 comments · Fixed by #6552

Comments

@hardikm10
Copy link

hardikm10 commented Oct 14, 2022

What type of bug is this?

Performance issue

What subsystems and features are affected?

Continuous aggregate, Query planner

What happened?

While investigating for the slow and cpu heavy refresh of a C-Agg, we observed that querying the underlying partial_view is scanning the entire hypertable. A partial view which is present inside the _timescaledb_internal schema is used to read the data from the main-hypertable during the C-Agg refresh. We are using time_bucket() function with timezone as well. If we remove the timezone argument, the underlying query explain looks robust, but with any timezone, the produced plan looks like scanning all the chunks. We think this might be the reason of slowness but would really appreciate any thoughts.

TimescaleDB version affected

2.8.1

PostgreSQL version used

14.5

What operating system did you use?

MST

What installation method did you use?

Other

What platform did you run on?

Managed Service for TimescaleDB (MST/Aiven)

Relevant log output and stack trace

How can we reproduce the bug?

Setup

create table sensor_data(
  time timestamptz not null,
  sensor_id integer not null,
  cpu double precision null,
  temperature double precision null
);

select * from create_hypertable('sensor_data','time');

INSERT INTO sensor_data
SELECT
time + (INTERVAL '1 minute' * random()) AS time,
sensor_id,
random() AS cpu,
random()* 100 AS temperature
FROM
generate_series(now() - INTERVAL '1 months', now() - INTERVAL '1 week', INTERVAL '10 minute') AS g1(time),
generate_series(1, 100, 1 ) AS g2(sensor_id)
ORDER BY
time;

Create the cagg using the timezone information in the time_bucket function

create materialized view sensor_summary_1day with (timescaledb.continuous) as
select time_bucket('1 day', time, 'UTC') bucket, sensor_id, avg(cpu) as avg_cpu, count(temperature), sum(temperature)
from sensor_data group by bucket, sensor_id;

select id from _timescaledb_catalog.hypertable 
WHERE table_name=(select materialization_hypertable_name 
from timescaledb_information.continuous_aggregates WHERE view_name='sensor_summary_1day');

-- as you can check, it's scanning the entire hypertable
twodmit=> explain analyze select * FROM  _timescaledb_internal._partial_view_10 WHERE bucket>='2022-10-10' and bucket<='2022-10-11';
                                                                                                                                                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=7382.35..7449.39 rows=200 width=36) (actual time=110.657..114.757 rows=0 loops=1)
   Group Key: (time_bucket('1 day'::interval, _hyper_9_16_chunk."time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval)), _hyper_9_16_chunk.sensor_id
   ->  Gather Merge  (cost=7382.35..7441.39 rows=400 width=60) (actual time=110.656..114.754 rows=0 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=6382.33..6395.19 rows=200 width=60) (actual time=73.721..73.725 rows=0 loops=3)
               Group Key: (time_bucket('1 day'::interval, _hyper_9_16_chunk."time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval)), _hyper_9_16_chunk.sensor_id
               ->  Sort  (cost=6382.33..6384.06 rows=691 width=28) (actual time=73.720..73.723 rows=0 loops=3)
                     Sort Key: (time_bucket('1 day'::interval, _hyper_9_16_chunk."time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval)), _hyper_9_16_chunk.sensor_id
                     Sort Method: quicksort  Memory: 25kB
                     Worker 0:  Sort Method: quicksort  Memory: 25kB
                     Worker 1:  Sort Method: quicksort  Memory: 25kB
                     ->  Result  (cost=0.00..6349.74 rows=691 width=28) (actual time=72.673..72.676 rows=0 loops=3)
                           ->  Parallel Append  (cost=0.00..6341.10 rows=691 width=28) (actual time=72.673..72.675 rows=0 loops=3)
                                 ->  Parallel Seq Scan on _hyper_9_16_chunk  (cost=0.00..1927.88 rows=296 width=28) (actual time=25.620..25.620 rows=0 loops=3)
                                       Filter: ((time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND (time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) <= '2022-10-11 00:00:00+00'::timestamp with time zone))
                                       Rows Removed by Filter: 33600
                                 ->  Parallel Seq Scan on _hyper_9_17_chunk  (cost=0.00..1927.88 rows=296 width=28) (actual time=33.357..33.358 rows=0 loops=2)
                                       Filter: ((time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND (time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) <= '2022-10-11 00:00:00+00'::timestamp with time zone))
                                       Rows Removed by Filter: 50400
                                 ->  Parallel Seq Scan on _hyper_9_18_chunk  (cost=0.00..1927.88 rows=296 width=28) (actual time=59.164..59.164 rows=0 loops=1)
                                       Filter: ((time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND (time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) <= '2022-10-11 00:00:00+00'::timestamp with time zone))
                                       Rows Removed by Filter: 100800
                                 ->  Parallel Seq Scan on _hyper_9_19_chunk  (cost=0.00..411.94 rows=64 width=28) (actual time=13.789..13.789 rows=0 loops=1)
                                       Filter: ((time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND (time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) <= '2022-10-11 00:00:00+00'::timestamp with time zone))
                                       Rows Removed by Filter: 21500
                                 ->  Parallel Seq Scan on _hyper_9_15_chunk  (cost=0.00..142.06 rows=22 width=28) (actual time=1.477..1.477 rows=0 loops=1)
                                       Filter: ((time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND (time_bucket('1 day'::interval, "time", 'UTC'::text, NULL::timestamp with time zone, NULL::interval) <= '2022-10-11 00:00:00+00'::timestamp with time zone))
                                       Rows Removed by Filter: 7400
 Planning Time: 0.438 ms
 Execution Time: 114.825 ms
(31 rows)

However, if I remove the Timezone information from the defn of C-Agg:

create materialized view sensor_summary_1day with (timescaledb.continuous) as
select time_bucket('1 day', time) bucket, sensor_id, avg(cpu) as avg_cpu, count(temperature), sum(temperature)
from sensor_data group by bucket, sensor_id;

select id from _timescaledb_catalog.hypertable 
WHERE table_name=(select materialization_hypertable_name 
from timescaledb_information.continuous_aggregates WHERE view_name='no_tz_sensor_summary_1day');

twodmit=> explain analyze select * FROM  _timescaledb_internal._partial_view_11 WHERE bucket>='2022-10-10' and bucket<='2022-10-11';

-- it's only scanning a single chunk as it's supposed to
                                                                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=2.34..2.37 rows=1 width=36) (actual time=0.008..0.009 rows=0 loops=1)
   Group Key: (time_bucket('1 day'::interval, _hyper_9_19_chunk."time")), _hyper_9_19_chunk.sensor_id
   ->  Sort  (cost=2.34..2.35 rows=1 width=28) (actual time=0.008..0.008 rows=0 loops=1)
         Sort Key: (time_bucket('1 day'::interval, _hyper_9_19_chunk."time")), _hyper_9_19_chunk.sensor_id
         Sort Method: quicksort  Memory: 25kB
         ->  Result  (cost=0.29..2.33 rows=1 width=28) (actual time=0.004..0.004 rows=0 loops=1)
               ->  Index Scan using _hyper_9_19_chunk_sensor_data_time_idx on _hyper_9_19_chunk  (cost=0.29..2.32 rows=1 width=28) (actual time=0.004..0.004 rows=0 loops=1)
                     Index Cond: (("time" >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND ("time" <= '2022-10-12 00:00:00+00'::timestamp with time zone))
                     Filter: ((time_bucket('1 day'::interval, "time") >= '2022-10-10 00:00:00+00'::timestamp with time zone) AND (time_bucket('1 day'::interval, "time") <= '2022-10-11 00:00:00+00'::timestamp with time zone))
 Planning Time: 0.367 ms
 Execution Time: 0.045 ms
@hardikm10 hardikm10 added the bug label Oct 14, 2022
@fabriziomello
Copy link
Contributor

@hardikm10 currently we support pushdown only for time_bucket(bucket_width, time_dimension) it means the other variations of the time_bucket functions will not pushdown the time filter. Looking to our code base to see how difficult is to support it. And I can't guarantee a quick fix for it.

@mkindahl mkindahl self-assigned this Nov 24, 2022
@mkindahl mkindahl removed their assignment Nov 24, 2022
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Dec 22, 2022
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Dec 22, 2022
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Dec 23, 2022
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Jan 2, 2023
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Jan 3, 2023
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Jan 4, 2023
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Jan 6, 2023
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Jan 11, 2023
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Jan 17, 2023
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Feb 16, 2023
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Feb 16, 2023
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Feb 17, 2023
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Feb 25, 2023
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
fabriziomello added a commit to fabriziomello/timescaledb that referenced this issue Mar 6, 2023
Added the ability to chunk exclusion happen when using
time_bucket(width, ts, tzname, origin, offset) variation.

Fixes timescale#4825
@sebastianvoss
Copy link

@fabriziomello Thanks for looking into this. We are facing the same issue. Are there any plans to put this on the roadmap? Thanks a lot in advance!

@sebastianvoss
Copy link

@fabriziomello Sorry to ask again, is this still being looked after? Thanks for your effort.

@fabifink
Copy link

fabifink commented Oct 7, 2023

@fabriziomello wanted to check on this issue as well, as I'm running into the same issue, which makes timezones in cagg unusable at the moment, my cagg policy runs for ~4.5 minutes with only about 4gb of data in the hypertable.

Any way I can support on fixing it?

@fabriziomello
Copy link
Contributor

@fabriziomello wanted to check on this issue as well, as I'm running into the same issue, which makes timezones in cagg unusable at the moment, my cagg policy runs for ~4.5 minutes with only about 4gb of data in the hypertable.

Any way I can support on fixing it?

Hey, there are an ongoing (and kinda of abandoned due to other priorities) PR to add support of time_bucket with timezone on caggs (#4854), but still require some more work: #4854 (review).

I don't have an ETA for it yet due to a lot of changes internally in the company, but this is something that is on my radar.

@fabifink
Copy link

Hey @fabriziomello, Thanks for the update! I'll keep an eye on the PR then.

@svenklemm svenklemm self-assigned this Jan 19, 2024
svenklemm added a commit to svenklemm/timescaledb that referenced this issue Jan 20, 2024
Since the optional time_bucket arguments like offset, origin and
timezone shift the output by at most bucket width we can optimize
these similar to how we optimize the other time_bucket constraints.

Fixes timescale#4825
svenklemm added a commit to svenklemm/timescaledb that referenced this issue Jan 20, 2024
Since the optional time_bucket arguments like offset, origin and
timezone shift the output by at most bucket width we can optimize
these similar to how we optimize the other time_bucket constraints.

Fixes timescale#4825
svenklemm added a commit to svenklemm/timescaledb that referenced this issue Jan 21, 2024
Since the optional time_bucket arguments like offset, origin and
timezone shift the output by at most bucket width we can optimize
these similar to how we optimize the other time_bucket constraints.

Fixes timescale#4825
svenklemm added a commit to svenklemm/timescaledb that referenced this issue Jan 21, 2024
Since the optional time_bucket arguments like offset, origin and
timezone shift the output by at most bucket width we can optimize
these similar to how we optimize the other time_bucket constraints.

Fixes timescale#4825
svenklemm added a commit that referenced this issue Jan 22, 2024
Since the optional time_bucket arguments like offset, origin and
timezone shift the output by at most bucket width we can optimize
these similar to how we optimize the other time_bucket constraints.

Fixes #4825
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants