[Bug]: Duplicate entries for real time group select #7439

TimWeFabricate · 2024-11-13T16:24:18Z

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Query executor, Query planner

What happened?

For the current year with real time enabled on my Materialized view I get duplicates on the grouping index.

The year 2024 is in there twice.

This happened on 2 servers for only 1 of the 2 types of logs and only on year level (Month and lower aggregates have no issues).

Code for creating the VIEW:
CREATE MATERIALIZED VIEW "YearlyEnergyResult" WITH (timescaledb.continuous, timescaledb.create_group_indexes=true, timescaledb.materialized_only = false) AS SELECT "MonthlyEnergyResult"."HeatPumpId", time_bucket('1 year'::interval, "MonthlyEnergyResult"."MonthBucket") AS "YearBucket", sum("MonthlyEnergyResult"."TotalEInHeating") AS "TotalEInHeating", sum("MonthlyEnergyResult"."TotalEInDhw") AS "TotalEInDhw", sum("MonthlyEnergyResult"."TotalEInHeatingDefrost") AS "TotalEInHeatingDefrost", sum("MonthlyEnergyResult"."TotalEInCooling") AS "TotalEInCooling", sum("MonthlyEnergyResult"."TotalEInDhwDefrost") AS "TotalEInDhwDefrost", sum("MonthlyEnergyResult"."TotalEOutHeating") AS "TotalEOutHeating", sum("MonthlyEnergyResult"."TotalEOutDhw") AS "TotalEOutDhw", sum("MonthlyEnergyResult"."TotalEOutHeatingDefrost") AS "TotalEOutHeatingDefrost", sum("MonthlyEnergyResult"."TotalEOutDhwDefrost") AS "TotalEOutDhwDefrost", sum("MonthlyEnergyResult"."TotalEOutCooling") AS "TotalEOutCooling", avg("MonthlyEnergyResult"."AveragePowerEInHeating") AS "AveragePowerEInHeating", avg("MonthlyEnergyResult"."AveragePowerEInDhw") AS "AveragePowerEInDhw", avg("MonthlyEnergyResult"."AveragePowerEInHeatingDefrost") AS "AveragePowerEInHeatingDefrost", avg("MonthlyEnergyResult"."AveragePowerEInCooling") AS "AveragePowerEInCooling", avg("MonthlyEnergyResult"."AveragePowerEInDhwDefrost") AS "AveragePowerEInDhwDefrost", avg("MonthlyEnergyResult"."AveragePowerEOutHeating") AS "AveragePowerEOutHeating", avg("MonthlyEnergyResult"."AveragePowerEOutDhw") AS "AveragePowerEOutDhw", avg("MonthlyEnergyResult"."AveragePowerEOutHeatingDefrost") AS "AveragePowerEOutHeatingDefrost", avg("MonthlyEnergyResult"."AveragePowerEOutDhwDefrost") AS "AveragePowerEOutDhwDefrost", avg("MonthlyEnergyResult"."AveragePowerEOutCooling") AS "AveragePowerEOutCooling" FROM "MonthlyEnergyResult" GROUP BY "MonthlyEnergyResult"."HeatPumpId", (time_bucket('1 year'::interval, "MonthlyEnergyResult"."MonthBucket"));

Notes:

Only containment I found was turning off real time querying (the query planner showed me that is where the extra column came from).
The values for the extra row are not explainable if I look at the month logs.
If I delete the entire view and recreate it again the duplicate persists.
If I alter the view to timescaledb.materialized_only = true and then back to false get a weird duplicate case:
It goes to this date. Again I cannot see anything in hyper table or month view for any duplicates

Now the issue is specifically for this date, it does not have any significance I could discover with regards to using the aggregate.
Other levels of logging are Hour, Day, Week, Month, Year
There is also another one that aggregates from Minutes up to Years that does not have any duplicates. So it is only for 1 of the 2 year tables.
The issue does appear in 2 our servers.

I really don't get how the index of the Group By can even be violated hence this bug report. I am really curious what the problem is here.

TimescaleDB version affected

2.10.2

PostgreSQL version used

17.7

What operating system did you use?

22.04

What installation method did you use?

Docker

What platform did you run on?

Microsoft Azure Cloud

Relevant log output and stack trace

"Sort  (cost=270.52..270.78 rows=104 width=664) (actual time=4.489..4.500 rows=3 loops=1)"
"  Sort Key: _materialized_hypertable_44.""YearBucket"" DESC"
"  Sort Method: quicksort  Memory: 25kB"
"  Buffers: shared hit=1405 dirtied=1"
"  ->  Append  (cost=0.15..267.03 rows=104 width=664) (actual time=0.013..4.493 rows=3 loops=1)"
"        Buffers: shared hit=1405 dirtied=1"
"        ->  Custom Scan (ChunkAppend) on _materialized_hypertable_44  (cost=0.15..4.73 rows=2 width=664) (actual time=0.011..0.019 rows=2 loops=1)"
"              Chunks excluded during startup: 0"
"              Buffers: shared hit=4"
"              ->  Index Scan using ""_hyper_44_436_chunk__materialized_hypertable_44_HeatPumpId_Year"" on _hyper_44_436_chunk  (cost=0.15..2.37 rows=1 width=664) (actual time=0.010..0.011 rows=1 loops=1)"
"                    Index Cond: ((""HeatPumpId"" = 'a7ce2f8a-1dcb-4740-b09e-7483072b1f72'::uuid) AND (""YearBucket"" < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(44)), '-infinity'::timestamp with time zone)))"
"                    Buffers: shared hit=2"
"              ->  Index Scan using ""_hyper_44_437_chunk__materialized_hypertable_44_HeatPumpId_Year"" on _hyper_44_437_chunk  (cost=0.15..2.37 rows=1 width=664) (actual time=0.004..0.004 rows=1 loops=1)"
"                    Index Cond: ((""HeatPumpId"" = 'a7ce2f8a-1dcb-4740-b09e-7483072b1f72'::uuid) AND (""YearBucket"" < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(44)), '-infinity'::timestamp with time zone)))"
"                    Buffers: shared hit=2"
"        ->  HashAggregate  (cost=254.36..260.74 rows=102 width=664) (actual time=4.463..4.471 rows=1 loops=1)"
"              Group Key: ""*SELECT* 1"".""HeatPumpId"", time_bucket('1 year'::interval, ""*SELECT* 1"".""MonthBucket"")"
"              Batches: 1  Memory Usage: 24kB"
"              Buffers: shared hit=1401 dirtied=1"
"              ->  Result  (cost=0.15..248.75 rows=102 width=664) (actual time=0.734..4.443 rows=3 loops=1)"
"                    Buffers: shared hit=1401 dirtied=1"
"                    ->  Append  (cost=0.15..247.48 rows=102 width=659) (actual time=0.732..4.438 rows=3 loops=1)"
"                          Buffers: shared hit=1401 dirtied=1"
"                          ->  Subquery Scan on ""*SELECT* 1""  (cost=0.15..21.46 rows=9 width=608) (actual time=0.731..1.435 rows=3 loops=1)"
"                                Buffers: shared hit=428"
"                                ->  Custom Scan (ChunkAppend) on _materialized_hypertable_36  (cost=0.15..21.37 rows=9 width=608) (actual time=0.727..1.429 rows=3 loops=1)"
"                                      Chunks excluded during startup: 7"
"                                      Buffers: shared hit=428"
"                                      ->  Index Scan using ""_hyper_36_413_chunk__materialized_hypertable_36_HeatPumpId_Mont"" on _hyper_36_413_chunk  (cost=0.15..2.38 rows=1 width=664) (actual time=0.726..0.727 rows=1 loops=1)"
"                                            Index Cond: ((""HeatPumpId"" = 'a7ce2f8a-1dcb-4740-b09e-7483072b1f72'::uuid) AND (""MonthBucket"" < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(36)), '-infinity'::timestamp with time zone)) AND (""MonthBucket"" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(44)), '-infinity'::timestamp with time zone)))"
"                                            Buffers: shared hit=214"
"                                      ->  Index Scan using ""_hyper_36_415_chunk__materialized_hypertable_36_HeatPumpId_Mont"" on _hyper_36_415_chunk  (cost=0.15..2.38 rows=1 width=664) (actual time=0.697..0.698 rows=2 loops=1)"
"                                            Index Cond: ((""HeatPumpId"" = 'a7ce2f8a-1dcb-4740-b09e-7483072b1f72'::uuid) AND (""MonthBucket"" < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(36)), '-infinity'::timestamp with time zone)) AND (""MonthBucket"" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(44)), '-infinity'::timestamp with time zone)))"
"                                            Buffers: shared hit=214"
"                          ->  Subquery Scan on ""*SELECT* 2""  (cost=218.76..225.50 rows=93 width=664) (actual time=2.995..3.000 rows=0 loops=1)"
"                                Buffers: shared hit=973 dirtied=1"
"                                ->  HashAggregate  (cost=218.76..224.57 rows=93 width=664) (actual time=2.995..3.000 rows=0 loops=1)"
"                                      Group Key: ""*SELECT* 1_1"".""HeatPumpId"", time_bucket('1 mon'::interval, ""*SELECT* 1_1"".""DayBucket"")"
"                                      Batches: 1  Memory Usage: 24kB"
"                                      Buffers: shared hit=973 dirtied=1"
"                                      ->  Result  (cost=0.29..213.65 rows=93 width=620) (actual time=2.994..2.999 rows=0 loops=1)"
"                                            Buffers: shared hit=973 dirtied=1"
"                                            ->  Append  (cost=0.29..212.48 rows=93 width=618) (actual time=2.994..2.998 rows=0 loops=1)"
"                                                  Buffers: shared hit=973 dirtied=1"
"                                                  ->  Subquery Scan on ""*SELECT* 1_1""  (cost=0.29..22.37 rows=9 width=190) (actual time=1.018..1.019 rows=0 loops=1)"
"                                                        Buffers: shared hit=316"
"                                                        ->  Custom Scan (ChunkAppend) on _materialized_hypertable_22  (cost=0.29..22.28 rows=9 width=191) (actual time=1.017..1.018 rows=0 loops=1)"
"                                                              Chunks excluded during startup: 8"
"                                                              Buffers: shared hit=316"
"                                                              ->  Index Scan using ""_hyper_22_394_chunk__materialized_hypertable_22_DayBucket_idx"" on _hyper_22_394_chunk  (cost=0.16..2.39 rows=1 width=137) (actual time=1.016..1.016 rows=0 loops=1)"
"                                                                    Index Cond: ((""DayBucket"" < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(22)), '-infinity'::timestamp with time zone)) AND (""DayBucket"" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(36)), '-infinity'::timestamp with time zone)))"
"                                                                    Filter: ((""HeatPumpId"" = 'a7ce2f8a-1dcb-4740-b09e-7483072b1f72'::uuid) AND (time_bucket('1 mon'::interval, ""DayBucket"") >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(44)), '-infinity'::timestamp with time zone)))"
"                                                                    Buffers: shared hit=316"
"                                                  ->  Subquery Scan on ""*SELECT* 2_1""  (cost=183.56..189.65 rows=84 width=664) (actual time=1.974..1.977 rows=0 loops=1)"
"                                                        Buffers: shared hit=657 dirtied=1"
"                                                        ->  HashAggregate  (cost=183.56..188.81 rows=84 width=664) (actual time=1.974..1.976 rows=0 loops=1)"
"                                                              Group Key: ""*SELECT* 1_2"".""HeatPumpId"", time_bucket('1 day'::interval, ""*SELECT* 1_2"".""HourBucket"")"
"                                                              Batches: 1  Memory Usage: 24kB"
"                                                              Buffers: shared hit=657 dirtied=1"
"                                                              ->  Result  (cost=0.30..181.04 rows=84 width=316) (actual time=1.972..1.975 rows=0 loops=1)"
"                                                                    Buffers: shared hit=657 dirtied=1"
"                                                                    ->  Append  (cost=0.30..179.99 rows=84 width=317) (actual time=1.972..1.974 rows=0 loops=1)"
"                                                                          Buffers: shared hit=657 dirtied=1"
"                                                                          ->  Subquery Scan on ""*SELECT* 1_2""  (cost=0.30..29.66 rows=9 width=90) (actual time=1.459..1.460 rows=0 loops=1)"
"                                                                                Buffers: shared hit=493 dirtied=1"
"                                                                                ->  Custom Scan (ChunkAppend) on _materialized_hypertable_21  (cost=0.30..29.57 rows=9 width=92) (actual time=1.458..1.458 rows=0 loops=1)"
"                                                                                      Chunks excluded during startup: 8"
"                                                                                      Buffers: shared hit=493 dirtied=1"
"                                                                                      ->  Index Scan using ""_hyper_21_392_chunk__materialized_hypertable_21_HeatPumpId_Hour"" on _hyper_21_392_chunk  (cost=0.29..9.38 rows=1 width=62) (actual time=1.457..1.457 rows=0 loops=1)"
"                                                                                            Index Cond: ((""HeatPumpId"" = 'a7ce2f8a-1dcb-4740-b09e-7483072b1f72'::uuid) AND (""HourBucket"" < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(21)), '-infinity'::timestamp with time zone)) AND (""HourBucket"" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(22)), '-infinity'::timestamp with time zone)))"
"                                                                                            Filter: ((time_bucket('1 day'::interval, ""HourBucket"") >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(36)), '-infinity'::timestamp with time zone)) AND (time_bucket('1 mon'::interval, time_bucket('1 day'::interval, ""HourBucket"")) >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(44)), '-infinity'::timestamp with time zone)))"
"                                                                                            Rows Removed by Filter: 16"
"                                                                                            Buffers: shared hit=493 dirtied=1"
"                                                                          ->  Subquery Scan on ""*SELECT* 2_2""  (cost=144.47..149.91 rows=75 width=344) (actual time=0.512..0.513 rows=0 loops=1)"
"                                                                                Buffers: shared hit=164"
"                                                                                ->  HashAggregate  (cost=144.47..149.16 rows=75 width=344) (actual time=0.512..0.512 rows=0 loops=1)"
"                                                                                      Group Key: ""EnergyLog"".""HeatPumpId"", time_bucket('01:00:00'::interval, ""EnergyLog"".""Timestamp"")"
"                                                                                      Batches: 1  Memory Usage: 24kB"
"                                                                                      Buffers: shared hit=164"
"                                                                                      ->  Custom Scan (ChunkAppend) on ""EnergyLog""  (cost=0.29..142.22 rows=75 width=104) (actual time=0.509..0.509 rows=0 loops=1)"
"                                                                                            Chunks excluded during startup: 74"
"                                                                                            Buffers: shared hit=164"
"                                                                                            ->  Index Scan Backward using _hyper_1_401_chunk_ix_energy_time on _hyper_1_401_chunk  (cost=0.29..2.55 rows=1 width=104) (actual time=0.508..0.508 rows=0 loops=1)"
"                                                                                                  Index Cond: ((""HeatPumpId"" = 'a7ce2f8a-1dcb-4740-b09e-7483072b1f72'::uuid) AND (""Timestamp"" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(21)), '-infinity'::timestamp with time zone)))"
"                                                                                                  Filter: ((time_bucket('01:00:00'::interval, ""Timestamp"") >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(22)), '-infinity'::timestamp with time zone)) AND (time_bucket('1 day'::interval, time_bucket('01:00:00'::interval, ""Timestamp"")) >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(36)), '-infinity'::timestamp with time zone)) AND (time_bucket('1 mon'::interval, time_bucket('1 day'::interval, time_bucket('01:00:00'::interval, ""Timestamp""))) >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(44)), '-infinity'::timestamp with time zone)))"
"                                                                                                  Buffers: shared hit=164"

How can we reproduce the bug?

I do not know how to reproduce it. One dataset has it, one doesn't. Its only on 1 server, but I cannot see any duplicates or incorrect order in my hyper tables. Also weird that it only happens for year on real time.

So: 
1 Create view as stated above
2 Create continuous aggregate for weekly refresh of year logs
3 Wait 11 months
4 Duplicate appears

The text was updated successfully, but these errors were encountered:

fabriziomello · 2024-11-13T21:26:08Z

@TimWeFabricate are u sure Postgres version is 17.7?? Cause last version today is 17.0 and tomorrow will be released the 17.1.

Another version related stuff is that TimescaleDB 2.10.2 is pretty old (and not maintained) so I strongly recommend first update it to 2.17.2 (last stable) and try it again.

kjell-van-straaten · 2024-11-13T21:37:03Z

Hey - I am a collaborator on this problem. We are actually using Postgres 14.7. Full details are:
PostgreSQL 14.7 (Ubuntu 14.7-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit

This also means we are using dockerized postgres on a linux vm :)

fabriziomello · 2024-11-13T21:59:37Z

Hey - I am a collaborator on this problem. We are actually using Postgres 14.7. Full details are: PostgreSQL 14.7 (Ubuntu 14.7-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit

No question and thanks for reaching out. It was clear that was a typo but is important to know exactly Postgres version, so based on your current TSDB version it can be 14.7, 13.7 or 12.7 (https://docs.timescale.com/self-hosted/latest/upgrades/upgrade-pg/) :-)

This also means we are using dockerized postgres on a linux vm :)

Is that possible to upgrade your version to 2.17.2 and recheck if the problem remains?

TimWeFabricate added the bug label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Duplicate entries for real time group select #7439

[Bug]: Duplicate entries for real time group select #7439

TimWeFabricate commented Nov 13, 2024

fabriziomello commented Nov 13, 2024

kjell-van-straaten commented Nov 13, 2024

fabriziomello commented Nov 13, 2024

[Bug]: Duplicate entries for real time group select #7439

[Bug]: Duplicate entries for real time group select #7439

Comments

TimWeFabricate commented Nov 13, 2024

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

fabriziomello commented Nov 13, 2024

kjell-van-straaten commented Nov 13, 2024

fabriziomello commented Nov 13, 2024