Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partition_data_*() not work when table PK defined with GENERATED ALWAYS #724

Open
waitingsong opened this issue Dec 22, 2024 · 7 comments
Open

Comments

@waitingsong
Copy link

waitingsong commented Dec 22, 2024

I use pgmq with pg_parten,
got error when calling partition_data_time()

cannot insert a non-DEFAULT value into column "msg_id"
DETAIL:  Column "msg_id" is an identity column defined as GENERATED ALWAYS.
HINT:  Use OVERRIDING SYSTEM VALUE to override.
CONTEXT:  SQL statement "WITH partition_data AS (
                DELETE FROM partman_temp_data_storage RETURNING *)
            INSERT INTO pgmq.q_abc0ba_p20241221_153200 (msg_id,read_ct,enqueued_at,vt,message,headers) SELECT msg_id,read_ct,enqueued_at,vt,message,headers FROM partition_data"
PL/pgSQL function partition_data_time(text,integer,interval,numeric,text,boolean,text,text[]) line 245 at EXECUTE

Cause Column "msg_id" is an identity column defined as GENERATED ALWAYS:

CREATE TABLE "pgmq"."Untitled" (
  "msg_id" int8 NOT NULL GENERATED ALWAYS AS IDENTITY , -- <----- here
  "read_ct" int4 NOT NULL DEFAULT 0,
  "enqueued_at" timestamptz(6) NOT NULL DEFAULT now(),
  "vt" timestamptz(6) NOT NULL,
  "message" jsonb,
  "headers" jsonb
)
PARTITION BY RANGE (
  "enqueued_at" "pg_catalog"."timestamptz_ops"
)
;

I committed a PR to handle this situation : #723

@keithf4
Copy link
Collaborator

keithf4 commented Dec 30, 2024

So I'm not able to re-produced this issue when using the p_ignored_columns option. Reading the description of OVERRIDING SYSTEM VALUE, you should only be running into that error when you're actually trying to insert values into that column. The p_ignored_columns option should completely remove that column from any insert statements so that override shouldn't be necessary.
Or are you trying to intentionally override those values? If so, that's a completely different issue and properly handling that would be different than the PR you provided. Your PR makes it so it always overrides no matter what if there's an identity. My assumption when someone has an identity set to GENERATE ALWAYS for any table (not just partitioned) is that no user should be inserting any values into that table without explicitly making that clear. So I would not want that to be the default in partman.

Below is a similar example to what you've given (I just make it all lower case and only set the NOT NULL columns to be the relevant ones to make an easier test case). It works fine ignoring the column and allows the partitioned table to generate the ID columns automatically.

If you can provide me a working example of where you're running into an error, maybe we can figure out a better solution to allow users to override identities if desired.

keith@github724=# CREATE TABLE "pgmq"."untitled" (
  "msg_id" int8 NOT NULL GENERATED ALWAYS AS IDENTITY , -- <----- here
  "read_ct" int4 DEFAULT 0,
  "enqueued_at" timestamptz(6) NOT NULL DEFAULT now(),
  "vt" timestamptz(6),
  "message" jsonb,
  "headers" jsonb
)
PARTITION BY RANGE (
  "enqueued_at" "pg_catalog"."timestamptz_ops"
)
;
CREATE TABLE
Time: 6.201 ms

keith@github724=# create table pgmq.source_untitled (like pgmq.untitled);
CREATE TABLE
Time: 14.238 ms

keith@github724=# alter table pgmq.source_untitled alter msg_id add generated always as identity;
ALTER TABLE
Time: 14.269 ms

keith@github724=# select partman.create_parent('pgmq.untitled', 'enqueued_at', '1 day');
 create_parent 
---------------
 t
(1 row)

Time: 35.883 ms
keith@github724=# \d+ pgmq.untitled
                                                           Partitioned table "pgmq.untitled"
   Column    |            Type             | Collation | Nullable |           Default            | Storage  | Compression | Stats target | Description 
-------------+-----------------------------+-----------+----------+------------------------------+----------+-------------+--------------+-------------
 msg_id      | bigint                      |           | not null | generated always as identity | plain    |             |              | 
 read_ct     | integer                     |           |          | 0                            | plain    |             |              | 
 enqueued_at | timestamp(6) with time zone |           | not null | now()                        | plain    |             |              | 
 vt          | timestamp(6) with time zone |           |          |                              | plain    |             |              | 
 message     | jsonb                       |           |          |                              | extended |             |              | 
 headers     | jsonb                       |           |          |                              | extended |             |              | 
Partition key: RANGE (enqueued_at)
Partitions: pgmq.untitled_p20241226 FOR VALUES FROM ('2024-12-26 00:00:00-05') TO ('2024-12-27 00:00:00-05'),
            pgmq.untitled_p20241227 FOR VALUES FROM ('2024-12-27 00:00:00-05') TO ('2024-12-28 00:00:00-05'),
            pgmq.untitled_p20241228 FOR VALUES FROM ('2024-12-28 00:00:00-05') TO ('2024-12-29 00:00:00-05'),
            pgmq.untitled_p20241229 FOR VALUES FROM ('2024-12-29 00:00:00-05') TO ('2024-12-30 00:00:00-05'),
            pgmq.untitled_p20241230 FOR VALUES FROM ('2024-12-30 00:00:00-05') TO ('2024-12-31 00:00:00-05'),
            pgmq.untitled_p20241231 FOR VALUES FROM ('2024-12-31 00:00:00-05') TO ('2025-01-01 00:00:00-05'),
            pgmq.untitled_p20250101 FOR VALUES FROM ('2025-01-01 00:00:00-05') TO ('2025-01-02 00:00:00-05'),
            pgmq.untitled_p20250102 FOR VALUES FROM ('2025-01-02 00:00:00-05') TO ('2025-01-03 00:00:00-05'),
            pgmq.untitled_p20250103 FOR VALUES FROM ('2025-01-03 00:00:00-05') TO ('2025-01-04 00:00:00-05'),
            pgmq.untitled_default DEFAULT

keith@github724=# insert into pgmq.source_untitled (enqueued_at) values (generate_series('2024-12-26'::timestamptz, '2025-01-02'::timestamptz, '1 hour'::interval));
INSERT 0 169
Time: 18.768 ms

keith@github724=# call partman.partition_data_proc('pgmq.untitled', p_source_table := 'pgmq.source_untitled', p_ignored_columns := ARRAY['msg_id']);              
NOTICE:  Loop: 1, Rows moved: 24
NOTICE:  Loop: 2, Rows moved: 24
NOTICE:  Loop: 3, Rows moved: 24
NOTICE:  Loop: 4, Rows moved: 24
NOTICE:  Loop: 5, Rows moved: 24
NOTICE:  Loop: 6, Rows moved: 24
NOTICE:  Loop: 7, Rows moved: 24
NOTICE:  Loop: 8, Rows moved: 1
NOTICE:  Total rows moved: 169
NOTICE:  Ensure to VACUUM ANALYZE the parent (and source table if used) after partitioning data
CALL
Time: 8118.288 ms (00:08.118)

keith@github724=# select * from pgmq.untitled;
 msg_id | read_ct |      enqueued_at       |   vt   | message | headers 
--------+---------+------------------------+--------+---------+---------
      1 |  «NULL» | 2024-12-26 00:00:00-05 | «NULL» | «NULL»  | «NULL»
      2 |  «NULL» | 2024-12-26 01:00:00-05 | «NULL» | «NULL»  | «NULL»
      3 |  «NULL» | 2024-12-26 02:00:00-05 | «NULL» | «NULL»  | «NULL»
      4 |  «NULL» | 2024-12-26 03:00:00-05 | «NULL» | «NULL»  | «NULL»
      5 |  «NULL» | 2024-12-26 04:00:00-05 | «NULL» | «NULL»  | «NULL»
      6 |  «NULL» | 2024-12-26 05:00:00-05 | «NULL» | «NULL»  | «NULL»
      7 |  «NULL» | 2024-12-26 06:00:00-05 | «NULL» | «NULL»  | «NULL»
      8 |  «NULL» | 2024-12-26 07:00:00-05 | «NULL» | «NULL»  | «NULL»
      9 |  «NULL» | 2024-12-26 08:00:00-05 | «NULL» | «NULL»  | «NULL»
     10 |  «NULL» | 2024-12-26 09:00:00-05 | «NULL» | «NULL»  | «NULL»
[...]

@keithf4
Copy link
Collaborator

keithf4 commented Dec 31, 2024

Created some new unit tests for version 5.2.4 to test the bug I fixed there. I also had those test use GENERATED IDENTITY columns, one even using it for the partition key, and all seems to be working fine as long as I set p_ignored_columns.

#727

So just let me know if you're still having issues even when using the ignored columns options

@keithf4
Copy link
Collaborator

keithf4 commented Dec 31, 2024

Actually, I think I just found the issue. I was testing this in PG17, but apparently the ability for partitioned tables to properly honor IDENTITY was not added until 17!

https://www.postgresql.org/docs/17/release-17.html

Allow partitioned tables to have identity columns (Ashutosh Bapat) 

I got this error when I tested in PG16.

keith@keith=# SELECT partman.partition_data_id('partman_test.id_taptest_table', '20', p_source_table := 'partman_source.id_taptest_table_source', p_ignored_columns := ARRAY['col1']);
ERROR:  null value in column "col1" of relation "id_taptest_table_p0" violates not-null constraint
DETAIL:  Failing row contains (null, stuff, 2024-12-31 17:49:03.486479-05, stuff3000000001).
CONTEXT:  SQL statement "WITH partition_data AS (
            DELETE FROM ONLY partman_source.id_taptest_table_source WHERE col1 >= 0 AND col1 < 10 RETURNING *)
        INSERT INTO partman_test.id_taptest_table_p0 (col2,col3,col4) SELECT col2,col3,col4 FROM partition_data"
PL/pgSQL function partman.partition_data_id(text,integer,bigint,numeric,text,boolean,text,text[]) line 215 at EXECUTE
Time: 15.045 ms

So, this is actually an issue that has been fixed properly in core as of PG17. I don't think I want to allow an override option to get around this for now. The recommended fix would be to upgrade to PostgreSQL 17.

@waitingsong
Copy link
Author

waitingsong commented Jan 6, 2025

I have re producted on quay.io/tembo/pg17-pgmq:v1.5.0 ( pg-partman version 5.1.0)

CREATE TABLE "tb1" (
  "msg_id" int8 NOT NULL GENERATED ALWAYS AS IDENTITY , 
  "read_ct" int4 DEFAULT 0,
  "enqueued_at" timestamptz(6) NOT NULL DEFAULT now()
)
PARTITION BY RANGE (
  "enqueued_at" "pg_catalog"."timestamptz_ops"
);

select create_parent('public.tb1', 'enqueued_at', '1 day');

postgres=#  \d+ public.tb1;
                                                            Partitioned table "public.tb1"
   Column    |            Type             | Collation | Nullable |           Default            | Storage | Compression | Stats target | Description
-------------+-----------------------------+-----------+----------+------------------------------+---------+-------------+--------------+-------------
 msg_id      | bigint                      |           | not null | generated always as identity | plain   |             |              |
 read_ct     | integer                     |           |          | 0                            | plain   |             |              |
 enqueued_at | timestamp(6) with time zone |           | not null | now()                        | plain   |             |              |
Partition key: RANGE (enqueued_at)
Partitions: tb1_p20250102 FOR VALUES FROM ('2025-01-02 00:00:00+00') TO ('2025-01-03 00:00:00+00'),
            tb1_p20250103 FOR VALUES FROM ('2025-01-03 00:00:00+00') TO ('2025-01-04 00:00:00+00'),
            tb1_p20250104 FOR VALUES FROM ('2025-01-04 00:00:00+00') TO ('2025-01-05 00:00:00+00'),
            tb1_p20250105 FOR VALUES FROM ('2025-01-05 00:00:00+00') TO ('2025-01-06 00:00:00+00'),
            tb1_p20250106 FOR VALUES FROM ('2025-01-06 00:00:00+00') TO ('2025-01-07 00:00:00+00'),
            tb1_p20250107 FOR VALUES FROM ('2025-01-07 00:00:00+00') TO ('2025-01-08 00:00:00+00'),
            tb1_p20250108 FOR VALUES FROM ('2025-01-08 00:00:00+00') TO ('2025-01-09 00:00:00+00'),
            tb1_p20250109 FOR VALUES FROM ('2025-01-09 00:00:00+00') TO ('2025-01-10 00:00:00+00'),
            tb1_p20250110 FOR VALUES FROM ('2025-01-10 00:00:00+00') TO ('2025-01-11 00:00:00+00'),
            tb1_default DEFAULT

Insert a row into default part

insert into public.tb1_default (enqueued_at) values ('2025-01-15'::timestamptz);
postgres=# select * from tb1_default;
 msg_id | read_ct |      enqueued_at
--------+---------+------------------------
      1 |       0 | 2025-01-15 00:00:00+00
(1 row)

Merge data:

postgres=# select partition_data_time('public.tb1');
ERROR:  cannot insert a non-DEFAULT value into column "msg_id"
DETAIL:  Column "msg_id" is an identity column defined as GENERATED ALWAYS.
HINT:  Use OVERRIDING SYSTEM VALUE to override.
CONTEXT:  SQL statement "WITH partition_data AS (
                DELETE FROM partman_temp_data_storage RETURNING *)
            INSERT INTO public.tb1_p20250115 (msg_id,read_ct,enqueued_at) SELECT msg_id,read_ct,enqueued_at FROM partition_data"
PL/pgSQL function partition_data_time(text,integer,interval,numeric,text,boolean,text,text[]) line 245 at EXECUTE

@waitingsong
Copy link
Author

using p_ignored_columns

postgres=# select partition_data_time('public.tb1', 1, null, 0, 'ASC', true, NULL, ARRAY['msg_id']);
ERROR:  null value in column "msg_id" of relation "partman_temp_data_storage" violates not-null constraint
DETAIL:  Failing row contains (null, 0, 2025-01-15 00:00:00+00).
CONTEXT:  SQL statement "WITH partition_data AS (
                DELETE FROM public.tb1_default WHERE enqueued_at >= '2025-01-15 00:00:00+00' AND enqueued_at < '2025-01-16 00:00:00+00' RETURNING *)
            INSERT INTO partman_temp_data_storage (read_ct,enqueued_at) SELECT read_ct,enqueued_at FROM partition_data"
PL/pgSQL function partition_data_time(text,integer,interval,numeric,text,boolean,text,text[]) line 232 at EXECUTE

@keithf4
Copy link
Collaborator

keithf4 commented Jan 6, 2025

Ahh I see the issue. I did not test this scenario when dealing with data in the default. The issue is the temp table that's created to hold the data while moving it out of the default then into the proper child (note the table name partman_temp_data_storage).

I'll try and work out a fix for this. In the mean time if you did need to handle this scenario, you could move the data out of the default yourself into another similar table (temporary or not), adjust the premake value so that it creates the necessary future partition in maintenance, then insert that data back in. If the data is in the past, that's a little trickier. You'd have to feed the proper values to the internal create_partition_time() function to have it make the child table you need.

@keithf4
Copy link
Collaborator

keithf4 commented Jan 6, 2025

So I'm going to try and see if I can allow an option for OVERRIDING SYSTEM VALUES. Now that I see where this issue is, I could foresee a scenario where someone may want to preserve the generated value that went into the default or even from an original source table. But others may just want a newly generated value when the data is moved. The latter will be the default action that a user will have to intentionally override.

So, may be a bit before I get this fix in. Thanks again for reporting and helping me narrow down the problem.

@keithf4 keithf4 added this to the 5.3 milestone Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants