Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying distributed partitined tables & COUNT(DISTINCT) with PG17.2/citus13.0.1 causes "wrong varnullingrels" error #7899

Closed
shimooka opened this issue Feb 13, 2025 · 3 comments · Fixed by #7901

Comments

@shimooka
Copy link

shimooka commented Feb 13, 2025

Running the following SQL query always causes a "wrong varnullingrels" error in Citus 13.0.1 with Postgres 17.2.

Testing environment:

  • Rocky Linux 9
  • Postgres 17.2 (src)
  • Citus 13.0.1
  • 1 coordinator, 6 workers

Tables:

DROP TABLE hashed_partition CASCADE;
CREATE TABLE hashed_partition(
    site_id            INT NOT NULL,
    profile_id         BIGINT      NOT NULL,
    session_id         TEXT        NOT NULL
)
PARTITION BY HASH (
    site_id
);

ALTER TABLE hashed_partition
ADD CONSTRAINT pk_hashed_partition PRIMARY KEY (
    site_id,
    session_id,
    profile_id
);


DROP TABLE ranged_partition CASCADE;
CREATE TABLE ranged_partition(
    site_id INT NOT NULL,
    ts TIMESTAMP NOT NULL,
    action_ts TIMESTAMP NOT NULL,
    session_id TEXT NOT NULL
)
PARTITION BY RANGE (
    action_ts
)
;

CREATE INDEX idx_ranged_partition ON ranged_partition
(
    site_id,
    ts,
    action_ts,
    session_id
);

SELECT create_distributed_table('hashed_partition', 'session_id');
SELECT create_distributed_table('ranged_partition', 'session_id');

Query

SELECT
    COUNT(DISTINCT hashed_partition.profile_id) AS "cnt_landing"
FROM
    ranged_partition
LEFT OUTER JOIN
     hashed_partition
ON (ranged_partition.site_id=hashed_partition.site_id
AND ranged_partition.session_id = hashed_partition.session_id)
WHERE ranged_partition.site_id=1
AND ranged_partition.ts>='2025-01-29 00:00:00'
AND ranged_partition.ts<'2025-02-12 00:00:00'
;

The query executes successfully if:

  • Use INNER JOIN instead of LEFT OUTER JOIN
  • ranged_partition table is not distributed
  • Use PG15 & Citus12.1.6

Logs:

2025-02-13 12:36:10 JST [493436]: [100-1] user=xdata,db=xdata_local_db,app=psql,client=::1 ERROR:  wrong varnullingrels (b 3) (expected (b)) for Var 1/1
2025-02-13 12:36:10 JST [493436]: [101-1] user=xdata,db=xdata_local_db,app=psql,client=::1 STATEMENT:  SELECT
            COUNT(DISTINCT hashed_partition.profile_id) AS "cnt_landing"
        FROM
            ranged_partition
        LEFT OUTER JOIN
             hashed_partition
        ON (ranged_partition.site_id=hashed_partition.site_id
        AND ranged_partition.session_id = hashed_partition.session_id)
        WHERE ranged_partition.site_id=1
        AND ranged_partition.ts>='2025-01-29 00:00:00'
        AND ranged_partition.ts<'2025-02-12 00:00:00'
        ;
2025-02-13 12:36:16 JST [496099]: [1-1] user=[unknown],db=[unknown],app=[unknown],client=::1 LOG:  connection received: host=::1 port=47124
2025-02-13 12:36:16 JST [496099]: [2-1] user=nobody,db=postgres,app=[unknown],client=::1 LOG:  connection authenticated: user="nobody" method=trust (/database/pgdata17/pg_hba.conf:121)
2025-02-13 12:36:16 JST [496099]: [3-1] user=nobody,db=postgres,app=[unknown],client=::1 LOG:  connection authorized: user=nobody database=postgres SSL enabled (protocol=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384, bits=256)
2025-02-13 12:36:16 JST [496099]: [4-1] user=nobody,db=postgres,app=[unknown],client=::1 LOG:  disconnection: session time: 0:00:00.021 user=nobody database=postgres host=::1 port=47124
@shimooka shimooka changed the title Querying distributed HASH&RANGE partitined tables & COUNT(DISTINCT) with PG17.2/citus13.0.1 causes "wrong varnullingrels" error Querying distributed partitined tables & COUNT(DISTINCT) with PG17.2/citus13.0.1 causes "wrong varnullingrels" error Feb 13, 2025
@shimooka
Copy link
Author

This error occurs with PG16.4+citus 12.1.6 too

@colm-mchugh
Copy link
Contributor

Thanks for filing @shimooka . The issue you hit is the same issue as #7787 , we will aim to fix in the next patch release.

@colm-mchugh colm-mchugh linked a pull request Feb 14, 2025 that will close this issue
colm-mchugh added a commit that referenced this issue Feb 18, 2025
DESCRIPTION: Fixes a crash in left outer joins that can happen when
there is an an aggregate on a column from the inner side of the join.

Fix the SEGV seen in #7787 and #7899; it occurs because a column in the
targetlist of a worker subquery can contain a non-empty varnullingrels
field if the column is from the inner side of a left outer join. The
issue can also occur with the columns in the HAVING clause, and this is
also tested in the fix. The issue was triggered by the introduction of
the varnullingrels to Vars in Postgres 16 (2489d76c)

There is a related issue, #7705, where a non-empty varnullingrels was
incorrectly copied into the query tree for the combine query. Here, a
non-empty varnullingrels field of a var is incorrectly copied into the
query tree for a worker subquery.

The regress file from #7705 is used (and renamed) to also test this
(#7787). An alternative test output file is required for Postgres 15
because of an optimization to DISTINCT in Postgres 16 (1349d2790bf).
@colm-mchugh
Copy link
Contributor

colm-mchugh commented Feb 18, 2025

Fixed by PR #7901

colm-mchugh added a commit that referenced this issue Feb 18, 2025
DESCRIPTION: Fixes a crash in left outer joins that can happen when
there is an an aggregate on a column from the inner side of the join.

Fix the SEGV seen in #7787 and #7899; it occurs because a column in the
targetlist of a worker subquery can contain a non-empty varnullingrels
field if the column is from the inner side of a left outer join. The
issue can also occur with the columns in the HAVING clause, and this is
also tested in the fix. The issue was triggered by the introduction of
the varnullingrels to Vars in Postgres 16 (2489d76c)

There is a related issue, #7705, where a non-empty varnullingrels was
incorrectly copied into the query tree for the combine query. Here, a
non-empty varnullingrels field of a var is incorrectly copied into the
query tree for a worker subquery.

The regress file from #7705 is used (and renamed) to also test this
(#7787). An alternative test output file is required for Postgres 15
because of an optimization to DISTINCT in Postgres 16 (1349d2790bf).
colm-mchugh added a commit that referenced this issue Feb 18, 2025
DESCRIPTION: Fixes a crash in left outer joins that can happen when
there is an an aggregate on a column from the inner side of the join.

Fix the SEGV seen in #7787 and #7899; it occurs because a column in the
targetlist of a worker subquery can contain a non-empty varnullingrels
field if the column is from the inner side of a left outer join. The
issue can also occur with the columns in the HAVING clause, and this is
also tested in the fix. The issue was triggered by the introduction of
the varnullingrels to Vars in Postgres 16 (2489d76c)

There is a related issue, #7705, where a non-empty varnullingrels was
incorrectly copied into the query tree for the combine query. Here, a
non-empty varnullingrels field of a var is incorrectly copied into the
query tree for a worker subquery.

The regress file from #7705 is used (and renamed) to also test this
(#7787). An alternative test output file is required for Postgres 15
because of an optimization to DISTINCT in Postgres 16 (1349d2790bf).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants