Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query bytea type result error #53

Open
1 of 2 tasks
kysshsy opened this issue Jul 28, 2024 · 12 comments
Open
1 of 2 tasks

Query bytea type result error #53

kysshsy opened this issue Jul 28, 2024 · 12 comments
Labels
bug Something isn't working good first issue Good for newcomers priority-high High priority issue user-request This issue was directly requested by a user

Comments

@kysshsy
Copy link
Contributor

kysshsy commented Jul 28, 2024

What happens?

I am developing quals push-down feature and have found that the query result is abnormal

pg_lakehouse=# select binary_col from t1   where  binary_col = '\x68656c6c6f';
 binary_col
------------
(0 rows)

pg_lakehouse=# select binary_col from t1   where  binary_col = E'\\x68656c6c6f';
 binary_col
------------
(0 rows)

pg_lakehouse=# select binary_col from t1;
    binary_col
------------------
 \x68656c6c6f
 \x
 \x70617271756574
(3 rows)

pg_lakehouse=# select binary_col from t1   where  binary_col = E'';
 binary_col
------------
 \x
(1 row)

pg_lakehouse=#

To Reproduce

  1. export parquet file in the code. test_arrow_types.parquet
  2. create foreign table
CREATE FOREIGN DATA WRAPPER parquet_wrapper
HANDLER parquet_fdw_handler
VALIDATOR parquet_fdw_validator;

CREATE SERVER parquet_server
FOREIGN DATA WRAPPER parquet_wrapper;

CREATE FOREIGN TABLE t1 (
    boolean_col       boolean,
	int8_col          smallint,
	int16_col         smallint,
	int32_col         integer,
	int64_col         bigint,
	uint8_col         smallint,
	uint16_col        integer,
	uint32_col        bigint,
	uint64_col        numeric(20),
	float32_col       real,
	float64_col       double precision,
	date32_col        date,
	date64_col        date,
	binary_col        bytea,
	large_binary_col  bytea,
	utf8_col          text,
	large_utf8_col    text
)
SERVER parquet_server
OPTIONS (files '/home/kyss/labs/paradedb/test_arrow_types.parquet');
  1. query
select binary_col from t1   where  binary_col = '\x68656c6c6f';
select binary_col from t1   where  binary_col = E'\\x68656c6c6f';
select binary_col from t1   where  binary_col = E'';
select binary_col from t1;

OS:

x86

ParadeDB Version:

0.8.6.

Are you using ParadeDB Docker, Helm, or the extension(s) standalone?

ParadeDB pg_lakehouse Extension

Full Name:

kyss

Affiliation:

NA

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include the code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?

  • Yes, I have
@kysshsy kysshsy added the bug Something isn't working label Jul 28, 2024
@kysshsy
Copy link
Contributor Author

kysshsy commented Jul 28, 2024

The issue occurs in both the FDW and executor hook. I suspect there's an issue with connection::create_arrow. It can query correctly with E'' but fails with non-zero-length bytes.

@destrex271
Copy link

Hi @philippemnoel I would like to look into this

@philippemnoel
Copy link
Collaborator

Hi @philippemnoel I would like to look into this

All yours :) Thank you for looking into it!

@kysshsy
Copy link
Contributor Author

kysshsy commented Jul 31, 2024

Hi, I think it's about the difference of duckdb and postgres hex representation. https://www.postgresql.org/docs/current/datatype-binary.html
https://duckdb.org/docs/sql/data_types/blob

@destrex271
Copy link

destrex271 commented Jul 31, 2024

Hi, I think it's about the difference of duckdb and postgres hex representation. https://www.postgresql.org/docs/current/datatype-binary.html
https://duckdb.org/docs/sql/data_types/blob

That is a possibility, pg might be using the escape format which is causing all the problem.
We might need to decode the string and then encode it into a format that goes well with Duckdb.

But before that I'd like to finish investigating from my side too

@destrex271
Copy link

Hi @kysshsy can you please attach the paraquet file here, if possible?

@kysshsy
Copy link
Contributor Author

kysshsy commented Jul 31, 2024

test_arrow_types.parquet.zip
@destrex271 Yes, github not support parquet files. I forget to zip it.

@destrex271
Copy link

destrex271 commented Aug 6, 2024

Hi, I think it's about the difference of duckdb and postgres hex representation. https://www.postgresql.org/docs/current/datatype-binary.html https://duckdb.org/docs/sql/data_types/blob

Yep seems like this is the actual problem.
I checked the query execution but its working fine.

Will raise a PR as soon as I come up with a possible fix.

@philippemnoel philippemnoel added the good first issue Good for newcomers label Aug 7, 2024
@philippemnoel
Copy link
Collaborator

Hi, I think it's about the difference of duckdb and postgres hex representation. https://www.postgresql.org/docs/current/datatype-binary.html https://duckdb.org/docs/sql/data_types/blob

Yep seems like this is the actual problem. I checked the query execution but its working fine.

Will raise a PR as soon as I come up with a possible fix.

Thank you :)

@philippemnoel philippemnoel transferred this issue from paradedb/paradedb Aug 8, 2024
@philippemnoel philippemnoel added the priority-medium Medium priority issue label Aug 8, 2024
@philippemnoel
Copy link
Collaborator

Hi, I think it's about the difference of duckdb and postgres hex representation. https://www.postgresql.org/docs/current/datatype-binary.html https://duckdb.org/docs/sql/data_types/blob

Yep seems like this is the actual problem. I checked the query execution but its working fine.

Will raise a PR as soon as I come up with a possible fix.

Hi @destrex271. Any update on this?

@destrex271
Copy link

Nope not any as of now, I think it's better if someone else picks it up for now

@philippemnoel
Copy link
Collaborator

Nope not any as of now, I think it's better if someone else picks it up for now

Thank you!

@philippemnoel philippemnoel added priority-high High priority issue user-request This issue was directly requested by a user and removed priority-medium Medium priority issue labels Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers priority-high High priority issue user-request This issue was directly requested by a user
Projects
None yet
Development

No branches or pull requests

3 participants