Parquet structured column maps to JSONBOID by default which causes error on scan #178
Open
2 tasks done
Labels
bug
Something isn't working
What happens?
[XX000] ERROR: Column messages has Arrow data type List(Field { name: "l", data_type: Struct([Field { name: "content", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "role", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) but is mapped to the BuiltIn(JSONBOID) type in Postgres, which are incompatible. If you believe this conversion should be supported, please submit a request at https://github.com/paradedb/paradedb/ issues.
So does ParadeDB actually support
jsonb
for these structured columns, or does it not?To Reproduce
See dataset on Huggingface; it's split into a handful of Parquet files, and I'm not sure if that's exactly relevant, but it may be? We deal with many HF datasets in this manner, and so far had no problems. I was under the assumption that pg_analytics supported JSON natively, and it seems to fail unless the conversion isn't specified exactly:
I wonder if it's possible to similarly override the columns with custom-defined domains? We have a
chat
domain which is ajsonb
with multiple constraints, casts, and helper functions defined over it. However, I'd previously tried to cast to it but pg_analytics couldn't recognise the type:OS:
ppc64el
ParadeDB Version:
v0.2.1
Are you using ParadeDB Docker, Helm, or the extension(s) standalone?
ParadeDB pg_analytics Extension
Full Name:
Ilya Kowalewski
Affiliation:
The Stone Cross Foundation of Ukraine
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include the code required to reproduce the issue?
Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: