-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
auto_detect maxes out at 24 columns with json? #4055
Comments
Thanks for the report! This is related to the |
@Mytherin thank you !!!!! exactly what i was looking for... |
Let's keep the issue open so we can remember to document it :) |
updating docs to include `map_inferance_threshold` pr: duckdb/duckdb#11285 issue: duckdb#4055 discord: https://discord.com/channels/909674491309850675/921125471905787944/1305689816020549754
i have noticed that if i feed a (newline) json file to duckdb's
read_json()
data import, where each json object has 25 or more unique keys, duckdb just schematizes it as a single json column. it doesn't matter if you haveauto_detect
on, it won't infer the schema on a JSON object >25 keys (even if the keyspace is consistent across all lines in the JSON file).this seems like a strange/undocumented limitation (or i am missing something obvious).
here's some code which reproduces the behavior consistently... we generated two datasets, one has json with 24 columns, the other has 25... the one with 24 has the schema neatly inferred, the one with 25 is always single variant json column, like this:
is there some way to change how this works, or force duckdb to do schema inference? is i'd even be happy if i could get a view with the schema inferred (my data has hundreds of columns and i don't know which ones i need at the time i load the data...)
The text was updated successfully, but these errors were encountered: