Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support quoting columns in inferred schemas #337

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shaug
Copy link

@shaug shaug commented Dec 24, 2024

Description & motivation

If infer_schema is set to quote, then the inferred columns will be quoted. This is useful in scenarios where e.g., the schemas are generated from nested JSON objects.

Addresses issue #336

Checklist

  • I have verified that these changes work locally
  • I have updated the README.md (if applicable)
  • I have added an integration test for my fix/feature (if applicable)

If `infer_schema` is set to `quote`, then the columns will be quoted.
This is useful in scenarios e.g., where the schemas are generated from
nested JSON objects.
@shaug shaug requested a review from jeremyyeo as a code owner December 24, 2024 00:21
@shaug
Copy link
Author

shaug commented Jan 21, 2025

hi @jeremyyeo. does my approach here seem sound? I'd love to address this one way or the other, because the lack of quoting in inferred schemas is a significant shortcoming in our JSON-based schemas

@bdavis-dfw
Copy link

We would like to see this or similar change implemented as well. We have mixed case columns in parquet files that aren't handled properly in the current macro. The infer_schema = quote approach is conservative and makes sense, but I believe you could also just change the infer_schema = true scenario to use your logic. We currently have a custom macro set up that way.

If a column in the external file is clean and UPPER case then the quotes will basically be ignored by Snowflake anyway. If it's mixed case or has periods then the quotes will apply. But this solution would work for our use case also. Would like to avoid having to use a custom version of the macro in our code.

@shaug
Copy link
Author

shaug commented Feb 5, 2025

@bdavis-dfw yeah I'm not opposed to making it work with just infer_schema = true, but I didn't want to change existing library behavior for this mode without feedback. Happy to change the implementation to always quote, but I'd love to hear from the actual project owners about their desired approach to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants