Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't panic if object_store connector can't read a table #2255

Merged
merged 1 commit into from
Dec 12, 2023
Merged

Don't panic if object_store connector can't read a table #2255

merged 1 commit into from
Dec 12, 2023

Conversation

Jesse-Bakker
Copy link
Contributor

Users will from time to time place files with an inconsistent schema
in their S3 buckets. Instead of panicking when that happens, log an
error.

Users will from time to time place files with an inconsistent schema
in their S3 buckets. Instead of panicking when that happens, log an
error.
Copy link
Contributor

@abcpro1 abcpro1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that the schema of the first file is selected, and any subsequent file has a different schema is skipped, and the ingestion continues with the remaining files. Did I understand correctly?

@Jesse-Bakker
Copy link
Contributor Author

@abcpro1 yes, that's basically it. I think we want to completely rework this connector sometime soon, as it has quite a few sharp edges, but I think this should do the job for now.

@abcpro1 abcpro1 added this pull request to the merge queue Dec 12, 2023
@abcpro1
Copy link
Contributor

abcpro1 commented Dec 12, 2023

I see. Is it OK to skip files?

@Jesse-Bakker
Copy link
Contributor Author

What we did before was panicking, and I think skipping while logging the error is better than that, but we could make it configurable in the future. A third way might be to check if the schema is a subset and ingest with the missing columns as NULL, which is a decent alternative to skipping. We can add that behind a config value when we rework this connector. I think the more common case is that an unrelated file is added by accident though, in which it should be skipped anyway.

Merged via the queue into getdozer:main with commit a479aef Dec 12, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants