-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: validate that either a ndjson or tsv file was given as input #569
Conversation
3bc4afc
to
8ee52bf
Compare
e0e2c94
to
747d59f
Compare
747d59f
to
6870236
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it possible that duckdb threw such a cryptic/mojibake error?
This looks a bit like out of bounds access, which isn't great.
Validating is good, but I wonder how the error could end up so mangled. Can you point me to the place where duckdb crashed/threw?
throw preprocessing::PreprocessingException(fmt::format( | ||
"Cannot specify both a ndjsonInputFilename ('{}') and metadataFilename('{}').", | ||
ndjson_input_filename.value().string(), | ||
metadata_file.value().string() | ||
)); | ||
} | ||
if (!ndjson_input_filename.has_value() && !metadata_file.has_value()) { | ||
throw preprocessing::PreprocessingException(fmt::format( | ||
"Neither a ndjsonInputFilename ('{}') nor a metadataFilename ('{}') was specified as " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you could list the various ways in which they can be passed, but not super important (CLI, env, config file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it after the above comment.
We have them described in the README. Although the part about default values and the configs therefore being optional is unfortunately no longer valid as of my refactoring here...
In the change log entry I suggest to:
|
If there is a bug in the CLI option part, might make sense to add some basic tests here (E2E, once you get to it :) ) this stuff should be super easy to test for regressions |
I see, and how come it produced mojibake? Because nullpointer access and C++ being unsafe language? I've only done stuff like that in Rust and there it would throw instead of producing gibberish. |
I also thought it would throw an exception (which would still puzzle the end-user) as is documented for the But it turns out the shorthand syntax is just undefined behaviour :( |
Probably this was the evil line?
|
resolves #564
Summary
We did not check that either a ndjson or tsv file is actually specified as the input.
The resulting duckdb error is not parsable for users. Instead we now return a proper error message, also hinting at the possible CLI arguments that can be given instead of amending the preprocessing_config
PR Checklist