-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fault tolerant data input #4546
Labels
Comments
Here's a crude example of implementing the proposal using existing building blocks.
|
There was another recent request for this functionality in brimdata/zui#2933. The user made a couple suggestions that might be worth considering in our design here:
|
Note the changes over the years with "partial loads" as captured in brimdata/zui#2660. |
This was referenced Feb 21, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
At the time this issue is being opened, Zed is at commit 58e7993.
We've had a few community issues that speak to a desire tor fault tolerant data input, e.g., when a parse error is encountered when reading one of Zed's supported formats, skip over the "bad data" and continue reading more "good" data, when possible.
Indeed, we can see how this could be quite handy in some use cases, such as a user with a large amount of data that has such a parse error deep inside it somewhere. Their goal might be to find a "needle in a haystack" such that if a quick search on the non-corrupt parts reveals what they're looking for, they're done, so in that case having to pause and make the data 100% clean just to read it in and start searching is a hindrance. A quick survey of other tools does show that many CSV/JSON readers do indeed often have options to skip over parsing errors, so Zed could offer a similar option for readers where it makes sense.
In a group discussion a novel approach was proposed where Zed could turn the "bad data" into
error
values that include the bad data itself and a timestamp. As an alternative to just dropping the data as many other tools might do, this would give the user an easier way to see what got skipped & why and perhaps still do crude searches against it or even clean it up and commit the data into new values.As suggested in a comment in #4514, once we have this functionality it would probably be helpful to surface them to clients like Zui in a way that draws the user's attention to the presence & count of the error when they happen. Follow-on issues to deal with that may be opened once this base functionality exists.
The text was updated successfully, but these errors were encountered: