-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why require a fieldseparator after a null value? #2
Comments
I would get right of This can even simplify the spec by having only 2 special bytes:
|
No, that won't work because you won't be able to store an empty string (which is not the same as a null value). |
I agree with Rob's suggestion and would like to add one additional reason. I would like to interpret 0xFF as "end of value that is present", and 0xFE as "end of value that is not-present/disabled". The algorithm the becomes to "read bytes up to an end mark" in either case (it's just the current case is that all nulls are empty strings, but loops in my intended use case (or maybe it is a debugging use-case?) where values can be toggled on/off by changing one byte (changing 0xFF to 0xFE) while keeping the file otherwise unmodified. I understand it is probably outside of the original scope of RSV, but just thought it would be better to speak up now and potentially avoid a confusion-inducing RSV fork for our oddball case. |
I think having multiple separators complicates decoding and is a minor optimization for most use cases. The way it is, in some languages you can use split(string, 0xFD) or split(string, 0xFF) to split the rows and fields and use iterators to operate on the results... this is nice. |
A Also, without a |
Very true regarding streaming, though split could be set up to stream as well. I haven't examined very many examples in this repo, but the ones I have don't look to be set up for streaming. I'm not advocating removing the What I was getting at and didn't explain well is that adding decisions in the code adds complexity to the decoding processes. My understanding of the point of RSV was simplicity. I realize having multiple field separators which change a field to null or not does not add great complexity, but those sorts of decisions can make an elegant solution much less so quickly. That sort of functionality shouldn't be in RSV, add a filter up a layer. |
@dcoai I'm still not sure I understand. You're advocating keeping the field separator after a null value instead of what I proposed in this issue (getting rid of the field separator after a null value)? Because, yes, that would make sense when using I think also the (common) |
Yes, this is what I'm advocating and the main point I was trying to get at. I should also add, I have very little investment in RSV at this point, so my opinion doesn't/shouldn't carry much weight. It is just my opinion.
To briefly restate what I was getting at, hopefully more clearly: The way I see it, |
@Stenway |
Hi there, Split is definitely not ideal. And sometimes there isn't a builtin byte-array based split method in a programming language. But when it is available, it usually leads to less lines of code for the decoder. I've implemented versions in Python, Go and C# for testing purposes. The Python implementation is actually quite nice I think: https://github.com/Stenway/RSV-Challenge/blob/main/Python/rsv.py#L70 So sacrificing this might harm the adoption. Nevertheless I've added an Experiments folder to the repository, and added an implementation without the terminator byte for the null byte, for comparison: Cheers |
Why would you require a field separator (
0xFF
) after anull
value (0xFE
)? Why not have0xFE
signal both the null value and "end of field"? That would make reading RSV much easier because when reading a0xFE
you can immediately emit thenull
value and don't need any more shenanigans to wait for reading the0xFF
, looking back in the buffer if there's one and only one byte in it and it isnull
etc. That way you can make it 'streaming' much easier IMHO.The text was updated successfully, but these errors were encountered: