-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider lazily deserializing non-scalar fields #713
Comments
We did this for the web version of parsing JSON. I have mixed feelings about that change. Deserializing involves some validation. How would that be handled? |
Sorry I wasn't clear in my original message, but this is about binary deserialization. For JSON it's a bit tricky because (1) we currently rely on |
I was checking C++ protobuf implementation unit tests and came across this: https://github.com/protocolbuffers/protobuf/blob/8d3a7327606b115dda871be54c8b8cb66d41ff90/src/google/protobuf/unittest.proto#L115-L116 It turns out some of the protobuf implementations support this already and they even added an option for it. |
If it was done as an explicit option I guess it would be good. I have considered this in different contexts before - and to me the biggest gripe is that if the encoded message is broken validation errors now can occur on field access, instead of on first |
The reference implementation apparently provides two options: one that eagerly validates, and one that lazily validates: https://github.com/protocolbuffers/protobuf/blob/8d3a7327606b115dda871be54c8b8cb66d41ff90/src/google/protobuf/descriptor.proto#L583-L619
Apparently these options are only supported for message-type fields, but I wonder if they're allowed for other fields? In principle, we can lazily parse any field that is encoded with a length prefix: maps, packed lists, messages, strings. Having these options on strings may be useful in Dart as protobuf strings are encoded in UTF-8 but Dart strings are UTF-16, so we could do the conversion lazily. |
Ah - so it is a per-field option. I would have thought it was something you decided on when parsing a message... |
I think deciding this when parsing might make sense, but I'm not sure what kind of API to provide for this to allow lazily deserializing e.g. field of a field of a field. One potential problem with the per-field option is when I have multiple apps that use the same proto definition, like a server app and a few client apps. It's possible that different applications want different fields to be lazy. I don't know if this is a common problem though. I guess in the worst case you could have a pre-processor to run on the proto definition to tweak |
A colleague asked about this and I think it can be a good idea, and I don't know if this was considered before.
Message, map, packed repeated, and string fields can be lazily deserialized as their encoding have a length prefix so we can efficiently skip over those fields and copy the encodings to deserialize later.
(Instead of copying encodings, we could store a view into the original buffer, though that would keep the entire buffer alive until all of the lazy fields (including those in nested fields) are deserialized.)
This can improve app startup as we delay some of the deserialization costs to field access time.
Since
_FieldSet
already holdsdynamic
elements it shouldn't be too much worse to add another value class.I wonder if this was considered before? @sigurdm @mraleph
The text was updated successfully, but these errors were encountered: