-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protobuf string decoding in adapter google/googledatatransport-firelog-batchlog-protobuf
#71
Comments
To make debugging this easier, I created an MRE using https://www.protobufpal.com/. Schema: syntax = "proto3";
message Message {
string parsedIncorrectly = 5;
string parsedCorrectly = 7;
} Decoded message JSON: {
"parsedIncorrectly": "iPhone9,3",
"parsedCorrectly": "com.chargemap"
} Encoded message base64:
This exhibits the exact same problem. Without a schema, Cyberchef parses it as: {
"5": {
"13": 3687385302716868600
},
"7": "com.chargemap"
} With the schema, it is parsed correctly: {
"parsedIncorrectly": "iPhone9,3",
"parsedCorrectly": "com.chargemap"
} |
This is clearly some heuristic that is failing. While |
Unlike on the full protobuf from the request, Cyberchef's "Show Types" option does work for our MRE: {
"field #5: L-delim (e.g. string, message)": {
"field #13: 64-Bit (e.g. fixed64, double)": 3687385302716868600
},
"field #7: L-delim (e.g. string, message)": "com.chargemap"
} |
This is the relevant function: TrackHAR/src/common/protobuf.mjs Lines 590 to 609 in 36d2292
Both fields are length-delimited fields. For those, the code tries to recursively parse them as a Protobuf message. If that fails, it is interpreted as raw bytes/string. And the way the string |
In
google/googledatatransport-firelog-batchlog-protobuf
(which I'm writing as part of #52), I'm seeing some odd values:That is not us misinterpreting a field or otherwise specifying a wrong
dataPath
, it's a problem of our Protobuf decoder (which we took from Cyberchef).Here's an example of a request where that happens: https://data.tweasel.org/data/requests/informed-consent,101672
If we load that in Cyberchef, we get the same problem:
The fields I marked with an arrow should be
iPhone9,3
anden-GB
, respectively but they are instead interpreted as buffers.As far as I understand it, there is no definitive way to tell from the Protobuf wireformat (without a schema) whether a value is a string or buffer, so libraries apply heuristics, which can of course fail.
https://protogen.marcgravell.com/decode handles this case correctly/offers both interpretations:
I also quickly tried
rawprotoparse
(which is used by HTTPToolkit), but even withrawprotoparse(buffer, { stringMode: 'string' })
, that still interprets both fields as buffers:The text was updated successfully, but these errors were encountered: