Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protobuf string decoding in adapter google/googledatatransport-firelog-batchlog-protobuf #71

Open
baltpeter opened this issue May 2, 2024 · 4 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@baltpeter
Copy link
Member

In google/googledatatransport-firelog-batchlog-protobuf (which I'm writing as part of #52), I'm seeing some odd values:

image

That is not us misinterpreting a field or otherwise specifying a wrong dataPath, it's a problem of our Protobuf decoder (which we took from Cyberchef).

Here's an example of a request where that happens: https://data.tweasel.org/data/requests/informed-consent,101672

If we load that in Cyberchef, we get the same problem:

image

The fields I marked with an arrow should be iPhone9,3 and en-GB, respectively but they are instead interpreted as buffers.

As far as I understand it, there is no definitive way to tell from the Protobuf wireformat (without a schema) whether a value is a string or buffer, so libraries apply heuristics, which can of course fail.

https://protogen.marcgravell.com/decode handles this case correctly/offers both interpretations:

image

I also quickly tried rawprotoparse (which is used by HTTPToolkit), but even with rawprotoparse(buffer, { stringMode: 'string' }), that still interprets both fields as buffers:

image

@baltpeter baltpeter added the bug Something isn't working label May 2, 2024
@baltpeter baltpeter added the help wanted Extra attention is needed label Jun 27, 2024
@baltpeter
Copy link
Member Author

To make debugging this easier, I created an MRE using https://www.protobufpal.com/.


Schema:

syntax = "proto3";

message Message {
  string parsedIncorrectly = 5;
  string parsedCorrectly = 7;
}

Decoded message JSON:

{
  "parsedIncorrectly": "iPhone9,3",
  "parsedCorrectly": "com.chargemap"
}

Encoded message base64:

KglpUGhvbmU5LDM6DWNvbS5jaGFyZ2VtYXA=

This exhibits the exact same problem.

Without a schema, Cyberchef parses it as:

{
    "5": {
        "13": 3687385302716868600
    },
    "7": "com.chargemap"
}

With the schema, it is parsed correctly:

{
    "parsedIncorrectly": "iPhone9,3",
    "parsedCorrectly": "com.chargemap"
}

@baltpeter
Copy link
Member Author

This is clearly some heuristic that is failing. While iPhone9,3 is parsed incorrectly, bPhone9,3 (KgliUGhvbmU5LDM6DWNvbS5jaGFyZ2VtYXA=) is parsed correctly, for example.

@baltpeter
Copy link
Member Author

Unlike on the full protobuf from the request, Cyberchef's "Show Types" option does work for our MRE:

{
    "field #5: L-delim (e.g. string, message)": {
        "field #13: 64-Bit (e.g. fixed64, double)": 3687385302716868600
    },
    "field #7: L-delim (e.g. string, message)": "com.chargemap"
}

@baltpeter
Copy link
Member Author

This is the relevant function:

_lenDelim(fieldNum) {
// Read off the field length
const length = this._varInt();
const fieldBytes = this.data.slice(this.offset, this.offset + length);
let field;
try {
// Attempt to parse as a new Protobuf Object
const pbObject = new Protobuf(fieldBytes);
field = pbObject._parse();
// Set field types object
this.fieldTypes[fieldNum] = { ...this.fieldTypes[fieldNum], ...pbObject.fieldTypes };
} catch (err) {
// Otherwise treat as bytes
field = byteArrayToChars(fieldBytes);
}
// Move the offset and return the field
this.offset += length;
return field;
}

Both fields are length-delimited fields. For those, the code tries to recursively parse them as a Protobuf message. If that fails, it is interpreted as raw bytes/string.

And the way the string iPhone9,3 is encoded in a Protobuf just so happens to be a valid Protobuf message. :/ Really not sure whether there's anything we'll be able to do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant