Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary data is not encoded correct when using json envelope #111

Open
DEvil0000 opened this issue Jul 7, 2017 · 4 comments
Open

binary data is not encoded correct when using json envelope #111

DEvil0000 opened this issue Jul 7, 2017 · 4 comments

Comments

@DEvil0000
Copy link

DEvil0000 commented Jul 7, 2017

I am trying to read binary messages from kafka with kafkacat. The normal message output looks good but the json envelope output is broken. the message is not encoded correct.
It seams like it is trying to encode the message as unicode and then escape it. If there is a character which can not get encoded as unicode it seams that it keeps it as a 8 bit binary. So it is not propper unicode at the end - it is some mix of unicode and normal bytes. Decoding this is quite a mess.
How about a base64 option? this is quite default for binary data in json.

example:
a binary message of \xa3\x81\xf6\x80\x06\x04\x00\x02\x02\x01
gets encoded to \xa3\x81\xf6\x80\\u0006\\u0004\\u0000\\u0002\\u0002\\u0001

@Sabotaz
Copy link

Sabotaz commented Sep 20, 2018

we are facing the same problem, so I started working on this problem.
I'll try to add an option to put the payload in b64. I hope I'll be able to open a pull request soon :)

@mpallone
Copy link

This bug bit me too. Took me a few hours to eventually realize that the JSON-string encoding of binary Kafka events is not deserializable.

I would love it if the JSON envelope correctly encoded binary data!

Thanks for all of the work that goes into this tool.

@DEvil0000
Copy link
Author

looks like some progress was made but sadly not much maintainer interaction.
see #164 and #206

@DEvil0000
Copy link
Author

DEvil0000 commented Jan 16, 2023

I would love a fix (with flag so it does not break stuff) as well as a base64 output flag.
In the mean time I am happy to share my workaround with you (python3) in case you can not modify the version you are using.
Maybe not the most beautiful code but working fine.

compiledJsonSplitPattern = re.compile(b',"payload":"')

def decodeBuggyJson(jsonString):
    global compiledJsonSplitPattern
    jsonSplit = compiledJsonSplitPattern.split(jsonString)
    jsonMap = json.loads(jsonSplit[0] + b'}\n')
    jsonMap["payload"] = fixStr(jsonSplit[1][:-3])
    return jsonMap

def fixStr(data):
    unicodeConstructPos = -1
    unicodeBuffer = 0
    resultString = b""
    charEscapeLookup = {b"\\": b"\\", b"n": b"\n", b"t": b"\t", b"\'": b"\'", b"\"": b"\"", b"a": b"\a", b"b": b"\b", b"f": b"\f", b"r": b"\r", b"v": b"\v"}
    for charAsInt in data:
        char = int(charAsInt).to_bytes(1,"big")
        if unicodeConstructPos == -1 and char == b"\\":
            unicodeConstructPos += 1
        elif unicodeConstructPos == 0 and char == b"u":
            unicodeConstructPos += 1
        elif unicodeConstructPos == 0:
            unicodeConstructPos = -1
            resultString += charEscapeLookup[char]
        elif unicodeConstructPos >= 1:
            unicodeConstructPos += 1
            unicodeBuffer = unicodeBuffer << 4
            unicodeBuffer += int(char, 16)
        else:
            resultString += char

        if unicodeConstructPos >= 5:
            resultString += int(unicodeBuffer).to_bytes(1,"big")
            unicodeBuffer = 0
            unicodeConstructPos = -1
    return resultString

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants