-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
binary data is not encoded correct when using json envelope #111
Comments
we are facing the same problem, so I started working on this problem. |
This bug bit me too. Took me a few hours to eventually realize that the JSON-string encoding of binary Kafka events is not deserializable. I would love it if the JSON envelope correctly encoded binary data! Thanks for all of the work that goes into this tool. |
I would love a fix (with flag so it does not break stuff) as well as a base64 output flag. compiledJsonSplitPattern = re.compile(b',"payload":"')
def decodeBuggyJson(jsonString):
global compiledJsonSplitPattern
jsonSplit = compiledJsonSplitPattern.split(jsonString)
jsonMap = json.loads(jsonSplit[0] + b'}\n')
jsonMap["payload"] = fixStr(jsonSplit[1][:-3])
return jsonMap
def fixStr(data):
unicodeConstructPos = -1
unicodeBuffer = 0
resultString = b""
charEscapeLookup = {b"\\": b"\\", b"n": b"\n", b"t": b"\t", b"\'": b"\'", b"\"": b"\"", b"a": b"\a", b"b": b"\b", b"f": b"\f", b"r": b"\r", b"v": b"\v"}
for charAsInt in data:
char = int(charAsInt).to_bytes(1,"big")
if unicodeConstructPos == -1 and char == b"\\":
unicodeConstructPos += 1
elif unicodeConstructPos == 0 and char == b"u":
unicodeConstructPos += 1
elif unicodeConstructPos == 0:
unicodeConstructPos = -1
resultString += charEscapeLookup[char]
elif unicodeConstructPos >= 1:
unicodeConstructPos += 1
unicodeBuffer = unicodeBuffer << 4
unicodeBuffer += int(char, 16)
else:
resultString += char
if unicodeConstructPos >= 5:
resultString += int(unicodeBuffer).to_bytes(1,"big")
unicodeBuffer = 0
unicodeConstructPos = -1
return resultString |
I am trying to read binary messages from kafka with kafkacat. The normal message output looks good but the json envelope output is broken. the message is not encoded correct.
It seams like it is trying to encode the message as unicode and then escape it. If there is a character which can not get encoded as unicode it seams that it keeps it as a 8 bit binary. So it is not propper unicode at the end - it is some mix of unicode and normal bytes. Decoding this is quite a mess.
How about a base64 option? this is quite default for binary data in json.
example:
a binary message of
\xa3\x81\xf6\x80\x06\x04\x00\x02\x02\x01
gets encoded to
\xa3\x81\xf6\x80\\u0006\\u0004\\u0000\\u0002\\u0002\\u0001
The text was updated successfully, but these errors were encountered: