Implement new format API for JSON output #385

Neverlord · 2023-12-17T13:58:04Z

As discussed in #360: we want to offer a v2 endpoint for WebSocket clients. The new endpoint will render timestamps in UTC. This PR is prep work that re-implements the current encoding for WebSocket client in the namespace broker::format::json::v1 and swaps out the rendering logic in json_client to use the new encoding logic. All of this only touches internals and should not affect clients.

Adding a v2 endpoint is then going to be a followup.

Also contains a bugfix for include/broker/format/txt.hh where very large floating point numbers would miss the last character. We use the txt format only for internal logging, so I hope it's OK if I leave the commit here. I've noticed it only because I've used the txt implementation as a template.

@ckreibich I've split the work into multiple commits with a high-level description for each chunk of work. I hope this makes it easier to follow along when reviewing.

When calling `snprintf`, we need to pass the buffer size + 1 to account for the null terminator. Moreover, we should always respect the return value of `snprintf`, because this is the number of bytes actually being written.

The new header `broker/format/json.hh` implements the `encode`-API for `broker::variant`, akin to `broker/format/txt.hh`. The generated JSON format is tailored to the WebSocket interface. The format itself is documented in `web-socket.rst`.

A `data_message` embeds a `data` value. According to the spec, the JSON output contains the `@data-type` and `data` members of the embedded `data` value at the top level, alongside the `topic` and (message) `type` members. Hence, we need to extend the existing `encode` functions with a (template) parameter that configures whether or not to print the enclosing curly braces. The default behavior remains unchanged, we only omit the curly braces from the `encode` overload for `data_message` in order to "merge" to output of the nested `encode` call into the output.

Replace the `caf::json_writer` member with a simple character buffer. Instead of using the writer to generate JSON output, we call into the new `format::json::v1` API. Further, add an `append_field` function to the format API to allow us to render the simple JSON objects for ACK and error messages directly (instead of going through a map that we then convert to JSON).

We skip over the first character in the buffer that we pass to `strftime`. Hence, we also need subtract one from the total size that `strftime` is allowed to write to.

In order to achieve a deterministic output for a `timestamp`, we need to calculate the value dynamically since the exact value depends on the local timezone.

The new JSON rendering is more compact than previously and omits any unnecessary whitespace.

awelzel · 2023-12-18T09:11:35Z

include/broker/format/json.hh

+  if constexpr (std::is_same_v<T, bool>) {
+    return append_encoded<Policy>("boolean", value ? "true"sv : "false"sv, out);
+  } else if constexpr (std::is_same_v<T, count>) {
+    // An integer can at most have 20 digits (UINT64_MAX).


Vanilla JavaScript cannot correctly parse numbers so large. Nothing to do, but that would probably be a topic if we had more browsers interact with Zeek.

This even made it into the RFC :-)

https://datatracker.ietf.org/doc/html/rfc7159#section-6

Note that when such software is used, numbers that are integers and
are in the range [-(253)+1, (253)-1] are interoperable in the
sense that implementations will agree exactly on their numeric
values.

Probably because JavaScript will store numbers as 64-bit floating point numbers? Probably not much we could do about that. Except sending integers as strings... 🤔

Yeah, I don't think we should worry too much. If someone actually wants to work with numbers that large, they'll need to use BigInt anyhow and there's json-bigint that can parse this properly.

Where I saw this first:

In newer versions of the API, all large integer values are represented as strings by default.

https://developer.twitter.com/en/docs/twitter-ids

ckreibich

LGTM, Dominik — my suggestions are all about comments or about adding comments to help readers understand what the formatting looks like. I'll leave that to you to handle as you see fit.

Apologies, once more, for the delay.

include/broker/format/json.hh

include/broker/format/txt.hh

Neverlord · 2024-01-28T16:47:55Z

@ckreibich thanks for the feedback! I've updated the comments with the missing examples / explanations. I hope I didn't miss something. 🙂

Neverlord added 4 commits December 17, 2023 10:36

Fix text encoding for large floating point numbers

fb7ac5c

When calling `snprintf`, we need to pass the buffer size + 1 to account for the null terminator. Moreover, we should always respect the return value of `snprintf`, because this is the number of bytes actually being written.

Implement JSON encoding for broker::data

46d9d64

The new header `broker/format/json.hh` implements the `encode`-API for `broker::variant`, akin to `broker/format/txt.hh`. The generated JSON format is tailored to the WebSocket interface. The format itself is documented in `web-socket.rst`.

Neverlord added the Type: Maintenance label Dec 17, 2023

Neverlord added 4 commits December 17, 2023 15:06

Fix formatting

5a1c185

Fix buffer size we pass to strftime in json.cc

ca464eb

We skip over the first character in the buffer that we pass to `strftime`. Hence, we also need subtract one from the total size that `strftime` is allowed to write to.

Calculate the timestamp value for JSON tests

d518fa5

In order to achieve a deterministic output for a `timestamp`, we need to calculate the value dynamically since the exact value depends on the local timezone.

Update baseline for the web-socket.encrypted btest

cb6ac57

The new JSON rendering is more compact than previously and omits any unnecessary whitespace.

Neverlord force-pushed the topic/neverlord/json-format branch 3 times, most recently from 7cc6d9a to ef16f51 Compare December 17, 2023 19:32

Fix build on MSVC

e75c36f

Neverlord force-pushed the topic/neverlord/json-format branch from ef16f51 to e75c36f Compare December 17, 2023 19:54

awelzel reviewed Dec 18, 2023

View reviewed changes

ckreibich approved these changes Jan 24, 2024

View reviewed changes

Integrate review feedback

427afd1

Neverlord added a commit that referenced this pull request Jan 30, 2024

Merge pull request #385

75e94df

Neverlord merged commit bc25aec into master Jan 30, 2024
22 checks passed

Neverlord deleted the topic/neverlord/json-format branch January 30, 2024 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new format API for JSON output #385

Implement new format API for JSON output #385

Neverlord commented Dec 17, 2023

awelzel Dec 18, 2023

Neverlord Dec 18, 2023

awelzel Dec 18, 2023

ckreibich left a comment

Neverlord commented Jan 28, 2024

Implement new format API for JSON output #385

Implement new format API for JSON output #385

Conversation

Neverlord commented Dec 17, 2023

awelzel Dec 18, 2023

Choose a reason for hiding this comment

Neverlord Dec 18, 2023

Choose a reason for hiding this comment

awelzel Dec 18, 2023

Choose a reason for hiding this comment

ckreibich left a comment

Choose a reason for hiding this comment

Neverlord commented Jan 28, 2024