Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Benchmarking for JSON type serialization, Object vs String #1474

Closed
SpencerTorres opened this issue Jan 17, 2025 · 1 comment · Fixed by #1490
Closed

Performance Benchmarking for JSON type serialization, Object vs String #1474

SpencerTorres opened this issue Jan 17, 2025 · 1 comment · Fixed by #1490
Assignees

Comments

@SpencerTorres
Copy link
Member

The JSON type implemented in #1455 has multiple ways of serializing JSON data. The original format is object based, with paths defined along with native column data. The other is simply a String payload for the server to parse as JSON. Objects can be defined via a key/value map, or by providing a struct.

We need to test and compare the performance of these methods for reading/writing data. In some cases it may be faster to call json.Marshal with string format than it would be to have a struct with object format. This also leads to less control over how the data types are inferred on the server.

@SpencerTorres
Copy link
Member Author

I've added some benchmarks for all JSON patterns (paths, structs, strings) in #1490

Benchmark Results TL;DR

Inserting JSON

  • If you're working with strings just insert the string directly, the server will convert it. This is the absolute fastest method.
  • If you need to dynamically assemble some paths in a key/value pattern, use the clickhouse.JSON type
  • If you already have a struct and just want to insert it as-is, consider using json.Marshal to convert it to a string first
  • If you have a struct but require some fields to have specific types, insert the struct directly. You may use clickhouse.Dynamic or the chType:"SomeType" struct tag to specify preferred types. You could also use the typed paths on the JSON type itself.

Reading JSON

  • If you're just trying to get some strings from the sever, use the JSON strings setting (NOTE: due to a server-side bug the setting doesn't work. Cast your JSON to a String first in your query)
  • If you're trying to read the some specific paths, consider focusing those in your query (SELECT obj.a.b.c)
  • If you need the whole object with some specific paths, use the clickhouse.JSON type
  • If you want to simply scan the data into your struct, then just scan into your struct. You may use clickhouse.Dynamic for fields where there are multiple types.

For more details see below, along with the examples in the clickhouse-go repository.

Breakdown of Results

Insertion

The fastest method of insertion depends on the source of your data in your application.

If your JSON data is already in string form, then using the string insertion will be the fastest:

BenchmarkJSONInsert/strings-32    33972870    150.6 ns/op    2098 B/op    2 allocs/op

If your data is NOT in already in a string, the fastest is going to be using the clickhouse.JSON type. This is effectively a KV map where the keys are a list of flattened paths such as my.nested.path and the values are any. By using clickhouse.Dynamic you can also specify a preferred type for these any values.

BenchmarkJSONInsert/paths-32    5987515    982.7 ns/op    2408 B/op     11 allocs/op

If your data is in a struct, then you can either json.Marshal it to a string, or you can let the driver walk the struct recursively. The latter may be preferred for when you don't want to use server-side type inference. Using a raw struct lets you specify the type explicitly.

Based on the test struct, the json.Marshal appears to be faster, but does indeed lack the type tags. The performance between these options will vary depending on the struct, especially if it contains slices or maps.

BenchmarkJSONInsert/structs-32    1620612    3741 ns/op    5437 B/op    55 allocs/op
BenchmarkJSONInsert/marshal_strings-32    2864232    2074 ns/op    3473 B/op    28 allocs/op

Reading / Scanning rows

There is a server-side bug preventing the client from receiving JSON strings from the server, so we can't test the performance for those. If your end goal is to pass the JSON data along in string form, we can assume that this would be the fastest. Note that this isn't impossible, you can still cast your JSON column to a String column and use json.Unmarshal.

If you wish to read the JSON object by paths, then you should use the clickhouse.JSON type. This will preserve the underlying ClickHouse type name as well as the Go type in an interface{} for each path.

BenchmarkJSONRead/paths-32    1385293    4327 ns/op    5411 B/op    113 allocs/op

If you're trying to read the entire JSON into a struct, then you should simply scan into your struct. You can still use clickhouse.Dynamic to handle fields that may contain multiple types.

BenchmarkJSONRead/structs-32    971325    6202 ns/op    4697 B/op    165 allocs/op

But this may not always be the fastest for reading structs. You could consider instead receiving strings from the server and then using json.Unmarshal to load data into your struct. See the next section.

Converting back to a string

If your JSON data is in clickhouse.JSON or your custom struct, you may want to convert it back to a string, which is what the next set of tests are:

BenchmarkJSONMarshal/paths_direct-32 	  826999 	  7351 ns/op 	  4949 B/op 	  86 allocs/op
BenchmarkJSONMarshal/paths-32 	  705961 	  8643 ns/op 	  5333 B/op 	  87 allocs/op
BenchmarkJSONMarshal/structs-32 	  2479334 	  2389 ns/op 	  1340 B/op 	  26 allocs/op

paths_direct is calling (*clickhouse.JSON).MarshalJSON while paths is calling json.Marshal(clickhouseJSON). structs is simply calling json.Marshal on a struct.

Notice that the fastest is by simply using json.Marshal on your custom struct. This should influence your method for reading from the server. There's tradeoffs depending on the flow of data in your applications. While it is more performant to scan into clickhouse.JSON, it may not be the best approach if your end goal is to turn that data back into a string.

Conclusion

These tests aren't perfect, but it's a good overview for how things perform now and how you should consider writing, reading, and handling JSON data within your Go application.

Raw output for the benchmarks has been pasted below:

Raw Output
goos: linux
goarch: amd64
pkg: github.com/ClickHouse/clickhouse-go/v2/benchmark
cpu: AMD Ryzen 9 7950X 16-Core Processor            
BenchmarkJSONInsert
BenchmarkJSONInsert/paths
BenchmarkJSONInsert/paths-32         	 5987515	       982.7 ns/op	    2408 B/op	      11 allocs/op
BenchmarkJSONInsert/structs
BenchmarkJSONInsert/structs-32       	 1620612	      3741 ns/op	    5437 B/op	      55 allocs/op
BenchmarkJSONInsert/marshal_strings
BenchmarkJSONInsert/marshal_strings-32         	 2864232	      2074 ns/op	    3473 B/op	      28 allocs/op
BenchmarkJSONInsert/strings
BenchmarkJSONInsert/strings-32                 	33972870	       150.6 ns/op	    2098 B/op	       2 allocs/op
BenchmarkJSONRead
BenchmarkJSONRead/paths
BenchmarkJSONRead/paths-32                     	 1385293	      4327 ns/op	    5411 B/op	     113 allocs/op
BenchmarkJSONRead/structs
BenchmarkJSONRead/structs-32                   	  971325	      6202 ns/op	    4697 B/op	     165 allocs/op
BenchmarkJSONRead/marshal_strings
    json_test.go:327: cannot receive JSON strings
--- SKIP: BenchmarkJSONRead/marshal_strings
BenchmarkJSONRead/strings
    json_test.go:331: cannot receive JSON strings
--- SKIP: BenchmarkJSONRead/strings
BenchmarkJSONMarshal
BenchmarkJSONMarshal/paths_direct
BenchmarkJSONMarshal/paths_direct-32           	  826999	      7351 ns/op	    4949 B/op	      86 allocs/op
BenchmarkJSONMarshal/paths
BenchmarkJSONMarshal/paths-32                  	  705961	      8643 ns/op	    5333 B/op	      87 allocs/op
BenchmarkJSONMarshal/structs
BenchmarkJSONMarshal/structs-32                	 2479334	      2389 ns/op	    1340 B/op	      26 allocs/op

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant