Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[client-v2] Support Dynamic and JSON Data Types #2130

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

chernser
Copy link
Contributor

@chernser chernser commented Feb 4, 2025

Summary

This PR adds support for Variant, Dynamic and JSON for reading and writing. Here is details what is supported
All this is try only for RowBinary format. Native will need another implementation because it has another structure.

DataType POJO Binary Reader JDBC
Variant Read/Write Yes Yes
Dynamic Read/Write Yes Yes
JSON No Yes Yes/No

JSON Support

JSON in RowBinary is encoded as map of path-value (for example "a.b" = 1). Reader returns a flat map for this.
It is convenient when path is known and hardcoded. But inconvenient for parsing. It should be possible to get Map-of-Map structure. However this is most heavy-weight solution.
Writing JSON from POJO is not done in this PR. It can be done in two ways:

  • in-house solution - tedious work with reflection
  • plugin GSON/Jackson libraries to make Map-of-Map structure so can be easily transformed into a flat map.

PS: There are some test fixes.

Closes #1925
Closes #1833
Closes #1926

Checklist

Delete items not relevant to your PR:

Enum16(String.class, true, true, false, 2, 0, 0, 0, 0, false),
FixedString(String.class, true, true, false, 0, 0, 0, 0, 0, false, "BINARY"),
Int8(Byte.class, false, true, true, 1, 3, 0, 0, 0, false, "BYTE", "INT1", "INT1 SIGNED", "TINYINT",
Bool(Boolean.class, false, false, true, 1, 1, 0, 0, 0, false,0x2D, "BOOLEAN"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Int128(BigInteger.class, false, true, true, 16, 39, 0, 0, 0, false, 0x0B),
UInt128(BigInteger.class, false, true, false, 16, 39, 0, 0, 0, false, 0x05),
Int256(BigInteger.class, false, true, true, 32, 77, 0, 0, 0, false, 0x0C),
UInt256(BigInteger.class, false, true, false, 32, 78, 0, 0, 0, false, 0x06),
Decimal(BigDecimal.class, true, false, true, 0, 76, 0, 0, 76, false, "DEC", "FIXED", "NUMERIC"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Virtual type - no bin tag

return map;
}

private static Set<Class<?>> setOf(Class<?>... args) {
return Collections.unmodifiableSet(new HashSet<>(Arrays.stream(args).collect(Collectors.toList())));
}

public static final byte INTERVAL_BIN_TAG = 0x22;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following tags are not actual types or require special processing.

@@ -258,6 +333,21 @@ private static Set<Class<?>> setOf(Class<?>... args) {

allAliases = Collections.unmodifiableSet(set);
name2type = Collections.unmodifiableMap(map);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating lookup maps for types and interval kinds

@@ -51,7 +51,7 @@ protected ClickHouseEnum(Collection<String> params) {
}
}

protected ClickHouseEnum(String[] names, int[] values) {
public ClickHouseEnum(String[] names, int[] values) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to protect the constructor of such simple class.

@@ -641,6 +658,38 @@ public synchronized <T> List<T> asList() {
}
}

public static class EnumValue extends Number {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to create the class to overcome problem:

  • when dynamic column then each enum value may have different set of columns
  • when enum column is requested as String we need string constant but normally it is stored in column definition but it will be dynamic.

@@ -956,4 +1005,81 @@ public byte[] allocate(int size) {
return new byte[size];
}
}

private static final Set<Byte> DECIMAL_TAGS = Collections.unmodifiableSet(new HashSet<>(Arrays.asList(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decimal has special handling because of arguments. This set just simplifies if statements.

ClickHouseDataType.Decimal256.getBinTag()
)));

private ClickHouseColumn readDynamicData() throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reads metadata for dynamic value

private static void serializeArrayData(OutputStream stream, Object value, ClickHouseColumn column) throws IOException {
private static final Map<Class<?>, ClickHouseColumn> PREDEFINED_TYPE_COLUMNS = getPredefinedTypeColumnsMap();

private static Map<Class<?>, ClickHouseColumn> getPredefinedTypeColumnsMap() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use Column object to properly serialize/deserialize values, so I've created predefined list to avoid creation all the time.
But it can be more lightweight - will redesign in the future.

if (val == null) {
writeNull(stream);
continue;
public static ClickHouseColumn valueToColumnForDynamicType(Object value) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to crate column definition for each value so serializer can use it like a normal column serialization.

@chernser chernser marked this pull request as ready for review February 13, 2025 01:11
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
41.7% Coverage on New Code (required ≥ 80%)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@@ -51,7 +51,7 @@ protected ClickHouseEnum(Collection<String> params) {
}
}

protected ClickHouseEnum(String[] names, int[] values) {
public ClickHouseEnum(String[] names, int[] values) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it be more efficient to use a map here i see a lot scan operations that we have

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current approach is more memory efficient. We can introduce map and measure performance.

@@ -207,6 +207,8 @@ public static boolean writeValuePreamble(OutputStream out, boolean defaultsSuppo
return false;//And we're done
} else if (dataType == ClickHouseDataType.Array) {//If the column is an array
SerializerUtils.writeNonNull(out);//Then we send nonNull
} else if (dataType == ClickHouseDataType.Dynamic) {
// do nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we can add an explanation why we are not doing here anything

case Enum8: {
byte enum8Val = (byte) readUnsignedByte();
String name = actualColumn.getEnumConstants().nameNullable(enum8Val);
return (T) new EnumValue(name == null ? "<unknown>" : name, enum8Val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"" maybe can define as const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"" will be a blank space and it will be hard to detect error. <unknown> will attract attention.

Copy link
Member

@mshustov mshustov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing JSON from POJO is not done in this PR. It can be done in two ways:

+1 to use Gson or jackson

JSON + JDBC

What is our plan for JSON type usage via JDBC?

@@ -268,7 +268,7 @@ jobs:
needs: compile
strategy:
matrix:
clickhouse: ["23.8", "24.3", "24.6", "latest"]
clickhouse: ["23.8", "24.3", "24.8", "latest"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 23.8 can be excluded as not supported https://github.com/ClickHouse/ClickHouse/blob/master/SECURITY.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about it. Will do.

@@ -638,8 +640,57 @@ public static Object[][] testAppCompressionDataProvider() {
};
}

@Test(groups = { "integration" }, enabled = true)
public void testPOJOWithDynamicType() throws Exception {
if (isVersionMatch("(,24.8]")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be a proper comparison with a semver version ( greater or equal 24.8)? as done below https://github.com/ClickHouse/clickhouse-java/pull/2130/files#diff-89c11757b9243105d9931d9b667e953b7beabe6ca45ab2f333e905bb7003f8a9R944

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expression does the same - it is from old tests and I just reuse it. We will unify such things later when we have one client and can have a utility method.

private Object field;
}

private void testDynamicWith(String withWhat, Object[] values, String[] expectedStrValues) throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need separate json type tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no - JSON uses the same code as Dynamic for values. Paths are just string additions to each value.
The rest of logic is tested in InsertTests/QueryTests thru POJO.

@chernser chernser requested a review from mzitnik February 13, 2025 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[client-v2] Variant Type Support [client-v2] Dynamic Data Type Support [client-v2] JSON Support
3 participants