-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[client-v2] Support Dynamic and JSON Data Types #2130
base: main
Are you sure you want to change the base?
Conversation
Enum16(String.class, true, true, false, 2, 0, 0, 0, 0, false), | ||
FixedString(String.class, true, true, false, 0, 0, 0, 0, 0, false, "BINARY"), | ||
Int8(Byte.class, false, true, true, 1, 3, 0, 0, 0, false, "BYTE", "INT1", "INT1 SIGNED", "TINYINT", | ||
Bool(Boolean.class, false, false, true, 1, 1, 0, 0, 0, false,0x2D, "BOOLEAN"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added bin tag according to the https://clickhouse.com/docs/en/sql-reference/data-types/data-types-binary-encoding
Int128(BigInteger.class, false, true, true, 16, 39, 0, 0, 0, false, 0x0B), | ||
UInt128(BigInteger.class, false, true, false, 16, 39, 0, 0, 0, false, 0x05), | ||
Int256(BigInteger.class, false, true, true, 32, 77, 0, 0, 0, false, 0x0C), | ||
UInt256(BigInteger.class, false, true, false, 32, 78, 0, 0, 0, false, 0x06), | ||
Decimal(BigDecimal.class, true, false, true, 0, 76, 0, 0, 76, false, "DEC", "FIXED", "NUMERIC"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Virtual type - no bin tag
return map; | ||
} | ||
|
||
private static Set<Class<?>> setOf(Class<?>... args) { | ||
return Collections.unmodifiableSet(new HashSet<>(Arrays.stream(args).collect(Collectors.toList()))); | ||
} | ||
|
||
public static final byte INTERVAL_BIN_TAG = 0x22; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
following tags are not actual types or require special processing.
@@ -258,6 +333,21 @@ private static Set<Class<?>> setOf(Class<?>... args) { | |||
|
|||
allAliases = Collections.unmodifiableSet(set); | |||
name2type = Collections.unmodifiableMap(map); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating lookup maps for types and interval kinds
@@ -51,7 +51,7 @@ protected ClickHouseEnum(Collection<String> params) { | |||
} | |||
} | |||
|
|||
protected ClickHouseEnum(String[] names, int[] values) { | |||
public ClickHouseEnum(String[] names, int[] values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to protect the constructor of such simple class.
@@ -641,6 +658,38 @@ public synchronized <T> List<T> asList() { | |||
} | |||
} | |||
|
|||
public static class EnumValue extends Number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to create the class to overcome problem:
- when dynamic column then each enum value may have different set of columns
- when enum column is requested as String we need string constant but normally it is stored in column definition but it will be dynamic.
@@ -956,4 +1005,81 @@ public byte[] allocate(int size) { | |||
return new byte[size]; | |||
} | |||
} | |||
|
|||
private static final Set<Byte> DECIMAL_TAGS = Collections.unmodifiableSet(new HashSet<>(Arrays.asList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decimal has special handling because of arguments. This set just simplifies if statements.
ClickHouseDataType.Decimal256.getBinTag() | ||
))); | ||
|
||
private ClickHouseColumn readDynamicData() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reads metadata for dynamic value
private static void serializeArrayData(OutputStream stream, Object value, ClickHouseColumn column) throws IOException { | ||
private static final Map<Class<?>, ClickHouseColumn> PREDEFINED_TYPE_COLUMNS = getPredefinedTypeColumnsMap(); | ||
|
||
private static Map<Class<?>, ClickHouseColumn> getPredefinedTypeColumnsMap() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use Column object to properly serialize/deserialize values, so I've created predefined list to avoid creation all the time.
But it can be more lightweight - will redesign in the future.
if (val == null) { | ||
writeNull(stream); | ||
continue; | ||
public static ClickHouseColumn valueToColumnForDynamicType(Object value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to crate column definition for each value so serializer can use it like a normal column serialization.
|
@@ -51,7 +51,7 @@ protected ClickHouseEnum(Collection<String> params) { | |||
} | |||
} | |||
|
|||
protected ClickHouseEnum(String[] names, int[] values) { | |||
public ClickHouseEnum(String[] names, int[] values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it be more efficient to use a map here i see a lot scan operations that we have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current approach is more memory efficient. We can introduce map and measure performance.
@@ -207,6 +207,8 @@ public static boolean writeValuePreamble(OutputStream out, boolean defaultsSuppo | |||
return false;//And we're done | |||
} else if (dataType == ClickHouseDataType.Array) {//If the column is an array | |||
SerializerUtils.writeNonNull(out);//Then we send nonNull | |||
} else if (dataType == ClickHouseDataType.Dynamic) { | |||
// do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we can add an explanation why we are not doing here anything
case Enum8: { | ||
byte enum8Val = (byte) readUnsignedByte(); | ||
String name = actualColumn.getEnumConstants().nameNullable(enum8Val); | ||
return (T) new EnumValue(name == null ? "<unknown>" : name, enum8Val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"" maybe can define as const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""
will be a blank space and it will be hard to detect error. <unknown>
will attract attention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing JSON from POJO is not done in this PR. It can be done in two ways:
+1 to use Gson or jackson
JSON + JDBC
What is our plan for JSON type usage via JDBC?
@@ -268,7 +268,7 @@ jobs: | |||
needs: compile | |||
strategy: | |||
matrix: | |||
clickhouse: ["23.8", "24.3", "24.6", "latest"] | |||
clickhouse: ["23.8", "24.3", "24.8", "latest"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 23.8
can be excluded as not supported https://github.com/ClickHouse/ClickHouse/blob/master/SECURITY.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about it. Will do.
@@ -638,8 +640,57 @@ public static Object[][] testAppCompressionDataProvider() { | |||
}; | |||
} | |||
|
|||
@Test(groups = { "integration" }, enabled = true) | |||
public void testPOJOWithDynamicType() throws Exception { | |||
if (isVersionMatch("(,24.8]")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be a proper comparison with a semver version ( greater or equal 24.8
)? as done below https://github.com/ClickHouse/clickhouse-java/pull/2130/files#diff-89c11757b9243105d9931d9b667e953b7beabe6ca45ab2f333e905bb7003f8a9R944
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expression does the same - it is from old tests and I just reuse it. We will unify such things later when we have one client and can have a utility method.
private Object field; | ||
} | ||
|
||
private void testDynamicWith(String withWhat, Object[] values, String[] expectedStrValues) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need separate json type tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no - JSON uses the same code as Dynamic for values. Paths are just string additions to each value.
The rest of logic is tested in InsertTests/QueryTests thru POJO.
Summary
This PR adds support for Variant, Dynamic and JSON for reading and writing. Here is details what is supported
All this is try only for
RowBinary
format.Native
will need another implementation because it has another structure.JSON Support
JSON in RowBinary is encoded as map of path-value (for example "a.b" = 1). Reader returns a flat map for this.
It is convenient when path is known and hardcoded. But inconvenient for parsing. It should be possible to get Map-of-Map structure. However this is most heavy-weight solution.
Writing JSON from POJO is not done in this PR. It can be done in two ways:
PS: There are some test fixes.
Closes #1925
Closes #1833
Closes #1926
Checklist
Delete items not relevant to your PR: