-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 86: Preserve schema field order #87
Conversation
Signed-off-by: Shivesh Ranjan <[email protected]>
Signed-off-by: Shivesh Ranjan <[email protected]>
server/src/main/java/io/pravega/schemaregistry/storage/impl/SchemaStoreImpl.java
Show resolved
Hide resolved
server/src/main/java/io/pravega/schemaregistry/service/SchemaRegistryService.java
Show resolved
Hide resolved
server/src/main/java/io/pravega/schemaregistry/service/SchemaRegistryService.java
Show resolved
Hide resolved
On this comment in the PR description:
I expect the hashes to match rarely, including legitimate reasons, and so comparing binaries when they do match doesn't seem to induce a high cost, so I'd suggest to do it to be safe. |
But the hash is computed on normalized form and we are storing the schema in the original form. |
server/src/main/java/io/pravega/schemaregistry/service/SchemaRegistryService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/io/pravega/schemaregistry/service/SchemaRegistryService.java
Show resolved
Hide resolved
server/src/main/java/io/pravega/schemaregistry/service/SchemaRegistryService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/io/pravega/schemaregistry/storage/SchemaStore.java
Outdated
Show resolved
Hide resolved
server/src/main/java/io/pravega/schemaregistry/storage/impl/schemas/Schemas.java
Outdated
Show resolved
Hide resolved
@fpj
There are three apis where this is relevant:
So the original problem we were solving was - how do we compare and say two schemas are equal logically even if the user supplied schema binaries may be structured differently. To explain - for avro and json, where users supply schema strings, two schemas are considered equal even if the order of fields in the schema string was different or if the schema string contained extra whitespace chars. Prior to this PR, instead of parsing and comparing, we were normalizing the schema and storing the normalized form in its binary representation. that had the advantage that we only had to normalize the incoming user schema and then from that point on all operations and queries were on the normalized form. And then we did not need to parse the schema for comparing, we could just compare the normalized binaries. However, the requirement we are trying to incorporate is to store and return the schema as supplied by the user. While also ensuring that the comparison with a differently organized but logically identical schema should still treat them as identical. So what we are doing is we are storing the user supplied binary form as is. And the service layer passes a comarator function which can be used by the storage layer to compare the stored schema binary with requested schema binary by parsing both. |
Signed-off-by: Shivesh Ranjan <[email protected]>
Signed-off-by: Shivesh Ranjan <[email protected]>
* Issue 103: Class cast Exception Signed-off-by: Shivesh Ranjan <[email protected]> * Add README.md Signed-off-by: Shivesh Ranjan <[email protected]> * Update comment Signed-off-by: Shivesh Ranjan <[email protected]> * license Signed-off-by: Shivesh Ranjan <[email protected]> * PR comment Signed-off-by: Shivesh Ranjan <[email protected]>
Signed-off-by: Shivesh Ranjan <[email protected]>
Signed-off-by: Shivesh Ranjan <[email protected]>
Tested ECS Presto connector with this branch, and was successful. OK to merge. |
Signed-off-by: Shivesh Ranjan [email protected]
Change log description
Store the user supplied schema, although the comparison is performed with normalized form.
Purpose of the change
Fixes #86
What the code does
Stores the user supplied schema binary instead of the normalized form.
Since we check for schema's existence by comparing the sha 256 hash which has no practical chance of collision we will not compare the binaries if the hash matches.
This is because the lookup and comparison happens in the storage layer which does not have awareness of schema formats and parsers. Since the stored schema is non normalized, the business layer actually computes the normalized form and also computes the fingerprint and shares it with the storage layer.
How to verify it
Unit test added.