Issue 86: Preserve schema field order #87

shiveshr · 2020-07-29T12:56:48Z

Signed-off-by: Shivesh Ranjan [email protected]

Change log description
Store the user supplied schema, although the comparison is performed with normalized form.

Purpose of the change
Fixes #86

What the code does
Stores the user supplied schema binary instead of the normalized form.
Since we check for schema's existence by comparing the sha 256 hash which has no practical chance of collision we will not compare the binaries if the hash matches.
This is because the lookup and comparison happens in the storage layer which does not have awareness of schema formats and parsers. Since the stored schema is non normalized, the business layer actually computes the normalized form and also computes the fingerprint and shares it with the storage layer.

How to verify it
Unit test added.

Signed-off-by: Shivesh Ranjan <[email protected]>

server/src/main/java/io/pravega/schemaregistry/storage/impl/SchemaStoreImpl.java

server/src/main/java/io/pravega/schemaregistry/service/SchemaRegistryService.java

fpj · 2020-07-31T10:02:09Z

On this comment in the PR description:

Since we check for schema's existence by comparing the sha 256 hash which has no practical chance of collision we will not compare the binaries if the hash matches.

I expect the hashes to match rarely, including legitimate reasons, and so comparing binaries when they do match doesn't seem to induce a high cost, so I'd suggest to do it to be safe.

shiveshr · 2020-07-31T15:23:01Z

I expect the hashes to match rarely, including legitimate reasons, and so comparing binaries when they do match doesn't seem to induce a high cost, so I'd suggest to do it to be safe.

But the hash is computed on normalized form and we are storing the schema in the original form.
So comparing binaries is insufficient.
And since storage layer doesnt know how to parse the schema but treats it as raw bytes, we can not compare them directly.
So as a fix what i have done is to is pass a comparator function to storage layer that can parse and normalize the binary and then compare if the hashes match

server/src/main/java/io/pravega/schemaregistry/service/SchemaRegistryService.java

server/src/main/java/io/pravega/schemaregistry/storage/SchemaStore.java

server/src/main/java/io/pravega/schemaregistry/storage/impl/schemas/Schemas.java

shiveshr · 2020-08-13T13:15:06Z

@fpj
I am not sure if the intent of what we are doing is clear so i will explain here:

we want to store schemas for a group such that if a schema is already added, we dont want to add it again in the group.
we want to store the schema in a global pool of schemas as well

There are three apis where this is relevant:

addSchema(schemaInfo)
getSchemaVersion(schemaInfo)
getSchemaReferences(schemaInfo)

So the original problem we were solving was - how do we compare and say two schemas are equal logically even if the user supplied schema binaries may be structured differently. To explain - for avro and json, where users supply schema strings, two schemas are considered equal even if the order of fields in the schema string was different or if the schema string contained extra whitespace chars.
So logically we want to parse schemas and compare the parsed forms.

Prior to this PR, instead of parsing and comparing, we were normalizing the schema and storing the normalized form in its binary representation. that had the advantage that we only had to normalize the incoming user schema and then from that point on all operations and queries were on the normalized form. And then we did not need to parse the schema for comparing, we could just compare the normalized binaries.

However, the requirement we are trying to incorporate is to store and return the schema as supplied by the user. While also ensuring that the comparison with a differently organized but logically identical schema should still treat them as identical.

So what we are doing is we are storing the user supplied binary form as is. And the service layer passes a comarator function which can be used by the storage layer to compare the stored schema binary with requested schema binary by parsing both.

Signed-off-by: Shivesh Ranjan <[email protected]>

* Issue 103: Class cast Exception Signed-off-by: Shivesh Ranjan <[email protected]> * Add README.md Signed-off-by: Shivesh Ranjan <[email protected]> * Update comment Signed-off-by: Shivesh Ranjan <[email protected]> * license Signed-off-by: Shivesh Ranjan <[email protected]> * PR comment Signed-off-by: Shivesh Ranjan <[email protected]>

Signed-off-by: Shivesh Ranjan <[email protected]>

… into issue86

Signed-off-by: Shivesh Ranjan <[email protected]>

… into issue86

chipmaurer · 2020-09-15T15:15:13Z

Tested ECS Presto connector with this branch, and was successful. OK to merge.

shiveshr added 2 commits July 29, 2020 04:44

Issue 86: store non normalized schema

7a8457e

Signed-off-by: Shivesh Ranjan <[email protected]>

Unit test for json string normalization

6c24e12

Signed-off-by: Shivesh Ranjan <[email protected]>

shiveshr requested review from ravisharda and fpj and removed request for ravisharda July 29, 2020 12:58

fpj changed the title ~~Issue 86: Store non normalized schema so that users can retrieve the schema in the form they stored it in~~ Issue 86: Preserve schema order Jul 30, 2020

fpj requested changes Jul 30, 2020

View reviewed changes

fpj reviewed Aug 12, 2020

View reviewed changes

Issue 103: Class cast Exception

eb61c9e

Signed-off-by: Shivesh Ranjan <[email protected]>

shiveshr changed the title ~~Issue 86: Preserve schema order~~ Issue 86: Preserve schema field order Sep 2, 2020

merge with master

8339e8f

Signed-off-by: Shivesh Ranjan <[email protected]>

shiveshr force-pushed the issue86 branch from d26043a to 8339e8f Compare September 3, 2020 07:00

shiveshr force-pushed the master branch from eb61c9e to 7597736 Compare September 4, 2020 09:08

shiveshr added 4 commits September 4, 2020 16:26

Merge branch 'master' into issue86

138ccad

Merge branch 'master' into issue86

24d7f49

Issue 10: Add README.md (pravega#106)

493c8b5

Signed-off-by: Shivesh Ranjan <[email protected]>

shiveshr force-pushed the master branch from 72a7d2b to 493c8b5 Compare September 11, 2020 18:27

shiveshr added 2 commits September 11, 2020 11:31

Merge branch 'master' into issue86

3600bff

Merge branch 'issue86' of https://github.com/shiveshr/schema-registry-1…

6e05342

… into issue86

shiveshr force-pushed the master branch from 493c8b5 to d61694c Compare September 11, 2020 18:33

shiveshr added 3 commits September 11, 2020 11:37

Merge branch 'master' into issue86

1132d67

Merge branch 'master' into issue86

6d67518

Signed-off-by: Shivesh Ranjan <[email protected]>

Merge branch 'issue86' of https://github.com/shiveshr/schema-registry-1…

f131e37

… into issue86

fpj approved these changes Sep 15, 2020

View reviewed changes

Merge branch 'master' into issue86

c64732f

fpj merged commit cea7bd1 into pravega:master Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 86: Preserve schema field order #87

Issue 86: Preserve schema field order #87

shiveshr commented Jul 29, 2020

fpj commented Jul 31, 2020

shiveshr commented Jul 31, 2020 •

edited

Loading

shiveshr commented Aug 13, 2020

chipmaurer commented Sep 15, 2020

Issue 86: Preserve schema field order #87

Issue 86: Preserve schema field order #87

Conversation

shiveshr commented Jul 29, 2020

fpj commented Jul 31, 2020

shiveshr commented Jul 31, 2020 • edited Loading

shiveshr commented Aug 13, 2020

chipmaurer commented Sep 15, 2020

shiveshr commented Jul 31, 2020 •

edited

Loading