Dependency verification #23

libretto · 2022-07-18T21:08:51Z

Added support for dependency verification of external dependencies, (including known dependencuies and built-in types).

Know dependencies are organized as an index of known data types (built-in, google, confluent).

fixed docstring

fiexed equation functions

change line feed

Co-authored-by: Jarkko Jaakola <[email protected]>

sujayinstaclustr · 2022-08-05T00:21:32Z

@libretto As discussed yesterday due to conflict in PR #1 from Aiven upstream main, you have all the changes from PR #1 in PR # 2 . Is that correct?

libretto · 2022-08-08T21:30:46Z

@libretto As discussed yesterday due to conflict in PR #1 from Aiven upstream main, you have all the changes from PR #1 in PR # 2 . Is that correct?

Yes

jjaakola-aiven

Compatibility API endpoint is missing support for references.

What is the need for the proto files of known types? Are all of those used?

I would like to see more comprehensive unit testing and integration testing. E.g. dependency verifier does not have tests. As a technical debt the protobuf parser does not have tests, the changes here are minor but cannot verify for regression.

Commit history needs a cleanup.

I also would appreciate that Instaclustr does a review of this PR.

jjaakola-aiven · 2022-08-31T11:42:30Z

karapace/protobuf/schema.py

@@ -159,3 +300,21 @@ def to_schema(self) -> str:

    def compare(self, other: "ProtobufSchema", result: CompareResult) -> CompareResult:
        self.proto_file_element.compare(other.proto_file_element, result)
+
+    def reslove_dependencies(self):


Suggested change

def reslove_dependencies(self):

def resolve_dependencies(self):

jjaakola-aiven · 2022-08-31T11:43:24Z

karapace/protobuf/schema.py

        if type(schema).__name__ != "str":
            raise IllegalArgumentException("Non str type of schema string")
        self.dirty = schema
        self.cache_string = ""
+        self.schema_reader = schema_reader


Keep the schema as a model of single schema, i.e. separate schema and schema reference validation.

jjaakola-aiven · 2022-08-31T11:47:31Z

karapace/schema_reader.py

@@ -344,6 +346,7 @@ def _handle_msg_schema(self, key: dict, value: Optional[dict]) -> None:
        schema_id = value["id"]
        schema_version = value["version"]
        schema_deleted = value.get("deleted", False)
+        schema_references = value.get("references", None)


I would use empty array here.
And add it to schema dictionary at lines 385-390. Then it can be expected to behave as a list always.

in SchemaRegistry if the schema has no references they do not store empty references list in kafka so None there is more logical then empty list.

jjaakola-aiven · 2022-08-31T11:48:16Z

karapace/schema_reader.py

+            if schema_version in subjects_schemas:
+                LOG.info("Updating entry subject: %r version: %r id: %r", schema_subject, schema_version, schema_id)
+            else:
+                LOG.info("Adding entry subject: %r version: %r id: %r", schema_subject, schema_version, schema_id)


Unnecessary move.

which move?

jjaakola-aiven · 2022-08-31T11:51:26Z

karapace/schema_reader.py

+                    ref_str = reference_key(ref["subject"], ref["version"])
+                    referents = self.referenced_by.get(ref_str, None)
+                    if referents:
+                        LOG.info("Adding entry subject referenced_by : %r", ref_str)


This log is identical to one in the else branch.

jjaakola-aiven · 2022-08-31T12:26:55Z

karapace/protobuf/schema.py

@@ -97,15 +105,148 @@ def option_element_string(option: OptionElement) -> str:
    return f"option {result};\n"


+class Dependency:


To own module.

jjaakola-aiven · 2022-08-31T12:27:26Z

karapace/protobuf/schema.py

+        # result: DependencyVerifierResult = self.verify_ciclyc_dependencies()
+        # if not result.result:
+        #    return result


Commented, not used?

jjaakola-aiven · 2022-08-31T12:27:46Z

karapace/protobuf/schema.py

+
+    # def verify_ciclyc_dependencies(self) -> DependencyVerifierResult:
+
+    # TODO: add recursion detection


Would be really nice to have.

jjaakola-aiven · 2022-08-31T12:38:15Z

karapace/schema_models.py


 import json

+if TYPE_CHECKING:
+    from karapace.schema_reader import KafkaSchemaReader


Schema models module should not be dependent on storage logic.

jjaakola-aiven · 2022-08-31T12:39:01Z

karapace/protobuf/schema.py

+                references = schema_data.get("references")
+                parsed_schema = ProtobufSchema(schema, references)
+                self.dependencies[name] = Dependency(name, subject, version, parsed_schema)
+        except Exception as e:


Very broad exception, what are the conditions and what exceptions are expected?

libretto · 2022-08-31T20:36:38Z

Compatibility API endpoint is missing support for references.

Compatibility is not supported yet but it is under development and must be released on the next PR.

What is the need for the proto files of known types? Are all of those used?

So actually we used them once in the stage of generation of known dependencies. So current KnownDependency class is based on these proto files. If in the future version of these files will be changed we can track it and apply changes to the KnownDependency class.

I would like to see more comprehensive unit testing and integration testing. E.g. dependency verifier does not have tests. As a technical debt the protobuf parser does not have tests, the changes here are minor but cannot verify for regression.

Yes, the dependency verifier has no unit test, I will add it. But the protobuf parser actually has unit tests and it is already in aiven/karapace master branch.

Commit history needs a cleanup.

I would cleanup history before merge with aiven git repository. I suppose we can rebase all commits to root there before merge.

I also would appreciate that Instaclustr does a review of this PR.

I will be reviewing from Instaclustr side as well.

jjaakola-aiven

I did a bit more of reviewing on this. Some comments.

When an InvalidReference exception is raised the error message returned could be more helpful. I get back Invalid PROTOBUF schema. Error: Provided schema is not valid, e.g. in case where the reference is not found.

This needs more testing of different layouts of schema, chaining of schemas, evolution of schemas (adding new field that references another schema) etc.

jjaakola-aiven · 2022-09-06T12:40:17Z

karapace/protobuf/schema.py

+                subject_data = self.schema_reader.subjects.get(subject)
+                schema_data = subject_data["schemas"][version]
+                schema = schema_data["schema"].schema_str
+                references = schema_data.get("references")


When chaining schemas I think this fails.
Consider:

S1, root no references S2, ref S1 S3, ref S2

it will not fails I just added the test

jjaakola-aiven · 2022-09-06T12:47:42Z

tests/integration/test_schema_protobuf.py

+    assert not any(x != y for x, y in zip(myjson, referents))
+
+    res = await registry_async_client.delete("subjects/customer/versions/1")
+    assert res.status_code == 404


Gut feeling is that 409 Conflict would be better status code, but I am not sure of CSR returned value. This needs to be looked that it would match CSR.

Seems that CSR returns status_code 422, error_code 42206 and message One or more references exist to the schema {magic=1,keytype=SCHEMA,subject=<SUBJECT>,version=<VER>}.

We also return this error code but with another message. l will change the message to the SR message

jjaakola-aiven · 2022-10-03T10:51:33Z

Please make a PR to https://github.com/aiven/karapace

libretto and others added 30 commits April 19, 2022 21:04

add basic support of references to POST/GET subjects

efb456d

fixups

843a1da

fixup

44f853a

fixup

3498962

fixup

9e9b16d

fixup

944cd39

fixup

956b68e

merge with master and fixup conflicts

d677bd5

debugging

764facf

referencedby/delete workarounds

c1c4c50

referencedby/delete workarounds

732a9ed

Add basic support of references with basic tests

1f33b7d

Merge branch 'master' into deps

ebe889f

removed reference for ujson

93cd241

added comma at the end

2e9a8c6

dependencies verification without known deps

516d280

fixup cyclic dependencies

cc1a691

Update schema_models.py

7f2cf01

fixed docstring

Update schema_models.py

0f2fab6

fiexed equation functions

Update schema_models.py

bbbb5a3

Update schema_reader.py

1d2b7c4

change line feed

Update karapace/schema_registry_apis.py

4634373

Co-authored-by: Jarkko Jaakola <[email protected]>

fixup style

4b99fc8

update code by PR review

efe29f4

add reference_key()

0854296

Merge branch 'master' into deps

8a1b129

fixed undefined variable missing due to merge

46146dd

removed extra line feeds

967c822

implementation of protobuf dependency verifications beta

73ff0f3

fixup

c38c952

libretto requested review from mrlika, a team and sujayinstaclustr July 26, 2022 09:10

sujayinstaclustr mentioned this pull request Jul 28, 2022

Handle schema references Aiven-Open/karapace#195

Closed

libretto added 3 commits July 29, 2022 15:23

improve PR1 code

ef05bd4

merge with deps

144c5ec

resolve conflicts with aiven/main

8e57eeb

libretto added 3 commits August 8, 2022 23:51

fixup merge issues

1d4be0b

merge with aiven/master

fad2c0e

lint

45c6c54

libretto added 2 commits August 18, 2022 18:38

Merge branch 'master' into branch pr2

b454e1a

Merge remote-tracking branch 'origin' into pr2

e3ace7d

jjaakola-aiven suggested changes Aug 31, 2022

View reviewed changes

move dependency classes into separate files

4bd7ff7

jjaakola-aiven suggested changes Sep 6, 2022

View reviewed changes

libretto added 6 commits September 7, 2022 17:48

change dependencies class

d8c54e1

fixup references issues

d1fa06e

fixup issues in code

f72f855

fixup lint

4d14ff0

add dependency verifier unit test

e1f39b9

add dependency verifier unit test

90ef33c

sujayinstaclustr approved these changes Oct 3, 2022

View reviewed changes

libretto added 4 commits October 4, 2022 17:20

merge with master

341d355

fixup test bug with protobuf

7bd1ab6

fixup bug2

d5fdf59

fixup bug2

37e8adb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dependency verification #23

Dependency verification #23

libretto commented Jul 18, 2022

sujayinstaclustr commented Aug 5, 2022 •

edited

Loading

libretto commented Aug 8, 2022

jjaakola-aiven left a comment

jjaakola-aiven Aug 31, 2022

jjaakola-aiven Aug 31, 2022

libretto Sep 18, 2022

jjaakola-aiven Aug 31, 2022

libretto Sep 13, 2022

jjaakola-aiven Aug 31, 2022

libretto Sep 13, 2022

jjaakola-aiven Aug 31, 2022

jjaakola-aiven Aug 31, 2022

libretto Sep 18, 2022

jjaakola-aiven Aug 31, 2022

jjaakola-aiven Aug 31, 2022

jjaakola-aiven Aug 31, 2022

libretto Sep 16, 2022 •

edited

Loading

jjaakola-aiven Aug 31, 2022

libretto Sep 18, 2022

libretto commented Aug 31, 2022 •

edited

Loading

jjaakola-aiven left a comment •

edited

Loading

jjaakola-aiven Sep 6, 2022

libretto Sep 20, 2022

jjaakola-aiven Sep 6, 2022

jjaakola-aiven Sep 7, 2022

libretto Sep 13, 2022

jjaakola-aiven commented Oct 3, 2022

	def reslove_dependencies(self):
	def resolve_dependencies(self):

		@@ -97,15 +105,148 @@ def option_element_string(option: OptionElement) -> str:
		return f"option {result};\n"


		class Dependency:


		# def verify_ciclyc_dependencies(self) -> DependencyVerifierResult:

		# TODO: add recursion detection

Dependency verification #23

Are you sure you want to change the base?

Dependency verification #23

Conversation

libretto commented Jul 18, 2022

sujayinstaclustr commented Aug 5, 2022 • edited Loading

libretto commented Aug 8, 2022

jjaakola-aiven left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

libretto Sep 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

libretto commented Aug 31, 2022 • edited Loading

jjaakola-aiven left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjaakola-aiven commented Oct 3, 2022

sujayinstaclustr commented Aug 5, 2022 •

edited

Loading

libretto Sep 16, 2022 •

edited

Loading

libretto commented Aug 31, 2022 •

edited

Loading

jjaakola-aiven left a comment •

edited

Loading