-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dependency verification #23
base: main
Are you sure you want to change the base?
Conversation
fixed docstring
fiexed equation functions
change line feed
Co-authored-by: Jarkko Jaakola <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compatibility API endpoint is missing support for references.
What is the need for the proto files of known types? Are all of those used?
I would like to see more comprehensive unit testing and integration testing. E.g. dependency verifier does not have tests. As a technical debt the protobuf parser does not have tests, the changes here are minor but cannot verify for regression.
Commit history needs a cleanup.
I also would appreciate that Instaclustr does a review of this PR.
karapace/protobuf/schema.py
Outdated
@@ -159,3 +300,21 @@ def to_schema(self) -> str: | |||
|
|||
def compare(self, other: "ProtobufSchema", result: CompareResult) -> CompareResult: | |||
self.proto_file_element.compare(other.proto_file_element, result) | |||
|
|||
def reslove_dependencies(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def reslove_dependencies(self): | |
def resolve_dependencies(self): |
karapace/protobuf/schema.py
Outdated
if type(schema).__name__ != "str": | ||
raise IllegalArgumentException("Non str type of schema string") | ||
self.dirty = schema | ||
self.cache_string = "" | ||
self.schema_reader = schema_reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the schema as a model of single schema, i.e. separate schema and schema reference validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -344,6 +346,7 @@ def _handle_msg_schema(self, key: dict, value: Optional[dict]) -> None: | |||
schema_id = value["id"] | |||
schema_version = value["version"] | |||
schema_deleted = value.get("deleted", False) | |||
schema_references = value.get("references", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use empty array here.
And add it to schema dictionary at lines 385-390. Then it can be expected to behave as a list always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in SchemaRegistry if the schema has no references they do not store empty references list in kafka so None there is more logical then empty list.
if schema_version in subjects_schemas: | ||
LOG.info("Updating entry subject: %r version: %r id: %r", schema_subject, schema_version, schema_id) | ||
else: | ||
LOG.info("Adding entry subject: %r version: %r id: %r", schema_subject, schema_version, schema_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which move?
karapace/schema_reader.py
Outdated
ref_str = reference_key(ref["subject"], ref["version"]) | ||
referents = self.referenced_by.get(ref_str, None) | ||
if referents: | ||
LOG.info("Adding entry subject referenced_by : %r", ref_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log is identical to one in the else branch.
karapace/protobuf/schema.py
Outdated
@@ -97,15 +105,148 @@ def option_element_string(option: OptionElement) -> str: | |||
return f"option {result};\n" | |||
|
|||
|
|||
class Dependency: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To own module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
karapace/protobuf/schema.py
Outdated
# result: DependencyVerifierResult = self.verify_ciclyc_dependencies() | ||
# if not result.result: | ||
# return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented, not used?
karapace/protobuf/schema.py
Outdated
|
||
# def verify_ciclyc_dependencies(self) -> DependencyVerifierResult: | ||
|
||
# TODO: add recursion detection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be really nice to have.
karapace/schema_models.py
Outdated
|
||
import json | ||
|
||
if TYPE_CHECKING: | ||
from karapace.schema_reader import KafkaSchemaReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema models module should not be dependent on storage logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
karapace/protobuf/schema.py
Outdated
references = schema_data.get("references") | ||
parsed_schema = ProtobufSchema(schema, references) | ||
self.dependencies[name] = Dependency(name, subject, version, parsed_schema) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very broad exception, what are the conditions and what exceptions are expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Compatibility is not supported yet but it is under development and must be released on the next PR.
So actually we used them once in the stage of generation of known dependencies. So current KnownDependency class is based on these proto files. If in the future version of these files will be changed we can track it and apply changes to the KnownDependency class.
Yes, the dependency verifier has no unit test, I will add it. But the protobuf parser actually has unit tests and it is already in aiven/karapace master branch.
I would cleanup history before merge with aiven git repository. I suppose we can rebase all commits to root there before merge.
I will be reviewing from Instaclustr side as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a bit more of reviewing on this. Some comments.
When an InvalidReference
exception is raised the error message returned could be more helpful. I get back Invalid PROTOBUF schema. Error: Provided schema is not valid
, e.g. in case where the reference is not found.
This needs more testing of different layouts of schema, chaining of schemas, evolution of schemas (adding new field that references another schema) etc.
karapace/protobuf/schema.py
Outdated
subject_data = self.schema_reader.subjects.get(subject) | ||
schema_data = subject_data["schemas"][version] | ||
schema = schema_data["schema"].schema_str | ||
references = schema_data.get("references") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When chaining schemas I think this fails.
Consider:
S1, root no references
S2, ref S1
S3, ref S2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will not fails I just added the test
assert not any(x != y for x, y in zip(myjson, referents)) | ||
|
||
res = await registry_async_client.delete("subjects/customer/versions/1") | ||
assert res.status_code == 404 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gut feeling is that 409 Conflict
would be better status code, but I am not sure of CSR returned value. This needs to be looked that it would match CSR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that CSR returns status_code 422
, error_code 42206
and message One or more references exist to the schema {magic=1,keytype=SCHEMA,subject=<SUBJECT>,version=<VER>}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also return this error code but with another message. l will change the message to the SR message
Please make a PR to https://github.com/aiven/karapace |
Added support for dependency verification of external dependencies, (including known dependencuies and built-in types).
Know dependencies are organized as an index of known data types (built-in, google, confluent).