-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid SCHEMA messages are produced for deselected streams #212
Comments
@laurentS thanks for reporting this. The latest SDK was shipped with Also, I don't think this behavior is spilling into normal sync runs since PS: meltano/sdk#1698 might help |
Thanks @edgarrmondragon for these details.
|
I'm reopening this issue, as we are still seeing the bug with {"type": "SCHEMA", "stream": "traffic_clones", "schema": {"properties": {"repo": {"type": ["string", "null"]}, "org": {"type": ["string", "null"]}, "repo_id": {"type": ["integer", "null"]}, "timestamp": {"format": "date-time", "type": ["string", "null"]}, "count": {"type": ["integer", "null"]}, "uniques": {"type": ["integer", "null"]}}, "type": "object"}, "key_properties": ["repo", "org", "timestamp"], "bookmark_properties": ["timestamp"]} is generated with {"type": "SCHEMA", "stream": "traffic_clones", "schema": {"properties": {}, "type": "object"}, "key_properties": ["repo", "org", "timestamp"], "bookmark_properties": ["timestamp"]} The first I think the problem is that the test case added in #1698 does not cover the use case described here. |
In #193, a set of
traffic_*
streams were added to the tap, with a customisedmetadata
property, which deselects them if no catalog was passed as input to the tap.Unfortunately, when running the tap with
poetry run tap-github --config /tmp/tmpmt8fq0pn/tmp7896kkwh.json --test=schema
with this config (which does not seem to matter much, the main thing being the
test=schema
cli option):the tap issues invalid
SCHEMA
messages like:Specifically,
properties
is empty, so downstream targets cannot lookup thekey_properties
.The line that causes the problem is here https://github.com/MeltanoLabs/tap-github/pull/193/files#diff-06dc9c6115cbc069ce355913de0c101fedf6956d6f6b4873c5112434596934d3R2260
I have not dug into the details yet, but it looks like the schema production does not correctly take the selection metadata into account.
Pinging @edgarrmondragon as you suggested that code, and you might have a fix for it :)
I also think the sdk should not allow a tap to produce invalid records like this. Is there a way to test against it without causing too much overhead? Obviously, we could validate each record before sending it out, but that might be a bit heavy ;)
Interestingly there's a test for this
_test_replication_keys_in_schema
but it does not validate against the schema messages that are sent.The text was updated successfully, but these errors were encountered: