Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve catalog information synchronisation with GraphQL #507

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ThomasCAI-mlv
Copy link
Collaborator

@ThomasCAI-mlv ThomasCAI-mlv commented Jan 31, 2025

1. Context

  • Currently, we have the possibility to add tags and description to a topic and ns4kafka synchronizes these information with Confluent Cloud catalog information (tags & description). This synchronisation is performed with the Stream Catalog API (Stream Catalog documentation).

  • However, this API is not suited for the synchronisation because of the 500 topics limit per call, and the fact that there is no filter allowing to query only topics with a description (though it is possible to query only topics with at least one tag). These drawbacks lead to having to query all the cluster topics 500 per 500 in order to perform the synchronization, which is quite bad for performance.

2. Proposed implementation
This PR adds GraphQL API calls to query the topic lists with their tags and description. This API overcomes the two previous problems: it is possible to query the list of topics with a description, and query the list of topics with at least one tag, without pagination.

This API still has drawbacks:

  • it is only usable with the advanced Stream Governance Package on Confluent Cloud, which is the non-free package at 1$/hour: so we first try to use GraphQL API, and if it fails, we use the old Stream Catalog API to get the topics.
  • In order to query topics with at least one tag, all the existing tags need to be filled in the GraphQL query, so another API call to Stream Catalog shall be made to get the tags list.
  • I didn't find a way to query the list topics with a description OR with at least one tag. I had to split the call: so one GraphQL API call to get the topics with a description, another one to get the topics with at least one tag. If you want to challenge it, you can try the GraphQL API (doc: https://docs.confluent.io/cloud/current/stream-governance/graphql-apis.html#definition-kafka_topic)

3. Other

  • I added a sync-catalog property which must be set to true in order to allow the synchronization of catalog information in Confluent Cloud. It is set at false by default.

@ThomasCAI-mlv ThomasCAI-mlv added the enhancement This issue or pull request improves a feature label Jan 31, 2025
@ThomasCAI-mlv ThomasCAI-mlv self-assigned this Jan 31, 2025
@ThomasCAI-mlv ThomasCAI-mlv force-pushed the feat/graphql-tag-synchro branch from ec245b4 to fbb83e9 Compare February 7, 2025 09:52
@ThomasCAI-mlv ThomasCAI-mlv marked this pull request as ready for review February 7, 2025 10:11
@ThomasCAI-mlv ThomasCAI-mlv force-pushed the feat/graphql-tag-synchro branch from 25140ef to 944be8d Compare February 14, 2025 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue or pull request improves a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant