Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to backup a topic with partition and offset data using camel-minio-sink plugin #1561

Open
charleenklang opened this issue Sep 1, 2023 · 7 comments

Comments

@charleenklang
Copy link

charleenklang commented Sep 1, 2023

Hi,
I am trying to backup Kafka topics to an s3 bucket and restore the data back to a Kafka topic with the camel-minio-sink plugin version 3.20.6.

What I am observing is that the offset will be restored but all messages will be written to the same partition in the restored topic, even if they were in the original topic in different partitions.

For example:

If we consume from the original topic with kcat:

% Reached end of topic kafkatopic-sample [0] at offset 4
% Reached end of topic kafkatopic-sample [2] at offset 0
% Reached end of topic kafkatopic-sample [1] at offset 9

If we backup the data and restore it back to a new topic:

% Reached end of topic kafkatopic-restore-test [1] at offset 13
% Reached end of topic kafkatopic-restore-test [0] at offset 0
% Reached end of topic kafkatopic-restore-test [2] at offset 0

Is it possible to backup the partition and offset data as well?

@valdar
Copy link
Member

valdar commented Sep 1, 2023

If the partition was originally assigned from the message key, you need to restore the messages using the same key. That requires some extra work that might or not be possible doing ootb depending on the fact that is possible to easily extract the key from s3 saved data.

@charleenklang
Copy link
Author

Thanks for the answer :)
What happens with the messages that are not sent with the message key and are assigned based on round robin to a partition? In that case it will not be possible to restore the data to the original partition?

@valdar
Copy link
Member

valdar commented Sep 1, 2023

Well in theory you could save the partition of each message in s3 then reuse it to send it to a specific partition. Mind that I am not sure there are all the bits accessible already to achieve this. It sounds like a potentially interesting use case though.

@charleenklang
Copy link
Author

I agree that this is an interesting use case. Do you know if this is part of the road map?

@valdar
Copy link
Member

valdar commented Sep 3, 2023

Well there is a part that is specific to your data that is not easy to generalize: do you have partition and/or key in the data you save to s3?
Then we can start to think about how to restore it using the source connector. As I said something might be already in place that we can use for this purpose.

@charleenklang
Copy link
Author

I have partition and key in the data I want to safe to s3. But it seems to me that only the message (value) will be stored in the s3 objects. And restoring will result in publishing the messages to the same partition without any keys.

@charleenklang
Copy link
Author

For example, there is a message in topic-1 like this:

Key (1 bytes): test-key	
Value (4 bytes): test
Timestamp: 1693840308197	Partition: 1	Offset: 4

The object in the s3 bucket:

$ mc cat mybucket/myobject
test

After restoring the message to a new topic it looks like this:

Key (-1 bytes): 	
Value (4 bytes): test
Timestamp: 1693840947125	Partition: 2	Offset: 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants