-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Geo Replication lost messages or frequently fails due to Deduplication is not appropriate for Geo-Replication #23697
Open
poorbarcode
wants to merge
20
commits into
apache:master
Choose a base branch
from
poorbarcode:fix/repl_sequence_id
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
poorbarcode
changed the title
[draft][fix][broker] GEO replication fails due to deduplication is not aprropriate for Geo-Replication
[draft][fix][broker] Geo Replication fails due to Deduplication is not aprropriate for Geo-Replication
Dec 9, 2024
poorbarcode
commented
Dec 9, 2024
...roker/src/main/java/org/apache/pulsar/broker/service/persistent/GeoPersistentReplicator.java
Outdated
Show resolved
Hide resolved
poorbarcode
commented
Dec 9, 2024
pulsar-client/src/main/java/org/apache/pulsar/client/impl/GeoReplicationProducerImpl.java
Show resolved
Hide resolved
poorbarcode
commented
Dec 12, 2024
pulsar-client/src/main/java/org/apache/pulsar/client/impl/GeoReplicationProducerImpl.java
Outdated
Show resolved
Hide resolved
poorbarcode
force-pushed
the
fix/repl_sequence_id
branch
3 times, most recently
from
December 17, 2024 11:56
b9ae34a
to
0d2e235
Compare
poorbarcode
changed the title
[draft][fix][broker] Geo Replication fails due to Deduplication is not aprropriate for Geo-Replication
[fix][broker] Geo Replication fails due to Deduplication is not aprropriate for Geo-Replication
Dec 17, 2024
poorbarcode
changed the title
[fix][broker] Geo Replication fails due to Deduplication is not aprropriate for Geo-Replication
[fix][broker] Geo Replication lost messages or frequently fails due to Deduplication is not aprropriate for Geo-Replication
Dec 17, 2024
/pulsarbot rerun-failure-checks |
poorbarcode
changed the title
[fix][broker] Geo Replication lost messages or frequently fails due to Deduplication is not aprropriate for Geo-Replication
[fix] [broker] Geo Replication lost messages or frequently fails due to Deduplication is not aprropriate for Geo-Replication
Dec 18, 2024
poorbarcode
force-pushed
the
fix/repl_sequence_id
branch
from
December 20, 2024 04:37
5a2f2c7
to
cc44dc3
Compare
@poorbarcode It's a good idea to just use the ledger ID and entry ID for the message deduplication. In this case, we can also remove the deduplication state after the ledger get fully replicated. For example:
|
BewareMyPower
previously requested changes
Dec 24, 2024
lhotari
added
release/3.0.10
release/4.0.3
release/3.3.5
and removed
release/3.0.9
release/4.0.2
release/3.3.4
labels
Jan 3, 2025
lhotari
changed the title
[fix] [broker] Geo Replication lost messages or frequently fails due to Deduplication is not aprropriate for Geo-Replication
[fix][broker] Geo Replication lost messages or frequently fails due to Deduplication is not aprropriate for Geo-Replication
Jan 7, 2025
lhotari
changed the title
[fix][broker] Geo Replication lost messages or frequently fails due to Deduplication is not aprropriate for Geo-Replication
[fix][broker] Geo Replication lost messages or frequently fails due to Deduplication is not appropriate for Geo-Replication
Jan 7, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
doc-not-needed
Your PR changes do not impact docs
ready-to-test
release/3.0.10
release/3.3.5
release/4.0.3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Background
How does deduplication work?
{pendingMessages}
-1:-1
if the sequence ID published is lower than the previous messages.{pendingMessages}
is larger than the one that was rejected.{next} > {rejected}
: ignore the error, and continue work{next} < {rejected}
: close channels and reconnect.Conditions that issue happened
{pendingMessages}
withmessage.sequenceId
but ignoresmessage.original-producer-name
, which may cause the sequence-ids in{pendingMessages}
is not increasing-1:-1
send response will fail.Issue-1: loss messages
seq: 0
), M2(seq: 1
)seq: 0
), M4(seq: 1
){pendingMessages}
:[0,1]
{pendingMessages}
:[0,1,0,1]
seq 0, position 0:0
seq 1, position 0:1
seq 0, position -1: -1
seq 1, position -1:-1
{pendingMessages}
:[empty]
0
now).[M1, M2, M1, M2]
[M1, M2]
You can reproduce the issue by the test
testDeduplicationNotLostMessage
Issue-2: frequently fails
3:0
with sequence-id10
3:1
with sequence-id1
3:2
with sequence-id2
-0 Replicator copies messages
{pendingMessages}
:[10,1, 2]
3:0
successfully{pendingMessages}
:[1,2]
3:0
(a duplicated publishing)-1:-1
(new position relates to the latest publishing) for the latest send-response.failed-sequenced:10 > pendingMessages[0].sequenceId: 1
No test for reproducing this issue.
Modifications
Solution: replicators use a specified sequence ID(
ledegrId:entryId
of the original topic) instead of using the original producers’3:0
with sequence-id10
3:1
with sequence-id1
3:2
with sequence-id2
3:0
){pendingMessages}
:3:0, 3:1, 3:2]
3:0
successfully{pendingMessages}
:[3:1, 3:2]
3:0
(a duplicated publishing)-1:-1
(new position relates to the latest publishing) for the latest send-response.failed-sequenced(3:0) < pendingMessages[0].sequenceId(3:2)
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: x