-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Topic partition backlog grows after bookies restart when using ExtensibleLoadManagerImpl #23908
Comments
Thanks for reporting @szkoludasebastian . Would you be able to share also topic stats? It would be preferred to have this for a supported version (4.0.2) since that's what we are maintaining actively and there's also
It could be similar to Issue 23845 since I understood based on the above description that in your case you stop sending data and the backlog remains after that.
|
|
I will check that with |
@szkoludasebastian In |
@szkoludasebastian That means that the broker hasn't received the acknowledgements or lost them. It's indeed very useful information if you can consistently reproduce the issue with 3.3.2 and not with 3.3.1 . In that case, it should be possible to find the commit that introduced the regression. Since I don't have the reproducer, I cannot do this on behalf of you. For finding the commit in v3.3.1...v3.3.2 (release notes), one possible solution would be to do interactive git-bisecting (or just manual bisecting) where you first pick the commit in the middle of v3.3.1...v3.3.2 and build a Pulsar binary and test that. If it fails, you know that the regression was introduced before that commit. You keep splitting the commits until you find the commit that introduced the regression. In Pulsar, the docker images are built with the commands documented here: https://pulsar.apache.org/contribute/release-process/#release-pulsar-30-and-later (prerequisite is https://pulsar.apache.org/contribute/release-process/#build-release-artifacts). Would you be able to perform this task so that we'd find the commit that introduced the regression? |
Is the issue that you are facing similar to #22709? What's different this time? |
it appeared that it was problem on our side. |
@szkoludasebastian Just confirming: do you happen to use Pulsar transactions? |
Partition-4 stats-internal: {
"entriesAddedCounter" : 285,
"numberOfEntries" : 409,
"totalSize" : 4481595,
"currentLedgerEntries" : 285,
"currentLedgerSize" : 3521965,
"lastLedgerCreatedTimestamp" : "2025-01-31T09:46:39.629Z",
"waitingCursorsCount" : 1,
"pendingAddEntriesCount" : 0,
"lastConfirmedEntry" : "2044129:284",
"state" : "LedgerOpened",
"ledgers" : [ {
"ledgerId" : 2043391,
"entries" : 124,
"size" : 959630,
"offloaded" : false,
"underReplicated" : false
}, {
"ledgerId" : 2044129,
"entries" : 0,
"size" : 0,
"offloaded" : false,
"underReplicated" : false
} ],
"cursors" : {
"microbatcher" : {
"markDeletePosition" : "2043391:112",
"readPosition" : "2044129:285",
"waitingReadOp" : true,
"pendingReadOps" : 0,
"messagesConsumedCounter" : 273,
"cursorLedger" : 2044305,
"cursorLedgerLastEntry" : 44,
"individuallyDeletedMessages" : "[(2043391:122..2043391:123],(2044129:-1..2044129:282]]",
"lastLedgerSwitchTimestamp" : "2025-01-31T09:46:39.675Z",
"state" : "Open",
"active" : true,
"numberOfEntriesSinceFirstNotAckedMessage" : 297,
"totalNonContiguousDeletedMessagesRange" : 2,
"subscriptionHavePendingRead" : true,
"subscriptionHavePendingReplayRead" : false,
"properties" : { }
}
},
"schemaLedgers" : [ ],
"compactedLedger" : {
"ledgerId" : -1,
"entries" : -1,
"size" : -1,
"offloaded" : false,
"underReplicated" : false
}
} Topic partition-4 stats: {
"msgRateIn" : 0.016666664188055923,
"msgThroughputIn" : 16.849997494124537,
"msgRateOut" : 0.016666664151389267,
"msgThroughputOut" : 16.84999745705455,
"bytesInCounter" : 3522976,
"msgInCounter" : 832,
"systemTopicBytesInCounter" : 0,
"bytesOutCounter" : 3658127,
"msgOutCounter" : 863,
"bytesOutInternalCounter" : 0,
"averageMsgSize" : 1011.0,
"msgChunkPublished" : false,
"storageSize" : 4482606,
"backlogSize" : 3608212,
"backlogQuotaLimitSize" : -1,
"backlogQuotaLimitTime" : -1,
"oldestBacklogMessageAgeSeconds" : 956,
"oldestBacklogMessageSubscriptionName" : "microbatcher",
"publishRateLimitedTimes" : 0,
"earliestMsgPublishTimeInBacklogs" : 0,
"offloadedStorageSize" : 0,
"lastOffloadLedgerId" : 0,
"lastOffloadSuccessTimeStamp" : 0,
"lastOffloadFailureTimeStamp" : 0,
"ongoingTxnCount" : 0,
"abortedTxnCount" : 0,
"committedTxnCount" : 0,
"publishers" : [ {
"accessMode" : "Shared",
"msgRateIn" : 0.016666664188055923,
"msgThroughputIn" : 16.849997494124537,
"averageMsgSize" : 1011.0,
"chunkedMessageRate" : 0.0,
"producerId" : 5,
"supportsPartialProducer" : false,
"producerName" : "integ-pulsar-1162-18",
"address" : "/10.10.178.13:38552",
"connectedSince" : "2025-01-31T09:46:39.801991145Z",
"clientVersion" : "Pulsar-Java-v4.0.2",
"metadata" : { }
}, {
"accessMode" : "Shared",
"msgRateIn" : 0.0,
"msgThroughputIn" : 0.0,
"averageMsgSize" : 0.0,
"chunkedMessageRate" : 0.0,
"producerId" : 5,
"supportsPartialProducer" : false,
"producerName" : "integ-pulsar-1162-21",
"address" : "/10.10.178.13:38530",
"connectedSince" : "2025-01-31T09:46:39.802282863Z",
"clientVersion" : "Pulsar-Java-v4.0.2",
"metadata" : { }
} ],
"waitingPublishers" : 0,
"subscriptions" : {
"microbatcher" : {
"msgRateOut" : 0.016666664151389267,
"msgThroughputOut" : 16.84999745705455,
"bytesOutCounter" : 3658127,
"msgOutCounter" : 863,
"msgRateRedeliver" : 0.0,
"messageAckRate" : 0.049999992325001186,
"chunkedMessageRate" : 0.0,
"msgBacklog" : 10,
"backlogSize" : 3608212,
"earliestMsgPublishTimeInBacklog" : 0,
"msgBacklogNoDelayed" : 10,
"blockedSubscriptionOnUnackedMsgs" : false,
"msgDelayed" : 0,
"msgInReplay" : 0,
"unackedMessages" : 26,
"type" : "Key_Shared",
"msgRateExpired" : 0.0,
"totalMsgExpired" : 0,
"lastExpireTimestamp" : 0,
"lastConsumedFlowTimestamp" : 1738316912547,
"lastConsumedTimestamp" : 1738317704211,
"lastAckedTimestamp" : 1738317707396,
"lastMarkDeleteAdvancedTimestamp" : 0,
"consumers" : [ {
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"bytesOutCounter" : 0,
"msgOutCounter" : 0,
"msgRateRedeliver" : 0.0,
"messageAckRate" : 0.0,
"chunkedMessageRate" : 0.0,
"consumerName" : "microbatcher-consumer-27b750cf-9118-4585-9136-34ecb6a90ffe",
"availablePermits" : 500,
"unackedMessages" : 0,
"avgMessagesPerEntry" : 0,
"blockedConsumerOnUnackedMsgs" : false,
"drainingHashesCount" : 0,
"drainingHashesClearedTotal" : 0,
"drainingHashesUnackedMessages" : 0,
"drainingHashes" : [ ],
"address" : "/10.10.179.78:52838",
"connectedSince" : "2025-01-31T09:46:39.803306204Z",
"clientVersion" : "Pulsar-Java-v4.0.2",
"lastAckedTimestamp" : 0,
"lastConsumedTimestamp" : 0,
"lastConsumedFlowTimestamp" : 1738316799814,
"keyHashRangeArrays" : [ [ 1515, 2038 ], [ 3849, 4299 ], [ 4464, 4577 ], [ 5188, 5234 ], [ 5449, 6708 ], [ 7236, 7239 ], [ 8129, 8361 ], [ 8914, 8926 ], [ 9655, 9829 ], [ 9959, 10040 ], [ 10788, 11077 ], [ 11782, 11963 ], [ 13985, 14384 ], [ 15106, 15200 ], [ 15782, 16093 ], [ 16122, 16315 ], [ 16335, 16759 ], [ 17059, 17076 ], [ 18068, 18300 ], [ 18488, 18578 ], [ 19599, 19681 ], [ 20131, 20189 ], [ 20514, 21012 ], [ 21891, 22294 ], [ 22450, 22468 ], [ 22776, 22891 ], [ 23152, 23237 ], [ 24151, 24397 ], [ 24642, 25135 ], [ 25570, 25656 ], [ 26289, 26293 ], [ 26423, 26843 ], [ 27974, 28209 ], [ 28860, 29208 ], [ 29779, 29816 ], [ 30007, 30317 ], [ 31403, 31610 ], [ 32887, 33005 ], [ 33401, 33717 ], [ 33891, 34068 ], [ 34205, 34262 ], [ 35101, 35158 ], [ 36566, 37028 ], [ 37881, 38931 ], [ 39256, 39782 ], [ 40111, 40142 ], [ 40267, 40604 ], [ 40715, 41186 ], [ 41403, 41667 ], [ 43168, 43394 ], [ 43405, 44313 ], [ 44795, 44960 ], [ 46331, 46365 ], [ 46706, 47175 ], [ 47590, 48125 ], [ 48549, 48675 ], [ 49501, 49517 ], [ 49831, 50064 ], [ 50422, 50460 ], [ 52018, 52338 ], [ 52529, 52991 ], [ 53814, 53849 ], [ 54740, 54763 ], [ 55544, 55616 ], [ 55732, 56176 ], [ 56501, 56621 ], [ 56727, 56815 ], [ 57158, 57334 ], [ 57860, 58229 ], [ 58242, 58476 ], [ 59275, 59361 ], [ 59970, 60019 ], [ 61151, 61210 ], [ 62030, 62291 ], [ 62590, 62610 ], [ 63450, 63735 ], [ 64945, 65168 ] ],
"metadata" : { },
"lastAckedTime" : "1970-01-01T00:00:00Z",
"lastConsumedTime" : "1970-01-01T00:00:00Z"
}, {
"msgRateOut" : 0.016666664151389267,
"msgThroughputOut" : 16.84999745705455,
"bytesOutCounter" : 3649823,
"msgOutCounter" : 851,
"msgRateRedeliver" : 0.0,
"messageAckRate" : 0.016666664146667048,
"chunkedMessageRate" : 0.0,
"consumerName" : "microbatcher-consumer-e02c3d33-2ff6-4fcb-9fdc-3e8f3773ae36",
"availablePermits" : 399,
"unackedMessages" : 26,
"avgMessagesPerEntry" : 2,
"blockedConsumerOnUnackedMsgs" : false,
"drainingHashesCount" : 0,
"drainingHashesClearedTotal" : 0,
"drainingHashesUnackedMessages" : 0,
"drainingHashes" : [ ],
"address" : "/10.10.179.105:36012",
"connectedSince" : "2025-01-31T09:46:39.803747416Z",
"clientVersion" : "Pulsar-Java-v4.0.2",
"lastAckedTimestamp" : 1738317707396,
"lastConsumedTimestamp" : 1738317704211,
"lastConsumedFlowTimestamp" : 1738316912547,
"keyHashRangeArrays" : [ [ 406, 441 ], [ 661, 1185 ], [ 3706, 3848 ], [ 4300, 4463 ], [ 4578, 5029 ], [ 6709, 6722 ], [ 6833, 6877 ], [ 7301, 7326 ], [ 7506, 7595 ], [ 9503, 9572 ], [ 10041, 10427 ], [ 10536, 10583 ], [ 10605, 10787 ], [ 11078, 11101 ], [ 11115, 11166 ], [ 11448, 11781 ], [ 12319, 13683 ], [ 14385, 14534 ], [ 14706, 15105 ], [ 15201, 15651 ], [ 15659, 15781 ], [ 17077, 17309 ], [ 17568, 18019 ], [ 18805, 19188 ], [ 19256, 19353 ], [ 19484, 19598 ], [ 20072, 20078 ], [ 20190, 20213 ], [ 21106, 21458 ], [ 21676, 21731 ], [ 21753, 21890 ], [ 23238, 24150 ], [ 25136, 25343 ], [ 26294, 26422 ], [ 27172, 27880 ], [ 27944, 27973 ], [ 28653, 28859 ], [ 30373, 30610 ], [ 31110, 31402 ], [ 32104, 32194 ], [ 32350, 32436 ], [ 33006, 33097 ], [ 35454, 35941 ], [ 36289, 36430 ], [ 37029, 37034 ], [ 37596, 37679 ], [ 37806, 37880 ], [ 38946, 39181 ], [ 39783, 40110 ], [ 40605, 40714 ], [ 41187, 41232 ], [ 41314, 41402 ], [ 42094, 42417 ], [ 42549, 42795 ], [ 44462, 44609 ], [ 46240, 46322 ], [ 46366, 46422 ], [ 46581, 46705 ], [ 48893, 49220 ], [ 49518, 49601 ], [ 49621, 49830 ], [ 50749, 51596 ], [ 52339, 52528 ], [ 53770, 53813 ], [ 53850, 54079 ], [ 54222, 54320 ], [ 54764, 55360 ], [ 56177, 56219 ], [ 56622, 56650 ], [ 56867, 56896 ], [ 58230, 58241 ], [ 58491, 58830 ], [ 60644, 60926 ], [ 61211, 61299 ], [ 62611, 62962 ], [ 63344, 63449 ], [ 64566, 64944 ], [ 65169, 65177 ] ],
"metadata" : { },
"lastAckedTime" : "2025-01-31T10:01:47.396Z",
"lastConsumedTime" : "2025-01-31T10:01:44.211Z"
}, {
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"bytesOutCounter" : 0,
"msgOutCounter" : 0,
"msgRateRedeliver" : 0.0,
"messageAckRate" : 0.0,
"chunkedMessageRate" : 0.0,
"consumerName" : "microbatcher-consumer-4ddf9389-86d0-42a0-9b99-eac0c9374d16",
"availablePermits" : 500,
"unackedMessages" : 0,
"avgMessagesPerEntry" : 0,
"blockedConsumerOnUnackedMsgs" : false,
"drainingHashesCount" : 0,
"drainingHashesClearedTotal" : 0,
"drainingHashesUnackedMessages" : 0,
"drainingHashes" : [ ],
"address" : "/10.10.178.13:41010",
"connectedSince" : "2025-01-31T09:46:39.803606989Z",
"clientVersion" : "Pulsar-Java-v4.0.2",
"lastAckedTimestamp" : 0,
"lastConsumedTimestamp" : 0,
"lastConsumedFlowTimestamp" : 1738316799808,
"keyHashRangeArrays" : [ [ 1, 376 ], [ 501, 525 ], [ 1186, 1502 ], [ 2039, 2287 ], [ 2524, 2794 ], [ 3588, 3705 ], [ 6723, 6832 ], [ 6878, 7235 ], [ 7596, 8128 ], [ 8446, 8913 ], [ 8927, 9502 ], [ 9647, 9654 ], [ 9830, 9958 ], [ 10428, 10535 ], [ 10584, 10604 ], [ 11102, 11114 ], [ 15652, 15658 ], [ 16094, 16121 ], [ 17310, 17567 ], [ 18301, 18487 ], [ 20214, 20513 ], [ 22357, 22449 ], [ 22469, 22775 ], [ 22892, 22961 ], [ 24398, 24537 ], [ 25968, 26288 ], [ 26969, 27171 ], [ 27881, 27943 ], [ 28210, 28400 ], [ 29461, 29778 ], [ 29817, 30006 ], [ 30356, 30372 ], [ 30611, 30811 ], [ 31611, 32103 ], [ 32437, 32886 ], [ 33718, 33841 ], [ 34069, 34185 ], [ 34263, 35100 ], [ 35159, 35453 ], [ 37203, 37595 ], [ 37680, 37805 ], [ 38932, 38945 ], [ 39182, 39255 ], [ 40143, 40157 ], [ 41668, 42093 ], [ 42488, 42548 ], [ 44375, 44445 ], [ 44616, 44781 ], [ 44961, 45485 ], [ 45867, 45916 ], [ 46423, 46580 ], [ 47176, 47442 ], [ 50065, 50071 ], [ 50461, 50748 ], [ 51619, 51944 ], [ 52003, 52017 ], [ 54321, 54609 ], [ 55361, 55543 ], [ 55617, 55731 ], [ 56220, 56500 ], [ 56651, 56726 ], [ 56897, 57002 ], [ 57343, 57388 ], [ 59362, 59672 ], [ 60335, 60643 ], [ 60927, 61068 ], [ 61399, 61454 ], [ 61698, 62029 ], [ 62292, 62384 ], [ 62535, 62539 ], [ 62963, 63028 ], [ 63736, 63765 ], [ 64276, 64565 ], [ 65205, 65535 ] ],
"metadata" : { },
"lastAckedTime" : "1970-01-01T00:00:00Z",
"lastConsumedTime" : "1970-01-01T00:00:00Z"
}, {
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"bytesOutCounter" : 8304,
"msgOutCounter" : 12,
"msgRateRedeliver" : 0.0,
"messageAckRate" : 0.033333328178334135,
"chunkedMessageRate" : 0.0,
"consumerName" : "microbatcher-consumer-7e5d4f42-6697-446e-bbb8-6eeb3349269c",
"availablePermits" : 488,
"unackedMessages" : 0,
"avgMessagesPerEntry" : 1,
"blockedConsumerOnUnackedMsgs" : false,
"drainingHashesCount" : 0,
"drainingHashesClearedTotal" : 0,
"drainingHashesUnackedMessages" : 0,
"drainingHashes" : [ ],
"address" : "/10.10.179.78:36344",
"connectedSince" : "2025-01-31T09:46:39.803620032Z",
"clientVersion" : "Pulsar-Java-v4.0.2",
"lastAckedTimestamp" : 1738317673406,
"lastConsumedTimestamp" : 1738317657199,
"lastConsumedFlowTimestamp" : 1738316799808,
"keyHashRangeArrays" : [ [ 377, 405 ], [ 442, 500 ], [ 526, 660 ], [ 1503, 1514 ], [ 2288, 2523 ], [ 2795, 3587 ], [ 5030, 5187 ], [ 5235, 5448 ], [ 7240, 7300 ], [ 7327, 7505 ], [ 8362, 8445 ], [ 9573, 9646 ], [ 11167, 11447 ], [ 11964, 12318 ], [ 13684, 13984 ], [ 14535, 14705 ], [ 16316, 16334 ], [ 16760, 17058 ], [ 18020, 18067 ], [ 18579, 18804 ], [ 19189, 19255 ], [ 19354, 19483 ], [ 19682, 20071 ], [ 20079, 20130 ], [ 21013, 21105 ], [ 21459, 21675 ], [ 21732, 21752 ], [ 22295, 22356 ], [ 22962, 23151 ], [ 24538, 24641 ], [ 25344, 25569 ], [ 25657, 25967 ], [ 26844, 26968 ], [ 28401, 28652 ], [ 29209, 29460 ], [ 30318, 30355 ], [ 30812, 31109 ], [ 32195, 32349 ], [ 33098, 33400 ], [ 33842, 33890 ], [ 34186, 34204 ], [ 35942, 36288 ], [ 36431, 36565 ], [ 37035, 37202 ], [ 40158, 40266 ], [ 41233, 41313 ], [ 42418, 42487 ], [ 42796, 43167 ], [ 43395, 43404 ], [ 44314, 44374 ], [ 44446, 44461 ], [ 44610, 44615 ], [ 44782, 44794 ], [ 45486, 45866 ], [ 45917, 46239 ], [ 46323, 46330 ], [ 47443, 47589 ], [ 48126, 48548 ], [ 48676, 48892 ], [ 49221, 49500 ], [ 49602, 49620 ], [ 50072, 50421 ], [ 51597, 51618 ], [ 51945, 52002 ], [ 52992, 53769 ], [ 54080, 54221 ], [ 54610, 54739 ], [ 56816, 56866 ], [ 57003, 57157 ], [ 57335, 57342 ], [ 57389, 57859 ], [ 58477, 58490 ], [ 58831, 59274 ], [ 59673, 59969 ], [ 60020, 60334 ], [ 61069, 61150 ], [ 61300, 61398 ], [ 61455, 61697 ], [ 62385, 62534 ], [ 62540, 62589 ], [ 63029, 63343 ], [ 63766, 64275 ], [ 65178, 65204 ] ],
"metadata" : { },
"lastAckedTime" : "2025-01-31T10:01:13.406Z",
"lastConsumedTime" : "2025-01-31T10:00:57.199Z"
} ],
"isDurable" : true,
"isReplicated" : false,
"allowOutOfOrderDelivery" : false,
"keySharedMode" : "AUTO_SPLIT",
"consumersAfterMarkDeletePosition" : { },
"drainingHashesCount" : 0,
"drainingHashesClearedTotal" : 0,
"drainingHashesUnackedMessages" : 0,
"nonContiguousDeletedMessagesRanges" : 2,
"nonContiguousDeletedMessagesRangesSerializedSize" : 70,
"delayedMessageIndexSizeInBytes" : 0,
"subscriptionProperties" : { },
"filterProcessedMsgCount" : 0,
"filterAcceptedMsgCount" : 0,
"filterRejectedMsgCount" : 0,
"filterRescheduledMsgCount" : 0,
"durable" : true,
"replicated" : false
}
},
"replication" : { },
"deduplicationStatus" : "Disabled",
"nonContiguousDeletedMessagesRanges" : 2,
"nonContiguousDeletedMessagesRangesSerializedSize" : 70,
"delayedMessageIndexSizeInBytes" : 0,
"compaction" : {
"lastCompactionRemovedEventCount" : 0,
"lastCompactionSucceedTimestamp" : 0,
"lastCompactionFailedTimestamp" : 0,
"lastCompactionDurationTimeInMills" : 0
},
"ownerBroker" : "integ-pulsar-broker-5.integ-pulsar-broker.str-integ.svc.cluster.local:8080"
} Topic partitioned-stats: {
"msgRateIn" : 2.0333408764084946,
"msgThroughputIn" : 1704.8063034094396,
"msgRateOut" : 2.033340898884361,
"msgThroughputOut" : 1704.8063223167255,
"bytesInCounter" : 330665095,
"msgInCounter" : 78321,
"systemTopicBytesInCounter" : 0,
"bytesOutCounter" : 338307079,
"msgOutCounter" : 80097,
"bytesOutInternalCounter" : 0,
"averageMsgSize" : 612.0249999878398,
"msgChunkPublished" : false,
"storageSize" : 339672957,
"backlogSize" : 55819242,
"backlogQuotaLimitSize" : -1,
"backlogQuotaLimitTime" : -1,
"oldestBacklogMessageAgeSeconds" : 1202,
"oldestBacklogMessageSubscriptionName" : "microbatcher",
"publishRateLimitedTimes" : 0,
"earliestMsgPublishTimeInBacklogs" : 0,
"offloadedStorageSize" : 0,
"lastOffloadLedgerId" : 0,
"lastOffloadSuccessTimeStamp" : 0,
"lastOffloadFailureTimeStamp" : 0,
"ongoingTxnCount" : 0,
"abortedTxnCount" : 0,
"committedTxnCount" : 0,
"publishers" : [ {
"msgRateIn" : 1.0333367134601146,
"msgThroughputIn" : 874.5694844573518,
"averageMsgSize" : 420.425,
"chunkedMessageRate" : 0.0,
"producerId" : 0,
"supportsPartialProducer" : false
}, {
"msgRateIn" : 1.000004162948381,
"msgThroughputIn" : 830.2368189520878,
"averageMsgSize" : 387.9766666666666,
"chunkedMessageRate" : 0.0,
"producerId" : 0,
"supportsPartialProducer" : false
} ],
"waitingPublishers" : 0,
"subscriptions" : {
"microbatcher" : {
"msgRateOut" : 2.033340898884361,
"msgThroughputOut" : 1704.8063223167255,
"bytesOutCounter" : 338307079,
"msgOutCounter" : 80097,
"msgRateRedeliver" : 0.0,
"messageAckRate" : 2.050007899466562,
"chunkedMessageRate" : 0.0,
"msgBacklog" : 131,
"backlogSize" : 55819242,
"earliestMsgPublishTimeInBacklog" : 0,
"msgBacklogNoDelayed" : 131,
"blockedSubscriptionOnUnackedMsgs" : false,
"msgDelayed" : 0,
"msgInReplay" : 0,
"unackedMessages" : 214,
"type" : "Key_Shared",
"msgRateExpired" : 0.0,
"totalMsgExpired" : 0,
"lastExpireTimestamp" : 0,
"lastConsumedFlowTimestamp" : 0,
"lastConsumedTimestamp" : 0,
"lastAckedTimestamp" : 0,
"lastMarkDeleteAdvancedTimestamp" : 0,
"consumers" : [ {
"msgRateOut" : 0.600002054159385,
"msgThroughputOut" : 473.685136454395,
"bytesOutCounter" : 64861170,
"msgOutCounter" : 15575,
"msgRateRedeliver" : 0.0,
"messageAckRate" : 0.6166689861685504,
"chunkedMessageRate" : 0.0,
"availablePermits" : 46675,
"unackedMessages" : 95,
"avgMessagesPerEntry" : 0,
"blockedConsumerOnUnackedMsgs" : false,
"drainingHashesCount" : 0,
"drainingHashesClearedTotal" : 51,
"drainingHashesUnackedMessages" : 0,
"drainingHashes" : [ ],
"lastAckedTimestamp" : 0,
"lastConsumedTimestamp" : 0,
"lastConsumedFlowTimestamp" : 0,
"keyHashRangeArrays" : [ [ 1, 376 ], [ 501, 525 ], [ 1186, 1502 ], [ 2039, 2287 ], [ 2524, 2794 ], [ 3588, 3705 ], [ 6723, 6832 ], [ 6878, 7235 ], [ 7596, 8128 ], [ 8446, 8913 ], [ 8927, 9502 ], [ 9647, 9654 ], [ 9830, 9958 ], [ 10428, 10535 ], [ 10584, 10604 ], [ 11102, 11114 ], [ 15652, 15658 ], [ 16094, 16121 ], [ 17310, 17567 ], [ 18301, 18487 ], [ 20214, 20513 ], [ 22357, 22449 ], [ 22469, 22775 ], [ 22892, 22961 ], [ 24398, 24537 ], [ 25968, 26288 ], [ 26969, 27171 ], [ 27881, 27943 ], [ 28210, 28400 ], [ 29461, 29778 ], [ 29817, 30006 ], [ 30356, 30372 ], [ 30611, 30811 ], [ 31611, 32103 ], [ 32437, 32886 ], [ 33718, 33841 ], [ 34069, 34185 ], [ 34263, 35100 ], [ 35159, 35453 ], [ 37203, 37595 ], [ 37680, 37805 ], [ 38932, 38945 ], [ 39182, 39255 ], [ 40143, 40157 ], [ 41668, 42093 ], [ 42488, 42548 ], [ 44375, 44445 ], [ 44616, 44781 ], [ 44961, 45485 ], [ 45867, 45916 ], [ 46423, 46580 ], [ 47176, 47442 ], [ 50065, 50071 ], [ 50461, 50748 ], [ 51619, 51944 ], [ 52003, 52017 ], [ 54321, 54609 ], [ 55361, 55543 ], [ 55617, 55731 ], [ 56220, 56500 ], [ 56651, 56726 ], [ 56897, 57002 ], [ 57343, 57388 ], [ 59362, 59672 ], [ 60335, 60643 ], [ 60927, 61068 ], [ 61399, 61454 ], [ 61698, 62029 ], [ 62292, 62384 ], [ 62535, 62539 ], [ 62963, 63028 ], [ 63736, 63765 ], [ 64276, 64565 ], [ 65205, 65535 ] ],
"lastAckedTime" : "1970-01-01T00:00:00Z",
"lastConsumedTime" : "1970-01-01T00:00:00Z"
}, {
"msgRateOut" : 0.5166690860734111,
"msgThroughputOut" : 405.385061886289,
"bytesOutCounter" : 80591750,
"msgOutCounter" : 19173,
"msgRateRedeliver" : 0.0,
"messageAckRate" : 0.5500023426171868,
"chunkedMessageRate" : 0.0,
"availablePermits" : 46577,
"unackedMessages" : 61,
"avgMessagesPerEntry" : 0,
"blockedConsumerOnUnackedMsgs" : false,
"drainingHashesCount" : 0,
"drainingHashesClearedTotal" : 11,
"drainingHashesUnackedMessages" : 0,
"drainingHashes" : [ ],
"lastAckedTimestamp" : 0,
"lastConsumedTimestamp" : 0,
"lastConsumedFlowTimestamp" : 0,
"keyHashRangeArrays" : [ [ 377, 405 ], [ 442, 500 ], [ 526, 660 ], [ 1503, 1514 ], [ 2288, 2523 ], [ 2795, 3587 ], [ 5030, 5187 ], [ 5235, 5448 ], [ 7240, 7300 ], [ 7327, 7505 ], [ 8362, 8445 ], [ 9573, 9646 ], [ 11167, 11447 ], [ 11964, 12318 ], [ 13684, 13984 ], [ 14535, 14705 ], [ 16316, 16334 ], [ 16760, 17058 ], [ 18020, 18067 ], [ 18579, 18804 ], [ 19189, 19255 ], [ 19354, 19483 ], [ 19682, 20071 ], [ 20079, 20130 ], [ 21013, 21105 ], [ 21459, 21675 ], [ 21732, 21752 ], [ 22295, 22356 ], [ 22962, 23151 ], [ 24538, 24641 ], [ 25344, 25569 ], [ 25657, 25967 ], [ 26844, 26968 ], [ 28401, 28652 ], [ 29209, 29460 ], [ 30318, 30355 ], [ 30812, 31109 ], [ 32195, 32349 ], [ 33098, 33400 ], [ 33842, 33890 ], [ 34186, 34204 ], [ 35942, 36288 ], [ 36431, 36565 ], [ 37035, 37202 ], [ 40158, 40266 ], [ 41233, 41313 ], [ 42418, 42487 ], [ 42796, 43167 ], [ 43395, 43404 ], [ 44314, 44374 ], [ 44446, 44461 ], [ 44610, 44615 ], [ 44782, 44794 ], [ 45486, 45866 ], [ 45917, 46239 ], [ 46323, 46330 ], [ 47443, 47589 ], [ 48126, 48548 ], [ 48676, 48892 ], [ 49221, 49500 ], [ 49602, 49620 ], [ 50072, 50421 ], [ 51597, 51618 ], [ 51945, 52002 ], [ 52992, 53769 ], [ 54080, 54221 ], [ 54610, 54739 ], [ 56816, 56866 ], [ 57003, 57157 ], [ 57335, 57342 ], [ 57389, 57859 ], [ 58477, 58490 ], [ 58831, 59274 ], [ 59673, 59969 ], [ 60020, 60334 ], [ 61069, 61150 ], [ 61300, 61398 ], [ 61455, 61697 ], [ 62385, 62534 ], [ 62540, 62589 ], [ 63029, 63343 ], [ 63766, 64275 ], [ 65178, 65204 ] ],
"lastAckedTime" : "1970-01-01T00:00:00Z",
"lastConsumedTime" : "1970-01-01T00:00:00Z"
}, {
"msgRateOut" : 0.48333491160967085,
"msgThroughputOut" : 424.8514676772335,
"bytesOutCounter" : 78352903,
"msgOutCounter" : 18496,
"msgRateRedeliver" : 0.0,
"messageAckRate" : 0.5166687138351619,
"chunkedMessageRate" : 0.0,
"availablePermits" : 46754,
"unackedMessages" : 10,
"avgMessagesPerEntry" : 0,
"blockedConsumerOnUnackedMsgs" : false,
"drainingHashesCount" : 0,
"drainingHashesClearedTotal" : 3,
"drainingHashesUnackedMessages" : 0,
"drainingHashes" : [ ],
"lastAckedTimestamp" : 0,
"lastConsumedTimestamp" : 0,
"lastConsumedFlowTimestamp" : 0,
"keyHashRangeArrays" : [ [ 1515, 2038 ], [ 3849, 4299 ], [ 4464, 4577 ], [ 5188, 5234 ], [ 5449, 6708 ], [ 7236, 7239 ], [ 8129, 8361 ], [ 8914, 8926 ], [ 9655, 9829 ], [ 9959, 10040 ], [ 10788, 11077 ], [ 11782, 11963 ], [ 13985, 14384 ], [ 15106, 15200 ], [ 15782, 16093 ], [ 16122, 16315 ], [ 16335, 16759 ], [ 17059, 17076 ], [ 18068, 18300 ], [ 18488, 18578 ], [ 19599, 19681 ], [ 20131, 20189 ], [ 20514, 21012 ], [ 21891, 22294 ], [ 22450, 22468 ], [ 22776, 22891 ], [ 23152, 23237 ], [ 24151, 24397 ], [ 24642, 25135 ], [ 25570, 25656 ], [ 26289, 26293 ], [ 26423, 26843 ], [ 27974, 28209 ], [ 28860, 29208 ], [ 29779, 29816 ], [ 30007, 30317 ], [ 31403, 31610 ], [ 32887, 33005 ], [ 33401, 33717 ], [ 33891, 34068 ], [ 34205, 34262 ], [ 35101, 35158 ], [ 36566, 37028 ], [ 37881, 38931 ], [ 39256, 39782 ], [ 40111, 40142 ], [ 40267, 40604 ], [ 40715, 41186 ], [ 41403, 41667 ], [ 43168, 43394 ], [ 43405, 44313 ], [ 44795, 44960 ], [ 46331, 46365 ], [ 46706, 47175 ], [ 47590, 48125 ], [ 48549, 48675 ], [ 49501, 49517 ], [ 49831, 50064 ], [ 50422, 50460 ], [ 52018, 52338 ], [ 52529, 52991 ], [ 53814, 53849 ], [ 54740, 54763 ], [ 55544, 55616 ], [ 55732, 56176 ], [ 56501, 56621 ], [ 56727, 56815 ], [ 57158, 57334 ], [ 57860, 58229 ], [ 58242, 58476 ], [ 59275, 59361 ], [ 59970, 60019 ], [ 61151, 61210 ], [ 62030, 62291 ], [ 62590, 62610 ], [ 63450, 63735 ], [ 64945, 65168 ] ],
"lastAckedTime" : "1970-01-01T00:00:00Z",
"lastConsumedTime" : "1970-01-01T00:00:00Z"
}, {
"msgRateOut" : 0.43333484704189446,
"msgThroughputOut" : 400.884656298808,
"bytesOutCounter" : 114501256,
"msgOutCounter" : 26853,
"msgRateRedeliver" : 0.0,
"messageAckRate" : 0.36666785684566255,
"chunkedMessageRate" : 0.0,
"availablePermits" : 46647,
"unackedMessages" : 48,
"avgMessagesPerEntry" : 0,
"blockedConsumerOnUnackedMsgs" : false,
"drainingHashesCount" : 0,
"drainingHashesClearedTotal" : 0,
"drainingHashesUnackedMessages" : 0,
"drainingHashes" : [ ],
"lastAckedTimestamp" : 0,
"lastConsumedTimestamp" : 0,
"lastConsumedFlowTimestamp" : 0,
"keyHashRangeArrays" : [ [ 406, 441 ], [ 661, 1185 ], [ 3706, 3848 ], [ 4300, 4463 ], [ 4578, 5029 ], [ 6709, 6722 ], [ 6833, 6877 ], [ 7301, 7326 ], [ 7506, 7595 ], [ 9503, 9572 ], [ 10041, 10427 ], [ 10536, 10583 ], [ 10605, 10787 ], [ 11078, 11101 ], [ 11115, 11166 ], [ 11448, 11781 ], [ 12319, 13683 ], [ 14385, 14534 ], [ 14706, 15105 ], [ 15201, 15651 ], [ 15659, 15781 ], [ 17077, 17309 ], [ 17568, 18019 ], [ 18805, 19188 ], [ 19256, 19353 ], [ 19484, 19598 ], [ 20072, 20078 ], [ 20190, 20213 ], [ 21106, 21458 ], [ 21676, 21731 ], [ 21753, 21890 ], [ 23238, 24150 ], [ 25136, 25343 ], [ 26294, 26422 ], [ 27172, 27880 ], [ 27944, 27973 ], [ 28653, 28859 ], [ 30373, 30610 ], [ 31110, 31402 ], [ 32104, 32194 ], [ 32350, 32436 ], [ 33006, 33097 ], [ 35454, 35941 ], [ 36289, 36430 ], [ 37029, 37034 ], [ 37596, 37679 ], [ 37806, 37880 ], [ 38946, 39181 ], [ 39783, 40110 ], [ 40605, 40714 ], [ 41187, 41232 ], [ 41314, 41402 ], [ 42094, 42417 ], [ 42549, 42795 ], [ 44462, 44609 ], [ 46240, 46322 ], [ 46366, 46422 ], [ 46581, 46705 ], [ 48893, 49220 ], [ 49518, 49601 ], [ 49621, 49830 ], [ 50749, 51596 ], [ 52339, 52528 ], [ 53770, 53813 ], [ 53850, 54079 ], [ 54222, 54320 ], [ 54764, 55360 ], [ 56177, 56219 ], [ 56622, 56650 ], [ 56867, 56896 ], [ 58230, 58241 ], [ 58491, 58830 ], [ 60644, 60926 ], [ 61211, 61299 ], [ 62611, 62962 ], [ 63344, 63449 ], [ 64566, 64944 ], [ 65169, 65177 ] ],
"lastAckedTime" : "1970-01-01T00:00:00Z",
"lastConsumedTime" : "1970-01-01T00:00:00Z"
} ],
"isDurable" : true,
"isReplicated" : false,
"allowOutOfOrderDelivery" : false,
"consumersAfterMarkDeletePosition" : { },
"drainingHashesCount" : 0,
"drainingHashesClearedTotal" : 65,
"drainingHashesUnackedMessages" : 0,
"nonContiguousDeletedMessagesRanges" : 29,
"nonContiguousDeletedMessagesRangesSerializedSize" : 1001,
"delayedMessageIndexSizeInBytes" : 0,
"subscriptionProperties" : { },
"filterProcessedMsgCount" : 0,
"filterAcceptedMsgCount" : 0,
"filterRejectedMsgCount" : 0,
"filterRescheduledMsgCount" : 0,
"replicated" : false,
"durable" : true
}
},
"replication" : { },
"nonContiguousDeletedMessagesRanges" : 29,
"nonContiguousDeletedMessagesRangesSerializedSize" : 1001,
"delayedMessageIndexSizeInBytes" : 0,
"compaction" : {
"lastCompactionRemovedEventCount" : 0,
"lastCompactionSucceedTimestamp" : 0,
"lastCompactionFailedTimestamp" : 0,
"lastCompactionDurationTimeInMills" : 0
},
"metadata" : {
"partitions" : 100,
"deleted" : false
},
"partitions" : { }
} |
Checked on broker: |
That's fine. No transactions involved. The application code would explicitly need to use transactions if it was used. |
In the v3.3.1...v3.3.2 diff, #23072 looks like the largest change related to acknowledgements. If you would be able to run the experiments mentioned in #23908 (comment), that would confirm many things. Building a fork off v3.3.2 with #23072 commit 1996c86 reverted would be one way to check if it's impacting your use case. |
Configured that but still able to reproduce issue |
Tried that but I have problem with building pulsar-all image for architecture linux/amd64. For linux/arm64 it works correctly but I need to build linux/amd64 |
What problem are you facing? The way Pulsar docker images are built during the release is documented at https://pulsar.apache.org/contribute/release-process/#release-pulsar-30-and-later . The prerequisite is to build artifacts before that. |
Unfortunately those instructions aren't up-to-date. Please check #23908 (comment) for the commands used while releasing. |
still same issue when i use commands from this link |
It seems that you might be using the wrong commands. In the error message, you shouldn't see
|
Did you use these commands to build a multi-platform image? mvn clean install -DskipTests
DOCKER_USER=my_docker_user
mvn install -pl docker/pulsar,docker/pulsar-all \
-DskipTests \
-Pmain,docker,docker-push \
-Ddocker.platforms=linux/amd64,linux/arm64 \
-Ddocker.organization=$DOCKER_USER \
-Ddocker.noCache=true There are several gotchas in building and handling multi-platform images. I'm not exactly sure, but I think that unless you build a multi-platform image in one shot and push it directly to a docker repository from the build, you might not be able to push it separately with You might not need a multi-platform image in your case, but if you'd like to build a similar image as the Pulsar images, it's better to stick to the same set of commands. Another detail is that tagging multi-platform images contains gotchas. In Pulsar, we use the |
Finally I was able to build pulsar-all docker image. I reverted commits which you mentioned in above comment but still was able to reproduce issue. Do you know which other commit I can try? |
Thank you @szkoludasebastian, this is useful in isolating the issue. Since there's also 3.3.3 and 3.3.4 releases available, would you be able to test whether problems reproduce in those releases? If possible, it would be great to find the commit that breaks your use case. |
I performed more tests for pulsar client versions 3.3.3 and 3.3.4 to check which version is causing issue but failed to reproduce it. Then I checked again versions 4.0.0 and 4.0.1, however, it failed to reproduce for them as well. I noticed that right now I built images for our services from our latest master, so I checked git history and lately we reverted such change: loadManagerClassName: "org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl"
loadBalancerLoadSheddingStrategy: "org.apache.pulsar.broker.loadbalance.extensions.scheduler.TransferShedder" we had it added in broker configmap. Here is our broker configmap (added these two configurations mentioned above):
|
Thank you @szkoludasebastian. This is very useful information in isolating the issue to be related to ExtensibleLoadManagerImpl. @heesung-sn recently provided this advice about using ExtensibleLoadManagerImpl: #23889 (comment) |
Tested but still able to reproduce issue |
Thank you for reporting this bug.
Meanwhile, I am trying to reproduce this issue on my end. |
I tried to reproduce this issue by the above test setting, but I couldn't. Can you help to reproduce this issue by modifying the above setup script? Also, for this repro step:
it appears that the Step 5 kills the producer. After step 5, how do we see backlog increase without producer? |
From this stats, #23908 (comment) I noticed that "unackedMessages" is non-zero. Seems like either the broker or consumer missed msg acks during the bk restarts. fyi, the ExtensibleLoadManager ignores message acks during topic.transferring state, by However, I think these unacked messages should be handled by Pulsar message re-delivery mechanism. You can confirm this behavior by the debug logs in the above code or
|
Search before asking
Read release policy
Version
Pulsar client >= 3.3.2
Pulsar server >= 3.3.2
We made our tests on client/server version: 3.3.1, 3.3.2, 4.0.1, 4.0.2. We have not noticed such a problem on version 3.3.1.
Minimal reproduce step
so readPosition is moving but markDeletePosition stuck. Sometimes restarting consumers or brokers makes that markDeletePosition is moving forward and backlog goes down.
We also tried resetting the cursor to markDeletePosition to read the messages again, and that also helped. We did that by this command:
What did you expect to see?
No partition stuck and backlog does not grow
What did you see instead?
Backlog grows because markDeletePosition does not move
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: