Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BugReport: ClusterTopologyRefreshTask is not shutdown when RedisClusterClient is shutdown #2904

Open
exceptionplayer opened this issue Jun 24, 2024 · 4 comments
Labels
type: bug A general bug
Milestone

Comments

@exceptionplayer
Copy link

Bug Report

When RedisClusterClient is shutdown, the ClusterTopologyRefreshTask may not be cancelled.

Current Behavior

When I close the RedisClusterClient by invoking RedisClusterClient.shutdown method, there is a chance that the ClusterTopologyRefreshTask is not stopped.

It seems the issue have be resolved here, but the latest code does not seem to work.

Stack trace
2024-06-24 23:27:04.223 ERROR 781313 --- [lettuce-eventExecutorLoop-3-59] i.n.u.c.D.rejectedExecution              : Failed to submit a listener notification task. Event loop shut down?

java.util.concurrent.RejectedExecutionException: event executor terminated
	at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:934) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:351) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:344) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:836) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.execute0(SingleThreadEventExecutor.java:827) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:817) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:862) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:500) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:185) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30) ~[netty-transport-4.1.109.Final.jar!/:4.1.109.Final]
	at io.lettuce.core.AbstractRedisClient.initializeChannelAsync0(AbstractRedisClient.java:439) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at io.lettuce.core.AbstractRedisClient.lambda$initializeChannelAsync$2(AbstractRedisClient.java:412) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at reactor.core.publisher.LambdaMonoSubscriber.onNext(LambdaMonoSubscriber.java:171) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onNext(FluxPeekFuseable.java:210) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onNext(MonoPeekTerminal.java:180) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2571) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.request(MonoPeekTerminal.java:139) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.request(FluxPeekFuseable.java:144) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.LambdaMonoSubscriber.onSubscribe(LambdaMonoSubscriber.java:121) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onSubscribe(FluxPeekFuseable.java:178) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onSubscribe(MonoPeekTerminal.java:152) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.Mono.subscribe(Mono.java:4568) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.Mono.subscribeWith(Mono.java:4634) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.Mono.subscribe(Mono.java:4534) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.Mono.subscribe(Mono.java:4470) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at reactor.core.publisher.Mono.subscribe(Mono.java:4442) ~[reactor-core-3.6.5.jar!/:3.6.5]
	at io.lettuce.core.AbstractRedisClient.initializeChannelAsync(AbstractRedisClient.java:407) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at io.lettuce.core.cluster.RedisClusterClient.connectStatefulAsync(RedisClusterClient.java:818) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at io.lettuce.core.cluster.RedisClusterClient.connectToNodeAsync(RedisClusterClient.java:554) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at io.lettuce.core.cluster.RedisClusterClient$NodeConnectionFactoryImpl.connectToNodeAsync(RedisClusterClient.java:1287) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at io.lettuce.core.cluster.topology.DefaultClusterTopologyRefresh.openConnections(DefaultClusterTopologyRefresh.java:315) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at io.lettuce.core.cluster.topology.DefaultClusterTopologyRefresh.loadViews(DefaultClusterTopologyRefresh.java:99) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at io.lettuce.core.cluster.RedisClusterClient.fetchPartitions(RedisClusterClient.java:1033) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at io.lettuce.core.cluster.RedisClusterClient.loadPartitionsAsync(RedisClusterClient.java:985) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at io.lettuce.core.cluster.RedisClusterClient.refreshPartitionsAsync(RedisClusterClient.java:891) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at io.lettuce.core.cluster.ClusterTopologyRefreshScheduler$ClusterTopologyRefreshTask.doRun(ClusterTopologyRefreshScheduler.java:338) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at io.lettuce.core.cluster.ClusterTopologyRefreshScheduler$ClusterTopologyRefreshTask.run(ClusterTopologyRefreshScheduler.java:323) ~[lettuce-core-6.3.2.RELEASE.jar!/:6.3.2.RELEASE/8941aea]
	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.109.Final.jar!/:4.1.109.Final]
	at java.base/java.lang.Thread.run(Thread.java:898) [?:?]

Environment

  • Lettuce version(s): [lettuce-core-6.3.2.RELEASE.jar!]
@exceptionplayer exceptionplayer changed the title ClusterTopologyRefreshTask is not shutdown when RedisClusterClient is shutdown BugReport: ClusterTopologyRefreshTask is not shutdown when RedisClusterClient is shutdown Jun 24, 2024
@tishun tishun added the type: bug A general bug label Jul 3, 2024
@tishun tishun added this to the 7.x milestone Jul 3, 2024
@tishun tishun added the status: good-first-issue An issue that can only be worked on by brand new contributors label Jul 17, 2024
@tishun tishun modified the milestones: 7.x, Backlog Jul 17, 2024
@thachlp
Copy link
Contributor

thachlp commented Sep 12, 2024

@tishun @exceptionplayer please help review my pr 🙇

@tishun
Copy link
Collaborator

tishun commented Oct 22, 2024

@tishun @exceptionplayer please help review my pr 🙇

@thachlp did you check out the comments in the PR?

@thachlp
Copy link
Contributor

thachlp commented Oct 23, 2024

@tishun @exceptionplayer please help review my pr 🙇

@thachlp did you check out the comments in the PR?

Let me check

@tishun tishun removed the status: good-first-issue An issue that can only be worked on by brand new contributors label Dec 21, 2024
@tishun
Copy link
Collaborator

tishun commented Dec 23, 2024

This is a bit of a cat-and-mouse problem, because there is a chain of calls that lead to the actual execution of the task.
The event loop could be alive during the scheduling of the task and then be in a closing state when the tasks starts executing. As @mp911de mentioned in #2985 (comment) we would have to - more or less - block the event loop from closing until the scheduled task completes. Basically

  • if we detect the event loop is active
  • then we would block it from completing with an reentrant lock
  • then we will schedule the partition refresh task
  • then when the task completes we will unblock

If during the refresh the event loop was required to close then the thread that initiated that would wait until the refresh is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

No branches or pull requests

3 participants