Remove a unnecessary Thread.sleep() from ClusterCommandExecutor.collectResults() #2518

dongmyo · 2023-03-02T16:12:19Z

https://github.com/spring-projects/spring-data-redis/blob/main/src/main/java/org/springframework/data/redis/connection/ClusterCommandExecutor.java#L224

when for loop ends with all futures done or cancelled in ClusterCommandExecutor.collectResults().

https://github.com/spring-projects/spring-data-redis/blob/main/src/main/java/org/springframework/data/redis/connection/ClusterCommandExecutor.java#L258

after that, there is no need for thread to be slept, but Thread.sleep() is called unnecessarily.

it could be better to check if the done is false.

try {
    if (!done) {
        Thread.sleep(10);
    }
} catch (InterruptedException e) {
    // ...   
}

The text was updated successfully, but these errors were encountered:

jxblum · 2023-03-08T20:21:27Z

Hi @dongmyo -

I appreciate you raising this issue; nice catch!

I also think you are generally correct, here. However, I have been giving this some serious thought, off-and-on, for the past few days, while working through other SD Redis Issues in our backlog.

In summary, I am thinking this Thread.sleep() call should be removed altogether as I am not entirely convinced that 1) it is strictly necessary, and 2) could be achieved in another, more productive way (such as using Future.get(timeout, :TimeUnit), here). I need to carefully think this through.

I think the code will need to be restructured, which should help simplify the code a bit since it is rather complex to reason about anyway, IMO. It also does not appropriately handle the CancellationException that can be thrown from Future.get(), which is not documented (nor here) either, where the collectResults(..) method is invoked (i.e. from methods that are part of the public API).

In general, arbitrarily calling Thread.sleep(..) is dangerous for at least 2 reasons:

Threads in Java are native to the platform on which the JRE and Java application are built and run (unlike the Green Threads, or Project Loom, to an extent, which is now on the horizon). Therefore, Java Threads are subject to the vagaries of scheduling and behavior consistent with the underlying OS and Thread model in use.
Secondly, arbitrary Thread.sleep(..) timeouts (e.g. 10 milliseconds +/-) might be perfectly acceptable in some applications while detrimental in others. What is too much or too little? And with Redis being all about efficient/fast access to data, this sort of induced latency is questionable at best.

My plan of attack is to write some tests for this area of code first so that I can then begin to make (structural) changes that ultimately get rid of the Thread.sleep(..) altogether.

I will post back when I have more information to share.

dongmyo · 2023-03-09T05:17:34Z

@jxblum thank you for your response.
either way, it will be fine as long as unnecessary Thread.sleep() is going to be removed.

jxblum · 2023-09-27T02:32:28Z

I am finally, now circling back to this.

Knowing significantly more about Spring Data Redis and just Redis in general now, my thinking is this "arbitrary" Thread.sleep() call (here) can be simply removed altogether. It really serves no purpose.

The Future.get() calls (here and here) "waits until the computation completes" anyway.

So, I think it is also unnecessary to use Future.get(timeout, :TimeUnit) in this case given blocking/waiting nature of Future.get() in the first place. Although, it may not be the most effective way of "collecting results" across the nodes in the Redis cluster.

Additionally, after looking through this code, I also think there is a bug!

If execution fails for any node when evaluating the result, then there is no point to re-evaluate the node on a subsequent iteration (in the while loop). That is, I think the futureId should be added to the saveGuard (as so) inside the ExecutionException handler block.

Finally, it is not necessary to "handle" the CancellationException as I previously stated.

rishiraj88 · 2023-09-27T05:44:18Z

Very impressive issue. Thanks.

Keep Thread.sleep(..) to avoid the busy (hot) loop when the Future executions are not yet complete. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518

jxblum · 2023-09-27T06:58:11Z

Opened PR #2719, which is pending review and merge. Buildable changes captured in issue branch issue/2518.

Keep Thread.sleep(..) to avoid the busy (hot) loop when the Future executions are not yet complete. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Reduce Thread.sleep(1L) to 1 millisecond. Closes spring-projects#2518

jxblum · 2023-09-27T07:16:37Z

Technically, after further review and careful consideration, I should clarify that the Thread.sleep(..) is possibly useful when none of the Future results are complete yet.

Depending on the Future implementation, without a pause, the while loop (here) inside ClusterCommandExecutor.collectResults(..) could run hot and wind up being a "busy" loop.

The new implementation is here, along with this.

I also reduced the Thread.sleep(..) to 1 millisecond.

NOTE: I could have chosen to use a TimeUnit.timedWait(..) (Javadoc), thereby reducing the latency even further to MICROSECONDS (e.g. TimeUnit.MICROSECONDS.timedWait(obj, 10)) or even NANOSECONDS, but that requires a synchronized block on an Object monitor (or an object's intrinsic lock). This would limit scalability in the context of Virtual Threads.

jxblum · 2023-09-27T07:57:26Z

On second thought, recalling earlier thoughts I had, I do think there is a way to remove the Thread.sleep(..) call completely.

I switched PR #2719 to back DRAFT mode for the time being.

Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) call for 1 millisecond. As such, the Future.isDone() (and Future.isCancelled()) calls are no longer necessary. Simply try to get the results, and if a TimeoutException is thrown, then simply set done to false. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518

Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the compuptation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/requst routing etc. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time while waiting for other nodes to complete. 10 microseconds was partially arbitrary, but not anymore so than Thread.sleep(10L) (10 millisconds). The objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is "done", hence Future.get(timeout, TimeUnit.MICROSECONDS), or sub-millisecond response times. This may need to be futher tuned over time, but should serve as a reasonable baseline for the time being. This was also based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, "recommending" a cluster size of no more than 1000 nodes. One optimimzation could be to reorder the Map of Futures at the end of each iteration to organize Futures that are done. Furthermore, Futures that have already completed could be removed from the Map. Of course, there is little harm in keep the future in the Map witht the "safeguard" in place. This was not included in the changes simply because the optimization is most likely neglible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518

Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the compuptation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/requst routing etc. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time while waiting for other nodes to complete. 10 microseconds was partially arbitrary, but not anymore so than Thread.sleep(10L) (10 millisconds). The objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is "done", hence Future.get(timeout, TimeUnit.MICROSECONDS), or sub-millisecond response times. This may need to be futher tuned over time, but should serve as a reasonable baseline for the time being. This was also based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, "recommending" a cluster size of no more than 1000 nodes. One optimimzation might be to reorder the Map of Futures at the end of each iteration organizing Futures that are done first. Furthermore, Futures that have already completed could be removed from the Map. Of course, there is little harm in keeping the future in the Map with the "safeguard" in place. This optimization was not included in the changes simply because the optimization is most likely neglible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory consumption should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518

jxblum · 2023-09-27T18:25:16Z

Alright, PR #2719 re-opened (no longer in draft mode).

I also modified the PR to now remove Thread.sleep(..) completely! You can read the commit message for the reasoning behind the current thinking and set of changes.

Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the compuptation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/requst routing etc. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time while waiting for other nodes to complete. 10 microseconds was partially arbitrary, but not anymore so than Thread.sleep(10L) (10 millisconds). The objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is "done", hence Future.get(timeout, TimeUnit.MICROSECONDS), or sub-millisecond response times. This may need to be futher tuned over time, but should serve as a reasonable baseline for the time being. This was also based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, "recommending" a cluster size of no more than 1000 nodes. One optimimzation might be to reorder the Map of Futures at the end of each iteration organizing Futures that are done first. Furthermore, Futures that have already completed could be removed from the Map. Of course, there is little harm in keeping the future in the Map with the "safeguard" in place. This optimization was not included in the changes simply because the optimization is most likely neglible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory consumption should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518

Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the computation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/request routing, and so on. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time when waiting on a single node to complete. 10 microseconds was partially arbitrary, but no more so than Thread.sleep(10L) (10 milliseconds). The main objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is done, hence Future.get(timeout, TimeUnit.MICROSECONDS) for sub-millisecond response times. This may need to be further tuned over time, but should serve as a reasonable baseline for the time being. Additionally, this was based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, recommending a cluster size of no more than 1000 nodes. One optimization might be to reorder the Map of Futures at the end of each iteration by organizing Futures that are done first. Furthermore, Futures that have already completed could even be removed from the Map. Of course, there is little harm in keeping the completed Futures in the Map with the safeguard in place. This optimization was not included in theses changes simply because the optimization is most likely negligible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory consumption should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518

Simplify tests. Reuse existing interfaces from Spring. Remove inappropriate nullability annotations and introduce annotations where required. Replace Future mocking with easier to maintain and to read future method overrides. Remove superfluous code and replace with infrastructure classes provided by Spring Framework. Consistently name callbacks. Make exception collector concept explicit. Reformat code. See #2518 Original pull request: #2719

See spring-projects#2518

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Mar 2, 2023

jxblum self-assigned this Mar 4, 2023

jxblum added the in: core Issues in core support label Mar 8, 2023

jxblum added status: on-hold We cannot start working on this issue yet and removed status: waiting-for-triage An issue we've not yet triaged labels Aug 8, 2023

jxblum mentioned this issue Sep 27, 2023

Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) in ClusterCommandExecutor.collectResults(..) #2719

Closed

mp911de closed this as completed in 781fda6 Oct 11, 2023

mp911de added type: task A general task and removed status: on-hold We cannot start working on this issue yet labels Oct 11, 2023

mp911de modified the milestone: 3.1.5 (2023.0.5) Oct 11, 2023

mp911de added this to the 3.2 RC1 (2023.1.0) milestone Oct 11, 2023

jxblum added a commit to jxblum/spring-data-redis that referenced this issue Oct 12, 2023

Apply additional polishing to ClusterCommandExecutor.

9bd8fda

See spring-projects#2518

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove a unnecessary Thread.sleep() from ClusterCommandExecutor.collectResults() #2518

Remove a unnecessary Thread.sleep() from ClusterCommandExecutor.collectResults() #2518

dongmyo commented Mar 2, 2023

jxblum commented Mar 8, 2023 •

edited

Loading

dongmyo commented Mar 9, 2023

jxblum commented Sep 27, 2023 •

edited

Loading

rishiraj88 commented Sep 27, 2023

jxblum commented Sep 27, 2023 •

edited

Loading

jxblum commented Sep 27, 2023 •

edited

Loading

jxblum commented Sep 27, 2023 •

edited

Loading

jxblum commented Sep 27, 2023 •

edited

Loading

Remove a unnecessary Thread.sleep() from ClusterCommandExecutor.collectResults() #2518

Remove a unnecessary Thread.sleep() from ClusterCommandExecutor.collectResults() #2518

Comments

dongmyo commented Mar 2, 2023

jxblum commented Mar 8, 2023 • edited Loading

dongmyo commented Mar 9, 2023

jxblum commented Sep 27, 2023 • edited Loading

rishiraj88 commented Sep 27, 2023

jxblum commented Sep 27, 2023 • edited Loading

jxblum commented Sep 27, 2023 • edited Loading

jxblum commented Sep 27, 2023 • edited Loading

jxblum commented Sep 27, 2023 • edited Loading

jxblum commented Mar 8, 2023 •

edited

Loading

jxblum commented Sep 27, 2023 •

edited

Loading

jxblum commented Sep 27, 2023 •

edited

Loading

jxblum commented Sep 27, 2023 •

edited

Loading

jxblum commented Sep 27, 2023 •

edited

Loading

jxblum commented Sep 27, 2023 •

edited

Loading