-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove a unnecessary Thread.sleep() from ClusterCommandExecutor.collectResults() #2518
Comments
Hi @dongmyo - I appreciate you raising this issue; nice catch! I also think you are generally correct, here. However, I have been giving this some serious thought, off-and-on, for the past few days, while working through other SD Redis Issues in our backlog. In summary, I am thinking this I think the code will need to be restructured, which should help simplify the code a bit since it is rather complex to reason about anyway, IMO. It also does not appropriately handle the In general, arbitrarily calling
My plan of attack is to write some tests for this area of code first so that I can then begin to make (structural) changes that ultimately get rid of the I will post back when I have more information to share. |
@jxblum thank you for your response. |
I am finally, now circling back to this. Knowing significantly more about Spring Data Redis and just Redis in general now, my thinking is this "arbitrary" The Future.get() calls (here and here) "waits until the computation completes" anyway. So, I think it is also unnecessary to use Additionally, after looking through this code, I also think there is a bug! If execution fails for any node when evaluating the result, then there is no point to re-evaluate the node on a subsequent iteration (in the Finally, it is not necessary to "handle" the |
Very impressive issue. Thanks. |
Keep Thread.sleep(..) to avoid the busy (hot) loop when the Future executions are not yet complete. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518
Opened PR #2719, which is pending review and merge. Buildable changes captured in issue branch issue/2518. |
Keep Thread.sleep(..) to avoid the busy (hot) loop when the Future executions are not yet complete. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Reduce Thread.sleep(1L) to 1 millisecond. Closes spring-projects#2518
Technically, after further review and careful consideration, I should clarify that the Depending on the The new implementation is here, along with this. I also reduced the
|
On second thought, recalling earlier thoughts I had, I do think there is a way to remove the I switched PR #2719 to back DRAFT mode for the time being. |
Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) call for 1 millisecond. As such, the Future.isDone() (and Future.isCancelled()) calls are no longer necessary. Simply try to get the results, and if a TimeoutException is thrown, then simply set done to false. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518
Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the compuptation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/requst routing etc. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time while waiting for other nodes to complete. 10 microseconds was partially arbitrary, but not anymore so than Thread.sleep(10L) (10 millisconds). The objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is "done", hence Future.get(timeout, TimeUnit.MICROSECONDS), or sub-millisecond response times. This may need to be futher tuned over time, but should serve as a reasonable baseline for the time being. This was also based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, "recommending" a cluster size of no more than 1000 nodes. One optimimzation could be to reorder the Map of Futures at the end of each iteration to organize Futures that are done. Furthermore, Futures that have already completed could be removed from the Map. Of course, there is little harm in keep the future in the Map witht the "safeguard" in place. This was not included in the changes simply because the optimization is most likely neglible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518
Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the compuptation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/requst routing etc. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time while waiting for other nodes to complete. 10 microseconds was partially arbitrary, but not anymore so than Thread.sleep(10L) (10 millisconds). The objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is "done", hence Future.get(timeout, TimeUnit.MICROSECONDS), or sub-millisecond response times. This may need to be futher tuned over time, but should serve as a reasonable baseline for the time being. This was also based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, "recommending" a cluster size of no more than 1000 nodes. One optimimzation might be to reorder the Map of Futures at the end of each iteration organizing Futures that are done first. Furthermore, Futures that have already completed could be removed from the Map. Of course, there is little harm in keeping the future in the Map with the "safeguard" in place. This optimization was not included in the changes simply because the optimization is most likely neglible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory consumption should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518
Alright, PR #2719 re-opened (no longer in draft mode). I also modified the PR to now remove |
Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the compuptation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/requst routing etc. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time while waiting for other nodes to complete. 10 microseconds was partially arbitrary, but not anymore so than Thread.sleep(10L) (10 millisconds). The objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is "done", hence Future.get(timeout, TimeUnit.MICROSECONDS), or sub-millisecond response times. This may need to be futher tuned over time, but should serve as a reasonable baseline for the time being. This was also based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, "recommending" a cluster size of no more than 1000 nodes. One optimimzation might be to reorder the Map of Futures at the end of each iteration organizing Futures that are done first. Furthermore, Futures that have already completed could be removed from the Map. Of course, there is little harm in keeping the future in the Map with the "safeguard" in place. This optimization was not included in the changes simply because the optimization is most likely neglible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory consumption should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518
Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the computation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/request routing, and so on. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time when waiting on a single node to complete. 10 microseconds was partially arbitrary, but no more so than Thread.sleep(10L) (10 milliseconds). The main objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is done, hence Future.get(timeout, TimeUnit.MICROSECONDS) for sub-millisecond response times. This may need to be further tuned over time, but should serve as a reasonable baseline for the time being. Additionally, this was based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, recommending a cluster size of no more than 1000 nodes. One optimization might be to reorder the Map of Futures at the end of each iteration by organizing Futures that are done first. Furthermore, Futures that have already completed could even be removed from the Map. Of course, there is little harm in keeping the completed Futures in the Map with the safeguard in place. This optimization was not included in theses changes simply because the optimization is most likely negligible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory consumption should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518
Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the computation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/request routing, and so on. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time when waiting on a single node to complete. 10 microseconds was partially arbitrary, but no more so than Thread.sleep(10L) (10 milliseconds). The main objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is done, hence Future.get(timeout, TimeUnit.MICROSECONDS) for sub-millisecond response times. This may need to be further tuned over time, but should serve as a reasonable baseline for the time being. Additionally, this was based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, recommending a cluster size of no more than 1000 nodes. One optimization might be to reorder the Map of Futures at the end of each iteration by organizing Futures that are done first. Furthermore, Futures that have already completed could even be removed from the Map. Of course, there is little harm in keeping the completed Futures in the Map with the safeguard in place. This optimization was not included in theses changes simply because the optimization is most likely negligible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory consumption should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518
Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the computation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/request routing, and so on. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time when waiting on a single node to complete. 10 microseconds was partially arbitrary, but no more so than Thread.sleep(10L) (10 milliseconds). The main objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is done, hence Future.get(timeout, TimeUnit.MICROSECONDS) for sub-millisecond response times. This may need to be further tuned over time, but should serve as a reasonable baseline for the time being. Additionally, this was based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, recommending a cluster size of no more than 1000 nodes. One optimization might be to reorder the Map of Futures at the end of each iteration by organizing Futures that are done first. Furthermore, Futures that have already completed could even be removed from the Map. Of course, there is little harm in keeping the completed Futures in the Map with the safeguard in place. This optimization was not included in theses changes simply because the optimization is most likely negligible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory consumption should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518
Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the computation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/request routing, and so on. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time when waiting on a single node to complete. 10 microseconds was partially arbitrary, but no more so than Thread.sleep(10L) (10 milliseconds). The main objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is done, hence Future.get(timeout, TimeUnit.MICROSECONDS) for sub-millisecond response times. This may need to be further tuned over time, but should serve as a reasonable baseline for the time being. Additionally, this was based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, recommending a cluster size of no more than 1000 nodes. One optimization might be to reorder the Map of Futures at the end of each iteration by organizing Futures that are done first. Furthermore, Futures that have already completed could even be removed from the Map. Of course, there is little harm in keeping the completed Futures in the Map with the safeguard in place. This optimization was not included in theses changes simply because the optimization is most likely negligible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory consumption should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes spring-projects#2518
Simplify tests. Reuse existing interfaces from Spring. Remove inappropriate nullability annotations and introduce annotations where required. Replace Future mocking with easier to maintain and to read future method overrides. Remove superfluous code and replace with infrastructure classes provided by Spring Framework. Consistently name callbacks. Make exception collector concept explicit. Reformat code. See #2518 Original pull request: #2719
https://github.com/spring-projects/spring-data-redis/blob/main/src/main/java/org/springframework/data/redis/connection/ClusterCommandExecutor.java#L224
when for loop ends with all futures done or cancelled in
ClusterCommandExecutor.collectResults()
.https://github.com/spring-projects/spring-data-redis/blob/main/src/main/java/org/springframework/data/redis/connection/ClusterCommandExecutor.java#L258
after that, there is no need for thread to be slept, but
Thread.sleep()
is called unnecessarily.it could be better to check if the
done
is false.The text was updated successfully, but these errors were encountered: