Skip to content

Commit

Permalink
Remove Thread.sleep(..) from collectResults(..).
Browse files Browse the repository at this point in the history
Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the compuptation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/requst routing etc. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time while waiting for other nodes to complete.

10 microseconds was partially arbitrary, but not anymore so than Thread.sleep(10L) (10 millisconds). The objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is "done", hence Future.get(timeout, TimeUnit.MICROSECONDS), or sub-millisecond response times. This may need to be futher tuned over time, but should serve as a reasonable baseline for the time being. This was also based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, "recommending" a cluster size of no more than 1000 nodes.

One optimimzation might be to reorder the Map of Futures at the end of each iteration organizing Futures that are done first. Furthermore, Futures that have already completed could be removed from the Map. Of course, there is little harm in keeping the future in the Map with the "safeguard" in place. This optimization was not included in the changes simply because the optimization is most likely neglible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory consumption should also be taken into consideration.

Add test coverage for ClusterCommandExecutor collectResults(..) method.

Cleanup compiler warnings in ClusterCommandExecutorUnitTests.

Closes #2518
  • Loading branch information
jxblum committed Sep 27, 2023
1 parent 7115f31 commit 0a5c342
Show file tree
Hide file tree
Showing 4 changed files with 765 additions and 277 deletions.
Loading

0 comments on commit 0a5c342

Please sign in to comment.