Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Remove Thread.sleep(..) from collectResults(..).
Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the compuptation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/requst routing etc. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time while waiting for other nodes to complete. 10 microseconds was partially arbitrary, but not anymore so than Thread.sleep(10L) (10 millisconds). The objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is "done", hence Future.get(timeout, TimeUnit.MICROSECONDS), or sub-millisecond response times. This may need to be futher tuned over time, but should serve as a reasonable baseline for the time being. This was also based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, "recommending" a cluster size of no more than 1000 nodes. One optimimzation could be to reorder the Map of Futures at the end of each iteration to organize Futures that are done. Furthermore, Futures that have already completed could be removed from the Map. Of course, there is little harm in keep the future in the Map witht the "safeguard" in place. This was not included in the changes simply because the optimization is most likely neglible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes #2518
- Loading branch information