Replace `Thread.sleep(..)` with `Future.get(timeout, :TimeUnit)` in `ClusterCommandExecutor.collectResults(..)` #2719

jxblum · 2023-09-27T06:56:52Z

Resolves #2518

OTHER NOTABLE CHANGES

Additionally, this PR fixes a couple bugs in collectResults(..), which were called out and captured in 2 test cases.

Previously, ClusterCommandExecutor.collectResults(..) had no test coverage.

mp911de

This pull request is a no-go and impossible to review. There are changes all over the place without the possibility to distinguish relevant changes from polishing.

It is fine to address bugs while working on code, but these musts be at least encapsulated in its own commit.

Also, a lot of these changes create the impression that they are done just because of a different code style and not an improvement over the existing code.

mp911de · 2023-09-28T07:03:01Z

src/main/java/org/springframework/data/redis/connection/ClusterTopology.java

-			}
-		}
-		return activeMasterNodes;
+		return getNodes().stream()


Stream usage introduces a huge performance penalty due to its excessive object allocation. We avoid stream usage to not accidentally introduce a performance regression unless code paths are guaranteed to run once or during the initialization phase.

Reverted, completely.

jxblum · 2023-09-29T21:11:54Z

I am not even sure how to respond to this feedback other than not all PRs can be cleanly reviewed through the GitHub lens, and sometimes should be viewed as the final product, compared to the original, in your IDE.

The underlying problem from Issue #2518 boils down to this: before and after Along the way, I refactored other, redundant but related parts of the codebase to improve readability and maintainability.

You cannot properly address the unnecessary and inappropriate Thread.sleep(..) invocation without structural and logical changes to the code. This led me to writing tests in the first place since there were no individual unit tests for the ClusterCommandExecutor.collectResults(..) method. Through analyzing and testing the code, I also discovered other bugs that I provided test cases for. While edge cases, they were bugs none-the-less.

These changes have very little to do with style vs. correctness, readability and overall long-term maintenance. I feel it is important to address issues the moment we see them and not put them off for later, or be so fine-grained it becomes an impediment to productivity, or even contributions.

In summary, and in total, there were not a lot of wide sweeping changes in this PR, especially when you consider the final product. The goal is always to leave it better than I found it. You can tell me if I am wrong, but I believe I accomplished that.

Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the computation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/request routing, and so on. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time when waiting on a single node to complete. 10 microseconds was partially arbitrary, but no more so than Thread.sleep(10L) (10 milliseconds). The main objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is done, hence Future.get(timeout, TimeUnit.MICROSECONDS) for sub-millisecond response times. This may need to be further tuned over time, but should serve as a reasonable baseline for the time being. Additionally, this was based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, recommending a cluster size of no more than 1000 nodes. One optimization might be to reorder the Map of Futures at the end of each iteration by organizing Futures that are done first. Furthermore, Futures that have already completed could even be removed from the Map. Of course, there is little harm in keeping the completed Futures in the Map with the safeguard in place. This optimization was not included in theses changes simply because the optimization is most likely negligible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory consumption should also be taken into consideration. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes #2518

jxblum · 2023-09-29T22:17:29Z

I reworked this PR.

First, I rebased this PR on the latest changes/checkins on main.
Second, I completely removed all modifications to ClusterTopology (e.g. removing use of Streams among other things).
Finally, I reorganized the ClusterCommandExecutor logic around collectResults(..) even further along with a few other code improvements.

This brings the change set to: ClusterCommandExecutor, ClusterCommandExecutorUnitTests and MockitoUtils. MockitoUtils is purely a test utility to facilitate testing with mocks.

Simplify tests. Reuse existing interfaces from Spring. Remove inappropriate nullability annotations and introduce annotations where required. Consistently name callbacks. Make exception collector concept explicit. Reformat code.

……)`. Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false. 10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the computation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/request routing, and so on. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time when waiting on a single node to complete. 10 microseconds was partially arbitrary, but no more so than Thread.sleep(10L) (10 milliseconds). The main objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is done, hence Future.get(timeout, TimeUnit.MICROSECONDS) for sub-millisecond response times. This may need to be further tuned over time, but should serve as a reasonable baseline for the time being. Additionally, this was based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, recommending a cluster size of no more than 1000 nodes. Add test coverage for ClusterCommandExecutor collectResults(..) method. Cleanup compiler warnings in ClusterCommandExecutorUnitTests. Closes #2518 Original pull request: #2719

Simplify tests. Reuse existing interfaces from Spring. Remove inappropriate nullability annotations and introduce annotations where required. Replace Future mocking with easier to maintain and to read future method overrides. Remove superfluous code and replace with infrastructure classes provided by Spring Framework. Consistently name callbacks. Make exception collector concept explicit. Reformat code. See #2518 Original pull request: #2719

mp911de · 2023-10-11T12:56:39Z

That's merged and polished now.

jxblum changed the title ~~Conditionally wraps Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..)~~ Conditionally wrap Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..) Sep 27, 2023

jxblum mentioned this pull request Sep 27, 2023

Remove a unnecessary Thread.sleep() from ClusterCommandExecutor.collectResults() #2518

Closed

jxblum added type: bug A general bug in: core Issues in core support type: enhancement A general enhancement labels Sep 27, 2023

jxblum requested review from mp911de and christophstrobl September 27, 2023 07:01

jxblum force-pushed the issue/2518 branch from 06c0d61 to e102105 Compare September 27, 2023 07:10

jxblum removed request for mp911de and christophstrobl September 27, 2023 07:54

jxblum changed the title ~~Conditionally wrap Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..)~~ FIRST DRAFT: Conditionally wrap Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..) Sep 27, 2023

jxblum marked this pull request as draft September 27, 2023 07:54

jxblum force-pushed the issue/2518 branch from e102105 to 19e7bc8 Compare September 27, 2023 18:07

jxblum changed the title ~~FIRST DRAFT: Conditionally wrap Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..)~~ Conditionally wrap Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..) Sep 27, 2023

jxblum changed the title ~~Conditionally wrap Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..)~~ Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) in ClusterCommandExecutor.collectResults(..) Sep 27, 2023

jxblum force-pushed the issue/2518 branch from 19e7bc8 to 0a5c342 Compare September 27, 2023 18:12

jxblum marked this pull request as ready for review September 27, 2023 18:13

jxblum requested review from mp911de and christophstrobl September 27, 2023 18:21

jxblum force-pushed the issue/2518 branch from 0a5c342 to e70d4b4 Compare September 27, 2023 18:32

mp911de requested changes Sep 28, 2023

View reviewed changes

mp911de added for: team-attention An issue we need to discuss as a team to make progress status: blocked An issue that's blocked on an external project change labels Sep 28, 2023

Prepare issue branch.

f7c66a6

jxblum force-pushed the issue/2518 branch 3 times, most recently from 066903c to c1890f5 Compare September 29, 2023 22:11

jxblum force-pushed the issue/2518 branch from c1890f5 to 2783082 Compare September 29, 2023 22:16

Polishing.

0cfe8bd

Simplify tests. Reuse existing interfaces from Spring. Remove inappropriate nullability annotations and introduce annotations where required. Consistently name callbacks. Make exception collector concept explicit. Reformat code.

mp911de added type: task A general task and removed type: bug A general bug status: blocked An issue that's blocked on an external project change type: enhancement A general enhancement for: team-attention An issue we need to discuss as a team to make progress labels Oct 11, 2023

mp911de added this to the 3.2 RC1 (2023.1.0) milestone Oct 11, 2023

mp911de closed this Oct 11, 2023

mp911de deleted the issue/2518 branch October 11, 2023 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `Thread.sleep(..)` with `Future.get(timeout, :TimeUnit)` in `ClusterCommandExecutor.collectResults(..)` #2719

Replace `Thread.sleep(..)` with `Future.get(timeout, :TimeUnit)` in `ClusterCommandExecutor.collectResults(..)` #2719

jxblum commented Sep 27, 2023 •

edited

Loading

mp911de left a comment

mp911de Sep 28, 2023

jxblum Sep 29, 2023

jxblum Sep 29, 2023

jxblum commented Sep 29, 2023 •

edited

Loading

jxblum commented Sep 29, 2023

mp911de commented Oct 11, 2023

Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) in ClusterCommandExecutor.collectResults(..) #2719

Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) in ClusterCommandExecutor.collectResults(..) #2719

Conversation

jxblum commented Sep 27, 2023 • edited Loading

mp911de left a comment

Choose a reason for hiding this comment

mp911de Sep 28, 2023

Choose a reason for hiding this comment

jxblum Sep 29, 2023

Choose a reason for hiding this comment

jxblum Sep 29, 2023

Choose a reason for hiding this comment

jxblum commented Sep 29, 2023 • edited Loading

jxblum commented Sep 29, 2023

mp911de commented Oct 11, 2023

Replace `Thread.sleep(..)` with `Future.get(timeout, :TimeUnit)` in `ClusterCommandExecutor.collectResults(..)` #2719

Replace `Thread.sleep(..)` with `Future.get(timeout, :TimeUnit)` in `ClusterCommandExecutor.collectResults(..)` #2719

jxblum commented Sep 27, 2023 •

edited

Loading

jxblum commented Sep 29, 2023 •

edited

Loading