Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) in ClusterCommandExecutor.collectResults(..) #2719

Closed
wants to merge 3 commits into from

Conversation

jxblum
Copy link
Contributor

@jxblum jxblum commented Sep 27, 2023

Resolves #2518

OTHER NOTABLE CHANGES

Additionally, this PR fixes a couple bugs in collectResults(..), which were called out and captured in 2 test cases.

Previously, ClusterCommandExecutor.collectResults(..) had no test coverage.

@jxblum jxblum changed the title Conditionally wraps Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..) Conditionally wrap Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..) Sep 27, 2023
@jxblum jxblum added type: bug A general bug in: core Issues in core support type: enhancement A general enhancement labels Sep 27, 2023
@jxblum jxblum changed the title Conditionally wrap Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..) FIRST DRAFT: Conditionally wrap Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..) Sep 27, 2023
@jxblum jxblum marked this pull request as draft September 27, 2023 07:54
@jxblum jxblum changed the title FIRST DRAFT: Conditionally wrap Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..) Conditionally wrap Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..) Sep 27, 2023
@jxblum jxblum changed the title Conditionally wrap Thread.sleep(..) call in ClusterCommandExecutor.collectResults(..) Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) in ClusterCommandExecutor.collectResults(..) Sep 27, 2023
@jxblum jxblum marked this pull request as ready for review September 27, 2023 18:13
Copy link
Member

@mp911de mp911de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request is a no-go and impossible to review. There are changes all over the place without the possibility to distinguish relevant changes from polishing.

It is fine to address bugs while working on code, but these musts be at least encapsulated in its own commit.

Also, a lot of these changes create the impression that they are done just because of a different code style and not an improvement over the existing code.

}
}
return activeMasterNodes;
return getNodes().stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stream usage introduces a huge performance penalty due to its excessive object allocation. We avoid stream usage to not accidentally introduce a performance regression unless code paths are guaranteed to run once or during the initialization phase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted, completely.

@mp911de mp911de added for: team-attention An issue we need to discuss as a team to make progress status: blocked An issue that's blocked on an external project change labels Sep 28, 2023
@jxblum
Copy link
Contributor Author

jxblum commented Sep 29, 2023

I am not even sure how to respond to this feedback other than not all PRs can be cleanly reviewed through the GitHub lens, and sometimes should be viewed as the final product, compared to the original, in your IDE.

The underlying problem from Issue #2518 boils down to this: before and after Along the way, I refactored other, redundant but related parts of the codebase to improve readability and maintainability.

You cannot properly address the unnecessary and inappropriate Thread.sleep(..) invocation without structural and logical changes to the code. This led me to writing tests in the first place since there were no individual unit tests for the ClusterCommandExecutor.collectResults(..) method. Through analyzing and testing the code, I also discovered other bugs that I provided test cases for. While edge cases, they were bugs none-the-less.

These changes have very little to do with style vs. correctness, readability and overall long-term maintenance. I feel it is important to address issues the moment we see them and not put them off for later, or be so fine-grained it becomes an impediment to productivity, or even contributions.

In summary, and in total, there were not a lot of wide sweeping changes in this PR, especially when you consider the final product. The goal is always to leave it better than I found it. You can tell me if I am wrong, but I believe I accomplished that.

@jxblum jxblum force-pushed the issue/2518 branch 3 times, most recently from 066903c to c1890f5 Compare September 29, 2023 22:11
Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false.

10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the computation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/request routing, and so on. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time when waiting on a single node to complete.

10 microseconds was partially arbitrary, but no more so than Thread.sleep(10L) (10 milliseconds). The main objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is done, hence Future.get(timeout, TimeUnit.MICROSECONDS) for sub-millisecond response times. This may need to be further tuned over time, but should serve as a reasonable baseline for the time being. Additionally, this was based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, recommending a cluster size of no more than 1000 nodes.

One optimization might be to reorder the Map of Futures at the end of each iteration by organizing Futures that are done first. Furthermore, Futures that have already completed could even be removed from the Map. Of course, there is little harm in keeping the completed Futures in the Map with the safeguard in place. This optimization was not included in theses changes simply because the optimization is most likely negligible and should be measured. Reconstructing a TreeMap should run mostly within log(n) time, but memory consumption should also be taken into consideration.

Add test coverage for ClusterCommandExecutor collectResults(..) method.

Cleanup compiler warnings in ClusterCommandExecutorUnitTests.

Closes #2518
@jxblum
Copy link
Contributor Author

jxblum commented Sep 29, 2023

I reworked this PR.

  1. First, I rebased this PR on the latest changes/checkins on main.
  2. Second, I completely removed all modifications to ClusterTopology (e.g. removing use of Streams among other things).
  3. Finally, I reorganized the ClusterCommandExecutor logic around collectResults(..) even further along with a few other code improvements.

This brings the change set to: ClusterCommandExecutor, ClusterCommandExecutorUnitTests and MockitoUtils. MockitoUtils is purely a test utility to facilitate testing with mocks.

Simplify tests. Reuse existing interfaces from Spring. Remove inappropriate nullability annotations and introduce annotations where required.
Consistently name callbacks. Make exception collector concept explicit. Reformat code.
mp911de pushed a commit that referenced this pull request Oct 11, 2023
……)`.

Replace Thread.sleep(..) with Future.get(timeout, :TimeUnit) for 10 microseconds. As a result, Future.isDone() and Future.isCancelled() are no longer necessary. Simply try to get the results within 10 us, and if a TimeoutException is thrown, then set done to false.

10 microseconds is 1/1000 of 10 milliseconds. This means a Redis cluster with 1000 nodes will run in a similar time to Thread.sleep(10L) if all Futures are blocked waiting for the computation to complete and take an equal amount of time to compute the result, which is rarely the case in practice, given different hardware configurations, data access patterns, load balancing/request routing, and so on. However, using Future.get(timeout, :TimeUnit) is more fair than Future.get(), which blocks until a result is returned or an ExecutionException is thrown, thereby starving computationally faster nodes vs. other nodes in the cluster that might be overloaded. In the meantime, some nodes may even complete in the short amount of time when waiting on a single node to complete.

10 microseconds was partially arbitrary, but no more so than Thread.sleep(10L) (10 milliseconds). The main objective was to give each node a chance to complete the computation in a moments notice balanced with the need to quickly check if the computation is done, hence Future.get(timeout, TimeUnit.MICROSECONDS) for sub-millisecond response times. This may need to be further tuned over time, but should serve as a reasonable baseline for the time being. Additionally, this was based on https://redis.io/docs/reference/cluster-spec/#overview-of-redis-cluster-main-components in the Redis documentation, recommending a cluster size of no more than 1000 nodes.

Add test coverage for ClusterCommandExecutor collectResults(..) method.

Cleanup compiler warnings in ClusterCommandExecutorUnitTests.

Closes #2518
Original pull request: #2719
mp911de added a commit that referenced this pull request Oct 11, 2023
Simplify tests. Reuse existing interfaces from Spring. Remove inappropriate nullability annotations and introduce annotations where required.

Replace Future mocking with easier to maintain and to read future method overrides. Remove superfluous code and replace with infrastructure classes provided by Spring Framework.

Consistently name callbacks. Make exception collector concept explicit. Reformat code.

See #2518
Original pull request: #2719
@mp911de mp911de added type: task A general task and removed type: bug A general bug status: blocked An issue that's blocked on an external project change type: enhancement A general enhancement for: team-attention An issue we need to discuss as a team to make progress labels Oct 11, 2023
@mp911de mp911de added this to the 3.2 RC1 (2023.1.0) milestone Oct 11, 2023
@mp911de mp911de closed this Oct 11, 2023
@mp911de mp911de deleted the issue/2518 branch October 11, 2023 12:56
@mp911de
Copy link
Member

mp911de commented Oct 11, 2023

That's merged and polished now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: core Issues in core support type: task A general task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove a unnecessary Thread.sleep() from ClusterCommandExecutor.collectResults()
2 participants