Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java: have a future cache #43

Merged

Conversation

dkropachev
Copy link
Collaborator

We recreate future on every request, there is no reason to do so.

@dkropachev dkropachev requested review from nyh and Bouncheck November 14, 2024 19:06
@dkropachev dkropachev force-pushed the dk/java-have-resolve-future-cache branch from b23f592 to 9e911e0 Compare November 14, 2024 19:18
@dkropachev dkropachev force-pushed the dk/java-have-resolve-future-cache branch from 9e911e0 to 7f1adff Compare November 14, 2024 20:34
Copy link

@Bouncheck Bouncheck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as long as URI's equals satisfies the use case and if there is no problem with threads sharing the futures if they do.

@dkropachev
Copy link
Collaborator Author

LGTM as long as URI's equals satisfies the use case and if there is no problem with threads sharing the futures if they do.

It satisfies, and completed future could be easily shared and reused.

@dkropachev dkropachev merged commit c6f2c02 into scylladb:master Nov 15, 2024
1 check passed
@nyh
Copy link
Contributor

nyh commented Nov 17, 2024

I don't understand this patch at all :-( Not it's motivation - is creating a future slow? - nor the implementation... Is it really ok in Java to return the same future twice? In Seastar, for example, it is a faux pas.

If you didn't like the idiom used in the existing code,

CompletableFuture<Endpoint> f = new CompletableFuture<Endpoint>();
f.complete(ret);
return f;

how about using

return CompletableFuture<Endpoint>.completedFuture(ret);

This is a built-in idiom to return a ready value that is presumably implemented in the most efficient way possible - why do we need something more efficient that has dubious correctness?

@dkropachev
Copy link
Collaborator Author

I don't understand this patch at all :-( Not it's motivation - is creating a future slow?

It depends, creating future allocates memory, if code works in normal circumstances it does not do much, but if it is CPU or memory bound, it causes performance degradation, if we can, we need to avoid that.
resolveEndpoint is called on every request, so it is definitely worth that.

  • nor the implementation... Is it really ok in Java to return the same future twice? In Seastar, for example, it is a faux pas.

It is ok, in java future is just a class instance, final state of which to hold an instance of underlying type or exception.
It does not matter how many times you wait it, or obtain it's result.

If you didn't like the idiom used in the existing code,

CompletableFuture<Endpoint> f = new CompletableFuture<Endpoint>();
f.complete(ret);
return f;

how about using

return CompletableFuture<Endpoint>.completedFuture(ret);

This is a built-in idiom to return a ready value that is presumably implemented in the most efficient way possible - why do we need something more efficient that has dubious correctness?

It is still allocates and deallocates memory.

@nyh
Copy link
Contributor

nyh commented Nov 17, 2024

I don't understand this patch at all :-( Not it's motivation - is creating a future slow?

It depends, creating future allocates memory, if code works in normal circumstances it does not do much, but if it is CPU or memory bound, it causes performance degradation, if we can, we need to avoid that. resolveEndpoint is called on every request, so it is definitely worth that.

You may be more up-to-date than me, as I said it's been 15 years since my last "serious" work in Java, but did work in Java heavily for 5 years before that so I have some experience that seems to contradict what you're saying and I wonder what I am missing. My experience is that much of Java code keeps creating objects that are just returned once and then abandoned. There is no "deallocation cost" - the "generational" GC is written with the knowledge that in Java most objects in the will be abandoned and not survive their first GC, and makes this use case very efficient (and in particular - NOT proportional to the amound of allocations done in the young generation).

You are suggesting that return CompletableFuture<Endpoint>.completedFuture(ret); is wasteful. If so, why does it exist? What doesn't its documentation suggest it is inefficient?

Moreover, your alternative to completedFuture() isn't free either - in the best case it involves a hash table lookup. Are you sure that the memory allocation done by completeFuture() (not much more than a pointer increment!) is more expensive than this hash lookup? Is it so much more efficient that it's worth making the code more complex?

* nor the implementation... Is it really ok in Java to return the same future twice? In Seastar, for example, it is a faux pas.

It is ok, in java future is just a class instance, final state of which to hold an instance of underlying type or exception. It does not matter how many times you wait it, or obtain it's result.

Is this documented anywhere? It's not at all obvious to me whether this approach is supposed to work or not - in Seastar it wouldn't have worked. I found a package https://github.com/jmnarloch/completable-future-cache that also uses this idea of returning a multiple reference to the same CompletableFuture object ("all of them will gain access to the same instance of CompletableFuture that they can observe for completion") so I guess you are correct, and it's not a wrong approach.

I still wonder if it's worth it. Could you do me a favor and try some sort of loop or something to measure if completedFuture() is really as slow as you assume it to be?

how about using

return CompletableFuture<Endpoint>.completedFuture(ret);

This is a built-in idiom to return a ready value that is presumably implemented in the most efficient way possible - why do we need something more efficient that has dubious correctness?

It is still allocates and deallocates memory.

Java has no "deallocation" of memory. It uses a generational GC, where allocation is O(1) but very efficient (basically, just increment a pointer) and deallocation is, believe it or not, less than O(1) - i.e., deallocation is (more-or-less) free, it is the memory that is not deallocated - i.e., lives for a significant amount of time - which takes cycles.

So I doubt the completedFuture() function is inefficient as you think it is. It might even be a tiny bit faster than the hash-table lookup you replaced it with... And certainly easier to understand.

@dkropachev
Copy link
Collaborator Author

I don't understand this patch at all :-( Not it's motivation - is creating a future slow?

It depends, creating future allocates memory, if code works in normal circumstances it does not do much, but if it is CPU or memory bound, it causes performance degradation, if we can, we need to avoid that. resolveEndpoint is called on every request, so it is definitely worth that.

You may be more up-to-date than me, as I said it's been 15 years since my last "serious" work in Java, but did work in Java heavily for 5 years before that so I have some experience that seems to contradict what you're saying and I wonder what I am missing. My experience is that much of Java code keeps creating objects that are just returned once and then abandoned. There is no "deallocation cost" - the "generational" GC is written with the knowledge that in Java most objects in the will be abandoned and not survive their first GC, and makes this use case very efficient (and in particular - NOT proportional to the amound of allocations done in the young generation).

You are suggesting that return CompletableFuture<Endpoint>.completedFuture(ret); is wasteful. If so, why does it exist? What doesn't its documentation suggest it is inefficient?

Moreover, your alternative to completedFuture() isn't free either - in the best case it involves a hash table lookup. Are you sure that the memory allocation done by completeFuture() (not much more than a pointer increment!) is more expensive than this hash lookup? Is it so much more efficient that it's worth making the code more complex?

* nor the implementation... Is it really ok in Java to return the same future twice? In Seastar, for example, it is a faux pas.

It is ok, in java future is just a class instance, final state of which to hold an instance of underlying type or exception. It does not matter how many times you wait it, or obtain it's result.

Is this documented anywhere? It's not at all obvious to me whether this approach is supposed to work or not - in Seastar it wouldn't have worked. I found a package https://github.com/jmnarloch/completable-future-cache that also uses this idea of returning a multiple reference to the same CompletableFuture object ("all of them will gain access to the same instance of CompletableFuture that they can observe for completion") so I guess you are correct, and it's not a wrong approach.

I still wonder if it's worth it. Could you do me a favor and try some sort of loop or something to measure if completedFuture() is really as slow as you assume it to be?

how about using

return CompletableFuture<Endpoint>.completedFuture(ret);

This is a built-in idiom to return a ready value that is presumably implemented in the most efficient way possible - why do we need something more efficient that has dubious correctness?

It is still allocates and deallocates memory.

Java has no "deallocation" of memory. It uses a generational GC, where allocation is O(1) but very efficient (basically, just increment a pointer) and deallocation is, believe it or not, less than O(1) - i.e., deallocation is (more-or-less) free, it is the memory that is not deallocated - i.e., lives for a significant amount of time - which takes cycles.

So I doubt the completedFuture() function is inefficient as you think it is. It might even be a tiny bit faster than the hash-table lookup you replaced it with... And certainly easier to understand.

It is not about deallocation per say, it is more about sheer amount of objects that GC needs to track, less objects to track, less problem GC has.
Let's say that you have 6 nodes, and send 1000 rq/s.
Old code will allocate 1000 futures plus 1000 endpoints a second and they are going to finalized at young or at s0.
New code will allocate 6 futures and 6 endpoints once, all of them end up in old gen where GC spends minimal possible resources on reclamation.
Even if you don't take GC part into account new code looks good and if you consider that in order to pass an object from one genertion to another it spends CPU cycles for every object in the original generation, it looks even better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants