Java: have a future cache #43

dkropachev · 2024-11-14T19:06:36Z

We recreate future on every request, there is no reason to do so.

Bouncheck

LGTM as long as URI's equals satisfies the use case and if there is no problem with threads sharing the futures if they do.

dkropachev · 2024-11-15T15:33:05Z

LGTM as long as URI's equals satisfies the use case and if there is no problem with threads sharing the futures if they do.

It satisfies, and completed future could be easily shared and reused.

nyh · 2024-11-17T10:21:50Z

I don't understand this patch at all :-( Not it's motivation - is creating a future slow? - nor the implementation... Is it really ok in Java to return the same future twice? In Seastar, for example, it is a faux pas.

If you didn't like the idiom used in the existing code,

CompletableFuture<Endpoint> f = new CompletableFuture<Endpoint>();
f.complete(ret);
return f;

how about using

return CompletableFuture<Endpoint>.completedFuture(ret);

This is a built-in idiom to return a ready value that is presumably implemented in the most efficient way possible - why do we need something more efficient that has dubious correctness?

dkropachev · 2024-11-17T13:21:54Z

I don't understand this patch at all :-( Not it's motivation - is creating a future slow?

It depends, creating future allocates memory, if code works in normal circumstances it does not do much, but if it is CPU or memory bound, it causes performance degradation, if we can, we need to avoid that.
resolveEndpoint is called on every request, so it is definitely worth that.

nor the implementation... Is it really ok in Java to return the same future twice? In Seastar, for example, it is a faux pas.

It is ok, in java future is just a class instance, final state of which to hold an instance of underlying type or exception.
It does not matter how many times you wait it, or obtain it's result.

If you didn't like the idiom used in the existing code,
CompletableFuture<Endpoint> f = new CompletableFuture<Endpoint>();
f.complete(ret);
return f;
how about using
return CompletableFuture<Endpoint>.completedFuture(ret);
This is a built-in idiom to return a ready value that is presumably implemented in the most efficient way possible - why do we need something more efficient that has dubious correctness?

It is still allocates and deallocates memory.

nyh · 2024-11-17T14:03:16Z

I don't understand this patch at all :-( Not it's motivation - is creating a future slow?

It depends, creating future allocates memory, if code works in normal circumstances it does not do much, but if it is CPU or memory bound, it causes performance degradation, if we can, we need to avoid that. resolveEndpoint is called on every request, so it is definitely worth that.

You may be more up-to-date than me, as I said it's been 15 years since my last "serious" work in Java, but did work in Java heavily for 5 years before that so I have some experience that seems to contradict what you're saying and I wonder what I am missing. My experience is that much of Java code keeps creating objects that are just returned once and then abandoned. There is no "deallocation cost" - the "generational" GC is written with the knowledge that in Java most objects in the will be abandoned and not survive their first GC, and makes this use case very efficient (and in particular - NOT proportional to the amound of allocations done in the young generation).

You are suggesting that return CompletableFuture<Endpoint>.completedFuture(ret); is wasteful. If so, why does it exist? What doesn't its documentation suggest it is inefficient?

Moreover, your alternative to completedFuture() isn't free either - in the best case it involves a hash table lookup. Are you sure that the memory allocation done by completeFuture() (not much more than a pointer increment!) is more expensive than this hash lookup? Is it so much more efficient that it's worth making the code more complex?

* nor the implementation... Is it really ok in Java to return the same future twice? In Seastar, for example, it is a faux pas.
It is ok, in java future is just a class instance, final state of which to hold an instance of underlying type or exception. It does not matter how many times you wait it, or obtain it's result.

Is this documented anywhere? It's not at all obvious to me whether this approach is supposed to work or not - in Seastar it wouldn't have worked. I found a package https://github.com/jmnarloch/completable-future-cache that also uses this idea of returning a multiple reference to the same CompletableFuture object ("all of them will gain access to the same instance of CompletableFuture that they can observe for completion") so I guess you are correct, and it's not a wrong approach.

I still wonder if it's worth it. Could you do me a favor and try some sort of loop or something to measure if completedFuture() is really as slow as you assume it to be?

how about using
return CompletableFuture<Endpoint>.completedFuture(ret);
This is a built-in idiom to return a ready value that is presumably implemented in the most efficient way possible - why do we need something more efficient that has dubious correctness?
It is still allocates and deallocates memory.

Java has no "deallocation" of memory. It uses a generational GC, where allocation is O(1) but very efficient (basically, just increment a pointer) and deallocation is, believe it or not, less than O(1) - i.e., deallocation is (more-or-less) free, it is the memory that is not deallocated - i.e., lives for a significant amount of time - which takes cycles.

So I doubt the completedFuture() function is inefficient as you think it is. It might even be a tiny bit faster than the hash-table lookup you replaced it with... And certainly easier to understand.

dkropachev · 2024-11-17T15:02:35Z

I don't understand this patch at all :-( Not it's motivation - is creating a future slow?

It depends, creating future allocates memory, if code works in normal circumstances it does not do much, but if it is CPU or memory bound, it causes performance degradation, if we can, we need to avoid that. resolveEndpoint is called on every request, so it is definitely worth that.

You may be more up-to-date than me, as I said it's been 15 years since my last "serious" work in Java, but did work in Java heavily for 5 years before that so I have some experience that seems to contradict what you're saying and I wonder what I am missing. My experience is that much of Java code keeps creating objects that are just returned once and then abandoned. There is no "deallocation cost" - the "generational" GC is written with the knowledge that in Java most objects in the will be abandoned and not survive their first GC, and makes this use case very efficient (and in particular - NOT proportional to the amound of allocations done in the young generation).

You are suggesting that return CompletableFuture<Endpoint>.completedFuture(ret); is wasteful. If so, why does it exist? What doesn't its documentation suggest it is inefficient?

Moreover, your alternative to completedFuture() isn't free either - in the best case it involves a hash table lookup. Are you sure that the memory allocation done by completeFuture() (not much more than a pointer increment!) is more expensive than this hash lookup? Is it so much more efficient that it's worth making the code more complex?
* nor the implementation... Is it really ok in Java to return the same future twice? In Seastar, for example, it is a faux pas.
It is ok, in java future is just a class instance, final state of which to hold an instance of underlying type or exception. It does not matter how many times you wait it, or obtain it's result.
Is this documented anywhere? It's not at all obvious to me whether this approach is supposed to work or not - in Seastar it wouldn't have worked. I found a package https://github.com/jmnarloch/completable-future-cache that also uses this idea of returning a multiple reference to the same CompletableFuture object ("all of them will gain access to the same instance of CompletableFuture that they can observe for completion") so I guess you are correct, and it's not a wrong approach.

I still wonder if it's worth it. Could you do me a favor and try some sort of loop or something to measure if completedFuture() is really as slow as you assume it to be?
how about using
return CompletableFuture<Endpoint>.completedFuture(ret);
This is a built-in idiom to return a ready value that is presumably implemented in the most efficient way possible - why do we need something more efficient that has dubious correctness?
It is still allocates and deallocates memory.
Java has no "deallocation" of memory. It uses a generational GC, where allocation is O(1) but very efficient (basically, just increment a pointer) and deallocation is, believe it or not, less than O(1) - i.e., deallocation is (more-or-less) free, it is the memory that is not deallocated - i.e., lives for a significant amount of time - which takes cycles.

So I doubt the completedFuture() function is inefficient as you think it is. It might even be a tiny bit faster than the hash-table lookup you replaced it with... And certainly easier to understand.

It is not about deallocation per say, it is more about sheer amount of objects that GC needs to track, less objects to track, less problem GC has.
Let's say that you have 6 nodes, and send 1000 rq/s.
Old code will allocate 1000 futures plus 1000 endpoints a second and they are going to finalized at young or at s0.
New code will allocate 6 futures and 6 endpoints once, all of them end up in old gen where GC spends minimal possible resources on reclamation.
Even if you don't take GC part into account new code looks good and if you consider that in order to pass an object from one genertion to another it spends CPU cycles for every object in the original generation, it looks even better.

dkropachev requested review from nyh and Bouncheck November 14, 2024 19:06

dkropachev force-pushed the dk/java-have-resolve-future-cache branch from b23f592 to 9e911e0 Compare November 14, 2024 19:18

Java: have a future cache

7f1adff

dkropachev force-pushed the dk/java-have-resolve-future-cache branch from 9e911e0 to 7f1adff Compare November 14, 2024 20:34

Bouncheck approved these changes Nov 15, 2024

View reviewed changes

dkropachev merged commit c6f2c02 into scylladb:master Nov 15, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java: have a future cache #43

Java: have a future cache #43

dkropachev commented Nov 14, 2024

Bouncheck left a comment

dkropachev commented Nov 15, 2024

nyh commented Nov 17, 2024 •

edited

Loading

dkropachev commented Nov 17, 2024

nyh commented Nov 17, 2024

dkropachev commented Nov 17, 2024

Java: have a future cache #43

Java: have a future cache #43

Conversation

dkropachev commented Nov 14, 2024

Bouncheck left a comment

Choose a reason for hiding this comment

dkropachev commented Nov 15, 2024

nyh commented Nov 17, 2024 • edited Loading

dkropachev commented Nov 17, 2024

nyh commented Nov 17, 2024

dkropachev commented Nov 17, 2024

nyh commented Nov 17, 2024 •

edited

Loading