Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add jitter to max lifetime in connection pool #1496

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

priyanshu-d11
Copy link

Connection Storm Prevention with Jitter in Connection Pool #1495

Introduce connection jitter to randomize connection timing:

  1. Add a jitter parameter to PoolOptions
  2. When fetching a new connection from pool, add jitter to maxlifetime
  3. This spreads out connection creation/renewal across a time window

Fix: #1495

@priyanshu-d11 priyanshu-d11 force-pushed the feat/jitter-in-maxlifetime branch 2 times, most recently from dedff31 to 3629f5d Compare March 7, 2025 12:55
@priyanshu-d11
Copy link
Author

@vietj Changes for the jitter have been added. I have some doubts regarding the test cases. One way to verify this is by creating a pool and fetching a connection. Then, using a timer, check every few milliseconds whether the connection is still active. Once it closes, record the time. Repeat this process three times and ensure that each recorded time is different.

@priyanshu-d11 priyanshu-d11 force-pushed the feat/jitter-in-maxlifetime branch from 3629f5d to 2235e03 Compare March 11, 2025 10:08
@vietj
Copy link
Member

vietj commented Mar 12, 2025

how can we test this ?

@priyanshu-d11 priyanshu-d11 force-pushed the feat/jitter-in-maxlifetime branch from 2235e03 to 61d0455 Compare March 17, 2025 12:44
@vietj
Copy link
Member

vietj commented Mar 18, 2025

@priyanshu-d11 I think we could test with a TCP server that just does nothing and create a pool of 100 connections and perform a connect storm and measure the connection time between the client and server and ensure that we get a spread of the collected latencies ? we could add a min jitter time that could be helpful for testing, e.g. if we set 30 milliseconds then we can check that there are no latencies under 30 milliseconds

@priyanshu-d11
Copy link
Author

priyanshu-d11 commented Mar 19, 2025

I have added a test case. Here's what the test does:

  • We set up a pool with a maxlifetime of 2 seconds and a jitter of 400ms
  • Then we keep track of when connections change (by watching their PIDs)
  • We verify that the time between connection changes is somewhere between 1.6s and 2.4s (2s ± 400ms)
  • We run it twice to make sure it's consistent

Let me know if you want me to explain any part in more detail or if you have any other idea to test this scenario.

@vietj

import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.*;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove star import

@@ -105,6 +110,7 @@ public class PoolOptions {
private boolean shared = DEFAULT_SHARED_POOL;
private String name = DEFAULT_NAME;
private int eventLoopSize = DEFAULT_EVENT_LOOP_SIZE;
private int jitter = DEFAULT_JITTER;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need a jitterTimeUnit property which should be by default Willis

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@vietj
Copy link
Member

vietj commented Mar 20, 2025

I am looking at the contribution, how does it work initially when the pool is empty and many connections attempts can be made simultaneously ?

Comment on lines 316 to 317
long randomizedJitter = (maxLifetime > 0 && jitter > 0 && maxLifetime > jitter) ? ThreadLocalRandom.current().nextLong(-jitter, jitter + 1) : 0;
this.lifetimeEvictionTimestamp = maxLifetime > 0 ? System.currentTimeMillis() + maxLifetime + randomizedJitter : Long.MAX_VALUE;
Copy link

@chirag-manwani chirag-manwani Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
long randomizedJitter = (maxLifetime > 0 && jitter > 0 && maxLifetime > jitter) ? ThreadLocalRandom.current().nextLong(-jitter, jitter + 1) : 0;
this.lifetimeEvictionTimestamp = maxLifetime > 0 ? System.currentTimeMillis() + maxLifetime + randomizedJitter : Long.MAX_VALUE;
this.lifetimeEvictionTimestamp = maxLifetime > 0 ? System.currentTimeMillis() + maxLifetime + Math.max(0, ThreadLocalRandom.current().nextLong(0, jitter)) : Long.MAX_VALUE;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this with nextLong(0, jitter+1)

@priyanshu-d11
Copy link
Author

@vietj
Modified the code to have jitter added to max lifetime, rather than ranging from (-jitter, jitter+1).

Added a test case which does the following:
This test verifies that our connection pool correctly enforces its configured maxLifetime (5000 ms) and jitter (1 second) settings. It does so by:

Acquiring 50 connections in parallel.

  1. For each connection, query the backend process ID (PID) and its connection start time.
  2. Closing the connection and then polling PostgreSQL’s pg_stat_activity until the connection is no longer visible (i.e., it has been closed by the pool).
  3. Recording the time when the connection is detected as closed.
  4. Calculate each connection's duration (close time – start time).
  5. Grouping these durations into buckets (each with a width of jitter/5 = 200 ms) based on an offset from max lifetime.
  6. Finally, asserting that at least 5 distinct buckets are populated ensures that the jitter setting effectively introduces a spread in connection lifetimes.

@priyanshu-d11
Copy link
Author

@vietj While reviewing the test cases, I came across one for a maxLifetime where a connection is created and immediately closed. The test correctly verified that the connection was closed after the expected duration. I tried replicating this by setting the pool size to 10 and requesting multiple connections, but I wasn't able to get it working as expected. Is there a better approach to test this scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connection Storm Prevention with Jitter in Connection Pool
3 participants