Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature timeouts and stats #251

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

seanmcevoy
Copy link

This pull request allows us to configure separate queue and service timeouts instead of just the total timeout.

The problem it's aimed at is that each request is blocking, so if the client is used in a near-capacity application and request queues start to build up it can become very inefficient.

The inefficiency in the current timeout mechanism is caused by the fact that the timeout covers both the queue and service time. If the request queue builds up then the requests that we put on the wire are the oldest ones which are most likely to expire while being serviced. If one expires while being serviced the connection is broken and re-formed and the next oldest request is taken. So if we're in a constant overloaded state the client will spend all its time disconnecting & reconnecting to the server and serve no requests at all.

To avoid this the new mechanism allows us to configure seperate queue and servicing timeouts.

This is demonstrated in overload_demo_test where we setup a dummy server with a delay and then overload it with 200 requests. The old mechanism can only service the first 2 or 3 requests and will timeout for the rest of the test, while the new mechanism will service a constant ~60% of the requests.

The old mechanism is retained and is working exactly as before, backward compatibility is tested in timeout_conn_test & timeout_no_conn_test.

@seanmcevoy seanmcevoy mentioned this pull request Nov 20, 2015
Closed
@lukebakken
Copy link
Contributor

If you have time, could you do some research into the CI failures? Let me know if you can / can't reproduce them locally when you run the test suite.

@seanmcevoy
Copy link
Author

The changes affect the timeout mechanism so the tests make a lot of
concurrent calls and check the time taken to return. So I have a dilemma of
how tight to set the tolerances of the execution times.
On my machine these tests pass >95% of the time, but on you CI box the
failure seems consistent.
So it's just a test issue, I'll see if I can figure out a better way to
test it. Just loosning the tolerances until it passes feels like cheating!

On Fri, Dec 4, 2015 at 4:17 PM, Luke Bakken [email protected]
wrote:

If you have time, could you do some research into the CI failures? Let me
know if you can / can't reproduce them locally when you run the test suite.


Reply to this email directly or view it on GitHub
#251 (comment)
.

@seanmcevoy
Copy link
Author

CI tests passing now after tweaking the tolerances. They're still probabalistic so unfortunately there will be a low failure rate if run repeatedly. Possibly they're a little too high-level to be eunit tests, don't know of any better solutions though.

@lukebakken
Copy link
Contributor

@seanmcevoy - just so you know, we haven't had time to review this yet. I have added it to a future milestone and it won't be lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants