Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java: Add UDS read/write error handling and tests with glide core mock #949

Merged

Conversation

Yury-Fridlyand
Copy link
Collaborator

Features:

  1. Error handling on UDS write
  2. Error handling on UDS read
  3. Glide core mock
  4. Test superclass with glide core mock
  5. Tests for error handling

Missing:

  1. Error handling on channel creation. Left TODO there. Requires significant rework of ChannelHandler.
  2. IT - can't be reproduced in IT unfortunately.

* Add UDS read/write error handling and tests with glide core mock.

Signed-off-by: Yury-Fridlyand <[email protected]>
@Yury-Fridlyand Yury-Fridlyand added the java issues and fixes related to the java client label Feb 13, 2024
@Yury-Fridlyand Yury-Fridlyand requested a review from a team as a code owner February 13, 2024 01:04
@shachlanAmazon
Copy link
Contributor

what's the performance cost of all of these new allocations on the hot path?

@Yury-Fridlyand
Copy link
Collaborator Author

what's the performance cost of all of these new allocations on the hot path?

I merged this branch with #896 and run benchmark 5 times with old and new code

./gradlew :benchmarks:run --args="-clients GLIDE -dataSize 100 -concurrentTasks 100 -clientCount 1 -resultsFile ../new.json"

New has 62320 tps
Old has 62305 tps

I feel that measurement precision is less the difference.
Do you want me to run more tests with more parameters?

@shachlanAmazon
Copy link
Contributor

@Yury-Fridlyand, in order to reduce noise, we usually run the benchmark multiple times (usually 10), with full data/concurrency configurations, and then check the difference between averages in each data/concurrency tuple - that way we catch issues that stem mostly in one scenario and not others. I'll share the system & the analysis file I use in the evening meeting.

@acarbonetto acarbonetto changed the title Add UDS read/write error handling and tests with glide core mock Java: Add UDS read/write error handling and tests with glide core mock Feb 16, 2024
@barshaul
Copy link
Collaborator

@Yury-Fridlyand @acarbonetto Please comment when it's ready for another round

@Yury-Fridlyand
Copy link
Collaborator Author

TL;DR
It reduces average performance on 1.04%

What is our measurement precision?
is_cluster data_size num_of_tasks client_count baseline this PR delta
FALSE 20 1 1 3171,7 3301,4 104,0892897
FALSE 20 1 2 3165,8 3156,5 99,70623539
FALSE 20 10 1 20935,55 19334,2 92,35104881
FALSE 20 10 2 20913,7 20048,15 95,86132535
FALSE 20 100 1 133024,4 133509,45 100,3646324
FALSE 20 100 2 162364,7 164485,2 101,3060105
FALSE 500 1 1 3288,4 3171,8 96,45420265
FALSE 500 1 2 3111,7 3142,65 100,9946332
FALSE 500 10 1 19895,65 18955,85 95,27635438
FALSE 500 10 2 20285,5 20726 102,1715018
FALSE 500 100 1 125628,95 127746,8 101,6857977
FALSE 500 100 2 155886,65 156529,95 100,4126716
TRUE 20 1 1 1021,65 1023,5 100,1810796
TRUE 20 1 2 1014,1 1017,45 100,3303422
TRUE 20 10 1 6618,7 6332,7 95,67890976
TRUE 20 10 2 7409,9 7327,85 98,89269761
TRUE 20 100 1 61875,25 61480,5 99,36202278
TRUE 20 100 2 69192,55 68918,05 99,60328099
TRUE 500 1 1 1025,9 1020,55 99,47850668
TRUE 500 1 2 1014,75 1020,15 100,5321508
TRUE 500 10 1 6640,8 6284,45 94,63392965
TRUE 500 10 2 7582,45 7180,05 94,6930082
TRUE 500 100 1 63112,7 63712 100,9495712
TRUE 500 100 2 69007,6 69047,15 100,0573125

See performance results and script I used for running benchmark in zip: 949.zip

I run benchmarking 20 times on an EC2 instance vs standaloe and cluster redis running on another EC2. All tests were with TLS.
80 runs it total (20 x 2 redis types x 2 branches)

@shachlanAmazon
Copy link
Contributor

@Yury-Fridlyand 1% is well within our noise range, but I do see that there's a clear degradation throughout all scenarios on the 10 concurrent tasks values. It's not critical, but are there alternatives that don't require allocations on the hotpath?

BTW, why do you use these custom values for the benchmark? they're not the defaults we use.

@Yury-Fridlyand
Copy link
Collaborator Author

Yury-Fridlyand commented Feb 22, 2024

clear degradation throughout all scenarios on the 10 concurrent tasks

I propose to make a task to investigate it later unless it is a blocker now.
I have only a guess here that 10 tasks have the most optimal CPU utilization. There are 16 cores on the benchmarking host (so thread pool has 16 threads). It is still unclear why we have such degradation with 10 tasks, but not with 100 or 1000.

are there alternatives that don't require allocations

Unfortunately, not. CompletableFuture support in Netty is planned for Netty 5 major release.
We can make NettyFutureErrorHandler a singletone-like object and keep a concurrent map between netty future and client promise... I'm sure that it will be slower and would consume more memory.

why do you use these custom values for the benchmark?

My mistake. Do you want me to re-run?

@shachlanAmazon
Copy link
Contributor

I propose to make a task to investigate it later unless it is a blocker now.

Doesn't even need to be a task. If you'll have perf issues (once the benchmark actually works correctly), we'll investigate everything. Let's just remember that this might be a contributor.

My mistake. Do you want me to re-run?

No, thanks. You've spent enough time on this.

Copy link
Contributor

@shachlanAmazon shachlanAmazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@barshaul do you want to take a look?

@Yury-Fridlyand Yury-Fridlyand merged commit 5d5a36c into valkey-io:main Feb 27, 2024
11 checks passed
@Yury-Fridlyand Yury-Fridlyand deleted the java/integ_yuryf_handle_uds_errors branch February 27, 2024 18:16
cyip10 pushed a commit to Bit-Quill/valkey-glide that referenced this pull request Jun 24, 2024
valkey-io#949)

Add UDS read/write error handling and tests with glide core mock. (#76)

* Add UDS read/write error handling and tests with glide core mock.

Signed-off-by: Yury-Fridlyand <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
java issues and fixes related to the java client
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants