Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read each partition individually rather than using WHERE {set} IN query #66

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

0xF0D0
Copy link

@0xF0D0 0xF0D0 commented Nov 22, 2021

This PR fixes cdc stream read query to read each stream individually by having Driver3Reader per stream.

Every single Driver3Reader now uses prepared statement which is where equals ? not where in ?

scylladb/scylla-cdc-source-connector#15

@haaawk
Copy link
Contributor

haaawk commented Dec 1, 2021

@0xF0D0 did you check that this actually improves the performance?

Are you working with @pkgonan ?

@0xF0D0
Copy link
Author

0xF0D0 commented Dec 7, 2021

Yep I am :) I'll upload the result of test in a couple of days

@0xF0D0
Copy link
Author

0xF0D0 commented Dec 22, 2021

Sorry for the late reply. We've done some tests and proved that it reduces latency drastically.

current_slow_log.txt

improved_slow_log.txt

@0xF0D0
Copy link
Author

0xF0D0 commented Jan 3, 2022

image
image

First image is when I connected with current scylla driver
Second image is when I connected this pr's driver

@haaawk
Copy link
Contributor

haaawk commented Jan 10, 2022

Please squash the commits so that in the history you don't fix errors in commits that you've introduced just before.

Signed-off-by: fodo <[email protected]>

add missing consistency level.

Signed-off-by: fodo <[email protected]>

add missing findNext

Signed-off-by: fodo <[email protected]>
@0xF0D0
Copy link
Author

0xF0D0 commented Jan 11, 2022

@haaawk done

@haaawk
Copy link
Contributor

haaawk commented Jan 11, 2022

Thanks @0xF0D0

@0xF0D0
Copy link
Author

0xF0D0 commented Jan 18, 2022

@haaawk is there any further action needed?

@haaawk
Copy link
Contributor

haaawk commented Jan 18, 2022

We're having problem to replicate your performance results @0xF0D0. Would you be open to have a video call with us? We would like to try to understand why we don't see the same results.

@0xF0D0
Copy link
Author

0xF0D0 commented Jan 19, 2022

@0xF0D0 of course :) please send invite to [email protected]!

@0xF0D0
Copy link
Author

0xF0D0 commented Feb 20, 2022

Is there problem sending invitation? I haven't got any mails... could you check it again?

FYI. My test env is

Cluster: i3en.3xlarge * 12, 2DC, 6 nodes per DC, on k8s

Stress nodes
- two write 100% nodes
- two mixed(ratio=read7,write1) nodes

CDC enabled on stress table.

Scylla Kafka Connector
- Kafka connect 3 nodes

You can check on default driver, Scylla connector consumes with latency over 100ms, but with this patch it reduces to under 10ms

@haaawk
Copy link
Contributor

haaawk commented Feb 22, 2022

I've sent an invite over the email @0xF0D0.

@hansh0801
Copy link

track

@fee-mendes
Copy link
Member

@avelanarius ping

@kbr-scylla
Copy link

@avelanarius any chance to pick this up?

If it works better with IN queries on some clusters but better with individual queries on others, maybe the library should offer a config option to use one or the other?

@mykaul
Copy link

mykaul commented Apr 27, 2023

@avelanarius any chance to pick this up?

If it works better with IN queries on some clusters but better with individual queries on others, maybe the library should offer a config option to use one or the other?

How will the user know which value to use? Testing of their own workload against their own cluster (and its version) ?

@kbr-scylla
Copy link

How will the user know which value to use? Testing of their own workload against their own cluster (and its version) ?

I guess so. Apparently that's what @0xF0D0 did and they are using this PR on production for 6 months now (discussion on Slack).

@hansh0801
Copy link

hansh0801 commented Apr 27, 2023

@kbr-scylla
we are using in production about 6 months, and it works well. i can show detailed benchmark results and could discuss about it.

@mykaul
Copy link

mykaul commented Feb 25, 2024

@avelanarius - ping

@roydahan roydahan requested a review from avelanarius February 26, 2024 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants