Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to write multiple topics to single scylla table #55

Open
dhgokul opened this issue Jul 8, 2021 · 2 comments
Open

how to write multiple topics to single scylla table #55

dhgokul opened this issue Jul 8, 2021 · 2 comments

Comments

@dhgokul
Copy link

dhgokul commented Jul 8, 2021

At present we have topics - topic1, topic2 and topic3. Using separate sink connector (separate server)for ach topic ,
instead of writing each topic message to different scylla table, looking to write in common-single table for all topics.

At present used confluent regex property to achieve above . But its not efficient, as overwriting happens on regex used sink connectors.

Is there any method to achieve in efficient.

@avelanarius
Copy link

Could you be more specific on what the problem is? Are you observing a poor performance of RegexRouter ("regex property")? Or is another part of the system slow (the connector itself)? It seems (after looking at RegexRouter source code, doing some micro-benchmarks) that this transform should not add a major amount of overhead.

@dhgokul
Copy link
Author

dhgokul commented Jul 16, 2021

Using sink connector we trying to write multiple topics from redpanda to a single Scylla table.
In our case did testing of 100 Ml test using 5 topics, used 5 sink connectors, the topics includes: topic1, sub1-topic1, sub2-topic1,sub3-topic1,sub4-topic1.
In redpanda we have 3 cluster. Each topic has partition 10. Tried with and without replica in redpanda

**Connector [JSON] Config:**

bootstrap.servers=redpanda_cluster_1_ip:9092,redpanda_cluster_2_ip:9092,redpanda_cluster_3_ip:9092
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
offset.storage.file.filename=/tmp/connect.offsets
plugin.path=target/components/packages/

**Sink Connector-1 Config [Json]:**
name=scylladb-sink-connector
connector.class =io.connect.scylladb.ScyllaDbSinkConnector
tasks.max= 56
topics =topic1
scylladb.contact.points=scylla_cluster_1_ip,scylla_cluster_2_ip,scylla_cluster_3_ip
scylladb.port=9042
scylladb.keyspace=streamprocess
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
transforms=createKey
transforms.createKey.fields=id
transforms.createKey.type=org.apache.kafka.connect.transforms.ValueToKey



**Sink Connector-2 Config [Json]:**

name=scylladb-sink-connector2
connector.class =io.connect.scylladb.ScyllaDbSinkConnector
tasks.max= 56
topics =sub1-topic1
scylladb.contact.points= scylla_cluster_1_ip,scylla_cluster_2_ip,scylla_cluster_3_ip
scylladb.consistency.level= QUORUM 
scylladb.keyspace.replication.factor=3
scylladb.port=9042
scylladb.keyspace=streamprocess
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
transforms.createKey.fields=id
transforms.createKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms=createKey,dropPrefix
transforms.createKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.dropPrefix.type=org.apache.kafka.connect.transforms.RegexRouter
transforms.dropPrefix.regex=sub1-(.*)
transforms.dropPrefix.replacement=$1

using sink connector config 1 for topic1 and Sink Connector-2 Config for the rest of topics, running in 5 separate machines,
we facing 2 issues:

  1. Comparing sink connector 1 config, sink connector 2 config are slower
  2. Overwriting of messages are happening in scylla cluster, i.e once 100 milion messages dumbed in scylla, we restarted only sink connectors, but overwriting of message happening, when checked node tool tablestats instead of local write count 10million it showing 100+ million counts.

Are there any changes needed in the config?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants