Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for MilliSecondsBehindSource growing exponentially. #700

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions documentation/faq.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -351,3 +351,21 @@ To solve the issue the configuration option `producer.max.request.size` must be
If the global change is not desirable then the connector can override the default setting using configuration option `producer.override.max.request.size` set to a larger value.

In the latter case it is also necessary to configure `connector.client.config.override.policy=ALL` option in Kafka Connect worker config file `connect-distributed.properties`. For Debezium `connect` Docker image the environment variable `CONNECT_CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY` can be used to configure the option.

== Why MilliSecondsBehindSource is growing exponentially ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this change, this sounds a bit concerning though. Can you provide some more context about this situation: what's your change event rate and your set-up in general? I don't quite agree to the statement on roundtrip times below; the connection throughput may be a limiting factor, but latency itself should be constant, also with larger roundtrip times between the DB and the Kafka Connect host.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gunnarmorling ,

Thanks for your comments.
MilliSecondsBehindSource growing exponentially issue occurred in our environment and I had reached out to Jiri Pechanec about this issue. Link to Gitter Thread with details - https://gitter.im/debezium/user?at=6050ec1ed1aee44e2def7b89

Created this pull request as Jiri had recommended contributing to the documentation.

Please let me know if you need more details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rnatarajan so I definitely think adding an FAQ entry, particularly when MilliSecondsBehindSource is high is worthwhile given the scenario you describe, but I think perhaps the way this is conveyed, particularly with the use of "exponentially" gives a bad impression.

Perhaps, this could be reworded to something like "High MilliSecondsBehindSource when connector deployed with WAN connection to database" or similar. I think speaking in terms of higher than usual latency when using a WAN connection rather than a LAN connection is what we should strive for on this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naros Thanks for the comment. I can reword the README.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, once you have, if you could just request a re-review of the PR that'd be wonderful.


When Debezium Connector is replicating binlog, it is possible that Debezium Connector is not able to keep up with rate at which CDC events are generated in the upstream database.
Observe the streaming metrics MilliSecondsBehindSource. MilliSecondsBehindSource would increase exponentially.

To solve the issue, identify the round trip time of a packet from the machine on which Kafka connect is running to database host.

```
ping -c 10 <database.hostname>
```

If the round trip time is in few milliseconds(For Example 20 or 30 milliseconds and not 0.1 or 0.5 milliseconds), then time taken by Kafka Connect to connect with upstream database is high.

For streams generating fewer CDC events, Kafka connect will be able to keep up with CDC events even with high round trip time.
However for a stream generating high volume of CDC data, Kafka connect will not be able to keep up with CDC data and hence MilliSecondsBehindSource will grow exponentially.

Move Kafka Connect with Debezium connector to a host or machine from which database can be reached faster or round trip time is less than a millisecond.