Potential data loss in cdc caused by non-cancelation/non-connection related errors during sync #1485

serprex · 2024-03-13T23:07:20Z

#1365 one-sync introduced potential data loss due to incorrect state management with temporal activity retries. We maintain state to track the cdc offset of the replication connection, but this state was not being properly updated in error paths. Temporal would then retry with the same connection & our session state would appear to the logic as not requiring reconnection to reread cdc stream, causing previously read portion of the cdc stream to be skipped

#1481 fixes Offset being updated in error path so that replState comparison will properly cause connection restart when activity is retried

#1482 however, there was another issue in temporal errors: fmt.Errorf("%w", temporal.NonRetryableApplicationError(msg, type, cause)) would still have activity retried, so add checks to avoid wrapping temporal.ApplicationError

#1483 while assessing logs for impact to customers, lack of offset logging was a problem, so add that logging to help debug issues in future

We reviewed all cloud errors in last two weeks, all except one were not impacted by this error because they were either context cancelation (which prevents activity from being retried) or connection closed (which prevents the replication connection from being reused)

The text was updated successfully, but these errors were encountered:

serprex closed this as completed Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential data loss in cdc caused by non-cancelation/non-connection related errors during sync #1485

Potential data loss in cdc caused by non-cancelation/non-connection related errors during sync #1485

serprex commented Mar 13, 2024 •

edited

Loading

Potential data loss in cdc caused by non-cancelation/non-connection related errors during sync #1485

Potential data loss in cdc caused by non-cancelation/non-connection related errors during sync #1485

Comments

serprex commented Mar 13, 2024 • edited Loading

serprex commented Mar 13, 2024 •

edited

Loading