You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#1365 one-sync introduced potential data loss due to incorrect state management with temporal activity retries. We maintain state to track the cdc offset of the replication connection, but this state was not being properly updated in error paths. Temporal would then retry with the same connection & our session state would appear to the logic as not requiring reconnection to reread cdc stream, causing previously read portion of the cdc stream to be skipped
#1481 fixes Offset being updated in error path so that replState comparison will properly cause connection restart when activity is retried
#1482 however, there was another issue in temporal errors: fmt.Errorf("%w", temporal.NonRetryableApplicationError(msg, type, cause)) would still have activity retried, so add checks to avoid wrapping temporal.ApplicationError
#1483 while assessing logs for impact to customers, lack of offset logging was a problem, so add that logging to help debug issues in future
We reviewed all cloud errors in last two weeks, all except one were not impacted by this error because they were either context cancelation (which prevents activity from being retried) or connection closed (which prevents the replication connection from being reused)
The text was updated successfully, but these errors were encountered:
#1365 one-sync introduced potential data loss due to incorrect state management with temporal activity retries. We maintain state to track the cdc offset of the replication connection, but this state was not being properly updated in error paths. Temporal would then retry with the same connection & our session state would appear to the logic as not requiring reconnection to reread cdc stream, causing previously read portion of the cdc stream to be skipped
#1481 fixes Offset being updated in error path so that replState comparison will properly cause connection restart when activity is retried
#1482 however, there was another issue in temporal errors:
fmt.Errorf("%w", temporal.NonRetryableApplicationError(msg, type, cause))
would still have activity retried, so add checks to avoid wrapping temporal.ApplicationError#1483 while assessing logs for impact to customers, lack of offset logging was a problem, so add that logging to help debug issues in future
We reviewed all cloud errors in last two weeks, all except one were not impacted by this error because they were either context cancelation (which prevents activity from being retried) or connection closed (which prevents the replication connection from being reused)
The text was updated successfully, but these errors were encountered: