Nakadi clients resilience to partial outage and partial success #217

adyach · 2023-09-08T12:18:48Z

Nakadi publishing API accepts events in batches. It can fail to publish some events from the batch to underlying storage (Apache Kafka). In that case Nakadi publishing API will return error that batch was partially successful.
It can create problems the following problems, depending on how the Nakadi client and the publishing application deals with this partial success response:

increase in traffic on Nakadi publishing API due to Nakadi clients retrying the whole batch over and over
the application retries identical batches which prevents application from progressing

The following should be done to decrease the possibility of mentioned problems:

Nakadi client should contain a note to developers that publishing can experience partial success. This should be in the client documentation and ideally also within the self contained code documentation, raising awareness for the users, e.g. via docstrings.
An optional retry method on batch level can be provided for the whole batch, but the default strategy must contain a backoff - solution in case of continued errors to publish to Nakadi.
An optional retry method can be provided that only re-publishes unsuccessful events to Nakadi. This retry must also support a backoff strategy by default.
Clients must expose the result of a publishing request in a way that developers can understand that there is the possibility of a partial success for batch publishing.

gchudnov · 2023-09-12T08:41:15Z

Hi @adyach
Thank you for reaching out,

At the moment there should be a backoff logic that retries the publishing.
And the result of partial publishing is already exposed to the clients.

I'll update the readme to clarify the behavior.

adyach · 2023-09-12T08:49:51Z

@gchudnov docs update is great. thank you!

adyach · 2023-10-09T07:52:45Z

@gchudnov any update on this ?

gchudnov · 2023-10-09T10:14:57Z

@adyach
sorry, being quite busy :(
will finish the doc by the end of tomorrow (Oct 10, 2023)

gchudnov · 2023-10-10T19:50:05Z

@adyach
just made a PR, please take a look: #219

gchudnov · 2024-07-30T11:43:30Z

closed as migrated to the internal repo

gchudnov mentioned this issue Oct 10, 2023

Update publishing guidelines #219

Closed

gchudnov closed this as completed Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nakadi clients resilience to partial outage and partial success #217

Nakadi clients resilience to partial outage and partial success #217

adyach commented Sep 8, 2023

gchudnov commented Sep 12, 2023

adyach commented Sep 12, 2023

adyach commented Oct 9, 2023

gchudnov commented Oct 9, 2023

gchudnov commented Oct 10, 2023

gchudnov commented Jul 30, 2024

Nakadi clients resilience to partial outage and partial success #217

Nakadi clients resilience to partial outage and partial success #217

Comments

adyach commented Sep 8, 2023

gchudnov commented Sep 12, 2023

adyach commented Sep 12, 2023

adyach commented Oct 9, 2023

gchudnov commented Oct 9, 2023

gchudnov commented Oct 10, 2023

gchudnov commented Jul 30, 2024