diff --git a/README.md b/README.md index 36e003c4..5c09dc63 100644 --- a/README.md +++ b/README.md @@ -172,7 +172,7 @@ NakadiClient nakadiClient = NakadiClient.builder(NAKADI_URI, new SimpleRequestFa .build(); ``` -## Exception handling +## Retries and exception handling for event consumption Exception handling while streaming events follows some simple rules @@ -189,6 +189,29 @@ Fahrschein supports different exponential backoff strategies when streaming even * `EqualJitterBackoffStrategy` (default) - extends `ExponentialBackoffStrategy` with the same defaults. For each delay it takes half of the delay value and adds the other half multiplied by a random factor [0..1). * `FullJitterBackoffStrategy` - extends `ExponentialBackoffStrategy` with the same defaults and multiplies each delay by a random factor [0..1). +## Retries and exception handling for event publishing + +Fahrschein does not have sophisticated mechanisms for retry handling when publishing to Nakadi yet. The recommended way +to handle exceptions when publishing is to create a retry-wrapper around the `NakadiClient.publish` method. + +In case of a partial success or also in cases like validation errors, which are complete failures, Fahrschein +will throw an `EventPublishingException` with the `BatchItemResponse`s (as returned from Nakadi) for the failed + items in the responses property. + +These objects have the eid of the failed event, a `publishingStatus` (failed/aborted/submitted - but successful itemes are +filtered out), the step where it failed and a detail string. + +If the application sets the eids itself (i.e. doesn't let Nakadi do it) and keeps track of them, this allows it +to resend only the failed items later. + +It also allows differentiating between validation errors, which likely don't need to be retried, as they are +unlikely to succeed the next time, unless the event type definition is changed, and publishing errors +which should be retried with some back-off. + +Recommendation: Implement a retry-with-backoff handler for `EventPublishingException`s, which, depending on +your ordering consistency requirements, either retries the full batch, or retries the failed events based +on the event-ids. + ## Stopping and resuming streams The stream implementation gracefully handles thread interruption, so it is possible to stop a running thread and resume consuming events by re-submitting the `Runnable`: diff --git a/fahrschein/src/main/java/org/zalando/fahrschein/NakadiClient.java b/fahrschein/src/main/java/org/zalando/fahrschein/NakadiClient.java index 80a599c1..1ff35ab4 100644 --- a/fahrschein/src/main/java/org/zalando/fahrschein/NakadiClient.java +++ b/fahrschein/src/main/java/org/zalando/fahrschein/NakadiClient.java @@ -5,6 +5,7 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.zalando.fahrschein.domain.Authorization; +import org.zalando.fahrschein.domain.BatchItemResponse; import org.zalando.fahrschein.domain.Cursor; import org.zalando.fahrschein.domain.Partition; import org.zalando.fahrschein.domain.Subscription; @@ -92,6 +93,22 @@ public List getPartitions(String eventName) throws IOException { /** * Writes the given events to the endpoint provided by the eventName. + * + *

In case of a partial success (or also in cases like validation errors, which are complete failures), Fahrschein + * will throw an {@link EventPublishingException} with the {@link BatchItemResponse}s (as returned from Nakadi) for the failed + * items in the responses property. + * These objects have the eid of the failed event, a publishingStatus (failed/aborted/submitted - but successful items are + * filtered out), the step where it failed and a detail string. + * If the application sets the eids itself (i.e. doesn't let Nakadi do it) and keeps track of them, this allows it + * to resend only the failed items later. + * It also allows differentiating between validation errors, which likely don't need to be retried, as they are + * unlikely to succeed the next time, unless the event type definition is changed, and publishing errors + * which should be retried with some back-off.

+ * + *

Recommendation: Implement a retry-with-backoff handler for {@link EventPublishingException}s, which, depending on + * your ordering consistency requirements, either retries the full batch, or retries the failed events based + * on the event-ids.

+ * * @param eventName where the event should be written to * @param events that should be written * @param Type of the Event