Polishing

zalando-nakadi · Sep 18, 2023 · 4ad3160 · 4ad3160
1 parent a5ed6fb
commit 4ad3160
Show file tree

Hide file tree

Showing 2 changed files with 30 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -172,7 +172,7 @@ NakadiClient nakadiClient = NakadiClient.builder(NAKADI_URI, new SimpleRequestFa
         .build();
 ```
 
-## Exception handling
+## Retries and exception handling for event consumption
 
 Exception handling while streaming events follows some simple rules
 
@@ -189,6 +189,29 @@ Fahrschein supports different exponential backoff strategies when streaming even
 * `EqualJitterBackoffStrategy` (default) - extends `ExponentialBackoffStrategy` with the same defaults. For each delay it takes half of the delay value and adds the other half multiplied by a random factor [0..1).
 * `FullJitterBackoffStrategy` - extends `ExponentialBackoffStrategy` with the same defaults and multiplies each delay by a random factor [0..1).
 
+## Retries and exception handling for event publishing
+
+Fahrschein does not have sophisticated mechanisms for retry handling when publishing to Nakadi yet. The recommended way
+to handle exceptions when publishing is to create a retry-wrapper around the `NakadiClient.publish` method.
+
+In case of a partial success or also in cases like validation errors, which are complete failures, Fahrschein
+will throw an `EventPublishingException` with the `BatchItemResponse`s (as returned from Nakadi) for the failed
+ items in the responses property.
+
+These objects have the eid of the failed event, a `publishingStatus` (failed/aborted/submitted - but successful itemes are
+filtered out), the step where it failed and a detail string.
+
+If the application sets the eids itself (i.e. doesn't let Nakadi do it) and keeps track of them, this allows it
+to resend only the failed items later.
+
+It also allows differentiating between validation errors, which likely don't need to be retried, as they are
+unlikely to succeed the next time, unless the event type definition is changed, and publishing errors
+which should be retried with some back-off.
+
+Recommendation: Implement a retry-with-backoff handler for `EventPublishingException`s, which, depending on
+your ordering consistency requirements, either retries the full batch, or retries the failed events based
+on the event-ids.
+
 ## Stopping and resuming streams
 
 The stream implementation gracefully handles thread interruption, so it is possible to stop a running thread and resume consuming events by re-submitting the `Runnable`:

diff --git a/fahrschein/src/main/java/org/zalando/fahrschein/NakadiClient.java b/fahrschein/src/main/java/org/zalando/fahrschein/NakadiClient.java
@@ -94,15 +94,15 @@ public List<Partition> getPartitions(String eventName) throws IOException {
      * Writes the given events to the endpoint provided by the eventName.
      *
      * In case of a partial success (or also in cases like validation errors, which are complete failures), Fahrschein
-     * will throw an EventPublishingException with the BatchItemResponses (as returned from Nakadi) for the failed
+     * will throw an {{EventPublishingException}} with the {{BatchItemResponse}}s (as returned from Nakadi) for the failed
      * items in the responses property.
-     * These objects have the eid of the failed event, a publishingStatus (failed/aborted/submitted - but these are
-     * filtered out)), the step where it failed and a detail string.
+     * These objects have the eid of the failed event, a publishingStatus (failed/aborted/submitted - but successful items are
+     * filtered out), the step where it failed and a detail string.
      * If the application sets the eids itself (i.e. doesn't let Nakadi do it) and keeps track of them, this allows it
      * to resend only the failed items later.
-     * It also allows differentiating between validation errors (which likely don't need to be retried, as they are
-     * unlikely to succeed the next time, unless the event type definition is changed) and publishing errors
-     * (which should be retried, possibly with some back-off).
+     * It also allows differentiating between validation errors, which likely don't need to be retried, as they are
+     * unlikely to succeed the next time, unless the event type definition is changed, and publishing errors
+     * which should be retried with some back-off.
      *
      * Recommendation: Implement a retry-with-backoff handler for {{EventPublishingException}}s, which, depending on
      * your ordering consistency requirements, either retries the full batch, or retries the failed events based