Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

Document Nakadi Publishing failure handling #416

Merged
merged 2 commits into from
Sep 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ NakadiClient nakadiClient = NakadiClient.builder(NAKADI_URI, new SimpleRequestFa
.build();
```

## Exception handling
## Retries and exception handling for event consumption

Exception handling while streaming events follows some simple rules

Expand All @@ -189,6 +189,29 @@ Fahrschein supports different exponential backoff strategies when streaming even
* `EqualJitterBackoffStrategy` (default) - extends `ExponentialBackoffStrategy` with the same defaults. For each delay it takes half of the delay value and adds the other half multiplied by a random factor [0..1).
* `FullJitterBackoffStrategy` - extends `ExponentialBackoffStrategy` with the same defaults and multiplies each delay by a random factor [0..1).

## Retries and exception handling for event publishing

Fahrschein does not have sophisticated mechanisms for retry handling when publishing to Nakadi yet. The recommended way
to handle exceptions when publishing is to create a retry-wrapper around the `NakadiClient.publish` method.

In case of a partial success or also in cases like validation errors, which are complete failures, Fahrschein
will throw an `EventPublishingException` with the `BatchItemResponse`s (as returned from Nakadi) for the failed
items in the responses property.

These objects have the eid of the failed event, a `publishingStatus` (failed/aborted/submitted - but successful itemes are
filtered out), the step where it failed and a detail string.

If the application sets the eids itself (i.e. doesn't let Nakadi do it) and keeps track of them, this allows it
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nakadi does not assign eid to events — the id has to be provided by the event publisher.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that's new to me. From my experience, when no eid is in the metadata, Nakadi will provide a random UUID. Did this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried it, and events without eid are rejected. I guess I mixed this up with some other metadata fields. Sorry.

to resend only the failed items later.

It also allows differentiating between validation errors, which likely don't need to be retried, as they are
unlikely to succeed the next time, unless the event type definition is changed, and publishing errors
which should be retried with some back-off.

Recommendation: Implement a retry-with-backoff handler for `EventPublishingException`s, which, depending on
your ordering consistency requirements, either retries the full batch, or retries the failed events based
on the event-ids.

## Stopping and resuming streams

The stream implementation gracefully handles thread interruption, so it is possible to stop a running thread and resume consuming events by re-submitting the `Runnable`:
Expand Down
17 changes: 17 additions & 0 deletions fahrschein/src/main/java/org/zalando/fahrschein/NakadiClient.java
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.zalando.fahrschein.domain.Authorization;
import org.zalando.fahrschein.domain.BatchItemResponse;
import org.zalando.fahrschein.domain.Cursor;
import org.zalando.fahrschein.domain.Partition;
import org.zalando.fahrschein.domain.Subscription;
Expand Down Expand Up @@ -92,6 +93,22 @@ public List<Partition> getPartitions(String eventName) throws IOException {

/**
* Writes the given events to the endpoint provided by the eventName.
*
* <p>In case of a partial success (or also in cases like validation errors, which are complete failures), Fahrschein
* will throw an {@link EventPublishingException} with the {@link BatchItemResponse}s (as returned from Nakadi) for the failed
* items in the responses property.
* These objects have the eid of the failed event, a publishingStatus (failed/aborted/submitted - but successful items are
* filtered out), the step where it failed and a detail string.
* If the application sets the eids itself (i.e. doesn't let Nakadi do it) and keeps track of them, this allows it
* to resend only the failed items later.
* It also allows differentiating between validation errors, which likely don't need to be retried, as they are
* unlikely to succeed the next time, unless the event type definition is changed, and publishing errors
* which should be retried with some back-off.</p>
*
* <p>Recommendation: Implement a retry-with-backoff handler for {@link EventPublishingException}s, which, depending on
* your ordering consistency requirements, either retries the full batch, or retries the failed events based
* on the event-ids.</p>
*
* @param eventName where the event should be written to
* @param events that should be written
* @param <T> Type of the Event
Expand Down