Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request type: Bugfix
Changes in this PR: Improve Workflow Reliability during Gradual Restarts with Optional Redis Idempotence
Description: This PR addresses an issue encountered when Conductor is deployed across multiple machines, particularly during a gradual restart sequence. Previously, if a workflow event was consumed from RabbitMQ and saved to Redis but had not yet been processed when a machine shut down, the event would reappear in the RabbitMQ queue. At this point, another machine could re-consume the event, but since the original entry in Redis had not yet expired, Conductor would incorrectly assume that this event was already processed by another machine, resulting in an early acknowledgment to RabbitMQ. This led to the loss of events and untriggered workflows, impacting reliability and predictability.
The Fix: This update introduces a new property, eventExecutionPersistenceEnabled, which offers greater flexibility and control over event handling. When set to false, this property disables the Redis persistence mechanism for idempotence, delegating responsibility for deduplication and idempotence to business services. By default, eventExecutionPersistenceEnabled is set to true, preserving the current behavior for backward compatibility.
Key Benefits:
Enhanced Reliability: Prevents workflow event loss during machine restarts by allowing business services to handle idempotence directly.
Backward Compatibility: Maintains the existing Redis persistence behavior by default, ensuring seamless integration for existing deployments.
Greater Control: Empowers operators to manage idempotence in ways that best suit their distributed environments.
This update brings added robustness to distributed setups and ensures workflows are processed reliably, even during server transitions.