Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAUSE MIRROR support #605

Merged
merged 9 commits into from
Nov 8, 2023
Merged

PAUSE MIRROR support #605

merged 9 commits into from
Nov 8, 2023

Conversation

heavycrystal
Copy link
Contributor

@heavycrystal heavycrystal commented Nov 2, 2023

PAUSE MIRROR supports both CDC and QRep mirrors. This is implemented by signalling the Temporal workflow. A PauseSignal delivered to the workflow should pause it, while a NoopSignal delivered to the workflow should resume it. Signals that don't make sense for the current workflow state are ignored. Currently supported only via the query layer, not from the UI. The following caveats exist for now:

  1. QRep mirrors cannot be paused during partition reading or replicating, only after a run completes successfully and the signals are read. This is usually not an issue but could be problematic for QRep runs with a lot of partitions are running.
  2. CDC mirrors similarly will only process signals and pause after a pull + sync + normalize cycle is run. Due to 1), a CDC mirror cannot pause during the initial snapshot phase, which is implemented as a QRep mirror internally. This will be marked as unpausable in the UI.

if v == shared.ShutdownSignal {
w.logger.Info("received shutdown signal")
state.ActiveSignal = v
} else if v == shared.PauseSignal {
Copy link
Contributor

@serprex serprex Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So PauseSignal triggers flipping pause state? Seems like a source of flakiness compared to having explicit Pause/Unpause signal. But then you'd have to consider what to do when both signals are set. Are signals not able to carry boolean data?

Guess it also doesn't matter since handler manages sequencing, so you handle pause/unpause in order received

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the current logic is a little flaky, but at the same time I can't think of an obvious way it will fail, since we only read a signal at a time and we only maintain one state variable. Temporal Signals are able to carry serializable data, but carrying a boolean value on the Pause signal isn't very clear IMO.

We could also use explicit Pause and Unpause signals and ignore an illegal one [pausing when already paused] but this would require two RPC endpoints exposed. Another consideration is that the UI can prevent illegal state changes by reading the current state of the workflow, but our query layer has no such ability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the minute while it's asleep one can imagine someone sending redundant pause signals

@@ -254,6 +279,33 @@ func (q *QRepFlowExecution) waitForNewRows(ctx workflow.Context, lastPartition *
return nil
}

func (q *QRepFlowExecution) signalHandler(v shared.CDCFlowSignal) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there some way to share this shutdown/pause logic?

Also seems like activeSignal shouldn't be switched from ShutdownSignal to PauseSignal. So should be checking activeSignal to prevent that sort of state transition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data structures associated with CDCFlow and QRepFlow are different enough for sharing the logic to be not trivial.

I don't think there's a way to transition from ShutdownSignal to PauseSignal, as we only process one signal at a time and the return from shutdown happens immediately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could have utility function in shared that takes signal value + active signal value & returns new active signal value

flow/workflows/qrep_flow.go Outdated Show resolved Hide resolved
nexus/server/src/main.rs Outdated Show resolved Hide resolved
@heavycrystal heavycrystal force-pushed the pause-mirror-initial branch 2 times, most recently from ae54c7d to 72a9ce0 Compare November 7, 2023 21:33
flow/workflows/cdc_flow.go Show resolved Hide resolved
flow/workflows/cdc_flow.go Show resolved Hide resolved
flow/workflows/qrep_flow.go Show resolved Hide resolved
@heavycrystal heavycrystal enabled auto-merge (squash) November 8, 2023 15:08
@heavycrystal heavycrystal merged commit 319cde6 into main Nov 8, 2023
9 checks passed
@serprex serprex mentioned this pull request Nov 9, 2023
@serprex serprex deleted the pause-mirror-initial branch July 19, 2024 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants