Implement ZK Overflow Status Checker and Idempotency Key Management #13496

0xnogo · 2024-06-11T08:07:24Z

Motivation

Improve handling of ZK overflow chains in CCIP
Implement a mechanism to query transaction statuses using the Transaction Manager (TXM): hence avoiding having a shared state between Transmission and Execution Plugin

Solution

Status Checker Implementation:
- Implemented CheckMessageStatus to retrieve transaction statuses.
- Query TXM to get all the transactions sent for a specific messageId.
The method will be used by the Transmission component to obtain the next retry count and construct the IdempotencyKey.
The Execution plugin will use this method to retrieve all transaction statuses and snooze a message if any status is Fatal (indicating ZK overflow).
Transmission changes are backward compatible with non-ZK messages. A unique IdempotencyKey is set for each batch, ensuring the TXM submits the transaction without a no-op. This change does not impact other products as it checks if TxMeta contains the MessageIDs field (specific to CCIP) and if the status checker is set. Returning a nil IdempotencyKey maintains the existing behavior.

Notes

Need to wait for Add transaction status function for external components to use #13040 to be merged first

core/services/relay/evm/statuschecker/txm_status_checker.go

makramkd · 2024-06-12T13:33:41Z

core/services/relay/evm/statuschecker/txm_status_checker_test.go

+// MockTxManager is a mock implementation of TxManager
+type MockTxManager struct {
+	mock.Mock
+}
+
+func (m *MockTxManager) GetTransactionStatus(ctx context.Context, transactionID string) (TransactionStatus, error) {
+	args := m.Called(ctx, transactionID)
+	return args.Get(0).(TransactionStatus), args.Error(1)
+}


You can generate this using mockery, avoid hand-writing mocks (at least in this repo)

This is supposed to mock a TXM object. As the PR is not ready yet decided to mock my fake TxManager. I think we should use the TXM mock present in TXM mock here

core/services/relay/evm/statuschecker/txm_status_checker.go

makramkd · 2024-06-12T13:35:54Z

core/services/relay/evm/statuschecker/txm_status_checker.go

+		transactionID := fmt.Sprintf("%s-%d", msgID, counter)
+		status, err := tsc.txManager.GetTransactionStatus(ctx, transactionID)
+		if err != nil {
+			if status == Unknown && strings.Contains(err.Error(), fmt.Sprintf("failed to find transaction with IdempotencyKey %s", transactionID)) {


If err != nil then status may not have a usable value, or at least, shouldn't. Lets make sure of the API contract here because this is kinda unidiomatic code

https://github.com/smartcontractkit/chainlink/pull/13040/files

func (b *Txm[CHAIN_ID, HEAD, ADDR, TX_HASH, BLOCK_HASH, R, SEQ, FEE]) GetTransactionStatus(ctx context.Context, transactionID string) (status commontypes.TransactionStatus, err error) { // Loads attempts and receipts in the transaction tx, err := b.txStore.FindTxWithIdempotencyKey(ctx, transactionID, b.chainID) if err != nil { return status, fmt.Errorf("failed to find transaction with IdempotencyKey %s: %w", transactionID, err) } // This check is required since a no-rows error returns nil err if tx == nil { return status, fmt.Errorf("failed to find transaction with IdempotencyKey %s", transactionID) } switch tx.State { case TxUnconfirmed, TxConfirmedMissingReceipt: // Return unconfirmed for ConfirmedMissingReceipt since a receipt is required to determine if it is finalized return commontypes.Unconfirmed, nil case TxConfirmed: // TODO: Check for finality and return finalized status // Return unconfirmed if tx receipt's block is newer than the latest finalized block return commontypes.Unconfirmed, nil case TxFatalError: // Use an ErrorClassifier to determine if the transaction is considered Fatal txErr := b.newErrorClassifier(tx.GetError()) if txErr != nil && txErr.IsFatal() { return commontypes.Fatal, tx.GetError() } // Return failed for all other tx's marked as FatalError return commontypes.Failed, tx.GetError() default: // Unstarted and InProgress are classified as unknown since they are not supported by the ChainWriter interface return commontypes.Unknown, nil } }

Status will always return the default value Unknown in this case. But I agree that this check is useless. We could just rely on the error check.

makramkd · 2024-06-12T13:38:39Z

core/services/relay/evm/statuschecker/txm_status_checker.go

+		counter++
+	}
+
+	return allStatuses, counter - 1, nil


Do we need all statuses or just the latest is good enough? Need all right?

Does it make sense to put a cap (even if its like 1,000) on the # of statuses to return to avoid getting into an infinite loop here?

We probably would want to skip that msg anyway if it was retried 1000s of times

Do we need all statuses or just the latest is good enough? Need all right?

Yes we would need all. The main scenario is a message (mi-0) submitted at round r0 and only gets resolved at r2. if we fetch the status mi-2 we won't know what the message is a zko has only mi-0 resolved

Does it make sense to put a cap (even if its like 1,000) on the # of statuses to return to avoid getting into an infinite loop here?

Good point. And yes, we should probably return an error and consider that something is wrong with the message. So returning an error here will signal the batching logic to snooze it.

Rough math:
deltaRound of 30s and 1000 retries so that 1000*30/3600 = 8h.

Say the chain has an outage, we will allow up to 8 hours of outage before dropping the message. It is unlikely to happen but it is leaving us enough room

core/services/ocrcommon/transmitter.go

0xnogo · 2024-06-14T09:02:28Z

Closing this PR in favor of smartcontractkit/ccip#947

0xnogo added 3 commits June 10, 2024 21:22

status checker

64990ca

transmitter zk logic

e58adbb

update transmitter for exec plugin

e16f2a0

0xnogo temporarily deployed to sdlc June 11, 2024 08:07 — with GitHub Actions Inactive

improving comments

b5d69a6

0xnogo temporarily deployed to sdlc June 11, 2024 08:13 — with GitHub Actions Inactive

0xnogo mentioned this pull request Jun 11, 2024

Zk Overflow in Execution Plugin smartcontractkit/ccip#947

Merged

add changeset

38b0b02

0xnogo temporarily deployed to sdlc June 11, 2024 08:46 — with GitHub Actions Inactive

0xnogo changed the title ~~status checker~~ Implement ZK Overflow Status Checker and Idempotency Key Management Jun 11, 2024

0xnogo requested a review from a team June 11, 2024 09:20

fix: go linting

9af0c07

0xnogo temporarily deployed to sdlc June 11, 2024 11:27 — with GitHub Actions Inactive

0xnogo marked this pull request as ready for review June 11, 2024 11:36

0xnogo requested a review from a team as a code owner June 11, 2024 11:36

makramkd reviewed Jun 12, 2024

View reviewed changes

0xnogo closed this Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement ZK Overflow Status Checker and Idempotency Key Management #13496

Implement ZK Overflow Status Checker and Idempotency Key Management #13496

0xnogo commented Jun 11, 2024 •

edited

Loading

makramkd Jun 12, 2024

0xnogo Jun 13, 2024

makramkd Jun 12, 2024

0xnogo Jun 13, 2024

0xnogo Jun 13, 2024

makramkd Jun 12, 2024

makramkd Jun 12, 2024

makramkd Jun 12, 2024

0xnogo Jun 13, 2024

0xnogo Jun 13, 2024

0xnogo Jun 13, 2024

0xnogo commented Jun 14, 2024

Implement ZK Overflow Status Checker and Idempotency Key Management #13496

Implement ZK Overflow Status Checker and Idempotency Key Management #13496

Conversation

0xnogo commented Jun 11, 2024 • edited Loading

Motivation

Solution

Notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0xnogo commented Jun 14, 2024

0xnogo commented Jun 11, 2024 •

edited

Loading