Ks 119/remote transmission protocol #13293

ettec · 2024-05-22T15:01:07Z

Very much still a WIP - created just to facilitate an early review meeting with Bolek

github-actions · 2024-05-22T15:02:20Z

I see you updated files related to core. Please run pnpm changeset in the root directory to add a changeset as well as in the text include at least one of the following tags:

#added For any new functionality added.
#breaking_change For any functionality that requires manual action for the node to boot.
#bugfix For bug fixes.
#changed For any change to the existing functionality.
#db_update For any feature that introduces updates to database schema.
#deprecation_notice For any upcoming deprecation functionality.
#internal For changesets that need to be excluded from the final changelog.
#nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
#removed For any functionality/config that is removed.
#updated For any functionality that is updated.
#wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

bolekk · 2024-05-22T15:28:56Z

core/capabilities/remote/target_caller.go

+		return nil, fmt.Errorf("failed to marshal capability request: %w", err)
+	}
+
+	deterministicMessageID := sha256.Sum256(rawRequest)


Your unique identified should be (workflowID, executionID). That is available inside req.Metadata - see engine.go:executeStep().

bolekk · 2024-05-22T15:39:28Z

core/capabilities/remote/target_caller.go

+	}
+
+	select {
+	case <-responseReceived:


Execute shouldn't block on that.

bolekk · 2024-05-22T15:52:00Z

core/capabilities/remote/target_receiver.go

+}
+
+func (r *remoteTargetReceiver) Receive(msg *types.MessageBody) {
+	// TODO should the dispatcher be passing in a context?


good question, I though about improving goroutine management in the Dispatcher, let me think about it

bolekk · 2024-05-22T15:58:57Z

core/capabilities/remote/target_receiver.go

+
+	executeReq.fromPeers[sender] = true
+	minRequiredRequests := int(callerDon.F + 1)
+	if len(executeReq.fromPeers) >= minRequiredRequests {


You can try leveraging the messageCache object.

bolekk · 2024-05-22T16:20:27Z

core/capabilities/remote/target_caller.go

+		return fmt.Errorf("failed to get peer ID to transmission delay: %w", err)
+	}
+
+	for peerID, delay := range peerIDToDelay {


It's interesting that you put the strategy inside the shim. I initially though that it will exist outside of it but maybe this is better. Let me think about it more.

bolekk · 2024-05-23T02:04:17Z

core/capabilities/remote/target_caller.go

+}
+
+func (c *remoteTargetCaller) RegisterToWorkflow(ctx context.Context, request commoncap.RegisterToWorkflowRequest) error {
+	return errors.New("not implemented")


Could these just be no-ops for targets?

bolekk · 2024-05-28T03:09:18Z

core/capabilities/remote/target/receiverRequest.go

+	}
+}
+
+func (e *remoteTargetCapabilityRequest) receive(ctx context.Context, msg *types.MessageBody) error {


Did you consider using MessageCache for this logic? It would be nice to implement similar behaviors in a consistent way across all remote capabilities.

bolekk · 2024-05-28T03:10:12Z

core/capabilities/remote/target/receiver.go

+		return
+	}
+
+	// A request is uniquely identified by the message id and the hash of the payload


This sounds risky. A message to a target should be identified by (CallerDonID, WorkflowExecutionID) - available in the metadata. WorkflowExecutionID is something that all nodes in the workflow DON reached consensus on. If we track things only by payload hashes and we have multiple buggy or malicious nodes, it will be very hard for us to make sense of any metrics. And you also need scoping to caller DON.

bolekk · 2024-05-28T03:11:22Z

core/capabilities/remote/target/receiverRequest.go

@@ -0,0 +1,186 @@
+package target


inconsistent file names - rename to receiver_request.go

bolekk · 2024-05-28T03:12:41Z

core/capabilities/remote/target/caller.go

+		requestIDToExecuteRequest: make(map[string]*callerExecuteRequest),
+	}
+
+	go func() {


Can we convert to services.Service style with Start()/Stop() for consistency with other objects that launch their own coroutines?

bolekk · 2024-05-28T03:15:25Z

core/capabilities/remote/target/caller.go

+
+	if msg.Error != types.Error_OK {
+		c.lggr.Warnw("received error response for pending request", "requestID", requestID, "sender", sender, "receiver", msg.Receiver, "error", msg.Error)
+		return


Why not aggregate like successful responses? If all remote nodes return the same error, shouldn't we also pass it back to the underlying caller?

cl-sonarqube-production · 2024-05-29T11:11:16Z

Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)
21 New Major Issues (required ≤ 5)

See analysis details on SonarQube

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

ettec added 12 commits May 20, 2024 15:16

factor out transmission protocol

b359bec

tidyup

bc87132

wip

990f28c

add timeout handling

a871757

move to using deterministically unique message ids

5b1c2c9

wip

13a861d

error handling

43eb46b

wip

04b820b

wip

bd79c12

remote target receiver base tests

e80bea6

wip

fbf15cf

wip wip

8ae4e73

ettec temporarily deployed to sdlc May 22, 2024 15:01 — with GitHub Actions Inactive

bolekk reviewed May 22, 2024

View reviewed changes

request timeout handling

393cdab

ettec temporarily deployed to sdlc May 22, 2024 16:37 — with GitHub Actions Inactive

request timeout handling

79d98b5

ettec temporarily deployed to sdlc May 22, 2024 16:38 — with GitHub Actions Inactive

context cancellation

f2587c6

ettec temporarily deployed to sdlc May 22, 2024 16:49 — with GitHub Actions Inactive

bolekk reviewed May 23, 2024

View reviewed changes

update message id

0ce3943

ettec temporarily deployed to sdlc May 23, 2024 10:08 — with GitHub Actions Inactive

refactored caller to return on f + 1 responses

7b2be14

ettec temporarily deployed to sdlc May 23, 2024 11:50 — with GitHub Actions Inactive

wip

f2ce195

ettec temporarily deployed to sdlc May 23, 2024 13:57 — with GitHub Actions Inactive

refactor tests and test broker

460f608

ettec temporarily deployed to sdlc May 23, 2024 14:35 — with GitHub Actions Inactive

ettec temporarily deployed to sdlc May 27, 2024 17:33 — with GitHub Actions Inactive

move to subpackage

2994867

ettec temporarily deployed to sdlc May 27, 2024 17:49 — with GitHub Actions Inactive

bolekk reviewed May 28, 2024

View reviewed changes

wip

b7980c2

ettec temporarily deployed to sdlc May 28, 2024 09:25 — with GitHub Actions Inactive

wip

fa117f1

ettec temporarily deployed to sdlc May 28, 2024 12:44 — with GitHub Actions Inactive

more tests

18fe585

ettec temporarily deployed to sdlc May 28, 2024 14:50 — with GitHub Actions Inactive

wip

e1427a0

ettec temporarily deployed to sdlc May 28, 2024 15:09 — with GitHub Actions Inactive

error case tests

a0fc1a7

ettec temporarily deployed to sdlc May 28, 2024 16:54 — with GitHub Actions Inactive

more tests

f80dcdc

ettec temporarily deployed to sdlc May 28, 2024 17:05 — with GitHub Actions Inactive

wip

8fc2972

ettec temporarily deployed to sdlc May 28, 2024 17:17 — with GitHub Actions Inactive

wip

eba0669

ettec temporarily deployed to sdlc May 28, 2024 17:41 — with GitHub Actions Inactive

make caller and reciver multithreaded to prevent slow executor blocking

4de7ff9

ettec temporarily deployed to sdlc May 28, 2024 17:45 — with GitHub Actions Inactive

update error handling

ac07133

ettec temporarily deployed to sdlc May 29, 2024 10:21 — with GitHub Actions Inactive

additional error case tests

a505abc

ettec temporarily deployed to sdlc May 29, 2024 10:40 — with GitHub Actions Inactive

tidyup

a3e0c6a

ettec temporarily deployed to sdlc May 29, 2024 10:56 — with GitHub Actions Inactive

ettec closed this May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ks 119/remote transmission protocol #13293

Ks 119/remote transmission protocol #13293

ettec commented May 22, 2024 •

edited

Loading

github-actions bot commented May 22, 2024

bolekk May 22, 2024

bolekk May 22, 2024

bolekk May 22, 2024

bolekk May 22, 2024

bolekk May 22, 2024

bolekk May 23, 2024

bolekk May 28, 2024

bolekk May 28, 2024

bolekk May 28, 2024

bolekk May 28, 2024

bolekk May 28, 2024

cl-sonarqube-production bot commented May 29, 2024

Ks 119/remote transmission protocol #13293

Ks 119/remote transmission protocol #13293

Conversation

ettec commented May 22, 2024 • edited Loading

github-actions bot commented May 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cl-sonarqube-production bot commented May 29, 2024

Quality Gate failed

ettec commented May 22, 2024 •

edited

Loading