Build customerd architecture with an RPC interface #347

marsella · 2022-01-11T23:23:44Z

Motivation

The customer architecture was originally created to be completely ephemeral: it would execute an operation (like a payment) and then cease to exist. However, this was an incorrect assumption. The customer needs a chain watcher to be running at any time that it has an open channel, to monitor for closing behaviors.

We added a long-running watcher process, but this is not tightly linked to other customer processes. For example, it's possible to create a new channel without first starting the watcher, which is a protocol violation and can result in loss of funds #241. Instead, there should be a customer daemon, without which no ephemeral customer operations can be executed.

Goal

customerd is a long-running daemon process that runs in the background. It holds a database connection, maintains a chain watcher, and executes a queue of customer operations as they are requested.

Customer operations are initiated on the command line. They cannot run if the customerd is not initialized. They communicate with customerd using an RPC protocol.

The RPC protocol itself is a specification that describes the ways other processes can communicate with the daemon. Each current command line operation will have a corresponding well-defined request. These will need to be documented, probably in zkchanels-spec. There will be an additional command to kill the daemon and shut down the chain watcher.

Advantages of this approach include

Fixes Only allow new channels if the customer has a chain watcher up #241
Adds consistency of internal state, like the database connection or the configuration details, across ephemeral operations
Prevents some order-of-operations bugs, like only processing expiry close after all running payments have completed

Existing work

There's a (commented-out) daemon architecture in the customer watcher. This was designed to be a ping-only daemon that would check the chain when it got a ping from another function. The ping interaction is a Dialectic protocol. It sets up a Server, broadcasts on localhost, and refreshes the watcher on receiving a request.

I think the reusable part of this architecture is the Server. It was refactored in #146 to remove the TLS requirement, so requests can come in on the local network (the network options are now either TLS or TCP).

The server is parameterized by a dialectic protocol.

Next steps

figure out the difference between TLS and TCP, determine whether we need a different IoStream for this application
Determine what the dialectic protocol for an RPC interaction looks like (just Request -> Response <- ?)
Queuing protocol: sketch out some options to achieve the goal of executing operations as they arrive, but also checking the chain every 1 minute
Draft the RPC request / response spec
Modify the watch command to set up a simple server in addition to the polling service
Update a simple command (list?) to send an RPC request instead of executing itself

The text was updated successfully, but these errors were encountered:

marsella · 2022-01-13T22:10:11Z

TCP or TCP+TLS?

TCP is a communication protocol that describes how to move information from point A to point B. TLS is an encryption protocol. There's no impediment to using TLS with JSON-RPC. However, TLS requires valid certificates approved by a certificate authority.

Most processes would be accessing the daemon from the same machine, so it's a bit of a weird trust model to require external validation of your local daemon -- if an attacker can impersonate and run a daemon on your local machine, you probably have bigger problems.

The tradeoffs are less clear to me for processes that access the demon from a different machine on the same LAN, or for a scenario where we configure the daemon to be accessible from outside the LAN. I am willing to make the assumption that such access should not be possible, and use unencrypted communication (TCP only) for the daemon for now.

marsella · 2022-01-13T22:26:36Z

JSON-RPC Dialectic protocol

The JSON-RPC spec is straightforward: the client must send a Request. The server must respond with a Response, or with nothing if the Request is a notification type.

Then the dialectic protocol is probably going to be

Customer chooses from two options (Request or Notification)
If Notification, the customer sends a Notification
If Request, the customer sends a Request and receives a Response
Close the connection

The response will either have a result or an error, but I think it makes sense to encode this locally (like, parse a Response type to get an Result<RpcResult, RpcError>) rather than as a second choice in the RPC network protocol.

I am not sure that this will be compatible with standard (non-Dialectic) RPC servers. The response-parsing is, but the initial choice is not. I think this is the issue is already raised in Dialectic. However, I am willing to make this compromise in order to not re-write the server from scratch. It may be the case that our spec will not use any Notifications, in which case we can be fully compatible.

marsella · 2022-01-14T20:22:05Z

Queuing protocols

With the current chain-watching infrastructure, we want the daemon to check the chain every 1 minute, and we want it to process other requests sent in via RPC. Eventually, the chain watcher should get updated to a notification service, where the daemon should receive push notifications and react to them as they arrive.

In general, requests cannot be parallelized (e.g. you can only execute one payment at a time). So the request queue should just be a normal queue, maybe holding futures, and then we can await them in order. When requests come in, they go at the back of the queue. When chain-watching steps come in, they go at the front. This could still cause chain-watching steps to get highly delayed, though, if e.g. the previous thing on the queue involves posting to chain and it takes 20 minutes.

Instead, the queue should just run as a separate task, like the merchant server. In this architecture, we spawn two tasks, one with a running server and one with a looping polling service. Both of these are expected to run forever, unless they encounter an error. If one of them raises a fatal error or if the server gets a "kill" request, they both shut down (in particular, the function ends without waiting for the uncompleted task to terminate).

With this architecture, we don't need any kind of special queue algorithm. It's just a standard FIFO queue.

marsella · 2022-01-18T22:35:29Z

Server

Upon further reflection, I think we can't use our Server code until Dialectic fixes the self-documenting choice issue. We need to be able to reject Notifications (even if they aren't allowed in our protocol) by closing the channel, but Dialectic won't allow that without another choice message. A normal RPC client won't know what to do with that choice.

Next step: determine what the server in json-rpc provides. Look at other options for Rust JSON-RPC libraries and see what they provide. Edit this comment.

marsella self-assigned this Jan 11, 2022

marsella added the Epic label Jan 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build customerd architecture with an RPC interface #347

Build customerd architecture with an RPC interface #347

marsella commented Jan 11, 2022 •

edited

Loading

marsella commented Jan 13, 2022

marsella commented Jan 13, 2022

marsella commented Jan 14, 2022

marsella commented Jan 18, 2022

Build customerd architecture with an RPC interface #347

Build customerd architecture with an RPC interface #347

Comments

marsella commented Jan 11, 2022 • edited Loading

Motivation

Goal

Existing work

Next steps

marsella commented Jan 13, 2022

marsella commented Jan 13, 2022

marsella commented Jan 14, 2022

marsella commented Jan 18, 2022

marsella commented Jan 11, 2022 •

edited

Loading