Skip to content

Commit

Permalink
v3 docs updates (#150)
Browse files Browse the repository at this point in the history
  • Loading branch information
agouin authored May 16, 2023
1 parent cf9a0ce commit eda0df9
Showing 1 changed file with 66 additions and 44 deletions.
110 changes: 66 additions & 44 deletions docs/migrating.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,20 +19,20 @@ This document will describe a migration from a "starting system" to a 2-of-3 mul
- Sentries: 3x VM w/ 4 CPU, 16GB RAM, 500GB SSD storage running fully synced chain daemon
- These chain daemons should only expose the `:26656` (p2p) port to the open internet
- The daemons will need to expose `:1234` (priv validator port) to the `horcrux` nodes, but not to the open internet
- Signers: 3x VM w/ 1 CPU, 1 GB RAM, 20 GB SSD storage running `horcux`
- Signers: 3x VM w/ 1 CPU, 1 GB RAM, 20 GB SSD storage running `horcrux`
- These nodes should not expose any ports to the open internet and should only connect with the sentries

## Migration Steps

### 1. Setup Full Nodes
### 1. Setup Chain Nodes

The first step to the migration is to sync the full nodes you will be using as sentries. To follow this guide, ensure that you have 3 nodes from the chain you are validating on which are synced. Follow the instructions for the individual chain for spinning up those nodes. This is the part of setting up `horcrux` that takes the longest.
The first step to the migration is to sync the chain nodes (also known as full nodes) that you will be using as sentries. To follow this guide, ensure that you have nodes from the chain you are validating on that are in sync with the latest height of the chain. You can validate with a minimum of one sentry node, but more are recommended for redundancy/availability. We will use three chain nodes for this example. Follow the instructions for the individual chain for spinning up those nodes. This is the part of setting up `horcrux` that takes the longest.

> **NOTE:** This is also a great usecase for [state sync](https://blog.cosmos.network/cosmos-sdk-state-sync-guide-99e4cf43be2f). Or one of the [quick sync services](https://quicksync.io/) that exist.
### 2. Setup Signer Nodes

To setup the signer nodes, start by recording the private IPs for each of the signer and sentry nodes. Order matters, and you will need these values to configure the signers. Make a table like so:
To setup the signer nodes, start by recording the private IPs or DNS hostnames for each of the signer and sentry (chain) nodes. Order matters, and you will need these values to configure the signers. Make a table like so:

```bash
# EXAMPLE
Expand All @@ -45,35 +45,36 @@ signer-2: 10.168.1.2
signer-3: 10.168.1.3
```

When installing `horcrux` we recommend using the prebuilt binaries from the [releases page](https://github.com/strangelove-ventures/horcrux/releases). Pick the release corresponding to the `tendermint` dependency for the `go.mod` of your chain binary. You should be able to get this with `{binary} version --long`. Install like so:
When installing `horcrux` we recommend using either the [container image](https://github.com/strangelove-ventures/horcrux/pkgs/container/horcrux) or the [prebuilt binary](https://github.com/strangelove-ventures/horcrux/releases) for the latest stable release.

The image or binary will be used on each cosigner (bare virtual machine, docker container, kubernetes pod, etc.)
The binary should also be installed on your local machine for working with the config and key files before distributing to the cosigner nodes.

Run the following on your local machine. If you are using the binary on the cosigners rather than container image, run this on each cosigner node VM also.
```bash
# On each signer VM
$ wget https://github.com/strangelove-ventures/horcrux/releases/download/v2.1.0/horcrux_2.1.0_linux_amd64.tar.gz
$ tar -xzf horcrux_2.1.0_linux_amd64.tar.gz
$ sudo mv horcrux /usr/bin/horcrux && rm horcrux_2.1.0_linux_amd64.tar.gz README.md LICENSE.md
TAG=v3.0.0
$ wget https://github.com/strangelove-ventures/horcrux/releases/download/${TAG}/horcrux_${TAG}_linux_amd64.tar.gz
$ tar -xzf horcrux_${TAG}_linux_amd64.tar.gz
$ sudo mv horcrux /usr/bin/horcrux && rm horcrux_${TAG}_linux_amd64.tar.gz README.md LICENSE.md
```

Once the binary is installed in `/usr/bin`, install the `systemd` unit file. You can find an [example here](./horcrux.service):
For each cosigner node (not required on local machine): once the binary is installed in `/usr/bin`, install the `systemd` unit file. You can find an [example here](./horcrux.service):

```bash
# On each signer VM
# On each horcrux cosigner
$ sudo nano /etc/systemd/system/horcrux.service
# copy file contents and modify to fit your environment
$ sudo systemctl daemon-reload
```

After that is done, initialize the configuration for the cosigners using the `horcrux` cli. If you would like different cosigners to connect to different sentry node(s), modify the `--node` flag values for each cosigner.
After that is done, initialize the shared configuration for the cosigners on your local machine using the `horcrux` cli. If you would like different cosigners to connect to different sentry node(s): repeat this command and modify the `--node` flag values for each cosigner, or modify the config after the initial generation.

```bash
# Run this command to generate a config file that can be used on all config nodes.


$ horcrux config init --node "tcp://10.168.0.1:1234" --node "tcp://10.168.0.2:1234" --node "tcp://10.168.0.3:1234" --cosigner "tcp://10.168.1.1:2222" --cosigner "tcp://10.168.1.2:2222" --cosigner "tcp://10.168.1.3:2222" --threshold 2 --grpc-timeout 1000ms --raft-timeout 1000ms
```

> **Note**
> Note the use of multiple `--node` and `--cosigner` flags. In this example, there are 3 sentry(chain) nodes that each horcrux cosigner will connect to. There are 3 horcrux cosigners, with a threshold of 2 cosigners required to sign a valid block signature.
> Note the use of multiple `--node` and `--cosigner` flags. In this example, there are 3 sentry (chain) nodes that each horcrux cosigner will connect to. There are 3 horcrux cosigners, with a threshold of 2 cosigners required to sign a valid block signature.
#### Flags

Expand All @@ -87,29 +88,46 @@ $ horcrux config init --node "tcp://10.168.0.1:1234" --node "tcp://10.168.0.2:12
> **Warning**
> SINGLE-SIGNER MODE SHOULD NOT BE USED FOR MAINNET! Horcrux single-signer mode does not give the level of improved key security and fault tolerance that Horcrux MPC/cosigner mode provides. While it is a simpler deployment configuration, single-signer should only be used for experimentation as it is not officially supported by Strangelove.
### 3. Split `priv_validator_key.json` and distribute key material

> **CAUTION:** **The security of any key material is outside the scope of this guide. The suggested procedure here is not necessarily the one you will use. We aim to make this guide easy to understand, not necessarily the most secure. The tooling here is all written in go and can be compiled and used in an airgapped setup if needed. Please open issues if you have questions about how to fit `horcrux` into your infra.**
### 3. Generate cosigner communication encryption keys

On some computer that contains your `priv_validator_key.json` create a folder to split the key through the following command. This may take a moment to complete:
Horcrux uses RSA 4096 keys to encrypt cosigner-to-cosigner p2p communication. This is done by encrypting the payloads that are sent over GRPC between cosigners. Open your shell to a working directory and generate the RSA keys that will be used on each cosigner using the `horcrux` CLI on your local machine.

```bash
$ ls
priv_validator_key.json

$ horcrux create-ed25519-shards --chain-id cosmoshub-4 --key-file priv_validator_key.json --threshold 2 --shards 3
Created Ed25519 Shard cosigner_1/cosmoshub-4_shard.json
Created Ed25519 Shard cosigner_2/cosmoshub-4_shard.json
Created Ed25519 Shard cosigner_3/cosmoshub-4_shard.json

$ horcrux create-rsa-shards --shards 3
Created RSA Shard cosigner_1/rsa_keys.json
Created RSA Shard cosigner_2/rsa_keys.json
Created RSA Shard cosigner_3/rsa_keys.json

$ ls -R
.:
cosigner_1 cosigner_2 cosigner_3 priv_validator_key.json
cosigner_1 cosigner_2 cosigner_3

./cosigner_1:
rsa_keys.json

./cosigner_2:
rsa_keys.json

./cosigner_3:
rsa_keys.json
```

### 4. Shard `priv_validator_key.json` for each chain.

> **CAUTION:** **The security of any key material is outside the scope of this guide. The suggested procedure here is not necessarily the one you will use. We aim to make this guide easy to understand, not necessarily the most secure. This guide assumes that your local machine is a trusted computer. The tooling here is all written in go and can be compiled and used in an airgapped setup if needed. Please open issues if you have questions about how to fit `horcrux` into your infra.**
Horcrux uses threshold Ed25519 cryptography to sign a block payload on the cosigners and combine the resulting signatures to produce a signature that can be validated against your validator's Ed25519 public key. On your local machine which contains your full `priv_validator_key.json` key file(s), shard the key using the `horcrux` CLI in the same working directory as the previous command.

```bash
$ horcrux create-ed25519-shards --chain-id cosmoshub-4 --key-file /path/to/cosmoshub/priv_validator_key.json --threshold 2 --shards 3
Created Ed25519 Shard cosigner_1/cosmoshub-4_shard.json
Created Ed25519 Shard cosigner_2/cosmoshub-4_shard.json
Created Ed25519 Shard cosigner_3/cosmoshub-4_shard.json

$ ls -R
.:
cosigner_1 cosigner_2 cosigner_3

./cosigner_1:
cosmoshub-4_shard.json rsa_keys.json
Expand All @@ -121,13 +139,15 @@ cosmoshub-4_shard.json rsa_keys.json
cosmoshub-4_shard.json rsa_keys.json
```

The files need to be moved their corresponding signer nodes in the `~/.horcrux/` directory. It is important to make sure the files for the cosigner `{id}` (in `cosigner_{id}`) are placed on the corresponding cosigner node. If not, the cluster will not produce valid signatures. If you have named your nodes with their index as the signer index, as in this guide, this operation should be easy to check.
If you will be signing for multiple chains with this single horcrux cluster, repeat this step with the `priv_validator_key.json` for each additional chain ID.

### 5. Distribute config file and key shards to each cosigner.

At the end of this step, each of your horcrux nodes should have a `~/.horcrux/{chain-id}_shard.json` file with the contents matching the appropriate `cosigner_{id}/{chain-id}_shard.json` file corresponding to the node number. Additionally, each of your horcrux nodes should have a `~/.horcrux/rsa_keys.json` file with the contents matching the appropriate `cosigner_{id}/rsa_keys.json` file corresponding to the node number.
The files need to be moved their corresponding signer nodes in the `~/.horcrux/` directory. It is important to make sure the files for the cosigner `{id}` (in `cosigner_{id}`) are placed on the corresponding cosigner node. If not, the cluster will not produce valid signatures. If you have named your nodes with their index as the signer index, as in this guide, this operation should be easy to check.

If you will be signing for multiple chains with this single horcrux cluster, repeat the `horcrux create-ed25519-shards` command with the `priv_validator_key.json` for each additional chain ID, and again place on the corresponding horcrux nodes.
At the end of this step, each of your horcrux nodes should have a `~/.horcrux/{chain-id}_shard.json` file for each `chain-id` with the contents matching the appropriate `cosigner_{id}/{chain-id}_shard.json` file corresponding to the node number. Additionally, each of your horcrux nodes should have a `~/.horcrux/rsa_keys.json` file with the contents matching the appropriate `cosigner_{id}/rsa_keys.json` file corresponding to the node number.

### 4. Halt your validator node and supply signer state data `horcrux` nodes
### 6. Halt your validator node and supply signer state data `horcrux` nodes

Now is the moment of truth. There will be a few minutes of downtime for this step, so ensure you have read the following directions completely before moving forward.

Expand Down Expand Up @@ -159,9 +179,9 @@ You will need to replace the contents of the `~/.horcrux/state/{chain-id}_priv_v

`horcrux state import` can be used to import an existing `priv_validator_state.json`

### 5. Start the signer cluster
### 7. Start the cosigner cluster

Once you have all of the signer nodes fully configured its time to start them. Start all of them at roughly the same time:
Once you have all of the cosigner nodes fully configured its time to start them. Start all of them at roughly the same time:

```bash
sudo systemctl start horcrux && journalctl -u horcrux -f
Expand All @@ -170,12 +190,14 @@ sudo systemctl start horcrux && journalctl -u horcrux -f
The following logs should be flowing on each signer node:

```log
I[2021-09-24|02:10:09.022] Tendermint Validator module=validator mode=mpc priv_key=...
I[2021-09-24|02:10:09.023] Starting CosignerRPCServer service module=validator impl=CosignerRPCServer
I[2021-09-24|02:10:09.025] Signer module=validator pubkey=PubKeyEd25519{9A66109B69C...
I[2021-09-24|02:10:09.025] Starting RemoteSigner service module=validator impl=RemoteSigner
E[2021-09-24|02:10:09.027] Dialing module=validator err="dial tcp 10.180.0.16:1234...
I[2021-09-24|02:10:09.027] Retrying module=validator sleep(s)=3 address=tcp://10.180...
I[2023-05-15|20:09:22.988] Horcrux Validator module=validator mode=threshold priv-state-dir=/root/.horcrux/state
I[2023-05-15|20:09:22.990] service start module=validator msg="Starting CosignerRaftStore service" impl=CosignerRaftStore
I[2023-05-15|20:09:22.991] Local Raft Listening module=validator port=2222
I[2023-05-15|20:09:22.993] service start module=validator msg="Starting RemoteSigner service" impl=RemoteSigner
I[2023-05-15|20:09:22.993] service start module=validator msg="Starting RemoteSigner service" impl=RemoteSigner
I[2023-05-15|20:09:22.993] service start module=validator msg="Starting RemoteSigner service" impl=RemoteSigner
E[2023-05-15|20:09:22.994] Dialing module=validator err="dial tcp 10.180.0.16:1234...
I[2023-05-15|20:09:22.995] Retrying module=validator sleep(s)=3 address=tcp://10.180...
...
```

Expand All @@ -186,7 +208,7 @@ The signer will continue retrying attempts to reach the sentries until we turn t

> **NOTE:** leaving these logs streaming in seperate terminal windows will enable you to watch the cluster connect to the sentries.
### 6. Configure and start your full nodes
### 8. Configure and start your full nodes

Once the signer cluster has started successfully its time to reconfigure and restart your sentry nodes. On each node enable the priv validator listener and verify config changes with the following commands:

Expand All @@ -204,14 +226,14 @@ $ sudo systemctl restart {node_service} && journalctl -u {node_service} -f

Common failure modes:

- Ports on your cloud service aren't properly configured and prevent signers/sentries from communicating
- Ports on the firewall (cosigner VM, cloud service, LAN port-forwards, etc.) aren't properly opened and prevent signers/sentries from communicating
- Node crashes because the signer didn't retry in time, can be fixed by trying again and/or restarting signer. May take some fiddling

### 7. CONGRATS!
### 9. CONGRATS!

You now can sleep much better at night because you are much less likely to have a down validator wake you up in the middle of the night. You have also completed a stressful migration on a production system. Go run around outside screaming, pet your dog, eat a nice meal, hug your kids/significant other, etc... and enjoy the rest of your day!

### 8. Administration Commands
### 10. Administration Commands

`horcrux elect` - Elect a new cluster leader. Pass an optional argument with the intended leader ID to elect that cosigner as the new leader, e.g. `horcrux elect 3` to elect cosigner with `shardID: 3` as leader. This is an optimistic leader election, it is not guaranteed that the exact requested leader will be elected.

Expand Down

0 comments on commit eda0df9

Please sign in to comment.