Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Mithril Signer deployment. #1693

Merged
merged 23 commits into from
Oct 31, 2023

Conversation

TrevorBenson
Copy link
Collaborator

@TrevorBenson TrevorBenson commented Oct 21, 2023

Adds support to download, install, and set up a mithril-signer for the appropriate network. Includes changes to deploy-as-systemd.sh to provide an option to run it as a Systemd service.

New scripts

mithril-client.sh

  • Opt -d downloads the latest mithril snapshot
    • If the environment file does not exist it will create it.
  • Opt -u creates or updates the environment file.
    • If the POOL_NAME is undefined or CHANGE_ME will create a minimal environment suitable for mithril client
    • If the POOL_NAME is defined will create a complete environment suitable for mithril signer
      • Presumes a naive deployment without mithril relay for automated environment setup on first start
      • Use either deploy-as-systemd.sh, or mithril-signer.sh -u directly, to define a complex mithril environment with a mithril relay (or the sidecar)

mithril-signer.sh

  • follows the ogmios.sh format such that it can be used to start the mithril signer.
  • Opt -d sets up the Systemd service unit.
    • If the environment file does not exist it will create it.
  • Opt -u creates or updates the environment file.
    • uses read to determine if naive deployment or use a mithril relay (or the sidecar) IP and port to set RELAY_ENDPOINT

mithril-relay.sh

  • Opt -d
    • installs squid
    • uses read to gather variables for block producer IP and relay port.
    • Creates a squid.conf for a reverse proxy.
      • applies an ACL in the config for each block producer.
    • Starts the squid systemd service.
  • Opt -l
    • installs nginx
    • uses read to gather variables for relay IP address(es), relay port, and IP to bind sidecar (Relay port shared between all relays and sidecar)
    • Creates an nginx.conf load balancing over relays.
    • starts nginx systems service.

Additional Details

  • Increases storage requirements for binaries by ~42MB.
  • Currently uses a static mithril_release set to 2337.0.
    • Includes a commented-out option to obtain the latest release via tag_name, similar to cardano-signer.
    • Open to the preferred method suggested.

The mithril relay and squid configuration comes from Mithril docs. The HA setup via an nginx sidecar is a Proof of Concept at the moment. I have tested with a single relay, but not yet with the sidecar and multiple relays.

@TrevorBenson TrevorBenson requested review from rdlrt and Scitz0 October 21, 2023 19:07
@TrevorBenson TrevorBenson self-assigned this Oct 21, 2023
@TrevorBenson TrevorBenson marked this pull request as draft October 21, 2023 19:09
@TrevorBenson TrevorBenson force-pushed the feature/mithril-build branch 3 times, most recently from 8331eeb to 4722705 Compare October 21, 2023 21:08
@TrevorBenson TrevorBenson force-pushed the feature/mithril-build branch 2 times, most recently from d77aef0 to 5d4a746 Compare October 21, 2023 23:29
@TrevorBenson TrevorBenson changed the title guild-deploy option to download mithril binaries Support for Mithril Signer deployment. Oct 21, 2023
@TrevorBenson TrevorBenson force-pushed the feature/mithril-build branch from 5d4a746 to 1c8ad58 Compare October 21, 2023 23:42
@TrevorBenson TrevorBenson marked this pull request as ready for review October 21, 2023 23:45
@TrevorBenson TrevorBenson requested a review from gufmar October 22, 2023 17:01
@TrevorBenson
Copy link
Collaborator Author

I opened #1694 to address the pre-merge check failures observed in this PR.

@TrevorBenson TrevorBenson requested a review from rdlrt October 22, 2023 20:46
@TrevorBenson
Copy link
Collaborator Author

TrevorBenson commented Oct 23, 2023

With commit 9b82d64 the environment file contains each variable required for the cnode-mithril-signer.service to start once the node is up.

I was hoping to just source /opt/cardano/cnode/scripts/env however EnvironmentFile only takes variable assignments, so sourcing inside the ${CNODE_HOME}/mithril-signer/service.env file does not work. I also tried ExecStartPre=. /opt/cardano/cnode/scripts/env, but that caused an error parsing the service unit.

In the end, it seemed simplest that since mithril-signer.sh -u is a forced update of the entire environment file, simply embed NETWORK and CARDANO_NODE_SOCKET_PATH into the environment file as a workaround. Since these would not normally change after deployment it seems acceptable to have the operator run an update of the environment if they change paths.

$ sudo systemctl status cnode-mithril-signer.service 
● cnode-mithril-signer.service - Cardano Mithril signer service
     Loaded: loaded (/etc/systemd/system/cnode-mithril-signer.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2023-10-23 00:25:47 UTC; 3s ago
   Main PID: 38606 (mithril-signer)
      Tasks: 14 (limit: 4555)
     Memory: 3.3M
        CPU: 143ms
     CGroup: /system.slice/cnode-mithril-signer.service
             └─38606 /home/tbenson/.local/bin/mithril-signer -vv

Oct 23 00:25:47 mithril cnode-mithril-signer[38606]: {"msg":"Storing/Getting immutables digests cache from: /opt/cardano/cnode/mithril-signer/data-stores/immutables_di>
Oct 23 00:25:48 mithril cnode-mithril-signer[38606]: {"msg":"New AggregatorHTTPClient created","v":0,"name":"slog-rs","level":20,"time":"2023-10-23T00:25:48.071677604Z>
Oct 23 00:25:48 mithril cnode-mithril-signer[38606]: {"msg":"Started","v":0,"name":"slog-rs","level":20,"time":"2023-10-23T00:25:48.071709418Z","hostname":"mithril","p>
Oct 23 00:25:48 mithril cnode-mithril-signer[38606]: {"msg":"STATE MACHINE: launching","v":0,"name":"slog-rs","level":30,"time":"2023-10-23T00:25:48.071733362Z","hostn>
Oct 23 00:25:48 mithril cnode-mithril-signer[38606]: {"msg":"================================================================================","v":0,"name":"slog-rs",">
Oct 23 00:25:48 mithril cnode-mithril-signer[38606]: {"msg":"STATE MACHINE: new cycle: Init","v":0,"name":"slog-rs","level":30,"time":"2023-10-23T00:25:48.071764699Z",>
Oct 23 00:25:48 mithril cnode-mithril-signer[38606]: {"msg":"RUNNER: get_current_epoch","v":0,"name":"slog-rs","level":20,"time":"2023-10-23T00:25:48.071779925Z","host>
Oct 23 00:25:48 mithril cnode-mithril-signer[38606]: {"msg":"RUNNER: update_era_checker","v":0,"name":"slog-rs","level":20,"time":"2023-10-23T00:25:48.104549893Z","hos>
Oct 23 00:25:48 mithril cnode-mithril-signer[38606]: {"msg":"Current Era is thales (Epoch 101).","v":0,"name":"slog-rs","level":20,"time":"2023-10-23T00:25:48.52686054>
Oct 23 00:25:48 mithril cnode-mithril-signer[38606]: {"msg":"… Cycle finished, Sleeping for 60000 ms","v":0,"name":"slog-rs","level":30,"time":"2023-10-23T00:25:48.526>

@TrevorBenson
Copy link
Collaborator Author

I believe the final check failures are due to opening the PR from my forked branch (origin), instead of my push being to cardanocommunity (upstream), leading to the premerge expecting the branch/commit to come from upstream but not existing.

@TrevorBenson
Copy link
Collaborator Author

I believe the final check failures are due to opening the PR from my forked branch (origin), instead of my push being to cardanocommunity (upstream), leading to the premerge expecting the branch/commit to come from upstream but not existing.

I pushed the branch from my fork upstream and ran a manual premerge workflow which was completed successfully.

@TrevorBenson
Copy link
Collaborator Author

TrevorBenson commented Oct 23, 2023

@rdlrt @Scitz0
I created a new branch named feature/mithril-build-with-relays. It adds support for Mithril relays. A new script mithril-relay.sh which can configure mithril relay (i.e. squid reverse proxy) on relay nodes or dedicated mithril relays using -d.

The mithril signer service.env takes a single RELAY_ENDPOINT and mithril-signer.sh will inquire whether to use a relay or to leave it as a naive deployment for testnets, like the current branch defaults to.

For Production an HA option exists with mithril-relay.sh -l that will set up an nginx sidecar (local) load balancer. It creates a backend pool out of the Mithril relay IPs and round robins . What IP the load balancer should be on (preferably 127.0.0.1, but the script does take user input in case sidecar is not local to the signer), and the relay port to use for the loadbalancer and each mithril relay in the nginx configurations backend pool.

Would you like this added to this PR for review or make a new PR after this branch reaches approval and merges?

@rdlrt
Copy link
Contributor

rdlrt commented Oct 23, 2023

Would you like this added to this PR for review or make a new PR after this branch reaches approval and merges?

Up to you, happy to use this PR itself so they're merged together

@TrevorBenson
Copy link
Collaborator Author

TrevorBenson commented Oct 23, 2023

Would you like this added to this PR for review or make a new PR after this branch reaches approval and merges?

Up to you, happy to use this PR itself so they're merged together

Commit cherry-picked into this branch, I think the trunk (flake8) formatting adjusted the layout so it may not match indenting etc. for existing scripts. I'll fix that, and other general linter suggestions, prior to merge.

Improve status updates about installation and service starts.
@Scitz0
Copy link
Contributor

Scitz0 commented Oct 24, 2023

The one thing that is important to add to the scripts themselves is docs addition. Describe how the scripts should be used and how it all ties together. Doesn't have to explain Mithril itself except for one or two high-level sentences. Can then link to https://mithril.network/doc/ for more information.

As I have mentioned before I have yet to take the time to read up on Mithril so my knowledge isn't extensive yet. cardano-signer.sh is to be deployed on block producing node, and cardano-relay.sh on the relay node. Then we have the client to actually download the blockchain snapshot. Are there plans to add something for this as well?

@TrevorBenson
Copy link
Collaborator Author

TrevorBenson commented Oct 24, 2023

The one thing that is important to add to the scripts themselves is docs addition. Describe how the scripts should be used and how it all ties together. Doesn't have to explain Mithril itself except for one or two high-level sentences. Can then link to https://mithril.network/doc/ for more information.

Agreed. Once the logic and script functionality is approved I'll include a commit for Documentation before merge.

As I have mentioned before I have yet to take the time to read up on Mithril so my knowledge isn't extensive yet. cardano-signer.sh is to be deployed on block producing node, and cardano-relay.sh on the relay node.

mithril-signer and mithril-relay in this instance.

For a naive deployment (testnets) the Mithril signer can be directly on a Cardano relay node, using mithril-signer.sh -d and no Mithril relay is required.

For a production deployment the Mithril signer should be on the block producer node using mithril-signer.sh -d. As well as a Mithril relay on a Cardano relay node by using mithril-relay.sh -d. If the operator wants relay redundancy they repeat the mithril-relay.sh -d set up on more than one relay. Then to allow the Mithril signer to load balance all relays, on the block producer node use mithril-relay.sh -l, provide all the Mithril relay IP's, an nginx loadbalancer is setup with all relays in its backend server pool.

Then we have the client to actually download the blockchain snapshot. Are there plans to add something for this as well?

Potentially, if that's something the guild would like. Automating the download of snapshots and extracting them could be done. I could see this as a mithril-client.sh which determines when to obtain snapshots and extract, or to skip.

It could be directly at the cnode.sh level, using a flag like ENABLE_MITHRIL_CLIENT. When true cnode.sh calls mithril-client.sh which either gets the needed snapshots and exits or decides no snapshots are needed and exits, returning to cnode.sh

Otherwise to not touch cnode.sh it could be a Systemd only option which mithril-client.sh -d could apply a unit override to for cnode.service using a cnode.service.d directory. Adding an ExecStartPre=bash -c '/opt/cardano/cnode/scripts/mithril-client.sh -XX' (flags undecided ATM), causing it to check before the service main ExecStart.

@Scitz0
Copy link
Contributor

Scitz0 commented Oct 24, 2023

Potentially, if that's something the guild would like. Automating the download of snapshots and extracting them could be done. I could see this as a mithril-client.sh which determines when to obtain snapshots and extract, or to skip.

It could be directly at the cnode.sh level, using a flag like ENABLE_MITHRIL_CLIENT. When true cnode.sh calls mithril-client.sh which either gets the needed snapshots and exits or decides no snapshots are needed and exits, returning to cnode.sh

Otherwise to not touch cnode.sh it could be a Systemd only option which mithril-client.sh -d could apply a unit override to for cnode.service using a cnode.service.d directory. Adding an ExecStartPre=bash -c '/opt/cardano/cnode/scripts/mithril-client.sh -XX' (flags undecided ATM), causing it to check before the service main ExecStart.

I'm a bit split on the best deployment for this. But I'm leaning towards the second option to add an ExecStartPre override. This would be a cleaner option keeping it totally separate from normal deployment.

@TrevorBenson
Copy link
Collaborator Author

I'm a bit split on the best deployment for this. But I'm leaning towards the second option to add an ExecStartPre override. This would be a cleaner option keeping it totally separate from normal deployment.

Mithril client is responsible for restoring the Cardano blockchain on an empty node from a certified snapshot. this was the logic I intended to have the script follow.

@Scitz0 Unless you think the script should attempt to determine how far out of sync a node is, when it already has content in the DB folder, and whether to rely on block synchronization or get a new Mithril snapshot?

To be honest I've not spent much time thinking about how to determine this when query tip would not be available. I used the Mithril client when I had existing DB data. Except I knew how far out of sync my node was or that my filesystem had filled and corrupted the current DB anyway, so made an informed decision to take the action manually.

@Scitz0
Copy link
Contributor

Scitz0 commented Oct 24, 2023

I'm a bit split on the best deployment for this. But I'm leaning towards the second option to add an ExecStartPre override. This would be a cleaner option keeping it totally separate from normal deployment.

Mithril client is responsible for restoring the Cardano blockchain on an empty node from a certified snapshot. this was the logic I intended to have the script follow.

@Scitz0 Unless you think the script should attempt to determine how far out of sync a node is, when it already has content in the DB folder, and whether to rely on block synchronization or get a new Mithril snapshot?

To be honest I've not spent much time thinking about how to determine this when query tip would not be available. I used the Mithril client when I had existing DB data. Except I knew how far out of sync my node was or that my filesystem had filled and corrupted the current DB anyway, so made an informed decision to take the action manually.

My idea was only as an initial fresh deployment, not to keep track of sync state. I think we can keep it easy like this. I'm also in theory fine with just adding it to cnode.sh if priyank feels this is ok as well.

@TrevorBenson
Copy link
Collaborator Author

TrevorBenson commented Oct 24, 2023

My idea was only as an initial fresh deployment, not to keep track of sync state. I think we can keep it easy like this. I'm also in theory fine with just adding it to cnode.sh if priyank feels this is ok as well.

@rdlrt What do you think?

  • Using a Systemd override file to ExecStartPre mithril-client.sh
    • Automatic for Systemd use case.
    • Manual for (cnode.sh non Systemd) and container use cases
  • Using cnode.sh to call mithril-client.sh
    • Automatic for all use cases.

The latter definitely gives the most coverage with identical behavior. The only question, if we go with the latter, is should the check for an empty ${CNODE_HOME}/db be in cnode.sh before calling mithril-client.sh or be in mithril-client.sh to determine if the db directory is empty. Either works but I figured there may be a preference for separation of responsibilities.

@Scitz0
Copy link
Contributor

Scitz0 commented Oct 26, 2023

My idea was only as an initial fresh deployment, not to keep track of sync state. I think we can keep it easy like this. I'm also in theory fine with just adding it to cnode.sh if priyank feels this is ok as well.

@rdlrt What do you think?

  • Using a Systemd override file to ExecStartPre mithril-client.sh

    • Automatic for Systemd use case.
    • Manual for (cnode.sh non Systemd) and container use cases
  • Using cnode.sh to call mithril-client.sh

    • Automatic for all use cases.

The latter definitely gives the most coverage with identical behavior. The only question, if we go with the latter, is should the check for an empty ${CNODE_HOME}/db be in cnode.sh before calling mithril-client.sh or be in mithril-client.sh to determine if the db directory is empty. Either works but I figured there may be a preference for separation of responsibilities.

I think everything related to mithril, ie if db is empty or not should be a job for mithril-client.sh. We should also add a boolean in user config setting to enable/disable mithril. I'm thinking that mithril could be default on. Though need to make sure it properly handles networks that might not have any snapshots like guild network currently. So it silently just starts syncing normally in this case. Or perhaps prints a line about it and continues.

Other than this I think the cnode.sh route is preferred as you mention, else manual execution of cnode.sh wouldn't work the same way as systemd deployed version of cnode.sh.

@TrevorBenson
Copy link
Collaborator Author

I think everything related to mithril, ie if db is empty or not should be a job for mithril-client.sh. We should also add a boolean in user config setting to enable/disable mithril. I'm thinking that mithril could be default on. Though need to make sure it properly handles networks that might not have any snapshots like guild network currently. So it silently just starts syncing normally in this case. Or perhaps prints a line about it and continues.

Other than this I think the cnode.sh route is preferred as you mention, else manual execution of cnode.sh wouldn't work the same way as systemd deployed version of cnode.sh.

Great, thats what I did in my local branch. I'll do some more tests this evening and get it pushed.

@TrevorBenson
Copy link
Collaborator Author

TrevorBenson commented Oct 30, 2023

@Scitz0 @rdlrt I've pushed the commit with mithril-client.sh. This changes a few things around in naming conventions.

  • After discussion with Arnaud about mithril-aggregator being a temporary artifact, using /opt/cardano/cnode/mithril as the directory for both signer and client.
  • Both signer and client require a set of variables. Renamed ${CNODE_HOME}/mithril/service.env to ${CNODE_HOME}/mithril/mithril.env.
  • Converted from using NETWORK to NETWORK_NAME, as the "source of truth" for network
    • NETWORK is common for container entrypoint.sh setting configs, but not for general cnode.sh or cnode.service Systemd usage.
    • Leads to a minor logic loop for containers
      1. Container sets NETWORK to determine the configs to download through entrypoint.sh
      2. ${CNODE_HOME}/scipts/env uses the configuration files to determine the NWMAGIC and sets the NETWORK_NAME
      3. When generating a mithril environment file mithril-signer.sh or mithril-client.sh use NETWORK_NAME to set the NETWORK variable in the ${CNODE_HOME}/mithril/mithril.env file.

While it is somewhat cyclical it should result in essentially an identical NETWORK variable as already used by the container entrypoint.sh being added to the mithril environment file and uses an identical logic to determine the variable regardless of use case for cnode (standalone binary execution, Systemd service, containerized).

scripts/cnode-helper-scripts/env Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/env Outdated Show resolved Hide resolved
scripts/cnode-helper-scripts/mithril-client.sh Outdated Show resolved Hide resolved
@TrevorBenson TrevorBenson requested a review from rdlrt October 31, 2023 13:42
@TrevorBenson
Copy link
Collaborator Author

The successfull pre-merge check (manually run) summary is available in this job https://github.com/cardano-community/guild-operators/actions/runs/6708797371

@rdlrt rdlrt merged commit 934f059 into cardano-community:alpha Oct 31, 2023
1 check passed
@TrevorBenson TrevorBenson deleted the feature/mithril-build branch November 28, 2023 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants