Database encryption key backup / recovery feature for `entropy-tss` #1249

ameba23 · 2025-01-14T11:49:42Z

Part of #1247

This adds a way to backup and recover the symmetric encryption key for the key-value store, which is used to store the secret keys of the TSS account and x25519 encryption keypair, as well as the network keyshare.

This is used to recover following a VM process restart, meaning we can keep the contents of the key-value database persistent, but inaccessible to the TSS node operator.

Backups are provided by other TSS nodes - that is every TSS node can hold encryption keys for other TSS node's data.

When a TSS node starts, it checks for the presence of an existing database.

If there is no database, a fresh encryption key is generated, a new database is initialized, and as part of the 'prerequisite' checks, another TSS node is chose pseudo-randomly, the encryption key is sent to them to be backed-up, and their details are stored on-disk in plaintext.
If there is an existing database, the details of the backup provider are retrieved and a recovery of the encryption key is requested. This is a two step process involving requesting a nonce and then sending a TDX quote with that nonce. On verifying the quote, the backup provider responds with the encryption key.

This PR is based off #1216 because i prefer having the keypair directly accessible from AppState. But it could be rebased to master.

Unresolved issues / TODOs:

We need an additional request-response cycle when requesting to recover a key, to get a quote nonce
Quote verification logic should probably be moved somewhere common (eg: entropy-shared) as it is now duplicated in the attestation pallet and entropy-tss
We need pre-known encryption keys for test validators alice, bob, etc, as we don't have access to their KvManager during testing.
Need to merge master in order to have the tdx quote errors with Error trait implemented from latest version
Rm unwraps from setup_kv_store
Originally i planned to have the HTTP endpoints to backup and recover work even when not yet in a ready state. However, since we need to make on-chain queries this is not possible. This could result in a deadlock where the initial genesis nodes cannot become ready because they cannot find another ready node to provide a backup. Since we only need read access to the chain, this could be solved by replacing ready: bool with a multi-state enum which means we can require that we have access to the chain, but don't yet need to be funded or have made a backup ourselves in order to provide a backup.
A way of the chain notifying entropy-tss when another TSS node leaves the validator set (either 'cleanly' through unbonding intentionally, or by being slashed following being unresponsive). This is needed so we can make another encryption key backup. (and maybe rotate the encryption key)

…equesting recover backup

ameba23 · 2025-01-20T10:34:55Z

crates/threshold-signature-server/src/helpers/launch.rs

+        )
+        .await
+        .map_err(|e| {
+            tracing::error!("Could not make key backup: {}", e);


Rather than giving up here, we could have a loop which attempts to request backups from other tss servers, until we find one which works, and also report the ones who failed to make a backup for us. I would leave this as a followup.

crates/threshold-signature-server/src/attestation/api.rs

crates/threshold-signature-server/src/backup_provider/api.rs

HCastano

Good work on this 💪

Still making my way through the review of this, will try and finish it up soon 🙏

crates/shared/src/types.rs

HCastano · 2025-01-22T02:59:26Z

crates/threshold-signature-server/src/backup_provider/api.rs

+) -> Result<(), BackupProviderError> {
+    let tss_account = SubxtAccountId32(sr25519_pair.public().0);
+    // Select a provider by making chain query and choosing a tss node
+    let key_provider_details = select_backup_provider(api, rpc, tss_account).await?;


What happens if we try and recover our DB at a later point in time and the selected server isn't available anymore?

Good point.

If the 'backup provider' unbonds cleanly, we need a way knowing this so we can make another backup. If they just disappear, then maybe through the slashing mechanism we could figure out that they have gone and make another backup. For example, the propagation pallet makes a request with the TSS ids of leaving validator(s), and if they match out backup provider then we make another call to make_key_backup. Otherwise, well we just cannot recover.

@JesseAbram suggested sending the key out to several other tss nodes: #1247 (comment)

But i want to keep things simple in this PR and then iterate, as deciding how many backups to make is a tricky design decision where theres a security tradeoff.

I think its worth bearing in mind that not being able to recover is only really a big problem if you are a signer, and even then we can tolerate losing n - t keyshares (although we haven't yet figured out the practicalities doing a reshare in that situation).

If the 'backup provider' unbonds cleanly, we need a way knowing this so we can make another backup.

If we put the onus onto the TSS here, they'd need to be querying the chain at every session change or anytime a validator changes, which might be annoying. If we make it the chain's responsibility to inform TSS servers then we might have to put backup information on-chain, or just send a request to all TSS servers informing them of this (even if it doesn't necessarily affect them).

If they just disappear, then maybe through the slashing mechanism we could figure out that they have gone and make another backup. For example, the propagation pallet makes a request with the TSS ids of leaving validator(s), and if they match out backup provider then we make another call to make_key_backup. Otherwise, well we just cannot recover.

Yeah similar to my second point above, we'll end up sending a bunch of messages out to the TSS servers. Kind of annoying but I guess it can work. I wouldn't implement this soon though, seems a bit overkill for now.

@JesseAbram suggested sending the key out to several other tss nodes: #1247 (comment)

But i want to keep things simple in this PR and then iterate, as deciding how many backups to make is a tricky > design decision where theres a security tradeoff.

I'd like to avoid spreading sensitive material around the network, as the design starts to become similar to our previous approach where we were sending keyshares to other TSS servers. And yep, it also does add complexity to this whole feature.

I think its worth bearing in mind that not being able to recover is only really a big problem if you are a signer, and even then we can tolerate losing n - t keyshares (although we haven't yet figured out the practicalities doing a reshare in that situation).

Yeah fair enough

ya but if the keys are stored in the memory of the tdx they are unreachable and within our current security model does not weaken the system only strengthens redundancy

crates/threshold-signature-server/src/backup_provider/api.rs

crates/threshold-signature-server/src/attestation/api.rs

ameba23 · 2025-01-22T11:53:08Z

crates/shared/src/types.rs

@@ -114,107 +112,3 @@ pub type EncodedVerifyingKey = [u8; VERIFICATION_KEY_LENGTH as usize];
 #[cfg(not(feature = "wasm"))]
 pub type BoundedVecEncodedVerifyingKey =
    sp_runtime::BoundedVec<u8, sp_runtime::traits::ConstU32<VERIFICATION_KEY_LENGTH>>;
-
-/// Input data to be included in a TDX attestation
-pub struct QuoteInputData(pub [u8; 64]);


I ended up doing a refactor here to put all the attestation-related stuff into an attestation module as it was starting to get a mess.

I should really have put this in a separate PR as it effects the pallets which are otherwise untouched by this PR.

Yeah, agreed probably should've been done in a follow up PR 😅

HCastano

Looks good, and gives us a bit of resilience in case of process crashes 👍

I'd say we first merge the PR this is built on top of, rebase this PR onto master, and then merge it. If it's not too much work I'd also consider splitting out the Attestation refactor stuff into a follow-up PR, but nbd if you can't do it.

HCastano · 2025-01-22T19:22:39Z

crates/shared/src/types.rs

@@ -114,107 +112,3 @@ pub type EncodedVerifyingKey = [u8; VERIFICATION_KEY_LENGTH as usize];
 #[cfg(not(feature = "wasm"))]
 pub type BoundedVecEncodedVerifyingKey =
    sp_runtime::BoundedVec<u8, sp_runtime::traits::ConstU32<VERIFICATION_KEY_LENGTH>>;
-
-/// Input data to be included in a TDX attestation
-pub struct QuoteInputData(pub [u8; 64]);


Yeah, agreed probably should've been done in a follow up PR 😅

crates/threshold-signature-server/src/lib.rs

HCastano · 2025-01-22T19:29:34Z

crates/threshold-signature-server/src/lib.rs

@@ -221,30 +225,31 @@ pub struct AppState {
    pub configuration: Configuration,
    /// Key-value store
    pub kv_store: KvManager,
+    /// Storage for encryption key backups for other TSS nodes
+    /// Maps TSS account id to encryption key
+    pub encryption_key_backups: Arc<RwLock<HashMap<[u8; 32], [u8; 32]>>>,


The subxt variant doesn't but the sp-core one does - and I think we can convert between them

crates/threshold-signature-server/src/lib.rs

crates/threshold-signature-server/src/main.rs

crates/threshold-signature-server/src/backup_provider/api.rs

HCastano · 2025-01-22T19:53:32Z

crates/threshold-signature-server/src/backup_provider/api.rs

+    tss_account: SubxtAccountId32,
+    /// An ephemeral encryption public key used to receive and encrypted response
+    response_key: X25519PublicKey,
+    /// A TDX quote


We may want to specify what fields are expected here

I don't get what you mean. You mean what we expect as input data to the quote?

Yeah exactly. Like if somebody were crafting a quote, how would they do so in order for it to pass verification

crates/threshold-signature-server/src/backup_provider/api.rs

HCastano · 2025-01-22T20:09:11Z

crates/threshold-signature-server/src/backup_provider/api.rs

+) -> Result<(), BackupProviderError> {
+    let tss_account = SubxtAccountId32(sr25519_pair.public().0);
+    // Select a provider by making chain query and choosing a tss node
+    let key_provider_details = select_backup_provider(api, rpc, tss_account).await?;


If the 'backup provider' unbonds cleanly, we need a way knowing this so we can make another backup.

If we put the onus onto the TSS here, they'd need to be querying the chain at every session change or anytime a validator changes, which might be annoying. If we make it the chain's responsibility to inform TSS servers then we might have to put backup information on-chain, or just send a request to all TSS servers informing them of this (even if it doesn't necessarily affect them).

If they just disappear, then maybe through the slashing mechanism we could figure out that they have gone and make another backup. For example, the propagation pallet makes a request with the TSS ids of leaving validator(s), and if they match out backup provider then we make another call to make_key_backup. Otherwise, well we just cannot recover.

Yeah similar to my second point above, we'll end up sending a bunch of messages out to the TSS servers. Kind of annoying but I guess it can work. I wouldn't implement this soon though, seems a bit overkill for now.

@JesseAbram suggested sending the key out to several other tss nodes: #1247 (comment)

But i want to keep things simple in this PR and then iterate, as deciding how many backups to make is a tricky > design decision where theres a security tradeoff.

I'd like to avoid spreading sensitive material around the network, as the design starts to become similar to our previous approach where we were sending keyshares to other TSS servers. And yep, it also does add complexity to this whole feature.

I think its worth bearing in mind that not being able to recover is only really a big problem if you are a signer, and even then we can tolerate losing n - t keyshares (although we haven't yet figured out the practicalities doing a reshare in that situation).

Yeah fair enough

ameba23 · 2025-01-24T09:44:53Z

I think i have now addressed everything from @HCastano 's review. Gonna leave this up till early next week so @JesseAbram can have a look

…ready (#1263) * Add an extra TSS state for connected to chain but not funded / fully ready * Clippy

ameba23 · 2025-01-30T08:09:21Z

@JesseAbram id like to get your thumbs up on this before merging - doesn't need to be a deep dive as hernando has already done some nitpicking, just wanna be sure you are into the idea generally. We can still iterate on the design - im not planning to close the feature issue on merging this.

ameba23 added 3 commits January 14, 2025 12:47

Add key provider module

af51f2d

KVDB - replace password with 32 byte key

a6a9838

KVDB - rm password.rs

55e4847

ameba23 changed the base branch from master to peg/non-persistant-tss-keys January 14, 2025 11:49

ameba23 marked this pull request as draft January 14, 2025 11:50

Rm commented code

d9c0a3c

ameba23 self-assigned this Jan 14, 2025

ameba23 added the Feature introduces a new feature label Jan 14, 2025

ameba23 added this to the v0.4.0 milestone Jan 14, 2025

ameba23 added 12 commits January 15, 2025 10:32

Add key provider logic

10ab4bd

Improve key provider and add test

06fe850

Kvdb stores key for backup

d5922da

Actually back up keyshares in production

374c98f

Doccomments

974cdc8

Store backed-up keys in memory not kvdb

9043c95

Encrypt keys when sending to be backed up

b06a721

Error handling, quote handling

39f3f68

Fix test

7a6b4b7

Use tss_account id from signed message, not one provided by user

88ac607

Copy quote validation logic into entropy-tss and verify quotes when r…

e2558a8

…equesting recover backup

Error handling, tidy

cb7dd4f

ameba23 commented Jan 20, 2025

View reviewed changes

Check measurement values when verifying quote

aacb5fc

This was referenced Jan 20, 2025

No encrypted storage for the TDX guest entropyxyz/tdx-build-system#4

Closed

DCAP attestation feature design #982

Open

ameba23 added 5 commits January 20, 2025 14:03

Merge peg/non-persistant-tss-keys

b8ecee8

Handle tdx-quote errors

76c9443

Use known encryption keys for test validators

184c2dd

Rename to backup provider, quote nonce getting api

6aa01fd

Clippy

74b3927

ameba23 commented Jan 21, 2025

View reviewed changes

crates/threshold-signature-server/src/attestation/api.rs Outdated Show resolved Hide resolved

Changelog

7005f44

ameba23 changed the title ~~Key provider / recovery feature for entropy-tss~~ Database encryption key backup / recovery feature for entropy-tss Jan 21, 2025

Changelog

dcdaa67

ameba23 marked this pull request as ready for review January 21, 2025 13:14

ameba23 commented Jan 21, 2025

View reviewed changes

crates/threshold-signature-server/src/backup_provider/api.rs Outdated Show resolved Hide resolved

HCastano reviewed Jan 22, 2025

View reviewed changes

ameba23 added 7 commits January 22, 2025 09:10

Doccomments following review

054bcbc

Choose backup provider randomly, not from TSS id

b33d261

Refactor duplicated quote verifying fn

8c513cc

Taplo

d1ecf5b

Update test-cli

05f5f3c

Fix for building entropy-shared for wasm

1de72b6

Fix staking pallet mock

9991f6d

ameba23 mentioned this pull request Jan 22, 2025

Add endpoints to migrate an encrypted key-value store dump for upgrading entropy-tss #1259

Draft

Fix staking pallet benchmarks

c4c7e42

ameba23 commented Jan 22, 2025

View reviewed changes

ameba23 added 2 commits January 22, 2025 13:35

Fix attestation pallet tests

57924d9

Fix client tests

0822704

HCastano approved these changes Jan 22, 2025

View reviewed changes

ameba23 added 5 commits January 23, 2025 13:43

Use sp_core::crypto::AccountId32 as key for hashmap to be more explicit

9c68e94

Rename struct field following review

57d22e2

Minor edits from PR review

18105ee

Clippy

ef46c51

Flatten HTTP API structure following review

a31ff0e

ameba23 mentioned this pull request Jan 27, 2025

Add an extra TSS state for connected to chain but not funded / fully ready #1263

Merged

Add an extra TSS state for connected to chain but not funded / fully …

f6146f4

…ready (#1263) * Add an extra TSS state for connected to chain but not funded / fully ready * Clippy

ameba23 requested a review from JesseAbram January 30, 2025 08:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database encryption key backup / recovery feature for `entropy-tss` #1249

Database encryption key backup / recovery feature for `entropy-tss` #1249

ameba23 commented Jan 14, 2025 •

edited

Loading

ameba23 Jan 20, 2025

HCastano left a comment

HCastano Jan 22, 2025

ameba23 Jan 22, 2025 •

edited

Loading

HCastano Jan 22, 2025

JesseAbram Jan 30, 2025

ameba23 Jan 22, 2025

HCastano Jan 22, 2025

HCastano left a comment

HCastano Jan 22, 2025

HCastano Jan 22, 2025

HCastano Jan 22, 2025

ameba23 Jan 23, 2025

HCastano Jan 25, 2025

HCastano Jan 22, 2025

ameba23 commented Jan 24, 2025

ameba23 commented Jan 30, 2025

Database encryption key backup / recovery feature for entropy-tss #1249

Are you sure you want to change the base?

Database encryption key backup / recovery feature for entropy-tss #1249

Conversation

ameba23 commented Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

HCastano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ameba23 Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HCastano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ameba23 commented Jan 24, 2025

ameba23 commented Jan 30, 2025

Database encryption key backup / recovery feature for `entropy-tss` #1249

Database encryption key backup / recovery feature for `entropy-tss` #1249

ameba23 commented Jan 14, 2025 •

edited

Loading

ameba23 Jan 22, 2025 •

edited

Loading