rfc: distributed coordinator #1078

burgerdev · 2024-12-17T10:28:06Z

No description provided.

3u13r · 2024-12-18T10:44:01Z

rfc/009-distributed-coordinator.md

+  bytes MeshCAKey = 5;
+  bytes MeshCACert = 6;


I think we need to explain the security of the HA part a bit more. I think when we allow to directly set the Mesh components of the Coordinator during automatic recovery we loose protection against the Kubernetes admin/workload owner as they can redirect to themself and "provision" a coordinator with the values above.

The simplest case to think about is one Coordinator needing to be recovered and the workload owner answering the recovery call from this coordinator. The next time the user verifies the deployment, it sees a valid chain of manifests and therefore trusts the new MestCACert. This allows the workload owner to man-in-the-middle the TLS connection from the data owner to the application.

While we have excluded this threat model from our current recovery, I think HA and auto-recovery can be implemented while securing against this threat model as well e.g., via only allowing recovery of coordinator that have the same hashes. Of course, this breaks the upgrade process, but I think we have to drop coordinator upgrades anyway in this threat model.

Good catch, thanks!

We should aim to support recovery from heterogeneous coordinators if they are explicitly allowed by the manifest. An upgrading workload owner could first set a manifest including new coordinator policies, then deploy new coordinators, then remove old coordinators, then remove old coordinator policies from the manifest. A data owner would need to verify not only the current manifest, but also the history of allowed coordinators.

I think what we need are the following invariants:

(A) A coordinator with current manifest M only sends key material to pods that have the coordinator role in M.

(B) A coordinator with current manifest M uses only key material that it generated locally or that was received from a pod with the coordinator role in M.

(A) should be covered sufficiently by the current proposal text, and we could modify it to achieve (B) as follows:

Load the existing latest transition from the store, keeping the signature around but not checking it yet.

Fetch the corresponding manifest, but don't set the state yet.

Create a validator from the temporary manifest's reference values.

Connect to the serving coordinator and validate its reference values.

Check that the serving coordinator's policy corresponds to a coordinator role in the temp manifest.

Receive the RecoverResponse.

Verify the signature from (1).

Initialize the state with received seed, keys, certs and the temp manifest.

rfc: distributed coordinator

d623dcb

burgerdev added the no changelog PRs not listed in the release notes label Dec 17, 2024

3u13r requested changes Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: distributed coordinator #1078

rfc: distributed coordinator #1078

burgerdev commented Dec 17, 2024

3u13r Dec 18, 2024

burgerdev Dec 18, 2024

rfc: distributed coordinator #1078

Are you sure you want to change the base?

rfc: distributed coordinator #1078

Conversation

burgerdev commented Dec 17, 2024

3u13r Dec 18, 2024

Choose a reason for hiding this comment

burgerdev Dec 18, 2024

Choose a reason for hiding this comment