-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: workload secrets #777
Conversation
rfc/007-workload-secrets.md
Outdated
### Secret source | ||
|
||
Workload secrets would need to be either persisted or deterministically derived. | ||
The only secure persistence option for the Coordinator is encryption with a key derived from the secret seed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also get a key from the hardware, though there are a lot of limitations:
Currently, AFAIK TDX doesn't have a way to derive a key, though there might be one in the future (the documentation contains mentions of TDG.MR.KEY.GET
, but doesn't document it).
AMD SEV-SNP has a way to derive a key, but ...
- the key is tied to the host data.
- the key is tied to a VCEK or VLEK.
Still, we could use this to locally cache the secret seed in a secure way: If either the VCEK or the VLEK are fixed (VCEK should be static on bare-metal deployments, VLEK should be static for supporting CSPs i.e AWS), we could ask the hardware the generate a key, use that key to encrypt the secret seed and save the ciphertext on the node. When the coordinator gets restarted, it could ask the hardware to generate the same key again, use that to decrypt the secret seed and do recovery on its own.
Obviously this would break on updating the deployment, but I'd say that there's still value in allowing a coordinator to recover without a human present, especially so, if the coordinator downtime wasn't planned to begin with and the coordinator restart was spurious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of recovering the coordinator with hardware-sealed keys! Do you think such a mechanism should be used for workload secrets, too, or would it be sufficient to recover the Coordinator seed and derive the workload key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, but that should be part of another RFC. ;) Switching the source afterwards shouldn't impact the design much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think such a mechanism should be used for workload secrets, too, or would it be sufficient to recover the Coordinator seed and derive the workload key?
I didn't think about storing workload secrets, that's an interesting idea! I'm not quite sure what the differences are in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another great RFC, thanks!
|
||
Due to being derived from the secret seed, the workload key is stable across manifest updates. | ||
This implies that no migration is needed, as long as the `WorkloadSecretID` doesn't change. | ||
On the other hand, this means that the workload secret can be obtained by the workload owner, and should be treated accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely something we should keep in mind when documenting this.
Instead of deriving the identity from workload characteristics, users are going to label workloads with an identity. | ||
To allow that, we're' going to change the `Manifest` schema. | ||
The policies field of the manifest is currently a plain map of policy hashes to SANs. | ||
|
||
```json | ||
{ | ||
"Policies": { | ||
"99dd77cbd7fe2c4e1f29511014c14054a21a376f7d58a48d50e9e036f4522f6b": [ | ||
"web", | ||
"*", | ||
"203.0.113.34" | ||
] | ||
} | ||
} | ||
``` | ||
|
||
Changing this to an object with named fields allows adding new metadata, for example a `WorkloadSecretID`. | ||
|
||
```json | ||
{ | ||
"Policies": { | ||
"99dd77cbd7fe2c4e1f29511014c14054a21a376f7d58a48d50e9e036f4522f6b": { | ||
"SAN": ["web", "*", "203.0.113.34"], | ||
"WorkloadSecretID": "openbao-prod", | ||
// ... | ||
} | ||
} | ||
} | ||
``` | ||
|
||
### Implementation | ||
|
||
After successful workload verification, the Coordinator derives a key with HKDF, setting the info argument to `workload-key:$WorkloadSecretID`. | ||
This key is returned as a new field `WorkloadKey` in the `NewMeshCertResponse`. | ||
The initializer writes the key to a file in the shared tmpfs, from where it can be used by the workload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is similar to MarbleRun. In addition, MarbleRun allows to inject the secret at different places and in different formats. With this, you can often avoid adapting your workload. If we would want something similar for Contrast, I guess it should be build upon the primitive you introduce with this RFC, right? For example, one could create an init container that transforms the secret such that the actual workload wouldn't need to be modified. Do you think this works or is there anything we'd need to consider for this RFC to enable this use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's a good sign that the proposals converge, even though I did not peek at the MarbleRun implementation :)
The scheme proposed here is intentionally designed to support derivation of more keys, and these use cases should be enabled by it.
The initializer (and other init containers) is somewhat limited in terms of what it can inject. Shared files are definitely possible, but env vars are not. I think the annotation pattern we use for the service mesh would be a good starting point for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
Implementation of 1) secret derivation and distribution and 2) workload identity in manifest should be possible in parallel.
Will the design of a guest-component for disk encryption be part of another RFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I believe it should be in a separate RFC, because delivering a secret seed is useful on its own (see use case). By the way, the capabilities of CDH are evolving rapidly and might be worth taking a look at first. |
No description provided.