Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Private Keys for Agent ↔ Tenant association during registration #4

Open
jhunt opened this issue Sep 7, 2019 · 2 comments
Open

Private Keys for Agent ↔ Tenant association during registration #4

jhunt opened this issue Sep 7, 2019 · 2 comments
Labels
resolved This concern / issue / complaint has been resolved.

Comments

@jhunt
Copy link
Contributor

jhunt commented Sep 7, 2019

@thomasmitchell raised this concern in a discussion we had offline; moving it here so that we can discuss.

Prospective agents should not identify the tenant they wish to be owned by through tenant ID, as this has been considered a fairly public token. There should be a new private agent token provisioned to identify the tenant.

Consider a system where keys are provisioned ahead of time by the SHIELD core for authentication, per agent, by a tenant. This is simpler than a whitelist system, provides a source of unique identification regardless of naming, and allows for easy transfer of tenant ownership. The agent then does not need to generate its own key.

Downsides are:

  • the core must be deployed before all agents
  • runtime config agent deployments become impossible
  • randomly provisioned kubernetes agents become difficult (but they already are cumbersome through the proposed system).
@jhunt
Copy link
Contributor Author

jhunt commented Sep 7, 2019

Here's one way this approach might work (WF1):

  1. Deploy SHIELD
  2. Create a Tenant; as that Tenant:
  3. Fill out the "Provision New Agent" form (SHIELD will generate a keypair)
  4. Download the private key for the Agent
  5. Deploy the Agent with the private key

This is the most convenient workflow, but it does mean that the SHIELD Core is in charge of the key material for some (albeit small) window of time.

A more rigorously secure workflow might look like this (WF2):

  1. Deploy SHIELD
  2. Generate an RSA keypair offline
  3. Create a Tenant; as that Tenant:
  4. Fill out the "Provision New Agent" form (providing the public key)
  5. Deploy the Agent with the private key

(these workflows differ only in who generates the keypair, and when).

For comparison, here is the workflow we are currently proposing (WFP):

  1. Deploy SHIELD
  2. Create a Tenant (to get its UUID)
  3. Generate an RSA keypair offline
  4. Deploy the Agent with the private key
  5. Approve the agent registration in the SHIELD Core (for the Tenant)

aside: I don't personally think WFP qualifies as "cumbersome"

These three workflows all look fairly equal in terms of steps, involved systems, and level of complexity, so the base case (everything is manual) is identical. Let's consider the more automated deployment scenarios of BOSH and Kubernetes.

BOSH

Here is what the proposed workflow, WFP, looks like when deploying under BOSH:

  1. Deploy SHIELD
  2. Create a Tenant (to get its UUID)
  3. Deploy Agent (and data system, presumably) via BOSH, using -v tenant_uuid=$x and leveraging CredHub to create the RSA keypair
  4. Approve the agent registration in the SHIELD Core (for the Tenant)

Notably, the handling of key material is 100% taken care of by automated systems, and the operator can remain unaware of the existence of the key. If they want to, they can retrieve the public key from CredHub, post-deployment, for visual verification in the approval screens of the SHIELD Web UI (or CLI).

Here is what WF2 looks like under BOSH (WF2-B1):

  1. Deploy SHIELD
  2. Generate an RSA keypair offline
  3. Create a Tenant; as that Tenant:
  4. Fill out the "Provision New Agent" form (providing the public key)
  5. Deploy Agent (and data system, presumably) via BOSH, using -v agent_key=$y

This has the disadvantage that it does not work under BOSH's runtime-config, so it is impossible to colocate the agent transparently based on deployment composition. Ideally, I should be able to do this:

# runtime-config.yml
variables:
  - name: shield-agent-key
    type: rsa

addons:
  - name: shield-agent
    include:
      - jobs: [{ release: postgres, name: postgres }]
    jobs:
      - name: shield-agnt
         release: shield
         properties:
           key: ((shield-agent-key.private_key))

And every deployment would get a unique key. To make this possible, we have to amend the application of WF1 on BOSH, resulting in (WF2-B2):

  1. Deploy SHIELD
  2. Deploy Agent (and data system, presumably) via BOSH, leveraging CredHub (via the runtime-config) to get the key
  3. Retrieve the RSA public key from CredHub
  4. Log back into SHIELD and fill out the "Provision New Agent" form (providing the public key)

The added bounce through CredHub involves a new system (with a CLI we have heretofore not needed). This is more complicated, but not untenable.

Kubenetes

Let's turn to Kubernetes. A bespoke "vanilla" deployment (i.e. not using any CRDs, Operators, or ex post facto configuration) looks like this for the proposed workflow (WFP-K1):

  1. Deploy SHIELD
  2. Create a Tenant (to get its UUID)
  3. Generate an RSA keypair offline (ssh-keygen -t rsa -f id_rsa and kubectl create secret generic my-secret --from-file=ssh-privatekey=$PWD/id_rsa
  4. Deploy the Agent with the private key (via kubectl apply -f ...)
  5. Approve the agent registration in the SHIELD Core (for the Tenant)

Applying WF2 to Kubernetes is more of the same from the non-runtime-config BOSH story, (WF2-K1):

  1. Deploy SHIELD
  2. Generate an RSA keypair offline (ssh-keygen -t rsa -f id_rsa and kubectl create secret generic my-secret --from-file=ssh-privatekey=$PWD/id_rsa
  3. Create a Tenant; as that Tenant:
  4. Fill out the "Provision New Agent" form (providing the public key)
  5. Deploy the Agent with the private key (via kubectl apply -f ...)

A (Secure) Compromise

I believe the crux of the disagreement between people who prefer WF2 over WFP boils down to agent autonomy, which is a policy decision that is best left to operators (we should concern ourselves with Mechanism, not Policy).

The WFP workflow gives greater power/convenience to people deploying agents, whereas the WF2 workflow gives greater control to the people operating tenants.

In the spirit of mechanism, not policy, what if we do this:

  1. Continue to identify agents by identity and private key (i.e. we authorize keys for subsets of identity)
  2. Enable SHIELD site administrators to enable or disable Agent-originated registration.

I think this allows us to support both workflows.

If you are convenience-minded: enable automatic registration and let your deployment tooling generate unique keys and identities (given a tenant UUID to start from). When it is time to approve keys that show up in the web interface, you can either blindly approve them (super-convenience-minded) or verify the public key against what you think you deployed (security+convience, or the trust but verify model).

If you are security-minded: disable automatic registration and manually provision all of your keys ahead of time.

Some Historical Context

The two real-world systems I've been basing this analysis (and indeed most of the design of the agent registration protocol) are Puppet auto-enrollment and SSH Host Keys.

Section 4.1 of RFC-4251 deals with the (now common) practice of trust-on-first-use:

The protocol provides the option that the server name - host key
association is not checked when connecting to the host for the first
time. This allows communication without prior communication of host
keys or certification. The connection still provides protection
against passive listening; however, it becomes vulnerable to active
man-in-the-middle attacks. Implementations SHOULD NOT normally allow
such connections by default, as they pose a potential security
problem. However, as there is no widely deployed key infrastructure
available on the Internet at the time of this writing, this option
makes the protocol much more usable during the transition time until
such an infrastructure emerges, while still providing a much higher
level of security than that offered by older solutions (e.g., telnet
[RFC0854] and rlogin [RFC1282]).

@jhunt
Copy link
Contributor Author

jhunt commented Oct 4, 2019

Hearing no concerns, complaints, or rebuttals, we will adopt the following:

Continue to identify agents by identity and private key (i.e. we authorize keys for subsets of identity)
Enable SHIELD site administrators to enable or disable Agent-originated registration.

This allows us to support both workflows.

If you are convenience-minded: enable automatic registration and let your deployment tooling generate unique keys and identities (given a tenant UUID to start from). When it is time to approve keys that show up in the web interface, you can either blindly approve them (super-convenience-minded) or verify the public key against what you think you deployed (security+convience, or the trust but verify model).

If you are security-minded: disable automatic registration and manually provision all of your keys ahead of time.

@jhunt jhunt added the resolved This concern / issue / complaint has been resolved. label Oct 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolved This concern / issue / complaint has been resolved.
Development

No branches or pull requests

1 participant