Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad ACL support for meta tag filtering. #24715

Open
dkyanakiev opened this issue Dec 19, 2024 · 4 comments
Open

Nomad ACL support for meta tag filtering. #24715

dkyanakiev opened this issue Dec 19, 2024 · 4 comments

Comments

@dkyanakiev
Copy link

dkyanakiev commented Dec 19, 2024

Proposal

Hi there 👋
Currently the ACLs for namespaces have to be created on a per namespace basis or if you use a wildcard.

namespace "dev-*" {
  policy = "read"
  capabilities = [
    "read-logs",
    "submit-job"
  ]
}

The problem comes when you want to have a policy that is applied to X namespaces that for example don't follow name pattern, so the wildcard cant work. If they are a handful you can always duplicate the policy but this scales really poorly. Since namespaces already have meta tags, it makes sense to be able to create a policy applicable to many namespaces based on meta tags.

Use-cases

Example of a policy with a filter

namespace {
  filter {
    meta = "team == 'abc'"
  }
  policy = "read"
  capabilities = [
    "alloc-lifecycle",
    "dispatch-job",
    "read-logs",
    "submit-job",
    "csi-mount-volume"
  ]
}

This way in a scenario where namespaces and change ownerships and teams that interact with them, you could easily have a policy that would match all the namespaces with the correct meta.

This basically flips the filter from the name to inside the filter block, it would allow for combinations

namespace { 
  filter {
    name = "dev-*"
  }
 ...
}

Which would works as before, when you have a name with wildcard
or allow for a combination

namespace { 
  filter {
      meta = "team == 'abc'"
      name = "dev-*"
    }
...
}

Have something be applied on a specific name + meta tag match

namespace "foo" {
  meta = {
    env = "dev"
    team = "abc"
  }
}

Because meta changes on the namespace, do not impact the workloads, unlike the job changes to the meta, this would allow for generic policies controlled by the meta tags

Attempted Solutions

Could not any existing feature that solves this, right now the issue is to just have a namespace naming pattern tied to a team for example, but that is not ideal as teams can change and it causes lots of issues to move workloads to updates namespaces.

@dkyanakiev dkyanakiev changed the title Nomad Namespace ACL support for meta tags. Nomad ACL support for meta tag filtering. Dec 19, 2024
@tgross
Copy link
Member

tgross commented Dec 19, 2024

Hi @dkyanakiev, this is a cool idea!

There are some interesting architectural challenges here. The ACL policies you write are turned into a set of in-memory immutable radix tries for fast lookups. There's two for each top-level type of policy, one for direct lookups and one for wildcard lookups. (Ex. there are 2 tries for all namespace lookups in a compiled policy). When an RPC handler checks namespace authentication, it just has to lookup which trie applies to the token used and policy type, and then looks up the request's namespace by the name (i.e. just a string) in the trie. This is very fast. A naive implementation of this feature would require that we go to the in-memory state store and lookup the namespace object as well, and then perform the bexpr evaluation on the object. This could get quite expensive at the time of authorization.

That being said, the transformation from the ACL policy document into the trie is itself fairly expensive already, and we amortize that cost by caching the ACL policy trie (with a bounded LRU cache). So in theory if we could figure out a way to transform the expressions you're talking about into an index represented as one of those immutable radix tries (or heck, some other data structure we can do cheap lookups in), we could do the expensive work up front and then the ACL policy checks would still be cheap.

I'm going to tag this for further discussion and roadmapping. Would love to hear if you have implementation thoughts based on what I've written above as well.

@dkyanakiev
Copy link
Author

Thanks for the detailed explanation @tgross .. I'll definitely have to sit and think about this in a bit more detail and maybe look at the code.
At first "glance" would it work if ACL policies are pre-evaluated?
Lets say the policies that use the filter option get evaluated every X , unrelated to the auth process and store the allowed resources in some form of data structure. The data would be fairly straight forward.

type PolicyEvals struct {
PolicyName string
Namespaces Namespace 
...
}

type Namespace struct{
Name string
Capabilities []string
...
}

The old/current lookups will still work as expected but simply add an addition lookup to see if the more "broad" lookup policies. For the sake of simplicity they could be prevented from working together in the same policy..
If you have a names policy, you can't use the filter and the other way around. That way each policy type will be evaluated/mapped via its specific way and not add additional complexity.
Working with a mutable and immutable radix tries might not be the ideal scenario but it allows for a lot more flexibility when it comes to resource mappings in Nomad.

@tgross
Copy link
Member

tgross commented Jan 6, 2025

Another possible approach could be to evaluate the ACL policy meta blocks any time a ACL policy is updated or a namespace is added and do some kind of denormalization in the state store. That'd make it expensive to make those updates but cheap to do the lookups because they'd be the same as static lookups. Large clusters, especially large federated clusters, might hurt with that, so there'd be some performance analysis to do there for sure.

@dkyanakiev
Copy link
Author

Namespace creations shouldn't be too taxing but eventual meta changes on namespaces could be..
But depending on the usage the performance may differ..
Thinking at it from my current use case, in a fairly large cluster, currently it causes more time and effort to maintain policies for namespaces when ownership changes and that would result in more policy changes than the meta block evaluation would but I guess this is all too subjective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

2 participants