Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow to restrict the CRs watched according to their labels #1832

Merged
merged 10 commits into from
Feb 21, 2025

Conversation

wilfriedroset
Copy link
Contributor

@wilfriedroset wilfriedroset commented Jan 20, 2025

The context could be two operator and two grafana CR, the first one with the label shard: 1 and the second one shard:2 . I should I configure the operator 1 to watch only the CR with the label shard: 1 and the operator 2 watch shard: 2
This example might seems simple but I'm aiming for a bigger scale with tens of shards with each shard would host tens of thousands CR. I'm anticipating the eventual heavy workload that a single operator will need to handle with the according to the amount of CRs I will create. I'm aiming for tens of thousands CRs. So having that magnitude handle by a single operator might be hard especially when the operator restarts.

Sharding the operators would help lower the amount of CRs tracked by a single one.

edit: this is a follow up PR after a discussion on slack --> https://kubernetes.slack.com/archives/C019A1KTYKC/p1737370451358769

@theSuess
Copy link
Member

Thanks for the PR! The approach looks good to me. @Baarsgaard since you took a look at the caching logic not too long ago, does this also seem good to you?

The only thing I might change is to use comma separated values for the label selector instead of JSON as it complicates quoting quite a bit.

Do you think something like cluster=prod,shard=1 would work as well? IMHO this is cleaner than {"cluster":"prod","shard":"1"} as it also reduces the ambiguity in the distinction between integers and strings

@wilfriedroset
Copy link
Contributor Author

Thank you for your review @theSuess, I've simplified the implementation of the code, it should be more generic as I'm now using labels.Parse, hence it will support labelSelector such as the one you are pointing cluster=prod,shard=1 or even more complexe like partition in (customerA, customerB),environment!=qa as per the documentation here: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#set-based-requirement

WDYT?

@wilfriedroset wilfriedroset force-pushed the labelSelectors branch 2 times, most recently from 43ae851 to 70c948d Compare January 21, 2025 13:36
@theSuess
Copy link
Member

This is great! I'll test this locally and see if I can come up with an E2E test to make sure this keeps working when refactoring the caching logic.

Copy link
Contributor

@Baarsgaard Baarsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually noticed this yesterday and started to compile some notes @theSuess
For the go-lint cyclomatic.., I opened #1833 to get rid of it as I had the same issue in #1818

main.go Outdated
var labelSelectors labels.Selector
var err error
if watchLabelSelectors != "" {
labelSelectors, err = labels.Parse(watchLabelSelectors)
Copy link
Contributor

@Baarsgaard Baarsgaard Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I like this quite a lot, and would prefer to swap the existing namespace selector parsing to this or the ValidatedSelectorFromSet to maintain current behaviour.
To properly support sharding at scale, changing to labels.Parse might be better for the namespace selector

@Baarsgaard
Copy link
Contributor

Baarsgaard commented Jan 21, 2025

With the above said, I have a ton of questions related to the Slack comment but I would prefer to keep all my CRs in the same namespace and other unknowns:

If startup/performance is a concern:

  • I would see to allowing configuration of MaxConcurrentReconcile allowing the Operator to reconcile multiple resources of the same kind concurrently, potentially speeding up startups if enough CPUs are allocated.
  • I think the plan is to deprecate the status lists on Grafana CRs, which would remove the startup sync @theSuess?
    On startups, we currently fetch all defined Grafanas and loop through them.
  • do you plan on applying Grafana CRs or use Helm/Kustomize/Other for creating the Deployments?
  • What is the expected resyncPeriod you're targeting at this scale? Depending on the usecase it could be set as low as 8h.

Sharding:

  • Do you plan to migrate resources between shards? If so, what would your mechanism for controlling this be? Manual or automatic?
  • In this context, namespaces and labels essentially accomplish the same thing, except one is more limiting than the other when listing resources. Is there a specific reason to preferring a single namespace?
    Is the plan to make use kuberenetes to prevent name collisions?
  • I don't know f there's a good way to drop part of the cache in order to migrate resources/namespaces between shards

Potential blockers/worries

  • Have you factored in the "eventually consistent" nature of applying new Grafana CRs?
    Applying a new resource CR will instantly show up in all matching instances, but applying a Grafana CR, you have to either wait for resyncPeriods to elapse, restart the operator, or create a new generation/update all resources that need to be reconciled.
  • If you do end up with 500k CRs in a single namespace, what protections do you have in place to prevent kubectl get grafana-operator --all or similar large requests, would it interfere with operators?
  • From a usability standpoint, I worry there's a significant overhead when working with individually labelled resources compared to namespaced unless it's entirely abstracted away.

I do not have a great understanding of compute at that scale, but you could spin up a lot of resources quickly with the below and stress the operator:

  1. Checkout Fix: Do not cache native resources created without CommonLabels #1818 as I've been doing some cache tuning there lately.
  2. Create N Grafana CRs + N*7 resource CRs:
    for i in {0..N}; do kubectl create ns test-$i && kubectl apply -n test-$i -f tests/example-resources.yml; done
  3. Get an understanding of resource usage with pprof:
kubectl port-forward -n grafana-operator-system deploy/grafana-operator-controller-manager-v5 8888 &
go tool pprof -pdf http://localhost:8888/debug/pprof/heap > heap.pdf
go tool pprof -pdf http://localhost:8888/debug/pprof/goroutine > goroutine.pdf

If a lot of these resources are provided by a central team and not by consumers, is there features you'd like to see that could reduce the total number or CRs?
I have been playing with an idea to support replacing/prefixing/suffixing values of resources with values from the matching Grafana CRs
Essentially using the name/namespace of a Grafana CR in dashboard/folder names or datasource urls.

PS: It's not everyday you get work on stuff at this scale, so I am interested/slightly invested in this already.

@wilfriedroset
Copy link
Contributor Author

thank you @Baarsgaard for taking the time to thoroughly review my PR. Here is more context

I would see to allowing configuration of MaxConcurrentReconcile allowing the Operator to reconcile multiple resources of the same kind concurrently, potentially speeding up startups if enough CPUs are allocated.

Indeed, I will need to adjust this setting along the growth of the number of CRs

do you plan on applying Grafana CRs or use Helm/Kustomize/Other for creating the Deployments?

I've a proof-of-concept that allow to CRUD the Grafana CRs via API call. the underlying api create the CRs directly in k8s.
Basically you can do something like that

# Create a new Grafana
curl  -v -XPOST  -H "Content-Type: application/json" localhost:8080/v1/grafana -d '{"service_id": "bbb", "version": "11.0.0"}'
 
# Update a Grafana, for example the version
curl  -v -XPUT  -H "Content-Type: application/json" localhost:8080/v1/grafana/bbb -d '{"service_id": "bbb", "version": "11.1.0"}'
 
# Delete a Grafana
curl  -v -XDELETE  -H "Content-Type: application/json" localhost:8080/v1/grafana/bbb
 
# Add datasources in a given Grafana
curl -v -POST  -H "Content-Type: application/json" localhost:8085/v1/grafana/bbb/datasources  -d '{"datasources": [{"url": "http://my-prom-not-hosted-on-k8s:8080/", "name": "prometheus", "type": "prometheus"}]}'

What is the expected resyncPeriod you're targeting at this scale? Depending on the usecase it could be set as low as 8h.

I will have to work more on my proof-of-concept to find the correct value. I'm also investigating the double shard:

  • multiple operator watching a subpart of the CRs
  • multiple k8s clusters

My control plane will be responsible to equally deploy the CRs where there is room (on less crowded cluster && less crowded operator shard)

Do you plan to migrate resources between shards? If so, what would your mechanism for controlling this be? Manual or automatic?

I do not plan to migrate resources between shard.

In this context, namespaces and labels essentially accomplish the same thing, except one is more limiting than the other when listing resources. Is there a specific reason to preferring a single namespace?

It is simpler to have everything in the same namespace from a provisioning and operation point of view.

Is the plan to make use kuberenetes to prevent name collisions?

I have full control on the name of each CRs, my API is responsible for crafting the correct name (e.g: uuid compact)

Have you factored in the "eventually consistent" nature of applying new Grafana CRs?

This is ok on my end. more over the double sharding should ease the amount of work done by a single operator.

If you do end up with 500k CRs in a single namespace, what protections do you have in place to prevent kubectl get grafana-operator --all or similar large requests, would it interfere with operators?

See my comment about the double sharding. The scale I'm aiming is 500k CRs but I'm considering spliting the workload in tens of k8s clusters. Example, I've 50 clusters, each hosting 10k CRs with each 10 shards. Then each operator shard will only be responsible for 1k CRs. However this is true that kubectl get grafana-operator --all or similar large requests will be heavy on the operators. I don't expect to do that often and I can always do 20 shards or 100 shards.

I do not have a great understanding of compute at that scale, but you could spin up a lot of resources quickly with the below and stress the operator:

I'm progressing on my proof-of-concept with a quick and dirty python script like so

from kubernetes import client, config, utils
import yaml
import logging
import argparse


def cmdline_parser():
    parser = argparse.ArgumentParser()

    parser.add_argument(
        "-d",
        "--debug",
        help="Print lots of debugging statements",
        action="store_const",
        dest="loglevel",
        const=logging.DEBUG,
        default=logging.WARNING,
    )  # mind the default value

    parser.add_argument(
        "-v",
        "--verbose",
        help="Be verbose",
        action="store_const",
        dest="loglevel",
        const=logging.INFO,
    )

    parser.add_argument(
        "-q",
        "--quiet",
        help="Be quiet",
        action="store_const",
        dest="loglevel",
        const=logging.CRITICAL,
    )

    parser.add_argument(
        "-n",
        "--namespace",
        help="namespace where to deploy the CR",
        default="test-grafana-operator",
    )

    parser.add_argument(
        "-c", "--count", type=int, help="number of CR to deploy", default=1
    )

    args = parser.parse_args()
    logging.basicConfig(level=args.loglevel)
    return args


def main():
    args = cmdline_parser()
    config.load_kube_config()
    api = client.CustomObjectsApi()

    body = None
    with open("./grafana.yaml") as fd:
        body = yaml.safe_load(fd)
    for i in range(args.count):
        body["metadata"]["name"] = f"grafana-{i}"
        logging.debug(f"creating grafna {i}")
        try:
            api.create_namespaced_custom_object(
                group="grafana.integreatly.org",
                version="v1beta1",
                namespace=args.namespace,
                plural="grafanas",
                body=body,
            )
        except client.ApiException as e:
            if e.reason == "Conflict":
                api.patch_namespaced_custom_object(
                    name=body["metadata"]["name"],
                    group="grafana.integreatly.org",
                    version="v1beta1",
                    namespace=args.namespace,
                    plural="grafanas",
                    body=body,
                )
            else:
                raise


if __name__ == "__main__":
    main()

With the following CR

apiVersion: grafana.integreatly.org/v1beta1
kind: Grafana
metadata:
  name: grafana-1
  namespace: test-grafana-operator
spec:
  deployment:
    spec:
      template:
        spec:
          containers:
            - image: docker.io/grafana/grafana:11.0.0
              name: grafana
              resources:
                limits:
                  cpu: "1"
                  memory: 512Mi
                requests:
                  cpu: 20m
                  memory: 100Mi

I will report what I find out with pprof when I get to a confortable spot. I could spawn a pyroscope to ease the work 😇

If a lot of these resources are provided by a central team and not by consumers, is there features you'd like to see that could reduce the total number or CRs?

The current state of the operator seems to suit my usease. Upon request for a Grafana I create a new CR in the right place (right cluster, right shard). I was more or less straightforward to implement my API and test it.

@theSuess theSuess added the feature this PR introduces a new feature label Jan 23, 2025
@theSuess
Copy link
Member

I also think migrating between shards is something we don't want to explicitly support. This is a very specific edge case, so I wouldn't want to spend too much effort here if it covers @wilfriedroset's use case and is easy enough to implement/understand

@theSuess
Copy link
Member

@wilfriedroset can you rebase this onto the latest changes?

The last missing part before merging would be to join the label selector with the "app.kubernetes.io/managed-by" label: #1832 (comment)

@wilfriedroset
Copy link
Contributor Author

@theSuess let me know what you think about the implem of the merge on "app.kubernetes.io/managed-by"

@wilfriedroset
Copy link
Contributor Author

@Baarsgaard

You can import "github.com/grafana/grafana-operator/v5/controllers/model" and create a labelSelector using the existing const: labels.SelectorFromSet(model.CommonLabels)

This is done.

Since you do not expect shard migrations to happen on Grafana instances, we still need to solve the issue of child resources(Deployment, PVC, Service, Route, ...) on internal instances being hidden by the defaultLabelSelector
One way this could be solved by enforcing that all labels on a Kind: Grafana are inherited by it's child resources

I'm not sure what to do about this comment, I'm thinking the same as @theSuess.
I will let you run the test you were referring to 🙏🏾

@Baarsgaard
Copy link
Contributor

Sure, I will try to break the cache 😉 soon ™️

@Baarsgaard
Copy link
Contributor

Baarsgaard commented Feb 13, 2025

To solve the above, I can only think of one solution which is to force labels on a Grafana CR to be inherited onto sub-resources(Secrets, ConfigMaps, PVCs, Deployments, ServiceAccounts, Ingress, Services).
That way the resources are included in the cache as long as the labelSelector does not explicitly include 'app.kubernetes.io/managed-by!=grafana-operator'!

EDIT: If the above is implemented, it should technically be possibly to migrate shards as well

@wilfriedroset
Copy link
Contributor Author

wilfriedroset commented Feb 17, 2025

Thank you @Baarsgaard I will try to have a closer look at the solution you are suggesting. I'm still new to this codebase, so if you have any pointer I would greatly help 🙏🏾

can only think of one solution which is to force labels on a Grafana CR to be inherited onto sub-resources(Secrets, ConfigMaps, PVCs, Deployments, ServiceAccounts, Ingress, Services).

should I patch only the Grafana resources or should I patch all of them?

My initial idea is to naively patch SetCommonLabels to take an additional map[string]string and then call it with the CR label

Alternvatively we can create a new dedicated function that you do the patching and replace the call to SetCommonLabels by the new func

func SetCommonLabelsExtended(obj metav1.ObjectMetaAccessor, extraLabels map[string]string) {
       SetCommonLabels(obj)
       meta := obj.GetObjectMeta()
       labels := meta.GetLabels()
       for k, v := range extraLabels {
               labels[k] = v
       }
}
model.SetCommonLabelsExtended(configMap, cr.Labels)

see:

func SetCommonLabels(obj metav1.ObjectMetaAccessor) {
meta := obj.GetObjectMeta()
labels := meta.GetLabels()
if labels == nil {
labels = CommonLabels
} else {
for k, v := range CommonLabels {
labels[k] = v
}
}
meta.SetLabels(labels)
}

wdyt?

@Baarsgaard
Copy link
Contributor

Baarsgaard commented Feb 17, 2025

You can have a look at #1661 to see where we need the labels propagated.
From what I can see, the Grafana instance with the labels seems to be in the function args for all of them as cr.
My initial idea would be to refactor models.SetCommonLabels and change it to SetInheritedLabels which takes the Grafana instance as an input, then handle the cr.labels nil check inside the function as well.
model.SetInheritedLabels(cr, <object>)

@Baarsgaard
Copy link
Contributor

Ah, I did not notice you edited your previous comment, but it looks like we're thinking along the same lines!

@wilfriedroset
Copy link
Contributor Author

@Baarsgaard I've managed to patch the operator as defined above and reproduce your test.

➜  grafana-operator git:(labelSelectors) ✗ k get grafana grafana-normal -o yaml
apiVersion: grafana.integreatly.org/v1beta1
kind: Grafana
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"grafana.integreatly.org/v1beta1","kind":"Grafana","metadata":{"annotations":{},"name":"grafana-normal","namespace":"default"},"spec":{"config":{"auth":{"disable_login_form":"false"},"log":{"mode":"console"},"security":{"admin_password":"secret","admin_user":"root"}}}}
  creationTimestamp: "2025-02-18T16:10:26Z"
  generation: 1
  name: grafana-normal
  namespace: default
  resourceVersion: "723"
  uid: 5c599b1e-1c66-4978-adae-1f93cac1178b
spec:
  config:
    auth:
      disable_login_form: "false"
    log:
      mode: console
    security:
      admin_password: secret
      admin_user: root
➜  grafana-operator git:(labelSelectors) ✗ k get grafana grafana-shard -o yaml
apiVersion: grafana.integreatly.org/v1beta1
kind: Grafana
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"grafana.integreatly.org/v1beta1","kind":"Grafana","metadata":{"annotations":{},"labels":{"manual":"test"},"name":"grafana-shard","namespace":"default"},"spec":{"config":{"auth":{"disable_login_form":"false"},"log":{"mode":"console"},"security":{"admin_password":"secret","admin_user":"root"}}}}
  creationTimestamp: "2025-02-18T16:10:26Z"
  generation: 2
  labels:
    manual: test
  name: grafana-shard
  namespace: default
  resourceVersion: "748"
  uid: 5924a376-0d9b-4f9e-820f-02277ec77a27
spec:
  config:
    auth:
      disable_login_form: "false"
    log:
      mode: console
    security:
      admin_password: secret
      admin_user: root
  version: 11.3.0
status:
  adminUrl: http://grafana-shard-service.default.svc:3000
  stage: complete
  stageStatus: success

WDYT?

@Baarsgaard
Copy link
Contributor

Baarsgaard commented Feb 18, 2025

Sidenote: When sharding, Secrets/ConfigMaps with referenced values also need to comply with the shard labels, otherwise they break with:

status:
  conditions:
  - message: 'getting referenced value: Secret "contact-mails" not found'
    reason: InvalidSettings
    status: "True"
    type: InvalidSpec
  lastResync: "2025-02-18T22:26:51Z"

Similarly for Internal Grafana instances, they need to be labelled in advance of turning on the sharding feature to ensure labels are inherited.

Both should be emphasised in the documentation for this feature.

If we start seeing false positive bugs/issues/discussions on this specific condition, logging a hint when sharding is enabled could make sense?

@wilfriedroset
Copy link
Contributor Author

If we start seeing false positive bugs/issues/discussions on this specific condition, logging a hint when sharding is enabled could make sense?

You are right, I've added a dedicated commit.

Copy link
Contributor

@Baarsgaard Baarsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally approve of this, but I would recommend mentioning the need to label Internal Grafana CRs in advance.

@wilfriedroset
Copy link
Contributor Author

I would recommend mentioning the need to label Internal Grafana CRs in advance.

I don't understand what I must do on top of the current commits to address this feedback.

@Baarsgaard
Copy link
Contributor

Baarsgaard commented Feb 20, 2025

I'm not a maintainer, you'll need to wait for one, likely @theSuess, to do a final review and run the CI.
You can run most checks locally with:
make all
make e2e-local-gh-actions

The above will run all the lints and compilation as well as run the existing E2E tests in a local kind cluster.
If you're up for it, you can have a look at the E2E tests implemented with kyverno/chainsaw and add a new case for the sharding implementation here.

Copy link
Member

@theSuess theSuess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge thanks to both of you @wilfriedroset and @Baarsgaard.

I pushed two minor fixes and have one last question, other than that, I'm happy to merge this.

Sorry that this took so long 😓

@theSuess theSuess added this pull request to the merge queue Feb 21, 2025
Merged via the queue into grafana:master with commit 2dd5065 Feb 21, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature this PR introduces a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants