-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[grafana-sampling] add sampling helm chart #2918
Conversation
charts/sampling/templates/_otelcol_processor_transform.river.txt
Outdated
Show resolved
Hide resolved
charts/sampling/templates/_otelcol_processor_transform.river.txt
Outdated
Show resolved
Hide resolved
charts/sampling/templates/_otelcol_processor_k8sattributes.river.txt
Outdated
Show resolved
Hide resolved
charts/sampling/templates/_otelcol_processor_transform.river.txt
Outdated
Show resolved
Hide resolved
charts/sampling/templates/_otelcol_processor_transform.river.txt
Outdated
Show resolved
Hide resolved
Doing RED & Svc Graph Metrics Generation + Sampling with the needed load-balancing for scalability is what I expected. I don't catch why there is also a K8s attributes processor, I think we don't want to enrich the incoming signals with any contextual metadata, we require metadata to be provided before entering the sampling box. |
@cyrille-leclerc with @rlankfo, I might not be able to get this addressed until morning. FYI. @gouthamve @mar4uk |
As discussed with @mar4uk , we don't see reasons to enrich incoming OTel signals with K8s Metadata. The |
Shall we put this diagram (the one Cyrille added) to the DesignDoc? The data enrichment processors
are going to be moved out of the Sampling Box and be a part of k8s-monitoring helm chart (they are already a part of the k8s-monitoring helm chart) wdyt, Robbie? |
Do we want to enrich meta-monitoring metrics and logs? I guess Grafana Agent's OTel Collector components publish meta-monitoring metrics similarly to what the Otel Collector does. Note that meta-monitoring could be tackled in a subsequent milestone |
I'll remove these for now but my gut feeling is that not everyone who uses this helm chart will use the k8s monitoring helm chart as well and eventually we may get requests to make some of these processors configurable in the chart. |
I don't think we want to enrich any debug metrics coming out of the agent. I believe these metrics would need to be scraped from the agents /metrics endpoint. |
Let's make sure we keep meta monitoring on our horizon for future work. cc: @rlankfo |
03881bb
to
a944ceb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to see a quick README to briefly explain what's going on here.
Otherwise, looking good.
charts/sampling/templates/_otelcol_exporter_loadbalancing.river.txt
Outdated
Show resolved
Hide resolved
We are going to advise users to use k8s monitoring helm chart if they run their applications in Kubernetes. If they prefer to not use k8s monitoring helm chart anyway (I'm wondering what could be the reasons for not using it), they would still deploy their custom Grafana Agent or Collector. We probably can provide some recommended configurations for this custom Agent/Collector. The downside of having enhancement processors in the sampling box in case when users don't use k8s monitoring helm chart is we will enhance only traces. Users still would need to enhance metrics and logs somewhere before the sampling box |
We provide the recommended configurations but we don't provide a way to deploy them. The suggestion was to make the processors configurable in this helm chart when users do not deploy k8s monitoring. There could be several reasons such as cost, having other monitoring solutions implemented for k8s, etc.
Not necessarily. The way it was originally configured, it allowed to send metrics and logs through the first layer as well. For now, we've chosen to remove this functionality in any case but if there are more comments around it lets iterate on the doc. |
Hey folks. I've been pointed at this chart draft, and want to raise a few comments, not about the code itself but more a general conversation around usage. We now have several different Agent charts for different purposes, including the 'vanilla' agent install and the k8s one. An issue with layering more charts in is that we're now starting to advise users to potentially put in entire pipelines of Agents in their infrastructure to get appropriate signals into our backend. This becomes problematic, and it's something we've seen in GTM. Could some of this work not be carried out using Agent Modules? The alternative is a user having to pick the chart that does the 'heaviest lifting' for them to start with, and start to override config, or at worst rewrite bits of charts, to actually do what they need. I would very much welcome a conversation here, or elsewhere, to discuss this. |
Thanks for reaching out @hedss, with great pleasure to have a conversation, I'll DM you. |
Signed-off-by: Robbie Lankford <[email protected]>
Signed-off-by: Robbie Lankford <[email protected]>
Signed-off-by: Robbie Lankford <[email protected]>
Signed-off-by: Robbie Lankford <[email protected]>
Signed-off-by: Robbie Lankford <[email protected]>
Signed-off-by: Robbie Lankford <[email protected]>
Signed-off-by: Robbie Lankford <[email protected]>
Signed-off-by: Robbie Lankford <[email protected]>
Signed-off-by: Robbie Lankford <[email protected]>
Signed-off-by: Robbie Lankford <[email protected]>
Signed-off-by: Robbie Lankford <[email protected]>
Signed-off-by: Robbie Lankford <[email protected]>
charts/grafana-sampling/templates/_otelcol_connector_spanmetrics.river.txt
Outdated
Show resolved
Hide resolved
Looks good! Left a couple of comments, I think they should be fixed before merging. Did you exclude logs and metrics pipelines because they should be handled by a separate agent? Let me know if you need help with the diagram update (logs and metrics arrows should be removed, right?) |
Thanks Irina! I'll make these updates. I would always appreciate any help with the diagram. It should only support traces. |
Signed-off-by: Robbie Lankford <[email protected]>
Signed-off-by: Robbie Lankford <[email protected]>
LGTM! |
* add sampling helm chart Signed-off-by: Robbie Lankford <[email protected]> * wire metrics generation toggle Signed-off-by: Robbie Lankford <[email protected]> * add simpified sampling policies Signed-off-by: Robbie Lankford <[email protected]> * set 2 replicas and disable autoscaling by default Signed-off-by: Robbie Lankford <[email protected]> * set back to 1 replicas by default to pass ci tests Signed-off-by: Robbie Lankford <[email protected]> * use kubernetes resolver for loadbalancing exporter Signed-off-by: Robbie Lankford <[email protected]> * add README.md Signed-off-by: Robbie Lankford <[email protected]> * helm-docs Signed-off-by: Robbie Lankford <[email protected]> * helm-docs Signed-off-by: Robbie Lankford <[email protected]> * update helm-docs; add decision wait Signed-off-by: Robbie Lankford <[email protected]> * helm-docs and fix typo Signed-off-by: Robbie Lankford <[email protected]> * quote decision_wait Signed-off-by: Robbie Lankford <[email protected]> * add transform to drop unneeded resource attributes for spanmetrics Signed-off-by: Robbie Lankford <[email protected]> * more doc updates Signed-off-by: Robbie Lankford <[email protected]> * more doc updates Signed-off-by: Robbie Lankford <[email protected]> * move sampling to grafana-sampling Signed-off-by: Robbie Lankford <[email protected]> * additional docs updates Signed-off-by: Robbie Lankford <[email protected]> * remove sample file Signed-off-by: Robbie Lankford <[email protected]> * shorten names to pass tests Signed-off-by: Robbie Lankford <[email protected]> * update png and metrics pipeline order based on PR review Signed-off-by: Robbie Lankford <[email protected]> * remove k8s.pod.name from default dimensions Signed-off-by: Robbie Lankford <[email protected]> --------- Signed-off-by: Robbie Lankford <[email protected]>
This helm chart deploys a layered set of agents that can be used for OTLP load balancing, metrics generation, and trace sampling. This provides an opinionated solution that customers can use as somewhat of a "black box" on top of the agent. It does quite a bit of the river/flow configuration heavy-lifting.