RFC: Observability API #97

R-Lawton · 2024-07-23T15:55:59Z

What are people's thoughts on the original suggestion of the observability CR? After a discussion i had with Jim, I think the observability policy is good but I would like others' opinions too.

rfcs/0000-observability-api.md

maleck13 · 2024-07-31T11:49:06Z

So discussed with @R-Lawton in person, but there seems to be some redundancy in the configuration and needs a user to understand the names and configuration options for underlying components. Also a separate CRD doesn't seem needed IMO if we condense the config

I think we can simplify this down somewhat into a two phase approach

offer log and tracing options in the kuadrant CR (but they will only affect Authorino and Limitador) in phase 1 and be as simple as:

logging:
   level: debug
tracing:
  common tracing config applied to both limitador and authorino
metrics:
  enabled: false (removes service monitors) enabled by default

Where by default it is set to error level for logging and a default logging config plus , metrics defaulted to on.

We can then expand beyond this in a second phase

add comming configmap for expressing observability values that can be loaded as env vars (logging for example) so that it changes all components including the operators . Add in new options around alerts and dashboards as needed.

The status of the kuadrant resource can reflect what has been configured:

logging:
   authorino: debug
   limitador: debug
tracing:
  ...

WDYT @Boomatang @R-Lawton

R-Lawton · 2024-08-01T13:47:25Z

offer log and tracing options in the kuadrant CR (but they will only affect Authorino and Limitador) in phase 1 and be as simple as:
logging:
   level: debug
tracing:
  common tracing config applied to both limitador and authorino
metrics:
  enabled: false (removes service monitors) enabled by default
Where by default it is set to error level for logging and a default logging config plus , metrics defaulted to on.

@maleck13

By default the log level is info, what are your thoughts about changing it to error?

Boomatang · 2024-08-01T14:18:00Z

@maleck13

To me this makes sense, I do think we need the focus more on the action being preformed over the component being configured.

How I see this now is we should remove the current ability to configure Limitador from the kuadrant CR in favor of the more action based configuration. If you have not seen the discussion, #102 that suggest removing the work done for RFC0006 I will take your comment as the answer for that discussion.

After talking with @R-Lawton, we can not see any iteration of this RFC that would not require the removal of the past configuration implantation. So I have opened a PR to remove that work, Kuadrant/kuadrant-operator#795

rfcs/0000-observability-api.md

Signed-off-by: R-Lawton <[email protected]> Signed-off-by: R-Lawton <[email protected]>

KevFan

Looks generally good, I've left some minor suggestions 👍

rfcs/0000-observability-api.md

…e location to the Kuadrant CR Signed-off-by: R-Lawton <[email protected]> Signed-off-by: R-Lawton <[email protected]>

KevFan

Looks good to me 👍

Boomatang · 2025-01-13T10:31:22Z

As this would be configured inside the kuadrant CR, it would be worth looking at #112 as it is trying to define a standardized API design for new features added to the kuadrant CR

maleck13 · 2025-01-24T08:43:43Z

Wanted to call out that there is a dicussion and investigation outlined here about some of these areas
Kuadrant/kuadrant-operator#1130
We have multiple "lower level" things that a user may want to configure. We want to have a consistent experience around how these are achieved. So let hold this PR until that is completed

david-martin · 2025-01-24T09:09:31Z

Since the Kuadrant CRD is v1, I don't see any avenue to trying things out with new APIs exposed through the Kuadrant CR.
For example, introducing some alpha API, seeing how it gets used and adapting it based on feedback.

@maleck13 @Boomatang @KevFan @R-Lawton @guicassolato What are your thoughts on exploring new APIs through a separate CRD. In this case, that would be an Observability CRD at v1alpha1.
There are a few ways that API could go then.
It could eventually evolve into a v1 (as the Observability CRD).
It could eventually fizzle out and be removed.
Or it could be merged into the Kuadrant CR.

I worry that the outcome of discussions and linked topics in Kuadrant/kuadrant-operator#1130 could end up stifling or delaying potential improvements and new features while we agonise over getting the perfect API.

maleck13 · 2025-01-24T09:15:52Z

The Kuadrant CRD is not v1 at the moment (correctly or incorrectly) https://github.com/Kuadrant/kuadrant-operator/blob/main/config/install/configure/standard/kuadrant.yaml

david-martin · 2025-01-24T09:29:54Z

Apologies, you're right.
Well that does change my argument above.
I think there's still an argument for using separate CRDs so as not to complicate versioning of the Kuadrant CRD.
There may be a conversation about how the Kuadrant CRD is versioned in general, and if/when it makes it to v1.

I'd still be interested in peoples thoughts on the above argument about using a separate v1alpha1 CRD to introduce this new feature.

Boomatang · 2025-01-24T14:48:42Z

What are your thoughts on exploring new APIs through a separate CRD. In this case, that would be an Observability CRD at v1alpha1.

Why yet another CRD? As this would be a very experimental feature, would it not be better to use a configmap with labels for the first pass at least. Once the structure has being refined a bit more, then we can look at creating a dedicated CRD or moving it into Kuadrant CRD.

One advantage of using a configmap would be allowing the feature to be behind a feature flag that doesn't require the installation of a CRD. There is currently similar things happening in kuadrant operator turning on and off workflows based on the CRDs installed for the different supported gateways. The same concept can work for a feature flag.

If you are wanting to get this in front of user fast, I think the configmap is the fastest approach.

david-martin · 2025-01-24T15:25:06Z

Why yet another CRD

I like CRDs. To me, they make configuration feel more controlled and kube native.
A big part of that is having a spec, field validation and taking advantage of built in kubectl cli args like output formatting and templating.

That being said, if there's push back on a CRD approach, the configmap solution can certainly work as you've outlined, and in a safe way.

If you are wanting to get this in front of user fast

There's no users screaming out for this.
It's something I think can make installation easier overall for some users.
For now, I'm content to let the discussion continue here and in Kuadrant/kuadrant-operator#1130, before refining this RFC.

KevFan · 2025-01-24T15:25:41Z

@david-martin Personally, I'm leaning towards just prototyping with the Kuadrant CR regardless of its current API version since the RFC "consenus" (at the moment) is to use the Kuadrant CR to configure Observability. I don't think we should care too much on complicating the versioning of the Kuadrant CR given this 🤔 Say even if it was at v1, we can add a v2alpha1, which makes a new alpha version that allows us to experiment with it 🤔 If the consenus had been to use separate CRD from the start, then that would make complete sense.

I like the general conventions set by the k8s api for introducing any changes to their api and the understanding of "the internal representation of an API object is decoupled from any one API version":

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md

We can prototype this as a non-breaking change to either v1beta1 as an optional field since the spec is currently empty (until there's an new upstream release) or better yet introduce a v1beta2 that we prototype with but keep v1beta1 as the stored version and the only version we generally recommend. If we're saying users can try this feature, use v1beta2 but knowing that it's a feature that is in the early stages and can have breaking changes until it's the recommended version 🤔 Achieving this is probably not "easy", but may be the "simpler/cleaner" way eventually. Eventually we'll reach a v1betaN version which will be promoted as v1 of Kuadrant CR if that's the goal.

I'm guessing what may add to this complexity and what would affect deciding on whether we want to or not use the Kuadrant CR to prototype with this approach is possibly the upgrades and release differences with upstream and downstream 🤔 (e.g. upstream with many releases now at v1betaN, and removed v1beta1/no longer reconcile/convert from this version, but downstream takes in the latest release at v1betaN)

If the goal is to just experiment quick on observability, then a separate CRD makes sense that I'm not totally against anyhow.

R-Lawton force-pushed the rfc-observability branch 3 times, most recently from 34cb471 to 3b91376 Compare July 25, 2024 13:42

R-Lawton marked this pull request as ready for review July 25, 2024 13:42

R-Lawton changed the title ~~RFC: Observability Policy~~ RFC: Observability API Jul 25, 2024

R-Lawton force-pushed the rfc-observability branch from afbb0c4 to 2864c28 Compare July 25, 2024 13:59

R-Lawton commented Jul 25, 2024

View reviewed changes