From 29fa1e145dfb3fb6d158e97c0e566b5555af6e9a Mon Sep 17 00:00:00 2001 From: R-Lawton Date: Fri, 26 Jul 2024 11:00:38 +0100 Subject: [PATCH] RFC Observability Signed-off-by: R-Lawton --- rfcs/0000-observability-api.md | 76 +++++++++++++++++++--------------- 1 file changed, 42 insertions(+), 34 deletions(-) diff --git a/rfcs/0000-observability-api.md b/rfcs/0000-observability-api.md index 87dcddf..6654784 100644 --- a/rfcs/0000-observability-api.md +++ b/rfcs/0000-observability-api.md @@ -34,22 +34,29 @@ The different aspects a user might want to modify could be the following: ###### **Note**: Observability pieces with a * denotes these are post v1 milestone -### Example CR with everything +### Example: Every component has every option available +The below example is a scenario where every option available is being used all configurable by the same type of API. +Although below is showing a CR with every value filled in, the changes will come into full affect through a phased approach due to the nature of some aspects not being available yet. The phased approach can follow the versioning syntax that k8s like v1beta1 or v1alpha1 and be portrayed in the CRD. + ```yaml -apiVersion: kuadrant.io/v1alpha1 +apiVersion: Kuadrant.io/v1alpha1 kind: observability spec: logging: component: authorino: logLevel: debug + logMode: production limitador: - logMode: debug + logLevel: info + logMode: production authorino: + logLevel: error logMode: development limitador: + logLevel: debug logMode: development tracing: @@ -58,9 +65,9 @@ spec: endpoint: rpc://tempo.tempo.svc.cluster.local:4317 insecure: true tags: tag1, tag2 - strategyRules: + strategyRules: #Note: this is a mock up of a potential new feature that is currently being discussed in the following [RFC](https://github.com/Kuadrant/architecture/pull/96) rule1: - best-rule-in-the-world-1 + best-rule-in-the-world-1 rule2: best-rule-in-the-world-2 authorino: @@ -79,21 +86,21 @@ spec: enableService: true port: 8080 deep: true - authorino-operator: + authorinoOperator: enableService: true port: 8080 limitador: enableService: true port: 8080 deep: false - limitador-operator: + limitadorOperator: enableService: true port: 8080 - kuadrant-operator: + KuadrantOperator: enableService: true port: 8080 deep: true - dns-operator: + dnsOperator: enableService: true port: 8080 @@ -101,36 +108,36 @@ spec: namespace: my-amazing-namespace component: authorino: - operator-level: true - component-level: true + operatorLevel: true + componentLevel: true limitador: - operator-level: true - component-level: true - kuadrant: - operator-level: true - component-level: true + operatorLevel: true + componentLevel: true + Kuadrant: + operatorLevel: true + componentLevel: true dashboards: namespace: my-amazing-namespace component: authorino: - operator-level: true - component-level: true + operatorLevel: true + componentLevel: true limitador: - operator-level: true - component-level: true - kuadrant: - operator-level: true + operatorLevel: true + componentLevel: true + Kuadrant: + operatorLevel: true ``` -### Sample use case +### Sample use case (With whats currently available) A use case a user might have would be they desire setting up tracing in the Limitador operator implementing the required endpoints and optional tag. The user also wants metrics setup with custom ports and requires service and serviceMonitors to be created for Kuadrant-operator and Authorino-operator as well as have the Authorino have a log level of Debug. ```yaml -apiVersion: kuadrant.io/v1alpha1 +apiVersion: Kuadrant.io/v1alpha1 kind: Observability spec: tracing: @@ -153,7 +160,7 @@ spec: logLevel: debug ``` ### Status -The status of the Observability CR will not be the observability stack is in a "healthy" state i.e Prometheus and Grafana is up and running. It should be the status of only the things that we contribute for example is new Logging and Tracing now in place or is the console plugin responding. We will not be taking responsibility for aspects we don't have control over. +The status of the Observability CR will not be the observability stack is in a "healthy" state i.e Prometheus and Grafana is up and running. It should be the status of only the things that we contribute for example is new Logging and Tracing now in place. We will not be taking responsibility for aspects we don't have control over. # Reference-level explanation @@ -165,7 +172,7 @@ directly - like setting flags or configuration directly on the deployments of th indirectly - Passing the information to Authorino & limitador via the Authorino & Limitador CRs indirectly - Passing the information to Authorino & limitador in the form of there own Observability CRs -The best approach would be the indirect approach, meaning once the Observability CR is updated the information is passed to the relevant component CR. For example the tracing section in the Authorino CR spec would be updated with the required endpoints and other configuration in the Observability CR, this would then be updated in the Authorino CR. +The best approach would be the indirect approach, meaning once the Observability CR is updated the information is passed to the relevant component CR. For example the tracing section in the Authorino CR spec would be updated with the required endpoints and other configuration in the Observability CR, this would then be updated in the Authorino CR. This does mean that the spec exposed is whats in the Kuadrant component CR's but new changes can be requested and implemented in said components. #### Adding, modifying and deleting values or no values @@ -176,17 +183,18 @@ If the value is removed from the Observability CR it will also be removed from t If no value is provided as the default is acceptable and the component CR is updated to something that changed the default. The value will get overridden to what the default was. If no value is given and there is no default in the Observability CR and the component CR is updated to add a value it will be overridden back to no value. - +# Unresolved questions +Do we want a value where a person just wants everything "on", they want sll the observing. Where a value a the root of the observability section can be set to say give me all you got? And potentially #### Work thats needed The indirect approach allows for not much if any changes to the Authorino operator and the Limitador operator etc . The bulk of the work that would be needed would be in the Kuadrant operator. -In terms of if this piece of work would require its own observability controller the answer needs more discussion. Some of the work could be done by the kuadrant CR but not everything for example the alerts or the dashboards dont make sense to have the kuadrant operator implement them so a new "observability" controller would be needed. This then begs the question if we need a new controller for some parts of the CR it might make sense to have the new controller handle the full CR and not have the Kuadrant CR reconcile it. +In terms of if this piece of work would require its own observability controller the answer needs more discussion. Some of the work could be done by the Kuadrant CR but not everything for example the alerts or the dashboards don't make sense to have the Kuadrant operator implement them so a new "observability" controller would be needed. This then begs the question if we need a new controller for some parts of the CR it might make sense to have the new controller handle the full CR and not have the Kuadrant CR reconcile it. The changes will come into full affect through a phased approach due to the nature of some aspects not being available yet. The phased approach can follow the versioning syntax that k8s like v1beta1 or v1alpha1 and be portrayed in the CRD. ### Default configuration -By default if no observability CR or if values are left blank the current default values will still be used the plan is to not change these and keep them as is. For new features like alerts and dashboard the default values will folllow the same style and trend as the current approach with some features being disabled by default like the tracing or have certain default value like info for logging for example. +By default if no observability CR or if values are left blank the current default values will still be used. The plan is to not change these and keep them as is. For new features like alerts and dashboard the default values will follow the same style and trend as the current approach with some features being disabled by default like the tracing or have certain default value, like info for logging for example. # Rationale and alternatives @@ -194,18 +202,18 @@ By default if no observability CR or if values are left blank the current defaul The above approach allows for the following: * User experience: The observability CR can be easily read to see what the current state of the observability configuration is. Also theres only one place to update rather then multiple. -* Abstraction: The [RFC 0006](https://github.com/Kuadrant/architecture/blob/main/rfcs/0006-kuadrant_sub_components_configurations.md) suggests having observability in the Kuadrant CR with other non observability related variables. With new proposed ideas and aspects for observability coming down the line and the current quite extensive options users can choose from, the Kuadrant CR will become "muddied", hard to maintain and hard to read. +* Abstraction: The [RFC 0006](https://github.com/Kuadrant/architecture/blob/main/rfcs/0006-Kuadrant_sub_components_configurations.md) suggests having observability in the Kuadrant CR with other non observability related variables. With new proposed ideas and aspects for observability coming down the line and the current quite extensive options users can choose from, the Kuadrant CR will become "muddied", hard to maintain and hard to read. * Future proof: Observability currently is Logging, metrics and tracing but theres plans for more configuration. Having it has a standalone API allows for engineering to easily add new features. * Single source of truth: Rather then having multiple crs to check what the current configuration is theres a single source of truth. Preventing users from accidentally changing values by mistake ## Other options: -An other option that has been investigated which is very similar to above, is having observability configuration as a element of the kuadrant CR spec. The majority of the work itself would be largely the same with operators having to move configuration to the Kuadrant CR and having new observability features use the kuadrant CR as the source of truth. +An other option that has been investigated which is very similar to above, is having observability configuration as a element of the Kuadrant CR spec. The majority of the work itself would be largely the same with operators having to move configuration to the Kuadrant CR and having new observability features use the Kuadrant CR as the source of truth. -The reason why we should go with the above method is Abstraction. The Kuadrant CR quite quickly can get "muddied" with observability and be harder to read and maintain also losing its main purpose; being the install kick of and maintainer for the kuadrant operators as well as the single source of truth aspect. From a user point of view having a users have to change configuration in 3+ separate crs in some cases is tedious and not slow. +The reason why we should go with the above method is Abstraction. The Kuadrant CR quite quickly can get "muddied" with observability and be harder to read and maintain also losing its main purpose; being the install kick of and maintainer for the Kuadrant operators as well as the single source of truth aspect. From a user point of view having a users have to change configuration in 3+ separate crs in some cases, is tedious and slow. If we don't decide to any of these options the user will have to manually in multiple places add there configuration of their desired observability stack which can result in poor user experience, mistakes being made and values not being tracked properly. -There was also a previous RFC [RFC 0006](https://github.com/Kuadrant/architecture/blob/main/rfcs/0006-kuadrant_sub_components_configurations.md), that suggests adding everything to the Kuadrant CR, why this RFC should replace the observability aspect are for the reasons stated above: +There was also a previous RFC [RFC 0006](https://github.com/Kuadrant/architecture/blob/main/rfcs/0006-Kuadrant_sub_components_configurations.md), that suggests adding everything to the Kuadrant CR, why this RFC should replace the observability aspect are for the reasons stated above: * Abstraction * User experience * Readability @@ -219,5 +227,5 @@ Manually adding configuration to Kuadrant operator crs. # Future possibilities [future-possibilities]: #future-possibilities -Currently we only have configuration for Logging, Tracing and Metrics. Post v1 the plan is to add alerts, dashboards and potentially other 3rd party like Kiali +Currently we only have configuration for Logging, Tracing and Metrics. Post v1 the plan is to add alerts, dashboards and potentially other 3rd party like Kiali.