📖 Add OLMv1 Overview doc #692

joelanford · 2024-03-12T14:35:31Z

Description

Reviewer Checklist

API Go Documentation
Tests: Unit Tests (and E2E Tests, if appropriate)
Comprehensive Commit Messages
Links to related GitHub Issue(s)

codecov · 2024-03-12T14:41:55Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.01%. Comparing base (38da6fc) to head (9e66fee).
Report is 7 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #692   +/-   ##
=======================================
  Coverage   64.01%   64.01%           
=======================================
  Files          22       22           
  Lines        1370     1370           
=======================================
  Hits          877      877           
  Misses        442      442           
  Partials       51       51

Flag	Coverage Δ
e2e	`47.37% <ø> (ø)`
unit	`58.41% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

everettraven · 2024-03-12T18:23:08Z

docs/olmv1_overview.md

+
+### Single-tenant control planes
+
+One choice for customers would be to adopt low-overhead single-tenant control planes (e.g. hypershift?) in which every tenant can have full control over their APIs and controllers and be truly isolated (at the control plane layer at least) from other tenants. With this option, the things OLMv1 cannot do (listed above) are irrelevant, because the purpose of all of those features is to support multi-tenant control planes in OLM.


The Kubernetes docs on multi-tenancy also mention the Cluster API, Kamaji, and vcluster as options for creating virtual control planes per tenant for tenant isolation. It might be worth linking to the kubernetes docs and/or some of the projects that are specifically designed for addressing this (hypershift can be included in this but shouldn't be the only example IMO)

I think including links to some of these other projects trying to solve the multi-tenancy problem will help get the point across that using something specifically for enabling multi-tenancy is a better solution than trying to force multi-tenancy support into OLMv1

Yep, I'll update this. Thanks! Missed the hypershift callout in my RH-specific scrub.

everettraven · 2024-03-12T18:26:48Z

docs/olmv1_overview.md

+
+Using the [Operator Capability Levels](https://sdk.operatorframework.io/docs/overview/operator-capabilities/) as a rubric, operators that fall into Level 1 and some that fall into Level 2 are not making full use of the operator pattern. If content authors had the choice to ship their content without also shipping an operator that performs simple installation and upgrades, many supporting these Level 1 and Level 2 operators might make that choice to decrease their overall maintenance and support burden while losing very little in terms of value to their customers.
+
+## What will OLM doo that a generic package manager doesn't?


Suggested change

## What will OLM doo that a generic package manager doesn't?

## What will OLM do that a generic package manager doesn't?

everettraven

Overall looks good to me. Had a couple comments, but nothing that I think is worth really holding this PR for.

griffindvs · 2024-03-13T00:00:17Z

docs/olmv1_overview.md

+
+### Watched namespaces cannot be configured in a first-class API
+
+OLMv1 will not have a first-class API for configuring the namespaces that a controller will watch.


How is "first-class API" being defined? Does this mean that OLM will not provide this capability itself, but this could be provided by someone else?

My understanding is that not having a "first-class API" means that there will be no field on the APIs introduced by OLMv1 to set that information explicitly. There is nothing stopping the author of a controller from providing configuration options to users for this or hardcoding the namespaces it should watch.

I don't think we have really talked about this, but I could imagine the APIs introduced by OLMv1 having a kind of "pass through" field where users can use to set some arbitrary values on the manifests deployed by OLM that doesn't prevent this (e.g setting env vars on a deployment).

Rather than saying "not", let's say what it does provide. e.g. "Namespace watching is provided via "

+1 Bryce. Some of this is already captured in issues:

[epic] ClusterExtension parameters passed through to templating engine #381

Add extensible values field to Extension spec #612

[epic] Support registry+v1 bundle installation in SingleNamespace and OwnNamespace install modes #593

@tmshort This is somewhat nuanced, and I want to make sure that the overall message is clear.

Namespace watching is not provided by OLMv1 at all. However, OLMv1 will support arbitrary configuration with schemas provided by extension authors and values that must match those schemas provided by extension admins. If authors decide to include knobs in their configuration schemas that control scoping, that's fine. But it isn't anything OLMv1 knows about or can build features around.

I call this out with a bit more detail in the Approach -> Don't Fight Kubernetes section further down.

There is one and only one exception. In order to maintain backward compatibility with registry+v1 bundles, the OLM maintainers will define the parameterization schema for registry+v1 bundles and will include the ability to define the watched namespaces. But again, this is particular to this bundle format and is not something that the broader OLM system will have awareness of.

griffindvs · 2024-03-13T00:03:21Z

docs/olmv1_overview.md

+However, Kubernetes does not assume that a controller will be successful when it reconciles an object.
+
+The Kubernetes design assumptions are:
+- CRDs and their controllers are trusted cluster extensions.


I understand that a CRD (the API) is global, but it is not clear to me why a controller cannot be namespaced. If the controller is running in a namespace, with a service account that has RBAC limited to the namespace, and only reconciles CRs within that namespace -- I'm not sure I would consider that a cluster extension?

I could be wrong, but in this case I am interpreting "cluster extension" as referring to it literally being something that extends the functionality of the Kubernetes cluster. I don't think there is anything in Kubernetes preventing a controller from running with reconciliation scoped only to a namespace.

I read cluster extension as things that extend the functionally of the cluster, further then what is shipped with a default install of Kubernetes. This has no baring on the scope that an operator author chooses to be implemented within a controller. Even if the controller in question is just namespace scoped, it still extended a k8s clusters functionality.

A controller can be namespaced, but it requires that all controllers for a given CRD manage non-overlapping namespaces. That can be tricky. This is why we are/were considering splitting up "global components" (e.g. CRDs) from potentially "namespace-able components" (i.e. the controllers).

griffindvs · 2024-03-13T00:04:51Z

docs/olmv1_overview.md

+
+The Kubernetes design assumptions are:
+- CRDs and their controllers are trusted cluster extensions.
+- If an object for an API exists a controller WILL reconcile it, no matter where it is in the cluster.


If a controller is running with a service account with namespaced RBAC, would the controller even see a CR created in another namespace?

No. My understanding is that the design assumption we are stating here is that Kubernetes assumes that there is a controller somewhere that will reconcile an object for an API no matter where it is in the cluster. As far as I am aware this doesn't necessarily mean it has to be a single instance of a controller. I do think in general it makes more sense to use a single controller over many namespace specific controllers though.

griffindvs · 2024-03-13T00:07:30Z

docs/olmv1_overview.md

+
+OLMv1 will make the same assumption that Kubernetes does and that users of Kubernetes APIs do. That is: If a user has RBAC to create an object in the cluster, they can expect that a controller exists that will reconcile that object. If this assumption does not hold, it will be considered a configuration issue, not an OLMv1 bug.
+
+This means that it is a best practice to implement and configure controllers to have cluster-wide permission to read and update the status of their primary APIs. It does not mean that a controller needs cluster-wide access to read/write secondary APIs. If a controller can update the status of its primary APIs, it can tell users when it lacks permission to act on secondary APIs.


I wonder if it would be useful to challenge this. Although APIs are inherently cluster-scoped in Kubernetes, can we achieve some level of multi-tenancy by splitting the APIs from the controllers?

Can a cluster admin make an API available -- with no controller installed with cluster-wide RBAC? Can a namespace admin then install a controller in their namespace with limited RBAC to reconcile CRs in that namespace? This may require allowing for the installation of individual components of a bundle.

I don't think we are going to intentionally prevent any of those possible approaches to trying to achieve multi-tenancy but I don't think we are going to intentionally support them either. A user absolutely should be able to do all of those things by creating individual bundles that way, but going that route fights the design of Kubernetes and comes with a host of it's own issues, making it difficult for OLM to be able to effectively handle the automatic life cycling of the controllers. Taking this approach, it will be entirely on the user to understand the impacts of making the decision of using this approach and why OLM may not behave as expected in this case.

For example, I would not see it as a bug if automatic upgrades for a controller were to fail and require manual intervention in this scenario. I would also not see it as a bug if OLM successfully installed multiple controllers reconcile the same resource in the same namespace in this scenario.

The fact that the APIs are global means that there isn't true multi-tenancy of controllers either. All of the controllers for that global API MUST agree on the single API they will all use. Therefore tenants will be limited by the choices made by other tenants when it comes to lifecycling controllers.

As Bryce said, OLMv1 will not get in the way of multiple controller installations, but it also won't help de-conflict between them.

griffindvs · 2024-03-13T00:13:55Z

docs/olmv1_overview.md

+
+### Dependencies based on watched namespaces
+
+Since there will be no first-class support for configuration of watched namespaces, OLMv1 cannot resolve dependencies among bundles based on where controllers are watching.


If we have the following scenario:

Customer goes to install Operator A at the cluster scope (CRD + controller with cluster-wide RBAC)

Operator A has a dependency on Operator B (A will be creating CRs from B and expects B to reconcile them)

I'm not sure why the absence of a watched namespaces concept prevents this dependency fulfillment.

OLMv1, given its limited RBAC, can likely only see that CRD B is not present. If it was present, OLMv1 couldn't see if controller B is available and reconciling.

OLMv1 however, if its RBAC was limited to a single namespace, could create a CR for B and see if controller B picks it up and sets some status field. This would give OLMv1 the information it needs about whether controller B is running and reconciling at the cluster scope (which as presented is the recommend -- possibly only? -- install mode).

Customer goes to install Operator A at the cluster scope (CRD + controller with cluster-wide RBAC)

OLMv1 will be completely unaware of the scoping configuration of Operator A. It doesn't know if Operator A is watching the entire cluster or is watching just a subset.

The fact that an opaque configuration applied to the bundle results in cluster-wide RBAC for the service account tied to a deployment is maybe a good enough proxy for watch namespace. But you are correct that a dependency resolver would also need awareness of the RBAC in use by any dependents, which is not available to OLM and may not be available to a user.

OLMv1 however, if its RBAC was limited to a single namespace, could create a CR for B and see if controller B picks it up and sets some status field. This would give OLMv1 the information it needs about whether controller B is running and reconciling at the cluster scope (which as presented is the recommend -- possibly only? -- install mode).

This seems fragile and complex and lots could go wrong. Off the top of my head:

OLM would have to evaluate the schema of each CRD and be able to produce a valid object

This assumes that all objects have a status

Operators may have admission webhooks that reject creates unless arbitrary conditions are met.

Creating a CR may have major implications on the operations of a cluster, and would like incur costs.

The fact that a CR for B can be created in a particular namespace provides no signal about whether a controller would reconcile B in another namespace.

The most likely dependency resolver implementation given the constraints is probably:

Client-based

Limited to cluster admins who can see RBAC and Extensions cluster-wide.

Required to use RBAC as a proxy for watch namespaces, which may result in assumptions that a controller is watching a namespace, even if it is not (i.e. it has RBAC, but for whatever reason isn't actually watching there)

griffindvs · 2024-03-13T00:14:37Z

docs/olmv1_overview.md

+1. How would a dependency resolver know which extensions were installed (let alone which extensions were watching which namespaces)? If a user is running the resolver, they would be blind to an installed extension that is watching their namespace if they don’t have permission to list extensions in the installation namespace. If a controller is running the resolver, then it might leak information to a user about installed extensions that the user is not otherwise entitled to know.
+2. Even if (1) could be overcome, the lack of awareness of watched namespaces means that the resolver would have to make assumptions. If only one controller is installed, is it watching the right set of namespaces to meet the constraint? If multiple controllers are installed, are any of them watching the right set of namespaces? Without knowing the watched namespaces of the parent and child controllers, a correct dependency resolver implementation is not possible to implement.
+
+Note that regardless of the ability of OLMv1 to perform dependency resolution (now or in the future), OLMv1 will not automatically install a missing dependency when a user requests an operator. The primary reasoning is that OLMv1 will err on the side of predictability and cluster-administrator awareness.


Given it was properly conveyed via the UI (or logging in a CLI), I think you can achieve administrator awareness while still fulfilling dependencies.

Predictability is definitely the harder thing to achieve though. All sorts of inputs go into which dependency is chosen. In other package managers, there is almost always an imperative flow where an admin has a chance to review the chosen dependencies before they are installed.

If it was possible to overcome the difficulty of building a dependency resolver without awareness of controller scope (that's a big if, and not something we're pursuing), then a client-based resolver that presents the user with the chosen set of Extensions to install and lets them decide how to proceed would be the best of both worlds.

With OLMv1's focus on GitOps friendliness and security posture, we have decided not to pursue a controller-based dependency resolver/installer.

And again, this is all fairly moot because we are not pursuing a dependency resolver. However, part of the beauty of this design is that it lends itself more to extensibility. A third party should be able to implement a dependency resolver over the APIs provided by core OLMv1:

Catalog contents are available to clients

Catalog metadata is extensible, so third parties could include their own dependency metadata (e.g. about "requires", "provides", "conflicts", etc).

Extension API is available to clients

griffindvs · 2024-03-13T00:19:49Z

docs/olmv1_overview.md

+
+OLMv1 will not provide dependency resolution among packages in the catalog (see [Dependencies based on watched namespaces](#dependencies-based-on-watched-namespaces))
+
+OLMv1 will provide constraint checking based on available cluster state. Constraint checking will be limited to checking whether the existing constraints are met. If so, install proceeds. If not, unmet constraints will be reported and the install/upgrade waits until constraints are met.


In OLMv0 this seems to exist via nativeAPIs defined in CSVs. The way this is built today, if you try to install an operator via the UI and the nativeAPI requirements are not fulfilled, the failure is not made apparent to the user with the missing dependencies. The installation appears to just be stuck pending. I'd suggest we make this more visible and easily debuggable in OLMv1.

Yes, there are a few rough edges of OLMv0 like this. Another is minKubeVersion in the CSV. The goal is to get all of the constraint-related information into the catalog where it can be evaluated before pulling, extracting, and applying bundle contents.

grokspawn · 2024-03-13T14:16:01Z

docs/olmv1_overview.md

+
+TL;DR: OLMv1 cannot feasibly support multi-tenancy or any feature that assumes multi-tenancy. All multi-tenancy features end up falling over because of the global API system of Kubernetes. While this short conclusion may be unsatisfying, the reasons are complex and intertwined.
+
+Nearly every engineer in the Operator Framework group contributed to design explorations and prototypes over an entire year. For each of these design explorations, there are complex webs of features and assumptions that are necessary to understand the context that ultimately led to a conclusion of infeasibility that led us to today’s conclusion.


Suggested change

Nearly every engineer in the Operator Framework group contributed to design explorations and prototypes over an entire year. For each of these design explorations, there are complex webs of features and assumptions that are necessary to understand the context that ultimately led to a conclusion of infeasibility that led us to today’s conclusion.

Nearly every engineer in the Operator Framework group contributed to design explorations and prototypes over an entire year. For each of these design explorations, there are complex webs of features and assumptions that are necessary to understand the context that ultimately led to a conclusion of infeasibility.

This whole paragraph seems odd. It's describing the process/history but not necessarily the present state of OLMv1. (I.e. The subjects of these paragraphs is not OLMv1, but people and tasks.) I think we may want to reconsider this paragraph, or put it into a historical section, even if it means just adding a ### Historical Context above it.

tmshort · 2024-03-13T14:56:20Z

README.md

+
+OLM v1 is the follow-up to OLM v0, located [here](https://github.com/operator-framework/operator-lifecycle-manager).
+
+It consists of four different components, including this one, which are as follows:


The last "it" referenced was "OLM v0". You may want to distinguish this as OLM v1 (even though it's repetitive, it's more precise).

tmshort · 2024-03-13T14:56:29Z

README.md

+
+OLM v1 is the follow-up to OLM v0, located [here](https://github.com/operator-framework/operator-lifecycle-manager).
+
+It consists of four different components, including this one, which are as follows:
 * operator-controller
 * [deppy](https://github.com/operator-framework/deppy)


Bye-bye deppy?

It is still in use by the ClusterExtension controller. As is rukpak. Let's leave those in until we actually stop using them?

tmshort · 2024-03-13T14:57:08Z

README.md

 * operator-controller
 * [deppy](https://github.com/operator-framework/deppy)
 * [rukpak](https://github.com/operator-framework/rukpak)
 * [catalogd](https://github.com/operator-framework/catalogd)

+For a more complete overview of OLM v1 and how it will differ from OLM v0, see our [overview](./docs/olmv1_overview.md).


Future tense ("will differ") vs present tense ("differs")?

tmshort · 2024-03-13T15:01:40Z

docs/olmv1_overview.md

+
+TL;DR: OLMv1 cannot feasibly support multi-tenancy or any feature that assumes multi-tenancy. All multi-tenancy features end up falling over because of the global API system of Kubernetes. While this short conclusion may be unsatisfying, the reasons are complex and intertwined.
+
+Nearly every engineer in the Operator Framework group contributed to design explorations and prototypes over an entire year. For each of these design explorations, there are complex webs of features and assumptions that are necessary to understand the context that ultimately led to a conclusion of infeasibility that led us to today’s conclusion.


This whole paragraph seems odd. It's describing the process/history but not necessarily the present state of OLMv1. (I.e. The subjects of these paragraphs is not OLMv1, but people and tasks.) I think we may want to reconsider this paragraph, or put it into a historical section, even if it means just adding a ### Historical Context above it.

tmshort · 2024-03-13T17:02:48Z

docs/olmv1_overview.md

+
+### Watched namespaces cannot be configured in a first-class API
+
+OLMv1 will not have a first-class API for configuring the namespaces that a controller will watch.


Rather than saying "not", let's say what it does provide. e.g. "Namespace watching is provided via "

tmshort · 2024-03-13T17:04:34Z

docs/olmv1_overview.md

+However, Kubernetes does not assume that a controller will be successful when it reconciles an object.
+
+The Kubernetes design assumptions are:
+- CRDs and their controllers are trusted cluster extensions.


A controller can be namespaced, but it requires that all controllers for a given CRD manage non-overlapping namespaces. That can be tricky. This is why we are/were considering splitting up "global components" (e.g. CRDs) from potentially "namespace-able components" (i.e. the controllers).

tmshort · 2024-03-13T18:11:23Z

docs/olmv1_overview.md

+
+## What will OLM doo that a generic package manager doesn't?
+
+OLM will provide multiple features that are absent in generic package managers. Some items listed below are already implemented, while others are most likely planned for the future.


Remove "are most likely". Prefer simply "are", or "may be".

netlify · 2024-03-13T19:02:50Z

✅ Deploy Preview for olmv1 ready!

Name	Link
🔨 Latest commit	`dadeb9f`
🔍 Latest deploy log	https://app.netlify.com/sites/olmv1/deploys/65fadda3b5050600089a1dc0
😎 Deploy Preview	https://deploy-preview-692--olmv1.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Signed-off-by: Joe Lanford <[email protected]>

durera

To sum up my thoughts on reviewing this:

OLM v1 feels like a wasted opportunity in the form presented and I really hoped to see more focus around the previously discussed separation of API (cluster-scoped) from controller (not necessarily cluster-scoped) and the impact on lifecycle that such a split would necessitate.

I thought/hoped that OLM v1 would be trying to provide a solution for that challenging problem, rather than what appears to me to be a view of "that's someone else's problem to solve, we are going to design OLM to not get in the way of whoever tries to solve that"

In a model where the CRD and the controller(s) that implement that CRD are distinct entities I don't think something that wants to be a lifecycle manager can just "opt out" of such a key aspect that drives lifecycle events as the relationship between CRD and controller(s).

I think that the discussion around tenancy in the cluster has got in the way a little here perhaps, because this is not about multi-tenancy really, and - unlike multi-tenancy - this seperation of CRD from controller is something that is a natural fit for Kubernetes, certainly a more natural fit then having them bound together as it is today IMO.

OLM v1 should be the glue that binds and manages the CRD and it's controllers, that's what I'd expect of a lifecycle manager for cluster extensions. Put the discussion about tenancy aside, this isn't about being able to isolate one namespace from changes in another (which as had been said many times really isn't possible due to the nature of Kubernetes), but as a cluster administrator I should be able to install an API in my cluster, and then control which namespaces can use that API.

Manage a Kubernetes API extension (CRD)
Manage a single cluster-scoped controller for an installed API extension
Manage one or more namespace-scoped controllers for an installed API extension

I'd expect OLM to be the thing that can manage the extension as a whole including:

Ensuring either use of multiple controllers with namespace scopes or a single cluster scoped controller, and preventing a mix or the two
Managing the relationship between controller and CRD, and e.g. preventing controller updates happening which would create incompatibiity with the CRD version or at least flagging them as moving into a warning state if the controller and the CRD are not compatible
Moving from a model where the CRD and controller are delivered in one package to offering distinct packages for CRDs and controllers, as well as a convenience bundle for both perhaps

The customers with large clusters that I have worked with in the last 2 years do not want to be using cluster-scoped controllers, and they are not seeking to address tenancy concerns through the use of multiple namespace-scoped controllers; they see namespace-scoped controllers as a way to enable the extensions on a namespace-by-namespace basis. Yes, the extension is installed to the cluster, but it's only been enabled for use in namespace1, 2, & 3.

They are seeking to limit the exposure/impact of introducing a new API and updating controllers in their large clusters rather than the ability to independently operate in namespace1 and namespace2 through some form of isolation/tenancy.

This is what I think the next evolution of OLM should be, transition to a first-class data model and lifecycle built around the seperation of CRD from controller.

Example: Strimzi and Red Hat AMQ Streams
Today use of these two major Kafka operators is problematic, if you install both in a cluster things go bad for you because both respond to the same API (kafka.kafka.strimzi.io).

I would expect/hope for OLM v1 to address this with first class support for something like this:

Install the kafka.strimzi.io API extension to say "we support Kafka in this cluster" (no controller(s) included in this action)
Install the Strimzi controller in namespace1 (no change to the API included in this action)
Install the AMQ Streams controller in namespace2 (no change to the API included in this action, it can not break anything in namespace1)
Update the kafka.strimzi.io API extension ... at this point we are performing a cluster-scoped operation that may impact all namespaces with a controller (namespace1 and namespace2)
Update the AMQ Streams controller in namespace2 (no need to worry about impacting namespace1)

To me, this is what I think about I hear people talking about tenancy, because this is what the customers I work with are seeking, and it's a perfect fit for Kubernetes if we just broke apart the delivery mechanism for CRDs and controllers.

I would liken this change in approach to the difference it made when file-based catalogs came around and removed the channel graph information from the operator bundles. It never made sense that an individual operator bundle had to know "what channel will I be in", or "what's the default channel of the package I belong to" and once we were able to define that relationship in the correct place (in the catalog itself) it was a game-changer for managing operator packages. I feel that the same thing needs to happen for CRD/Controllers as part of the evolution of OLM.

pgodowski · 2024-03-21T14:39:36Z

docs/olmv1_overview.md

+However, Kubernetes does not assume that a controller will be successful when it reconciles an object.
+
+The Kubernetes design assumptions are:
+- CRDs and their controllers are trusted cluster extensions.


The example which is counter argument is the Ingress API, which has reference to the ingress class name, which may or may not having a running controller on the cluster.

I am nor arguing APIs are global.
I am arguing that APIs could be separated from the controllers, with different lifecycle

pgodowski · 2024-03-21T14:50:49Z

docs/olmv1_overview.md

+
+### "Watch namespace"-aware operator discoverability
+
+When operators add APIs to a cluster, these APIs are globally visible. As stated before, there is an assumption in this design that a controller will reconcile an object of that API anywhere it exists in the cluster.


Such assumption is dangereous. Controllers might have privileges to read and modify Secrets, to properly manage the operands. Cluster admins will not allow global access to the all Secrets on the cluster, they will allow such access only to the selected namespaces only.

Example: lets assume we are installing Kafka operator at the cluster scope, which will have Secrets get/update permission. Cluster admins will not let Kafka operator access to namespaces running some other workloads where the confidential/sensitive data is stored as Secrets.

Therefore, controllers must be provided a way to restrict their access. It could be done as such that RBAC is granted only in selected namespaces and it is up to the controller to somehow know which namespaces to watch. But then, it begs for some API to understand what is such scope of controller, or at least some though being given what is the controller developer best practices how to handle scope discovery.

grokspawn

RH maintainers had a series of meetings last week and determined some updates/clarifications needed here, but it would be much easier to discern the updates in a separate PR, so let's merge this and do a follow-up to capture updates.

Reviewers: please interpret this as an attempt to ensure that your comment is interpreted in the updated context

joelanford requested a review from a team as a code owner March 12, 2024 14:35

michaelryanpeter mentioned this pull request Mar 12, 2024

📖 Update documentation before Kubecon #693

Open

7 tasks

everettraven reviewed Mar 12, 2024

View reviewed changes

everettraven previously approved these changes Mar 12, 2024

View reviewed changes

griffindvs reviewed Mar 13, 2024

View reviewed changes

grokspawn reviewed Mar 13, 2024

View reviewed changes

tmshort reviewed Mar 13, 2024

View reviewed changes

joelanford dismissed everettraven’s stale review via 9e66fee March 13, 2024 19:02

joelanford force-pushed the olmv1-overview branch from 027901f to 9e66fee Compare March 13, 2024 19:02

Add OLMv1 Overview doc

dadeb9f

Signed-off-by: Joe Lanford <[email protected]>

joelanford force-pushed the olmv1-overview branch from 9e66fee to dadeb9f Compare March 20, 2024 12:59

joelanford enabled auto-merge March 20, 2024 13:50

durera reviewed Mar 21, 2024

View reviewed changes

pgodowski suggested changes Mar 21, 2024

View reviewed changes

pgodowski approved these changes Mar 21, 2024

View reviewed changes

grokspawn approved these changes Mar 26, 2024

View reviewed changes

joelanford added this pull request to the merge queue Mar 26, 2024

Merged via the queue into operator-framework:main with commit 03ac22d Mar 26, 2024
15 checks passed

joelanford deleted the olmv1-overview branch June 20, 2024 19:00


		### Single-tenant control planes

		One choice for customers would be to adopt low-overhead single-tenant control planes (e.g. hypershift?) in which every tenant can have full control over their APIs and controllers and be truly isolated (at the control plane layer at least) from other tenants. With this option, the things OLMv1 cannot do (listed above) are irrelevant, because the purpose of all of those features is to support multi-tenant control planes in OLM.


		Using the [Operator Capability Levels](https://sdk.operatorframework.io/docs/overview/operator-capabilities/) as a rubric, operators that fall into Level 1 and some that fall into Level 2 are not making full use of the operator pattern. If content authors had the choice to ship their content without also shipping an operator that performs simple installation and upgrades, many supporting these Level 1 and Level 2 operators might make that choice to decrease their overall maintenance and support burden while losing very little in terms of value to their customers.

		## What will OLM doo that a generic package manager doesn't?

	## What will OLM doo that a generic package manager doesn't?
	## What will OLM do that a generic package manager doesn't?


		### Watched namespaces cannot be configured in a first-class API

		OLMv1 will not have a first-class API for configuring the namespaces that a controller will watch.


		OLMv1 will make the same assumption that Kubernetes does and that users of Kubernetes APIs do. That is: If a user has RBAC to create an object in the cluster, they can expect that a controller exists that will reconcile that object. If this assumption does not hold, it will be considered a configuration issue, not an OLMv1 bug.

		This means that it is a best practice to implement and configure controllers to have cluster-wide permission to read and update the status of their primary APIs. It does not mean that a controller needs cluster-wide access to read/write secondary APIs. If a controller can update the status of its primary APIs, it can tell users when it lacks permission to act on secondary APIs.


		### Dependencies based on watched namespaces

		Since there will be no first-class support for configuration of watched namespaces, OLMv1 cannot resolve dependencies among bundles based on where controllers are watching.


		OLMv1 will not provide dependency resolution among packages in the catalog (see [Dependencies based on watched namespaces](#dependencies-based-on-watched-namespaces))

		OLMv1 will provide constraint checking based on available cluster state. Constraint checking will be limited to checking whether the existing constraints are met. If so, install proceeds. If not, unmet constraints will be reported and the install/upgrade waits until constraints are met.


		TL;DR: OLMv1 cannot feasibly support multi-tenancy or any feature that assumes multi-tenancy. All multi-tenancy features end up falling over because of the global API system of Kubernetes. While this short conclusion may be unsatisfying, the reasons are complex and intertwined.

		Nearly every engineer in the Operator Framework group contributed to design explorations and prototypes over an entire year. For each of these design explorations, there are complex webs of features and assumptions that are necessary to understand the context that ultimately led to a conclusion of infeasibility that led us to today’s conclusion.


		OLM v1 is the follow-up to OLM v0, located [here](https://github.com/operator-framework/operator-lifecycle-manager).

		It consists of four different components, including this one, which are as follows:


		## What will OLM doo that a generic package manager doesn't?

		OLM will provide multiple features that are absent in generic package managers. Some items listed below are already implemented, while others are most likely planned for the future.


		### "Watch namespace"-aware operator discoverability

		When operators add APIs to a cluster, these APIs are globally visible. As stated before, there is an assumption in this design that a controller will reconcile an object of that API anywhere it exists in the cluster.

📖 Add OLMv1 Overview doc #692

📖 Add OLMv1 Overview doc #692

Conversation

joelanford commented Mar 12, 2024

Description

Reviewer Checklist

codecov bot commented Mar 12, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

everettraven left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netlify bot commented Mar 13, 2024 • edited Loading

✅ Deploy Preview for olmv1 ready!

durera left a comment • edited Loading

Choose a reason for hiding this comment

pgodowski Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

pgodowski Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

grokspawn left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 12, 2024 •

edited

Loading

netlify bot commented Mar 13, 2024 •

edited

Loading

durera left a comment •

edited

Loading

pgodowski Mar 21, 2024 •

edited

Loading

pgodowski Mar 21, 2024 •

edited

Loading