[epic] v1.0.0 Performance and Scale #920

joelanford · 2024-06-11T14:22:06Z

Epic Goal

Measure and implement any necessary improvements to ensure OLM v1.0.0 meets or exceeds OCP guidelines around performance and scalability.

Why is this important?

OLM v1.0.0 will be a payload component that always runs in OCP clusters. In order to reduce SD and customer costs, we need to minimize this overhead.
OLM v1.0.0 is intended to be used on a wide variety of clusters, ranging from single node clusters with just a few namespaces to clusters 2-3 orders of magnitude larger. We need make sure that it runs just as well on a small cluster as it does a large cluster.
In order to reduce user frustration, we need to provide a responsive user experience. Reconciliation needs to be fast and non-blocking to ensure users receive the experience they have come to expect from OCP. To the extent possible, long-running tasks (e.g. catalog fetching/caching and image pulling) should be performed asynchronously.

Scenarios

Collect pprof profiles for CPU and memory when running standard user flows around installing, upgrading, and removing operators from public catalogs (e.g. operatorhub)
Find the most resource intensive code paths. Provide documentation and recommendations related to making improvements in those areas.
Coordinate with OLM maintainers to make improvements in areas deemed to provide the most significant performance and scale gain.
Implement automated performance and scale regression tests in the existing upstream CI test suite.

Examples of known areas for improvement include:

When reconciling a ClusterExtension to resolve a bundle from the criteria provided by a user, the reconciler should return a desired bundle within 100ms and allocate no more memory than the size of the catalog metadata for the named spec.packageName.
When the ClusterExtension reconciler does not have the contents of a resolved image bundle available, it does not block waiting for the image to be pulled and processed. Rather, it starts an asychronous job, reports the pending image pull via the ClusterExtension status, and returns from reconcile.

OchiengEd · 2024-06-18T20:18:07Z

/assign

joelanford · 2024-07-14T03:11:57Z

I think I've found one unexpected slowdown: the bundle handler that converts a registry+v1 bundle to plain and then to helm. It takes 5s on my machine in the "Force upgrade" e2e test.

joelanford · 2024-08-14T17:23:00Z

@OchiengEd just wanted to check to make sure you didn't find any critical (as in "must fix for 1.0.0") issues in your performance and scale research?

If not, can we move the remaining scope of this epic to v1.x?

everettraven · 2024-08-20T14:43:33Z

Item to include in performance and scale (although maybe more of a release blocker based on other discussions):

(Provided ServiceAccount) Follow-up: update dynamic cache to re-use cache.Cache instead of creating a new one every time #1025 (optimization)

EDIT: After discussion in the community meeting, this issue is not in scope for this epic.

OchiengEd · 2024-10-24T15:42:13Z

No critical issues were identified. This epic was slated to be moved to 1.x

LalatenduMohanty · 2024-10-29T15:58:05Z

Lets re-access this and identify the acceptance criteria for this epic.

grokspawn · 2025-01-27T03:16:29Z

We should remodel this issue to design the assessment/reporting infrastructure, with MVP implementation. However, it should include the scope of the new query catalogd web API discussed in #1607.
We can create subsequent epics to give us measurable progress and continue to refine the implementation.

LalatenduMohanty · 2025-01-28T16:22:07Z

Next step is to call a meeting and agree on the things we want to achieve with this epic. cc @dtfranz

grokspawn · 2025-01-28T16:27:41Z

From the committee meeting, this is analogous to work that we're doing w.r.t. feature-gates, where we have

a general framework which provides for measurement/assessment/detection capabilities in a standardized approach; and
features which hook into the framework to inform their measurement/assessment/detection functionality, starting with catalogd web api benchmarking #1607

joelanford added v1.0 Issues related to the initial stable release of OLMv1 epic labels Jun 11, 2024

joelanford added this to OLM v1 Jun 11, 2024

openshift-ci bot assigned OchiengEd Jun 18, 2024

JudeNiroshan mentioned this issue Jul 12, 2024

Add tests to generate pprof files #1040

Open

kevinrizza moved this to Implementing in OLM v1 Jul 16, 2024

OchiengEd mentioned this issue Jul 18, 2024

✨ feat: implement initial benchmark tests #1071

Closed

4 tasks

everettraven added v1.x Issues related to OLMv1 features that come after 1.0 and removed v1.0 Issues related to the initial stable release of OLMv1 labels Aug 20, 2024

OchiengEd removed their assignment Oct 24, 2024

LalatenduMohanty added the triage/needs-information Indicates an issue needs more information in order to work on it. label Oct 29, 2024

joelanford added v1.1 and removed v1.x Issues related to OLMv1 features that come after 1.0 labels Nov 5, 2024

LalatenduMohanty moved this from Implementing to Accepted in OLM v1 Dec 3, 2024

LalatenduMohanty removed the status in OLM v1 Dec 3, 2024

LalatenduMohanty added the v1.x Issues related to OLMv1 features that come after 1.0 label Dec 10, 2024

LalatenduMohanty self-assigned this Jan 21, 2025

grokspawn mentioned this issue Jan 27, 2025

catalogd web api benchmarking #1607

Open

jianzhangbjz mentioned this issue Jan 27, 2025

[WIP] benchmark createClusterCatalog func #1651

Open

4 tasks

LalatenduMohanty assigned dtfranz and unassigned LalatenduMohanty Jan 28, 2025

LalatenduMohanty moved this to Designing in OLM v1 Feb 4, 2025

LalatenduMohanty removed the v1.1 label Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[epic] v1.0.0 Performance and Scale #920

[epic] v1.0.0 Performance and Scale #920

joelanford commented Jun 11, 2024

OchiengEd commented Jun 18, 2024

joelanford commented Jul 14, 2024 •

edited

Loading

joelanford commented Aug 14, 2024

everettraven commented Aug 20, 2024 •

edited

Loading

OchiengEd commented Oct 24, 2024

LalatenduMohanty commented Oct 29, 2024

grokspawn commented Jan 27, 2025

LalatenduMohanty commented Jan 28, 2025

grokspawn commented Jan 28, 2025 •

edited

Loading

[epic] v1.0.0 Performance and Scale #920

[epic] v1.0.0 Performance and Scale #920

Comments

joelanford commented Jun 11, 2024

Epic Goal

Why is this important?

Scenarios

OchiengEd commented Jun 18, 2024

joelanford commented Jul 14, 2024 • edited Loading

joelanford commented Aug 14, 2024

everettraven commented Aug 20, 2024 • edited Loading

OchiengEd commented Oct 24, 2024

LalatenduMohanty commented Oct 29, 2024

grokspawn commented Jan 27, 2025

LalatenduMohanty commented Jan 28, 2025

grokspawn commented Jan 28, 2025 • edited Loading

joelanford commented Jul 14, 2024 •

edited

Loading

everettraven commented Aug 20, 2024 •

edited

Loading

grokspawn commented Jan 28, 2025 •

edited

Loading