✨ Multiple Language Support + Application Discovery in Hub #173

jortel · 2024-04-25T16:56:30Z

Ref: #155

Signed-off-by: Jeff Ortel <[email protected]>

shawn-hurley · 2024-05-02T13:23:03Z

Can we copy the details over here so that we can comment on specific details

Signed-off-by: Jeff Ortel <[email protected]>

jortel · 2024-05-02T14:14:44Z

Can we copy the details over here so that we can comment on specific details

done.

shawn-hurley

Some concerns and nits.

I really don't think that we want to be in the business of scheduling pods beyond priority class; we should allow the platform to be responsible for the things that it is supposed to be responsible for, IMO.

shawn-hurley · 2024-05-02T15:00:43Z

enhancements/multiple-provider/README.md

+selector: ! tag:Language=
+
+Java and Linux
+selector:  tag:Language=Java && tag:OperatingSystem=Linux


pretty concerned with a new format for selector, the concept of tag: does not exist, and does not make it clear what I can do with this. I am worried that we keep running into each other with concepts, and considering that the hub is another way of running the CLI at scale, I believe that we need to make the UX as cohesive as possible between the two.

I would prefer that either we make the selectors specific (for tag, whatever else you are thinking of selcting on) or we just start with selector and assume that it will only ever work off of tags

I think the systemic use of a generic concept like selector is a good thing and makes the UX more cohesive. The analysis rules are clearly selected using a label selectors. Extensions, they are clearly extension selectors (by nature of scope). The same is true for addons.
I don't see how introducing a completely new term/convention for expressing how addons and extensions are selected makes for a better UX. IMHO, it seems more confusing.
The tag: and platform: prefix provides for extensibility without changing the API and seems like a minor conceptual addition.

Also, there is not reason to expect that extensions will only be selected by tags and addons will only be selected by tags. Given that both will need to be selected for source analysis, platform analysis and TBD.
For source analysis, the addon may be selected by tag.
For platform analysis, the addon will like be selected by platform (kind).
It's reasonable to anticipate the need the same flexibility for different types of discovery tasks.

The selector is a powerful, compact and extensible mechanism for supporting this and future needs.

The tag: and platform: prefix provides for extensibility without changing the API and seems like a minor conceptual addition.

I worry that this prefix will be expected in other selectors and will not exist; that is the confusion that I am worried about. This may not make sense as a worry, but I wanted to bring it up.

I agree that the <prefix>: to selectors is a suitable mechanism for extending selectors as we move forward for "things" as you said.

There are things that I would want to make sure that we have if we do go forward (and I think that the worry above is worth ignoring).

we should have good validation, for if the is valid. As you said, in the others it is assumed what it is and they are valid, but when we open it up, people can make mistakes that would be hard to track down (why isn't my thing being selected, OHH i am use language:Java instead of tag:Language=java as an example)

we should have a discovery mechanism to know what is valid, and because these selectors are fields in k8s resources, and we want them generic, there is no good way (see csi drivers IMO). Maybe this is a UX that we have to live with, and just make sure that we have good documentation.

WDYT? does that make sense?

Definitely. The implementation reports errors for an unknown selector prefix. And, if the tag: selector references an unknown tag category.

Not sure I understand this exactly. I'd anticipated the extension and addon controllers would validate the k8s references in the CRs. Is that what you meant?

Awesome re: validation, can you add a bit about that in the enhancement and how that works? if it is there, and I missed it I am sorry.

I was kind of hoping that we started to think of a "documentation" screen. Maybe this is outside the scope of this enhancement. One of the things that we are trying to do in Kantra is exposed via open API what providers give you for rules, and I was hoping that for this, we could do something similar.

I think that the outcome here is that we need to track two things:

A validation feature in the UI for when selecting the implicit selectors, that you can not use <prefix>:

A validation feature in kantra to do the same.

@ibolton336 @sjd78 @pranavgaikwad let me know if you folks see issues with that.

enhancements/multiple-provider/README.md

shawn-hurley · 2024-05-02T15:04:24Z

enhancements/multiple-provider/README.md

+- `selector` - defines selector for inclusion in the addon task pod.
+- `metadata` - defines unstructured information.
+
+Changed _Addon_ CR (Spec):


I can not understand the changes the spec.

Also if we are changing a k8s API, then we need to make sure that we understand what to do on upgrade with all the fields.

The incompatible changes are is part of a more holistic approach and better symmetry with extensions. The additions are needed for dynamic selection of addons we'll almost certainly needed for platform discovery (future). As for upgrade. We can include a conversion web hook and version our API but:

web hooks are a pain.

the API is at alpha.1

the only addon CRs are ours which are installed/updated by our operator.

That said, if folks feel strongly, we can include a conversion web hook. But I really don't think it's worth it.

After relooking I missed the - and + that you add that was not the -/+ from the diff, that is my fault I am sorry :)

If we are just adding fields and not changing fields or removing them, then we just need to handle the sane defaults (for the things that exist).

I agree webhook should be a last resort and is not necessary unless we are changing a field.

Field removal, on the other hand, is something that is not something we should do in one release. Instead, we should deprecate and deal with the consequences of them being left and removed in a future release. I know this is a pain, but is probably the right thing to do for our users. I think we need to come up with a plan here.

I would suggest that we should follow a similar process for our APIs that the hub API is providing as well.

I share the same philosophy here. However, I am all but certain there are no addon CRs in our namespace besides the (1) we install and manage. Mainly because there is no way of running other from the UI. Also, I would argue that an alpha1 version offer no guarantee against this kind of change. It's only as a matter of practicality that I propose we don't do as your suggesting.
Leaving the resources and image fields in the spec introduces the opportunity for anomolies. What if the image and resources are defined in both the (old) fields and the new container field? Seems like we would be creating a problem complexity unnecessarily.

As discussed w/ Jeff offline:

Plan A:
we remove the fields, but turn on strict enforcing. This will enable the users who were using the API that the fields that were removed are no longer valid.

Plan B:
If there are issues with upgrade/downgrade and removing the fields, we will leave them in place but add validation that makes it so they can't be set, with a deprecation warning, and will hopefully be able to be removed in a future release.

If anyone has thoughts let us know :)

shawn-hurley · 2024-05-02T15:17:32Z

enhancements/multiple-provider/README.md

+
+Pods are created with the appropriate _PriorityClass_ to ensure node scheduling matches. Task dependencies will be escalated as needed to prevent priority inversion. When (dep) tasks are already scheduled (Pending), the pod will be recreated referencing the updated priority class.
+
+When the nodes are saturated and task pods are phase=Pending, the task manager may reschedule (delete) task  with lower priority in an effort for higher priority pods to be run by the node scheduler.  PriorityClass preemption should handle this.


I am worried about anything that starts to try and reschedule things IIUC. If we are using PriorityClass already then leaving the pods in phase pending should be sufficient unless I am misunderstanding something.

Trust me, I pushed back on this as well.

Actually references to PriorityClass needs to be removed ( I thought I did).

The sections here about priority escalation and preemption are a necessary reaction to issues introduced by the automated discovery which is required by the "multi-provider feature". Mainly that analysis tasks cannot be delayed by discovery tasks.

But a few words about PriorityClasses. I looked pretty hard at them with high hopes.
A few problems:

they are a cluster resource that affect all pods in all namepaces. I highly doubt any customer will let our operator install them (and should not). We would need to convince customers to add the classes we need and define them on the tackle CR. Customers would need to define a default (if one does not exist) to prevent our task pods from running at hiher priority than controllers. According to @rromannissen , requiring users to do this a pretty much a deal breaker.

priority classes seem to only define a preference. I cannot find a claim in the documentation that the node scheduler will postpone scheduling lower priority pods (for which there is a node with available resources) to clear capacity for pending higher priority pods requiring more resources than available. This produces a priority inversion. In practice (what I observed) was that in a resource bound cluster, when many small pods are pending, larger, higher priority pods will are significantly delayed because it's easier to find small holes than large ones. Unfortunately, our analysis pods require a very large amount of memory which is likely to produce resource-based scheduling issues except on extremely large clusters.

Note: The tasking system already supports scheduling tasks (pods) in priority order. This is essential for installations where a ResourceQuota is placed in the konveyor namespace. According to @rromannissen, the quota will be common.

IMHO, this is the only complicated topic addressed by proposal. I'm continuing to run test to profile scheduling behaviors. I will add a use case to the Use Case section as told to me by @rromannissen .

That is good information about priority classes; thanks for explaining that!

I want to push back on the need. IMO, the benefit is a premature optimization for a theoretical problem. From using Kantra to do language discovery, I am seeing times in milliseconds, and if we end up delaying by that time frame, it is hard for me to see the benefit as analysis is in the 10's of minutes, as far as I know.

To be clear, I would agree that this is a problem that is needed if the delay is in the minutes/10's of minutes.

Can we do this without the this, and come back if/when this does become an issue. Maybe we do something simple like an option to turn off automatic discovery for customers that do hit this issue as an escape hatch?

The priority escalation features is simpler and only involves redeploying pods that are still pending. This seems straight forward and simply ensures that the pods submitted to the node scheduler match task priority. This is critical when a quota exists in the namespace.

To us, the preemption feature is the controversial one and what we pushed back on for these reasons:

As you stated, It seems like a premature optimization.

There is no way to determine which pods should be preempted. The pods blocked the analysis pod are just as likely to be belonging to other operators as our own. Even within our own, it impossible to tell. In all cases it's a best-effort attempt using a sledge hammer.

For these reasons, I made the preemption feature optional. Enabled with a setting. I agree we should consider disabling by default. Users can enable it should they experience problem described in the use case, they can enabled it. However, we'll need to convince @rromannissen.

I am not opposed to making automated discovery enabled/disabled via setting as well. Again, we'll need to get with @rromannissen as to the default.

I think the confusion here is what would be introducing the potential delay for the user requested analysis. My understanding is that language discovery does indeed take milliseconds to execute, but automated discovery would also include the technology stack/insights discovery, which implies the execution of an analysis via LSP and can effectively take minutes to complete. The end goal for automated discovery is to surface language and technology stack information at scale throughout the entire portfolio, so applications get automatically associated to their corresponding archetypes ASAP. But that shouldn't imply that the user is unable to do anything until automated tasks are finished, as that would lead to the system being perceived as unresponsive. The problem and the request are stated in #170

Signed-off-by: Jeff Ortel <[email protected]>

djzager

This is very detailed. One concern that was brought up when we were talking about the .NET support effort was how to properly get the container running on the windows node. Assuming that we have analyzer-addon and dotnet-external-provider images that can be run on a windows node, we need a way to:

let the hub know that this extension/provider can be run on a specific os/platform
let the hub know that an addon can be run on a specific os/platform
mechanism for determining under which conditions the addon/provider will be run on a windows node.

djzager · 2024-05-07T19:17:12Z

enhancements/multiple-provider/README.md

+- Targets (rules) may be filtered by language.
+- Targets (rules) may be pre-filtered by the UI when language is known.
+- Language defaults to Java (until automated discovery feature added).
+- User initiated task execution (Example: analysis) must be prioritized.


I do think we need to talk in this enhancement about how we are going to deal with providers that support being run on different platforms (ie. linux/amd64 and windows/amd64).

Since the addon and extensions run on the same pod, the addon would need to selected by OS. The addon selector could be used for this. For example: tag:OperatingSystem=Windows would select the Windows version of the analyzer addon.

djzager · 2024-05-07T19:19:34Z

enhancements/multiple-provider/README.md

+Fields:
+- `addon` - declares compatibility with addons.
+- `container` - defines the extension container.
+- `selector` - defines selector for inclusion in the addon task pod.
+- `metadata` - defines unstructured information.
+


Will need a way to expose the platform(s) an extension can run on.

djzager · 2024-05-07T19:20:13Z

enhancements/multiple-provider/README.md

+Fields:
+- `kind` - declares compatibility with a _kind_ of task.
+- `container` - defines the addon container.
+- `selector` - defines addon selector for task (kind).
+- `metadata` - defines unstructured information.
+- `schema` - _FUTURE_ defines the Task.Data schema.
+
+For symmetry and completeness, the `metadata` field is added and the separate addon fields for the container are replaced with the entire k8s _Container_.   This is mainly a general improvement while we're changing the CR.
+


Will need a mechanism to declare the platform/os an addon image can be run on.

shawn-hurley

@pranavgaikwad @eemcmullan @JustinXHale

After talking with Jeff, there are some changes on the preemption that we are adding; after this, I am a +1 on this enhancement!

Signed-off-by: Jeff Ortel <[email protected]>

rromannissen

@JustinXHale we need to add a new dialog/step in the analysis wizard to ask users if they want to prioritize their analysis over any background tasks that might be running and warn about the potential consequences. No big deal, we can figure it out offline and attach it to the enhancement afterwards. Aside from that, for me it's a +1 for the enhancement!

jortel added 3 commits April 25, 2024 09:53

New enhancement for multiple languages.

2b42133

Signed-off-by: Jeff Ortel <[email protected]>

checkpoint

f988971

Signed-off-by: Jeff Ortel <[email protected]>

checkpoint

fff64c5

Signed-off-by: Jeff Ortel <[email protected]>

dymurray added this to the v0.5.0 milestone May 2, 2024

Details copied from: hub/issues/599

5779b7f

Signed-off-by: Jeff Ortel <[email protected]>

shawn-hurley reviewed May 2, 2024

View reviewed changes

jortel added 4 commits May 2, 2024 13:54

References to PriorityClass removed.

4f59021

Signed-off-by: Jeff Ortel <[email protected]>

Added use case.

92bf9ce

Signed-off-by: Jeff Ortel <[email protected]>

Minor correction.

77735cb

Signed-off-by: Jeff Ortel <[email protected]>

Describe preemption.

e3c9128

Signed-off-by: Jeff Ortel <[email protected]>

djzager reviewed May 7, 2024

View reviewed changes

shawn-hurley approved these changes May 8, 2024

View reviewed changes

jortel added 2 commits May 8, 2024 10:06

requested updates.

ce501fb

Signed-off-by: Jeff Ortel <[email protected]>

checkpoint

eb163c7

Signed-off-by: Jeff Ortel <[email protected]>

rromannissen approved these changes May 9, 2024

View reviewed changes

dymurray changed the title ~~✨ Support multiple languages.~~ ✨ Multiple Language Support + Application Discovery in Tackle May 9, 2024

dymurray changed the title ~~✨ Multiple Language Support + Application Discovery in Tackle~~ ✨ Multiple Language Support + Application Discovery in Hub May 9, 2024

mansam approved these changes May 16, 2024

View reviewed changes

jortel merged commit 6af286c into konveyor:master May 28, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Multiple Language Support + Application Discovery in Hub #173

✨ Multiple Language Support + Application Discovery in Hub #173

jortel commented Apr 25, 2024

shawn-hurley commented May 2, 2024

jortel commented May 2, 2024

shawn-hurley left a comment

shawn-hurley May 2, 2024

jortel May 2, 2024

jortel May 2, 2024 •

edited

Loading

shawn-hurley May 3, 2024

jortel May 3, 2024 •

edited

Loading

shawn-hurley May 3, 2024

shawn-hurley May 3, 2024

shawn-hurley May 2, 2024

jortel May 2, 2024

shawn-hurley May 3, 2024

jortel May 3, 2024

shawn-hurley May 3, 2024

shawn-hurley May 2, 2024

jortel May 2, 2024 •

edited

Loading

shawn-hurley May 3, 2024

jortel May 3, 2024

rromannissen May 8, 2024

djzager left a comment

djzager May 7, 2024

jortel May 7, 2024

djzager May 7, 2024

djzager May 7, 2024

shawn-hurley left a comment

rromannissen left a comment •

edited

Loading


		Pods are created with the appropriate _PriorityClass_ to ensure node scheduling matches. Task dependencies will be escalated as needed to prevent priority inversion. When (dep) tasks are already scheduled (Pending), the pod will be recreated referencing the updated priority class.

		When the nodes are saturated and task pods are phase=Pending, the task manager may reschedule (delete) task with lower priority in an effort for higher priority pods to be run by the node scheduler. PriorityClass preemption should handle this.

✨ Multiple Language Support + Application Discovery in Hub #173

✨ Multiple Language Support + Application Discovery in Hub #173

Conversation

jortel commented Apr 25, 2024

shawn-hurley commented May 2, 2024

jortel commented May 2, 2024

shawn-hurley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jortel May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jortel May 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jortel May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djzager left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shawn-hurley left a comment

Choose a reason for hiding this comment

rromannissen left a comment • edited Loading

Choose a reason for hiding this comment

jortel May 2, 2024 •

edited

Loading

jortel May 3, 2024 •

edited

Loading

jortel May 2, 2024 •

edited

Loading

rromannissen left a comment •

edited

Loading