Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Assets generation and Platform Awareness enhancement #210

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
321 changes: 321 additions & 0 deletions enhancements/assets-generation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,321 @@
---
title: assets-generation-and-platform-awareness
authors:
- "rromannissen"
reviewers:
- "@dymurray"
- "@jortel"
- "@eemcmullan"
- "@JonahSussman"
- "@jwmatthews"
approvers:
- "@dymurray"
- "@jortel"
- "@eemcmullan"
- "@JonahSussman"
- "@jwmatthews"
creation-date: 2024-11-20
last-updated: 2024-11-22
status: provisional
see-also:
-
replaces:
-
superseded-by:
-
---

# Assets generation and Platform Awareness


## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] User-facing documentation is created

## Open Questions

- Should there be a dynamic way of registering Platform Types, Discovery Providers and Generator Types? Should that be managed by CRs or could there be an additional mechanism? That would imply adding some dynamic behavior on the UI to render the different field associated with each of them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic behavior on the UI is solved by products like OCP with frontend plugins for different operators on plugins in RHDH (backstage). I think the question should be: "which mechanism would be a good match for existing architecture"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are just doing dynamic fields, I think it would make sense to just focus on that, as that is a much more constrained problem (read the open API spec for a "thing" to determine the type have a the right field for that type). Having a full front end plugin system is hard IMO and if we don't need that we shouldn't focus on it IMO.

In the future, we may but I think we should do that work when it becomes an acute problem users are feeling.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with limiting the scope. Based on this open question it was not clear to me what we intent to provide. Still "just" dynamic fields may grow beyond our initial design.

- How can we store sensitive data retrieved by the Discovery Providers?
- How could we handle the same file being rendered by two different _Generators_ (charts)? Is there a way to calculate the intersection of two different Helm charts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could we handle the same file being rendered by two different Generators (charts)?

using different relase name for each generator can be an idea?
One approach may be to use a different release name for each Generator. WDYT?

Is there a way to calculate the intersection of two different Helm charts

Not aware of a way to intersect, maybe the closest is to use dependency managment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wouldn't be using the Helm release concept, as I wouldn't expect the asset generator to have any direct contact with a k8s cluster (that would be something more for a CI/CD pipeline). We are mostly using Helm to render assets via the helm template command.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that helm template might also reach out to the cluster unless the --dry-run=client flag is used. It depends on what has been coded in the charts: if the chart has been coded to check for the existence of a certain resource (secret, for instance), then helm will attempt to retrieve the secret, unless the option is specified, but then the template generated cannot be guaranteed to be the correct one:
https://helm.sh/docs/helm/helm_template/

--dry-run string[="client"]  simulate an install. If --dry-run is set with no option being specified or as '--dry-run=client', it will not attempt cluster connections. Setting '--dry-run=server' allows attempting cluster connections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the open question how you can layer the file changes on top of a each other, or merge them together, so that the generators work together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the OpenShift generator (chart) generates a Deployment.yaml and the EAP on OpenShift generator (chart) generates a different Deployment.yaml, how can we merge them? It just came to my mind that we could establish an explicit order of preference when assigning Generators to a Target Platform, so if some resources (files) overlap, the ones with the top preference override the others. That would mean no file merging, but the end result would be a composition (should we call this merge?) of the files rendered by all generators.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe missing some context here but my understanding is that we would have one more more generators (configured by the users) which may provide one or more ways to deploy the same app. In my opinion we should not merge anything and provide generated manifests (in different folders) per user request and let user decide what to do about duplicity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense for the first pass. I believe this could get cumbersome, but waiting for an actual user pain makes sense to me.


## Summary

Since its first release, the insights that Konveyor could gather from a given application were either coming from the source code from the application itself (analysis), or from information provided by the different stakeholders involved in the management of the application lifecycle (assessment). This enhancement proposes a third way of surfacing insights about an application by gathering both runtime and deployment configuration from the very platform in which the application is running (discovery), and storing that configuration in a canonical model that can be leveraged by different Konveyor modules or addons.

Aside from that, the support that Konveyor provided for the migration process stopped when the application source code was modified for the target platform, leaving the application itself ready to be deployed but without the required assets to get it actually deployed in the target platform. For example, for an application to be deployed in Kubernetes, it is not only necessary to adapt the application source code to run in containers, but it is also necessary to have deployment manifests that define how that application can be deployed in a cluster, a Containerfile to build the image and potentially some runtime configuration files. This enhancement proposes a way to automate the generation of those assets by leveraging the configuration and insights gathered by Konveyor.


## Motivation

This enhancement aims at enabling Konveyor to eventually tackle the following use cases:

- Fast tracking the migration of containerized applications by automating the translation of deployment assets from one platform to the other. For example, by automating all the configuration gathering from an application deployed in Cloud Foundry and leveraging that information to generate custom tailored deployment manifests for Kubernetes.
- Generation of deployment assets for applications that haven't been containerized.
- Translation of application configuration between application servers (for example Weblogic to EAP).


### Goals

- Platform Awareness:
- Enable Konveyor to retrieve information about applications directly from the platform in which they are running:
- Deployment configuration.
- Runtime configuration.
- Flexible enough to obtain information from multiple platform types:
shawn-hurley marked this conversation as resolved.
Show resolved Hide resolved
- Container platforms.
- Application servers.
- Hypervisors and VMs.
- Others...
- Assets generation:
- Flexible enough to generate all assets required to deploy an application on k8s (and potentially other platforms in the future)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The applications may be multilayered with complex deployments running in different platforms like stateless web service (PCF) and a db or cache (vm). Should we limit ourselves to only parts of the app or attempt to generate all the deployment assets? Depending on our choices we may or may not need to think about network layout and corresponding manifests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How deep we go would totally depend on the Discovery Provider logic and the Helm charts (and potentially other templating technologies in the future) associated with the generator for the target platform. The goal is to provide a framework to enable us and users to do this in a structured way.

Copy link

@istein1 istein1 Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be all sort of assets, and it might be tricky to provide any kind of it.
In case the discovery detects an asset Konveyor doesn't have in it's arsenal,
would the option to ask the user to provide that asset source, so that Konveyor could generate it can be considered?
Or maybe I'm getting this wrong and Konveyor is good with any asset, it would propagate it into a helm chart and then the CI/CD will handle this asset is installed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discovery providers will discover configuration, have it stored in canonical form, and then the generators will generate assets for different target platforms. Considering we will be in control of the discovery providers and generators we ship out of the box, we should put special care on coordinating them to tackle meaningful migration paths such as Cloud Foundry to Kubernetes (meaning shipping a CF discovery provider and a default Kubernetes generator).

- Provide opinionated best practices out of the box.
- Allow organizations to create their own corporate assets easily:
- Use templating as much as possible.
- Build on industry standards.
- Avoid requiring new users to learn new programming languages or proprietary APIs.

### Non-Goals

- Define a transformation logic when migrating between platforms or runtimes. The way the pieces in this enhancement are meant to work is the following:
- Platform awareness is able to retrieve information about how an application is deployed in a certain platform, potentially including runtime configuration as well.
- That information is translated into a well known canonical configuration model.
- Assets generation allows user to use a standard templating engine to create assets (deployment manifests, configuration files, etc.) that suit their needs, leveraging the canonical configuration that gets exposed to the templates in a similar way to Ansible facts.
- **Configuration discovery is orchestrated by Konveyor, but the actual transformation logic is modeled in templates like Helm charts, and managed by the template authors, who are assumed to be knowledgeable in the transformations required to meet their needs. Konveyor exposes the discovered configuration in a well known format so template authors can leverage it to establish equivalences between the source and target platforms.**
- Defining the logic of any Discovery Provider, as each one of them should have their own dedicated enhancements to specify their behavior in relation with the platform each one of them tackles. The aim of this enhancement is to establish the overall framework and component infrastructure to enable discovery and assets generation.

## Proposal

### Personas / Actors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we see a place for SRE or platform engineering in this effort?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is something I would consider once we expose this functionality via the Backstage plugin.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised this question since the original intention was to connect to source runtime as well as make sure it will run in target runtime without issues. This clearly requires work from SRE to configure access, CI/CD etc. although based on our discussion with customer we know it is not the highest priority atm.


#### Administrator

The administrator for the tool that has access to some application-wide configuration parameters that other users can consume but not change or browse.

#### Architect

A technical lead for the migration project that can create and modify applications and information related to it.


### User Stories

#### Platform Awareness

##### PA001

*As an Architect I want to be able to discover and retrieve configuration from applications deployed in a certain platform*

##### PA002

*As an Administrator I want to be able to manage different platform instances*

##### PA003

*As an Architect I want to be able to associate Source platforms from existing platform instances to applications*

##### PA004

*As an Architect I want to be able to retrieve configuration for existing applications that have an associated source platform instance at bulk or on a per application basis*

##### PA005

*As an Architect I want to be able to discover applications deployed in an existing platform instance and use that to populate the application inventory, including the configuration for each individual application*

#### Assets generation

##### AG001

*As an Architect I want to be able to generate assets (configuration files, deployment manifests or any file that might be relevant) to deploy an application in a given target platform.*

##### AG002

*As an Architect I want to be able to author templates to render the required assets using well known templating engines*

##### AG003

*As an Architect I want to be able to author templates using Helm Charts*

##### AG004

*As an Architect I want to be able to manage repositories that contain templates to be used to render assets*

##### AG005

*As an Architect I want to be able to assign templates to target platforms*

##### AG006

*As an Architect I want to be able to associate target platforms with archetypes*

##### AG007

*As an Architect I want to be able to override variables contained in the application configuration retrieved during discovery*

##### AG008

*As an Architect I want to be able to store the generated assets in a repository, that could be the one from the application itself or a different configuration repository*


### Design Details

#### Platform Awareness

##### Platform Instance

First class entity to model a platform instance in Konveyor.

- Managed in the administration perspective
- Potential fields:
- Name
- Platform Type (Kubernetes, Cloud Foundry, EAP, WebSphere…)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this list contain all the supported platforms?
Asking in terms design and infra needed to test this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are just examples. In a first iteration we should focus on Kubernetes and Cloud Foundry.

- URL
- Credentials (From the credentials vault in Konveyor)
- Extra fields depending on the type (TBD)

##### Changes in the Application entity

There will be platform related fields in the Application entity. These fields should be considered optional, as applications can still be managed manually or via the CSV import without the need for awareness of the source platform.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just something to note: we probably want to have these platform-specific fields be mutually exclusive to each other so the intermediate representation doesn't get in a weird state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonahSussman yeah, that's what the platform type field would be for


A _Source Platform_ section (similar to the Source Code and Binary sections) should be included in the _Application Profile_, including the following fields:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rromannissen is it possible to have multiple source platforms for a particular application? If we think of the current functionality around cloning the source from a git repo as a "platform" (git repository, there is an api, we populate the source code from this info......and "analyze" phase is strictly the static code analysis) then we'd definitely need multiple. maybe if there is an EAP app on k8s then you would have the two different source platforms? each responsible for their own details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code retrieval remains part of the analysis phase, as repositories are not a platform in which the application is deployed, but rather a place where the application source code is stored. The analysis process should be able to surface configuration though, and I think that we should (and can) leverage analysis findings (probably coming from Insights) to populate the configuration dictionary, aside from the technology tags to automate archetype association as we do now. That should remain independent from the discovery process for different platforms described in this document.

For a "compound" scenario like EAP on K8s, I imagine having a dedicated discovery provider that can handle the specifics of that situation and be able to retrieve information for both the k8s objects and the EAP configuration. Bear in mind that using a vanilla EAP discovery provider would not work for an EAP on k8s scenario, as some (if not all) of the EAP management APIs are disabled in the image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I could add to that, we would probably have to start scanning container layers at that point to get the information out of them. This is not impossible, and there are many ways to do this, but it is not something that we have implemented.

We should also consider this, but I think it is outside the scope of this enhancement.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we consider multilayered apps (many source repos, different components deployed in more than one platform) to be in scope of this work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the cardinality we currently have in Konveyor, each component on a distributed application would be treated as what we call an application in the inventory. All components of the same distributed application could be related via runtime dependencies and common tags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't they be associated via a migration wave as well or is that the wrong tool for the job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration Waves are thought to break the migration effort into different sprints to enable an iterative approach, so probably not the best tool for that. We discussed in the past the possibility of having the concept of application and components as first class entities, but that would require further changes in the API and UI/UX that I think go beyond the scope of this enhancement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our discussion yesterday it seems they are mostly using stateless apps. With that said it is ok to keep it out of the scope for this work.


- _Platform Type_ (Kubernetes, Cloud Foundry, EAP, WebSphere…).
- _Platform Instance_ (From the list of available Platform Instance entities in the system).
- _Location_: A field section expressing the coordinates of the application inside the associated Platform Instance. Fields on this section will depend on the selected Platform Type, as they will be platform dependent. For example, depending on the platform they could be:
- K8s: Namespace, Service ID…
- EAP: Profile, Server Group…

A read-only _Configuration_ dictionary should also be browsable in the _Application Profile_. For more about _Configuration_ see the [Canonical Configuration model section](#canonical-configuration-model) section.

_Target Platforms_ will be surfaced in the _Application Profile_ as read only data (can't be manually and individually associated to a single application) and inherited from the archetype.

Copy link
Contributor

@jortel jortel Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if user isn't using archetypes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel considering the complexity assigning target platforms for individual applications would bring, I think we can assume that archetypes are a requirement for the moment, and consider other options if requested in the future.

##### Canonical Configuration model


- YAML dictionary containing platform and runtime configuration for a given application.
- Sections are populated by [discovery](#discovery) and analysis.
- Documented way of storing configuration:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to define or do we already have a schema definition for the structure of this dictionary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining the dictionary will be part of the design and implementation.

- Keys are documented and have a precise meaning.
- Similar to Ansible facts, but surfacing different concerns related to the application runtime and platform configuration.
- RBAC protected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to explore this a little more

Is the whole configuration RBAC protected or just some fields. How is the RBAC managed from the hub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment I think it should be something simple along the lines of "only Admins and/or Architects can see the config" considering how RBAC works now. Once we move authorization over to Konveyor itself (as we've discussed several times in the past), I think we'd have something more flexible that would allow users to have a more fine grained control over this.

- Injected in tasks by the hub.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the discovered platform configuration is stored in the hub and addons have access to the inventory, I think the application ID should be sufficient.

note: The model we have been following is: rather than anticipating and injecting everything an addon may need, addons fetch whatever they need.

Copy link
Contributor Author

@rromannissen rromannissen Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel if that's how it's been done so far, sounds good to me! It makes sense for each generator task to take responsibility on retrieving the data on canonical form from the API and then transform it in the format that each templating engine requires (for example a values.yaml file for Helm).


##### Discovery

Discovery should be considered the act of retrieving application information and configuration from a platform. There will be two differentiated scenarios:

- When applications already exist in the inventory and contain _Source Platform_ information (_Platform Type_, _Platform Instance_ and _Location_), users should be able to run discovery and populate the _Configuration_ dictionary. This discovery could be run on a per application basis or at bulk if all selected applications contain _Source Platform_ information.
- On application import, targeting an existing platform with some criteria to look for applications and associated configuration. Criteria would depend on the source platform and their associated [Discovery Providers](#discovery-providers). Some examples depending on potential source platforms could be:
- Kubernetes:
- Namespace patterns (include/exclude)
- Match criteria (service, deployment, route…)
- EAP:
- Server group

##### Discovery Providers

Abstraction layer responsible of collecting configuration around an application on a given platform:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assume network connectivity to a platform/agent and admin level permissions. Is this something we can expect? what should be the process for agent to be deployed/installed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the live connection approach we'll need some valid credentials and network access. Agents will have to be deployed by the infrastructure teams managing the platforms and exposed to the Hub somehow (TBD).

- Live connection via API or similar methods.
- Through the filesystem accessing the path in which the platform is installed (suitable for Application Servers and Servlet containers). This would likely be modeled as an agent deployed on the platform host itself.

Configuration discovery could happen in different stages during the lifecycle of an application to avoid storing sensitive data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the sensitive data different than creds?
Can we store encrypted like creds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel when this was discussed, some folks argued that storing sensitive data such as database credentials coming from, for example, an application server, in yet another place like Konveyor could be considered as a security threat, that's why this passthrough approach was suggested. I guess having the credentials to the platform that stores that sensitive data (the application server in our example) has exactly the exact same threat level, so I'm not sure about this one myself, maybe it's something we could consider in subsequent iterations if requested.

If we were to store this encrypted, I'd consider thinking about an overarching entity like secrets, and then credentials being a type of secret and sensitive data being another.

- *Initial discovery*:
- Configuration dictionary gets populated with non sensitive data. Sensitive data gets redacted or defaults to dummy values.
- *Template instantiation*:
- A second discovery retrieval happens to obtain the sensitive data and inject it in the instantiated templates (the actual generated assets) without storing the data in the Configuration dictionary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a need to protect access to generated assets with sensitive data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very likely, but that would be responsibility of the user generating the assets, meaning that they should take care of storing the assets in a secured repository.


Discovery providers are custom tailored for the particularities of each platform and should be able to differentiate regular configuration from sensitive data.

#### Assets Generation

##### Generators

First class entity in Konveyor to wrap templates. Fields include:
- _Name_
- _Icon_
- _Generator Type_: Will only include Helm for the moment, but in the future we could include other types like Ansible or other templating engines. Generator type will determine the image that gets used to handle the generator task.
- _Description_
- Repository containing the template files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repository type: (Git|Svn)?
What if uses don't want to manage templates in a repository?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel I think that would bring too much unnecessary complexity for a first iteration. Let's stick to templates being managed in a repo and consider other options if requested. I'll add a field for repository type.

- _Repository type_ (Git/SVN)
- _URL_
- _Root Path_
- _Branch_
- _Credentials_
- _Variables_: List of prefixed variables that will be injected on template instantiation (The `helm template` command for example). Variables that match name with the ones coming from the Configuration dictionary will override their value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if variables will come from the discovered configuration and be found in the templates, what is the point of defining them here? The template instantiation could simply resolve any variables found in the templates, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel it's a way to enable users to override certain values that might have been found on the source config and enforce certain configuration values that might have changed between environments (for example, a domain that is different in the target k8s cluster). It's also inspired in Variables from Job Templates in Ansible AWX.

- _Parameters_: List of parameters the user will be asked for when generating assets with this template. Similar to [Surveys](https://ansible.readthedocs.io/projects/awx/en/latest/userguide/job_templates.html#surveys) in Ansible AWX.

##### Archetypes, Target Platforms and Generators
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, what if users are not using archetypes?
Should we support users selecting a generator in the generation wizard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel again, I think we can assume that archetypes are a requirement for the moment, and consider other options if requested in the future.


Multiple _Generators_ can be associated with an _Archetype_ through [_Target Platforms_](https://github.com/konveyor/enhancements/issues/186):
- Foster reusability.
- The generated assets for an archetype would be the product of the union of the instantiation (The `helm template` command for example) of all templates from all _Generators_ associated with that _Archetype_. (For Helm, we could consider leveraging [helmfile](https://github.com/helmfile/helmfile) to achieve this.)

![Archetypes, Target Platforms and Generators](images/archetypes-targetplatforms-generators.png?raw=true "Archetypes, Target Platforms and Generators")

##### Templating engine

In a first iteration, leverage the [Helm templating engine](https://helm.sh/docs/chart_template_guide/functions_and_pipelines/), as it is the lingua franca for Kubernetes related resource definition, although the solution should be open to other technologies in the future, such as Ansible for more complex assets generation.

##### Template instantiation

Template instantiation should be considered the act of injecting values in a template to render the target assets (deployment descriptors, configuration files...). For the Helm use case in this first iteration, the process could be as follows:

- The hub generates a values.yaml file based on the intersection of the _Configuration_ dictionary for the target application, the fixed _Variables_ set in the Generator and the _Parameters_ the user might have provided when requesting the generation, in inverse order of preference (_Parameters_ have top preference over the others, then _Variables_ and finally the _Configuration_ dictionary). That file should also include values inferred from other information stored in the application profile such as tags.
- The values.yaml file is injected by the hub in a _Generator_ task pod that will execute the `helm template` command to render the assets.
Copy link
Member

@savitharaghunathan savitharaghunathan Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to validate the generated manifests/assets or it is out of scope? I have seen this validation step as a part of CI/CD automation and local dev validation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@savitharaghunathan what kind of validation were you thinking about?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Kubernetes, there are tools like these - https://github.com/kubernetes-sigs/kubectl-validate or https://github.com/yannh/kubeconform. For others, may be validate the yaml generated using yamllint or something

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If helm is creating invalid yaml, I don't know what we are the user can do about it at this step in the process. They may have to fix it locally. I would just as soon assume that helm is generating valid yaml, and they have their own steps for making sure it is safe.

What I really don't want, is a small mistake to cause a long process to have to be re-run or for a bug in helm to block a user. If we do add this, we should still allow for users to download the files and fix them locally IMO,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @shawn-hurley, not much we can do on our side.

- The generated assets are then placed in a branch of the repository associated with the application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require giving access for konveyor to write to a repository, which as far as I know, it only needs read access today.

I wonder if being able to download the generated assets from the hub/ui might be a solution worth exploring.

This would allow users to put the files in gitops in an other repo, or just to use locally to test with before commiting. They could even mutate the resource before commiting.

Just something to consider, not tied to it one way or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawn-hurley AFAIK @jortel already has writing to a repository figured out.

Having everything committed to a repo seems cleaner to me, and a user can always make changes in the repo with a clear log of where each of those changes comes from. If we were to allow users to download the files, that would mean it would be difficult to tell which parts came from Konveyor and which ones came from a manual change.

In the end, this is all about organizations being able to enforce standards. If someone wants to override some of those standards, then they should be accountable for that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about pushing a PR or MR? it will be up to repo owners to merge the change. we may not need to have write permission to the repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That implies having to integrate with the different APIs of the most common Git wrappers out there: GitHub, GitLab, Bitbucket, Gitea... That means having not only to implement but also maintaining compatibility with all these APIs over time, which would require a considerable investment. I don't think that is a priority at the moment considering the resources we have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would not necessarily be hard, but it adds a larger support burden than we would like.

We have talked about this offline, and one of the things that we talked about is that this entire flow only works for source applications, not binaries. I think for the first pass, this makes sense, and we can pivot if there are issues that customers bring up. There is no need to boil the ocean if something is working to get in the user's hands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, no requirement for users to see/download the generated templates in UI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortel I'd say not for the moment, only committing to a repository.


![Template Instantiation](images/template-instantiation.png?raw=true "Template Instantiation")

From a UI/UX perspective, when requesting assets generation for a given application, users would be prompted with the following:
- Values for the _Parameters_ configured in the associated _Generator_(s)
- Target repository for the generated assets. Will default to the application repository and the `generated_assets` branch, but could be used to store configuration in a different configuration repository if that the pattern the organization uses:
- _URL_
- _Root Path_
- _Branch_
- _Credentials_
- Option to skip template instantiation and simply copy the charts to the target repository and inject the configuration as a values file. This is helpful when template instantiation is handled by an external orchestrator like a CI/CD pipeline.

##### Repository Augmentation

- Generated assets could be stored in a branch from the target application repository, or if needed, on a separate configuration repository if the application has adopted a GitOps approach to configuration management.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated assets could be stored in a branch. We need to keep in mind that we may have sensitive information added as part of asset generation. I am not sure whether it is a good idea to store those in a repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly how things are done in a full GitOps approach, with configuration for different environments being stored in different configuration repositories with different security level. Nevertheless, I think it might be interesting to add an additional parameter for Template Instantiation to allow the user to prevent sensitive data to be injected in the generated assets.

- Allow architects to seed repositories for migrators to start their work with everything they need to deploy the applications they are working on right away → Ease the change, test, repeat cycle.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "seed repositories"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add everything developers need to start deploying the application in the target platform since the very first minute. If a developer can only interact with the source code to apply changes to adapt the application for the target platform, but is not able to actually deploy the app in there to see if it works, it becomes difficult for them to know when the migration is done, at least to a point in which the organization can test that everything behaves as expected.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment could be done by existing CI/CD infrastructure. We implemented this approach for our customer in a workflow. When move2kube generated dockerfile and manifests we triggered tekton to build image and deploy. We provided place for customers to define how the pipeline should be triggered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the idea, our assets generator leaves the assets in a place where the corporate CI/CD can pick them up and orchestrate the deployment in whatever way they have designed. That last mile, the deployment itself, is delegated to the corporate CI/CD system, Konveyor doesn't have anything to do with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think IIUC @rromannissen, you are saying that nothing is stopping the generator from creating the TektonPipeline but applying and using that pipeline is an exercise left to users outside of Konveyor.

Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawn-hurley that's it!

- Aligned with the Seed work step from the Do work stage in the [Konveyor Unified Experience enhancement](https://github.com/konveyor/enhancements/tree/master/enhancements/unified_experience#step-4-do-work).


### Functional Specification

TBD

### Implementation Details/Notes/Constraints

TBD

### Security, Risks, and Mitigations

TBD

## Design Details

### Test Plan
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mguetta1 @ibragins, @nachandr,
Would you please add here questions/thoughts/ideas on testing?


TBD
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rromannissen , Could you please suggest a one high level end-to-end test for a common use case?
I think that would provide more calcification on the tests should be focused on.


### Upgrade / Downgrade Strategy

TBD

## Implementation History

TBD

## Drawbacks

TBD

## Alternatives

TBD

## Infrastructure Needed

TBD
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.