Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Assets generation and Platform Awareness enhancement #210

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rromannissen
Copy link
Contributor

Since its first release, the insights that Konveyor could gather from a given application were either coming from the source code from the application itself (analysis), or from information provided by the different stakeholders involved in the management of the application lifecycle (assessment). This enhancement proposes a third way of surfacing insights about an application by gathering both runtime and deployment configuration from the very platform in which the application is running (discovery), and storing that configuration in a canonical model that can be leveraged by different Konveyor modules or addons.

Aside from that, the support that Konveyor provided for the migration process stopped when the application source code was modified for the target platform, leaving the application itself ready to be deployed but without the required assets to get it actually deployed in the target platform. For example, for an application to be deployed in Kubernetes, it is not only necessary to adapt the application source code to run in containers, but it is also necessary to have deployment manifests that define how that application can be deployed in a cluster, a Containerfile to build the image and potentially some runtime configuration files. This enhancement proposes a way to automate the generation of those assets by leveraging the configuration and insights gathered by Konveyor.


- Should there be a dynamic way of registering Platform Types, Discovery Providers and Generator Types? Should that be managed by CRs or could there be an additional mechanism? That would imply adding some dynamic behavior on the UI to render the different field associated with each of them.
- How can we store sensitive data retrieved by the Discovery Providers?
- How could we handle the same file being rendered by two different _Generators_ (charts)? Is there a way to calculate the intersection of two different Helm charts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could we handle the same file being rendered by two different Generators (charts)?

using different relase name for each generator can be an idea?
One approach may be to use a different release name for each Generator. WDYT?

Is there a way to calculate the intersection of two different Helm charts

Not aware of a way to intersect, maybe the closest is to use dependency managment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wouldn't be using the Helm release concept, as I wouldn't expect the asset generator to have any direct contact with a k8s cluster (that would be something more for a CI/CD pipeline). We are mostly using Helm to render assets via the helm template command.

##### Repository Augmentation

- Generated assets could be stored in a branch from the target application repository, or if needed, on a separate configuration repository if the application has adopted a GitOps approach to configuration management.
- Allow architects to seed repositories for migrators to start their work with everything they need to deploy the applications they are working on right away → Ease the change, test, repeat cycle.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "seed repositories"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add everything developers need to start deploying the application in the target platform since the very first minute. If a developer can only interact with the source code to apply changes to adapt the application for the target platform, but is not able to actually deploy the app in there to see if it works, it becomes difficult for them to know when the migration is done, at least to a point in which the organization can test that everything behaves as expected.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment could be done by existing CI/CD infrastructure. We implemented this approach for our customer in a workflow. When move2kube generated dockerfile and manifests we triggered tekton to build image and deploy. We provided place for customers to define how the pipeline should be triggered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the idea, our assets generator leaves the assets in a place where the corporate CI/CD can pick them up and orchestrate the deployment in whatever way they have designed. That last mile, the deployment itself, is delegated to the corporate CI/CD system, Konveyor doesn't have anything to do with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think IIUC @rromannissen, you are saying that nothing is stopping the generator from creating the TektonPipeline but applying and using that pipeline is an exercise left to users outside of Konveyor.

Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawn-hurley that's it!


- Should there be a dynamic way of registering Platform Types, Discovery Providers and Generator Types? Should that be managed by CRs or could there be an additional mechanism? That would imply adding some dynamic behavior on the UI to render the different field associated with each of them.
- How can we store sensitive data retrieved by the Discovery Providers?
- How could we handle the same file being rendered by two different _Generators_ (charts)? Is there a way to calculate the intersection of two different Helm charts?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the open question how you can layer the file changes on top of a each other, or merge them together, so that the generators work together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the OpenShift generator (chart) generates a Deployment.yaml and the EAP on OpenShift generator (chart) generates a different Deployment.yaml, how can we merge them? It just came to my mind that we could establish an explicit order of preference when assigning Generators to a Target Platform, so if some resources (files) overlap, the ones with the top preference override the others. That would mean no file merging, but the end result would be a composition (should we call this merge?) of the files rendered by all generators.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe missing some context here but my understanding is that we would have one more more generators (configured by the users) which may provide one or more ways to deploy the same app. In my opinion we should not merge anything and provide generated manifests (in different folders) per user request and let user decide what to do about duplicity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense for the first pass. I believe this could get cumbersome, but waiting for an actual user pain makes sense to me.

- Documented way of storing configuration:
- Keys are documented and have a precise meaning.
- Similar to Ansible facts, but surfacing different concerns related to the application runtime and platform configuration.
- RBAC protected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to explore this a little more

Is the whole configuration RBAC protected or just some fields. How is the RBAC managed from the hub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment I think it should be something simple along the lines of "only Admins and/or Architects can see the config" considering how RBAC works now. Once we move authorization over to Konveyor itself (as we've discussed several times in the past), I think we'd have something more flexible that would allow users to have a more fine grained control over this.


- The hub generates a values.yaml file based on the intersection of the _Configuration_ dictionary for the target application, the fixed _Variables_ set in the Generator and the _Parameters_ the user might have provided when requesting the generation, in inverse order of preference (_Parameters_ have top preference over the others, then _Variables_ and finally the _Configuration_ dictionary). That file should also include values inferred from other information stored in the application profile such as tags.
- The values.yaml file is injected by the hub in a _Generator_ task pod that will execute the `helm template` command to render the assets.
- The generated assets are then placed in a branch of the repository associated with the application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require giving access for konveyor to write to a repository, which as far as I know, it only needs read access today.

I wonder if being able to download the generated assets from the hub/ui might be a solution worth exploring.

This would allow users to put the files in gitops in an other repo, or just to use locally to test with before commiting. They could even mutate the resource before commiting.

Just something to consider, not tied to it one way or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawn-hurley AFAIK @jortel already has writing to a repository figured out.

Having everything committed to a repo seems cleaner to me, and a user can always make changes in the repo with a clear log of where each of those changes comes from. If we were to allow users to download the files, that would mean it would be difficult to tell which parts came from Konveyor and which ones came from a manual change.

In the end, this is all about organizations being able to enforce standards. If someone wants to override some of those standards, then they should be accountable for that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about pushing a PR or MR? it will be up to repo owners to merge the change. we may not need to have write permission to the repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That implies having to integrate with the different APIs of the most common Git wrappers out there: GitHub, GitLab, Bitbucket, Gitea... That means having not only to implement but also maintaining compatibility with all these APIs over time, which would require a considerable investment. I don't think that is a priority at the moment considering the resources we have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would not necessarily be hard, but it adds a larger support burden than we would like.

We have talked about this offline, and one of the things that we talked about is that this entire flow only works for source applications, not binaries. I think for the first pass, this makes sense, and we can pivot if there are issues that customers bring up. There is no need to boil the ocean if something is working to get in the user's hands.


There will be platform related fields in the Application entity. These fields should be considered optional, as applications can still be managed manually or via the CSV import without the need for awareness of the source platform.

A _Source Platform_ section (similar to the Source Code and Binary sections) should be included in the _Application Profile_, including the following fields:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rromannissen is it possible to have multiple source platforms for a particular application? If we think of the current functionality around cloning the source from a git repo as a "platform" (git repository, there is an api, we populate the source code from this info......and "analyze" phase is strictly the static code analysis) then we'd definitely need multiple. maybe if there is an EAP app on k8s then you would have the two different source platforms? each responsible for their own details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code retrieval remains part of the analysis phase, as repositories are not a platform in which the application is deployed, but rather a place where the application source code is stored. The analysis process should be able to surface configuration though, and I think that we should (and can) leverage analysis findings (probably coming from Insights) to populate the configuration dictionary, aside from the technology tags to automate archetype association as we do now. That should remain independent from the discovery process for different platforms described in this document.

For a "compound" scenario like EAP on K8s, I imagine having a dedicated discovery provider that can handle the specifics of that situation and be able to retrieve information for both the k8s objects and the EAP configuration. Bear in mind that using a vanilla EAP discovery provider would not work for an EAP on k8s scenario, as some (if not all) of the EAP management APIs are disabled in the image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I could add to that, we would probably have to start scanning container layers at that point to get the information out of them. This is not impossible, and there are many ways to do this, but it is not something that we have implemented.

We should also consider this, but I think it is outside the scope of this enhancement.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we consider multilayered apps (many source repos, different components deployed in more than one platform) to be in scope of this work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the cardinality we currently have in Konveyor, each component on a distributed application would be treated as what we call an application in the inventory. All components of the same distributed application could be related via runtime dependencies and common tags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't they be associated via a migration wave as well or is that the wrong tool for the job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration Waves are thought to break the migration effort into different sprints to enable an iterative approach, so probably not the best tool for that. We discussed in the past the possibility of having the concept of application and components as first class entities, but that would require further changes in the API and UI/UX that I think go beyond the scope of this enhancement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our discussion yesterday it seems they are mostly using stateless apps. With that said it is ok to keep it out of the scope for this work.


## Open Questions

- Should there be a dynamic way of registering Platform Types, Discovery Providers and Generator Types? Should that be managed by CRs or could there be an additional mechanism? That would imply adding some dynamic behavior on the UI to render the different field associated with each of them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic behavior on the UI is solved by products like OCP with frontend plugins for different operators on plugins in RHDH (backstage). I think the question should be: "which mechanism would be a good match for existing architecture"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are just doing dynamic fields, I think it would make sense to just focus on that, as that is a much more constrained problem (read the open API spec for a "thing" to determine the type have a the right field for that type). Having a full front end plugin system is hard IMO and if we don't need that we shouldn't focus on it IMO.

In the future, we may but I think we should do that work when it becomes an acute problem users are feeling.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with limiting the scope. Based on this open question it was not clear to me what we intent to provide. Still "just" dynamic fields may grow beyond our initial design.


- Should there be a dynamic way of registering Platform Types, Discovery Providers and Generator Types? Should that be managed by CRs or could there be an additional mechanism? That would imply adding some dynamic behavior on the UI to render the different field associated with each of them.
- How can we store sensitive data retrieved by the Discovery Providers?
- How could we handle the same file being rendered by two different _Generators_ (charts)? Is there a way to calculate the intersection of two different Helm charts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe missing some context here but my understanding is that we would have one more more generators (configured by the users) which may provide one or more ways to deploy the same app. In my opinion we should not merge anything and provide generated manifests (in different folders) per user request and let user decide what to do about duplicity.

- Hypervisors and VMs.
- Others...
- Assets generation:
- Flexible enough to generate all assets required to deploy an application on k8s (and potentially other platforms in the future)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The applications may be multilayered with complex deployments running in different platforms like stateless web service (PCF) and a db or cache (vm). Should we limit ourselves to only parts of the app or attempt to generate all the deployment assets? Depending on our choices we may or may not need to think about network layout and corresponding manifests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How deep we go would totally depend on the Discovery Provider logic and the Helm charts (and potentially other templating technologies in the future) associated with the generator for the target platform. The goal is to provide a framework to enable us and users to do this in a structured way.


## Proposal

### Personas / Actors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we see a place for SRE or platform engineering in this effort?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is something I would consider once we expose this functionality via the Backstage plugin.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised this question since the original intention was to connect to source runtime as well as make sure it will run in target runtime without issues. This clearly requires work from SRE to configure access, CI/CD etc. although based on our discussion with customer we know it is not the highest priority atm.


There will be platform related fields in the Application entity. These fields should be considered optional, as applications can still be managed manually or via the CSV import without the need for awareness of the source platform.

A _Source Platform_ section (similar to the Source Code and Binary sections) should be included in the _Application Profile_, including the following fields:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we consider multilayered apps (many source repos, different components deployed in more than one platform) to be in scope of this work?


##### Discovery Providers

Abstraction layer responsible of collecting configuration around an application on a given platform:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assume network connectivity to a platform/agent and admin level permissions. Is this something we can expect? what should be the process for agent to be deployed/installed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the live connection approach we'll need some valid credentials and network access. Agents will have to be deployed by the infrastructure teams managing the platforms and exposed to the Hub somehow (TBD).

- *Initial discovery*:
- Configuration dictionary gets populated with non sensitive data. Sensitive data gets redacted or defaults to dummy values.
- *Template instantiation*:
- A second discovery retrieval happens to obtain the sensitive data and inject it in the instantiated templates (the actual generated assets) without storing the data in the Configuration dictionary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a need to protect access to generated assets with sensitive data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very likely, but that would be responsibility of the user generating the assets, meaning that they should take care of storing the assets in a secured repository.


- The hub generates a values.yaml file based on the intersection of the _Configuration_ dictionary for the target application, the fixed _Variables_ set in the Generator and the _Parameters_ the user might have provided when requesting the generation, in inverse order of preference (_Parameters_ have top preference over the others, then _Variables_ and finally the _Configuration_ dictionary). That file should also include values inferred from other information stored in the application profile such as tags.
- The values.yaml file is injected by the hub in a _Generator_ task pod that will execute the `helm template` command to render the assets.
- The generated assets are then placed in a branch of the repository associated with the application.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about pushing a PR or MR? it will be up to repo owners to merge the change. we may not need to have write permission to the repository.


##### Repository Augmentation

- Generated assets could be stored in a branch from the target application repository, or if needed, on a separate configuration repository if the application has adopted a GitOps approach to configuration management.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated assets could be stored in a branch. We need to keep in mind that we may have sensitive information added as part of asset generation. I am not sure whether it is a good idea to store those in a repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly how things are done in a full GitOps approach, with configuration for different environments being stored in different configuration repositories with different security level. Nevertheless, I think it might be interesting to add an additional parameter for Template Instantiation to allow the user to prevent sensitive data to be injected in the generated assets.

##### Repository Augmentation

- Generated assets could be stored in a branch from the target application repository, or if needed, on a separate configuration repository if the application has adopted a GitOps approach to configuration management.
- Allow architects to seed repositories for migrators to start their work with everything they need to deploy the applications they are working on right away → Ease the change, test, repeat cycle.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment could be done by existing CI/CD infrastructure. We implemented this approach for our customer in a workflow. When move2kube generated dockerfile and manifests we triggered tekton to build image and deploy. We provided place for customers to define how the pipeline should be triggered.

- Hypervisors and VMs.
- Others...
- Assets generation:
- Flexible enough to generate all assets required to deploy an application on k8s (and potentially other platforms in the future)
Copy link

@istein1 istein1 Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be all sort of assets, and it might be tricky to provide any kind of it.
In case the discovery detects an asset Konveyor doesn't have in it's arsenal,
would the option to ask the user to provide that asset source, so that Konveyor could generate it can be considered?
Or maybe I'm getting this wrong and Konveyor is good with any asset, it would propagate it into a helm chart and then the CI/CD will handle this asset is installed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discovery providers will discover configuration, have it stored in canonical form, and then the generators will generate assets for different target platforms. Considering we will be in control of the discovery providers and generators we ship out of the box, we should put special care on coordinating them to tackle meaningful migration paths such as Cloud Foundry to Kubernetes (meaning shipping a CF discovery provider and a default Kubernetes generator).

- Managed in the administration perspective
- Potential fields:
- Name
- Platform Type (Kubernetes, Cloud Foundry, EAP, WebSphere…)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this list contain all the supported platforms?
Asking in terms design and infra needed to test this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are just examples. In a first iteration we should focus on Kubernetes and Cloud Foundry.


### Test Plan

TBD
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rromannissen , Could you please suggest a one high level end-to-end test for a common use case?
I think that would provide more calcification on the tests should be focused on.


## Design Details

### Test Plan
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mguetta1 @ibragins, @nachandr,
Would you please add here questions/thoughts/ideas on testing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants