Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Create k8s asset collector #38

Merged
merged 12 commits into from
Mar 23, 2023
Merged

Create k8s asset collector #38

merged 12 commits into from
Mar 23, 2023

Conversation

MichaelKatsoulis
Copy link
Collaborator

This PR creates a k8s asset input which collects the Kubernetes Resources from a cluster and publishes them.

Related issue: #1

@MichaelKatsoulis MichaelKatsoulis marked this pull request as draft February 24, 2023 15:38
@ruflin
Copy link
Contributor

ruflin commented Feb 27, 2023

Are there plans to support asset_types like discussed in #20 ? The reason I bring this up because k8s will have so many different asset types, I wonder if all should be collected on the same period. But could also well be a premature optimisation, so don't hold back on this one.

@MichaelKatsoulis
Copy link
Collaborator Author

Are there plans to support asset_types like discussed in #20 ?

Thanks for bringing this to my attention. I will raise this issue in our upcoming catchup with the rest of the team.
Assets_types is a very good idea for a configuration option and the different period for each type is valid.
Nodes for example do no change that often as a pod or container, so collection period can be different.

}

func collectK8sPods(ctx context.Context, log *logp.Logger, client kubernetes.Interface, publisher stateless.Publisher) error {
pods, err := client.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @MichaelKatsoulis! Keep in mind, this kind of querying might not scale so better to use a "Watcher" approach and on ticker's tick just return what is cached. We have hit this issue at elastic/beats#33307 (and elastic/elastic-agent-autodiscover#31).

Maybe in this way you can also re-use the elastic-agent-autodiscovery library for some parts.

Copy link
Collaborator Author

@MichaelKatsoulis MichaelKatsoulis Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first implementation of the k8s asset collector as part of the Topology initiative. Regarding the watcher approach, I am not sure that these queries would need to happen at very short periods as their main purpose is to gather topology data.
But maybe watchers is the better tested solution and we already use it in autodiscovery. But regarding your second comment as well, the inpurunner may be run by agent in the future but it shouldn't be confused with the kubernetes integration. Its goal is different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds interesting, let's also discuss this in the context of the other asset inputs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watchers' usage example: https://github.com/ChrsMark/k8sdiscovery

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func collectK8sNodes(ctx context.Context, log *logp.Logger, client kubernetes.Interface, publisher stateless.Publisher) error {

// collect the nodes using the client
nodes, err := client.CoreV1().Nodes().List(context.TODO(), metav1.ListOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sth related to my other comment: have we considered what would be the result of querying the k8s API multiple times from multiple places? I'm not sure how this input runner would be integrated with Elastic Agent, but Elastic Agent is already querying the k8s API from a couple (++) of places so I wonder if this is sth really efficient or if we should consider re-using already existing information.

For example the kubernetes provider knows at any given time what are the Pods or Nodes running. So why to perform the very same process from another different place?

@tommyers-elastic
Copy link
Collaborator

hey @MichaelKatsoulis, is this ready for review? let's get it out of draft and merged so we can work on tightening up the k8s<->CSP asset associations. any outstanding issues that can't be quickly resolved, let's just open issues for and address in smaller PRs.

@MichaelKatsoulis MichaelKatsoulis marked this pull request as ready for review March 21, 2023 11:50
Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a PoC looks good. I left some minor cleanup suggestions.

Moving forward however would require to deal with the 2 issues I already mentioned:

  1. Use Informers/caching instead of direct API calls (Create k8s asset collector #38 (comment))
  2. Figure out if information can be re-used since Elastic Agent is already collecting such information (Create k8s asset collector #38 (comment))

I would suggest filing a follow up issue for these to not forget.

deploy/inputrunner-kubernetes-manifest.yml Outdated Show resolved Hide resolved
input/assets/k8s/assets_k8s.go Show resolved Hide resolved
return &assetsK8s{cfg}, nil
}

type config struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move structs definitions to the top.

input/assets/k8s/assets_k8s.go Outdated Show resolved Hide resolved
input/assets/k8s/assets_k8s.go Outdated Show resolved Hide resolved
@MichaelKatsoulis
Copy link
Collaborator Author

  1. Use Informers/caching instead of direct API calls (#38 (comment))

Yes this will come in follow up PR

  1. Figure out if information can be re-used since Elastic Agent is already collecting such information (#38 (comment))

I am not sure I follow. This input is not run by agent for now. So how can I use information collected by agent?

@ChrsMark
Copy link
Member

  1. Figure out if information can be re-used since Elastic Agent is already collecting such information (#38 (comment))

I am not sure I follow. This input is not run by agent for now. So how can I use information collected by agent?

Well this is a high level architectural comment mostly.
I'm not fully aware of how these inputs would be running but if those will be running along with Agent then we will have the same data being collected from 2 different places.
If these inputs are completely independent of Agent then we cannot do a lot to re-use the information.

@MichaelKatsoulis
Copy link
Collaborator Author

Well this is a high level architectural comment mostly. I'm not fully aware of how these inputs would be running but if those will be running along with Agent then we will have the same data being collected from 2 different places. If these inputs are completely independent of Agent then we cannot do a lot to re-use the information.

I keep your comment as it is a valid concern. When everything is more clear on how things will run together, we need to revisit such things.

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@MichaelKatsoulis MichaelKatsoulis added this pull request to the merge queue Mar 23, 2023
@MichaelKatsoulis MichaelKatsoulis merged commit 361c461 into main Mar 23, 2023
@dmathieu dmathieu deleted the asset_k8s-input branch March 23, 2023 09:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants