Process metrics for Linux #7870

kubotat · 2023-08-25T18:08:40Z

Is your feature request related to a problem? Please describe.

The in_process plugin is available today which has capability to check how health a process is. Having process level CPU and Memory metrics metrics in addition to health information is beneficial for the system operation.

Describe the solution you'd like

As far as I research, node_exporter does not support process metrics as of today. So I suggest to develop new plugin which captures process level metrics from /proc//stat. Here is the expected configuration for the plugin: process_name_regex and process_status_regex options give user great flexibilities to control which process name to be captured and reduce the amount of data by cutting off unnecessary metrics.

[INPUT]
    Name  process_metrics_exporter
    scrape_interval  60
    path.procfs  /proc/
    process_name_regex  /fluent-bit/
    process_status_regex  /R/

Describe alternatives you've considered

I considered in_process plugin as an alternative. It helps me to check the health status and Memory metrics but it doesn't capture CPU metrics and doesn't work when users don't know the name of process.

Additional context

None

The text was updated successfully, but these errors were encountered:

cosmo0920 · 2023-08-28T05:43:15Z

The official node_exporter can handle process metrics which provides this feature as a one of the collectors. So, we need to provide it as one of the metrics which is implemented in node_exporter_metrics.
Thus, it needs to be implemented as process metrics.

The configuration for process metrics should be as follows:

[INPUT]
    Name  node_metrics_exporter
    collector.process.scrape_interval  60
    metrics process
    path.procfs  /proc/
    ne.process_name_regex  /fluent-bit/
    ne.process_status_regex  /R/

cosmo0920 · 2023-08-30T09:20:06Z

I already registered a PR for implementing processes metrics which means system level of the statuses of processes and threads on in_node_exporter_metrics here: #7880
So, this feature request should be handled as implementing process or some of the equivalents but it should use the different name of metrics for it.

kubotat · 2023-08-30T17:24:02Z

@cosmo0920 Thanks for your help. I looked into #7880 and confirmed that the system level metrics are captured.

2023-08-30T09:15:10.449031472Z node_process_threads = 1668
2023-08-30T09:15:10.449031472Z node_process_max_threads = 513122
2023-08-30T09:15:10.449031472Z node_process_threads_state{thread_state="R"} = 1
2023-08-30T09:15:10.449031472Z node_process_threads_state{thread_state="S"} = 1558
2023-08-30T09:15:10.449031472Z node_process_threads_state{thread_state="D"} = 0
2023-08-30T09:15:10.449031472Z node_process_threads_state{thread_state="Z"} = 3
2023-08-30T09:15:10.449031472Z node_process_threads_state{thread_state="T"} = 0
2023-08-30T09:15:10.449031472Z node_process_threads_state{thread_state="I"} = 106
2023-08-30T09:15:10.449031472Z node_process_state{state="R"} = 0
2023-08-30T09:15:10.449031472Z node_process_state{state="S"} = 367
2023-08-30T09:15:10.449031472Z node_process_state{state="D"} = 0
2023-08-30T09:15:10.449031472Z node_process_state{state="Z"} = 3
2023-08-30T09:15:10.449031472Z node_process_state{state="T"} = 0
2023-08-30T09:15:10.449031472Z node_process_state{state="I"} = 106
2023-08-30T09:15:10.449031472Z node_process_pids = 476
2023-08-30T09:15:10.449031472Z node_process_max_processes = 4194304

So, this feature request should be handled as implementing process or some of the equivalents but it should use the different name of metrics for it.

Do you mean process level cpu/memory metrics should be discussed in the other PR?

cosmo0920 · 2023-08-31T03:01:08Z

Yes. I wanted to discuss this issue and another PR for process level of metrics.

cosmo0920 · 2023-08-31T03:42:23Z

For the reference, we need to implement process metrics like as: https://github.com/ncabatoff/process-exporter/blob/master/collector/process_collector.go

kubotat · 2023-08-31T05:22:40Z

@cosmo0920 Thanks.

For the reference, we need to implement process metrics like as: https://github.com/ncabatoff/process-exporter/blob/master/collector/process_collector.go

Yes, I was checking the exactly same code:)

Do you think it is possible to implement the feature to scrape top 10 processes at input plugin? or should it be implemented at filter plugin??? I would like to hear your thoughts on it.

cosmo0920 · 2023-08-31T06:20:16Z

I think that scraping for top 10 processes is highly cost to determine the order with traversing procfs. Like as the above link, we should implement it with traversing all of the metrics of the process which are belonging to each of procfs for the processes.
This is because we're going to need to be digging to sort out for the CPU, memory, network bandwidth or other point of views.

For ordering the top of 10 process of the metrics, these should be handled by monitoring solution side.
For instance, Splunk can be displayed with the top of the 10 metrics in each of the graphs. That will be depending on the configurations but as far as I remember, the top 10 of the metrics should be the default.

Another plan is: Perhaps, we need to implement filtering feature for metrics in cmetrics?

patrick-stephens · 2023-08-31T08:02:51Z

Agreed @cosmo0920 plus the choice of top 10/9/8/100 will be arbitrary so should be left to the user to tune what is required.

kubotat · 2023-08-31T21:47:01Z

@cosmo0920 @patrick-stephens Thanks.

For ordering the top of 10 process of the metrics, these should be handled by monitoring solution side.
For instance, Splunk can be displayed with the top of the 10 metrics in each of the graphs. That will be depending on the configurations but as far as I remember, the top 10 of the metrics should be the default.

That makes sense to me.

cosmo0920 · 2023-09-20T07:40:12Z

@kubotat I sent a PR for covering this issue at: #7943

I have a question for your request. Process' status is rapidly changed as I noticed. So, capturing R(running) status has quite timing issues. For now, I dropped the filtering feature of the process' statuses.
Even if with this hard thing, do you need to support for the regex/parameter to filter process statuses?

I mean the Linux process scheduler depends on this parameter for preemption latency: https://elixir.bootlin.com/linux/v6.5.4/source/kernel/sched/fair.c#L72

This could be too small to scrape metrics:
default: 6ms * (1 + ilog(number of online CPUs)) (unit: nanoseconds) vs. 5 seconds (default of the scraping interval)

This means that 3 digits smaller than scrape interval for collecting metrics.

kubotat · 2023-09-21T06:36:28Z

@cosmo0920 Thank you so much for your feedback.

Even if with this hard thing, do you need to support for the regex/parameter to filter process statuses?

Filterting process by statuses is not the mandatory requirement.
Do process_include_pattern and process_exclude_pattern filter the metrics by the process name?

cosmo0920 · 2023-09-21T07:22:33Z

Filterting process by statuses is not the mandatory requirement.
Do process_include_pattern and process_exclude_pattern filter the metrics by the process name?

OK. I understand. And yes, they are already implemented in #7943.

kubotat · 2023-10-16T18:32:17Z

@cosmo0920 Is there any timeline when PR #7943 will be merged into the main branch?

cosmo0920 · 2023-10-17T02:01:00Z

Not sure but we might able to include this feature in 2.2 development cycle...

cosmo0920 added the enhancement label Aug 31, 2023

cosmo0920 mentioned this issue Sep 19, 2023

in_process_exporter_metrics: implement process exporter metrics #7943

Merged

7 tasks

edsiper closed this as completed in #7943 Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process metrics for Linux #7870

Process metrics for Linux #7870

kubotat commented Aug 25, 2023

cosmo0920 commented Aug 28, 2023 •

edited

Loading

cosmo0920 commented Aug 30, 2023 •

edited

Loading

kubotat commented Aug 30, 2023

cosmo0920 commented Aug 31, 2023

cosmo0920 commented Aug 31, 2023

kubotat commented Aug 31, 2023

cosmo0920 commented Aug 31, 2023 •

edited

Loading

patrick-stephens commented Aug 31, 2023

kubotat commented Aug 31, 2023

cosmo0920 commented Sep 20, 2023 •

edited

Loading

kubotat commented Sep 21, 2023

cosmo0920 commented Sep 21, 2023

kubotat commented Oct 16, 2023 •

edited

Loading

cosmo0920 commented Oct 17, 2023 •

edited

Loading

Process metrics for Linux #7870

Process metrics for Linux #7870

Comments

kubotat commented Aug 25, 2023

cosmo0920 commented Aug 28, 2023 • edited Loading

cosmo0920 commented Aug 30, 2023 • edited Loading

kubotat commented Aug 30, 2023

cosmo0920 commented Aug 31, 2023

cosmo0920 commented Aug 31, 2023

kubotat commented Aug 31, 2023

cosmo0920 commented Aug 31, 2023 • edited Loading

patrick-stephens commented Aug 31, 2023

kubotat commented Aug 31, 2023

cosmo0920 commented Sep 20, 2023 • edited Loading

kubotat commented Sep 21, 2023

cosmo0920 commented Sep 21, 2023

kubotat commented Oct 16, 2023 • edited Loading

cosmo0920 commented Oct 17, 2023 • edited Loading

cosmo0920 commented Aug 28, 2023 •

edited

Loading

cosmo0920 commented Aug 30, 2023 •

edited

Loading

cosmo0920 commented Aug 31, 2023 •

edited

Loading

cosmo0920 commented Sep 20, 2023 •

edited

Loading

kubotat commented Oct 16, 2023 •

edited

Loading

cosmo0920 commented Oct 17, 2023 •

edited

Loading