Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue-465 Create a documentation section to use Grafana DataSource with SonataFlow Prometheus metrics #693

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions serverlessworkflow/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,9 @@
*** xref:cloud/operator/service-discovery.adoc[Service Discovery]
*** xref:cloud/operator/using-persistence.adoc[Workflow Persistence]
*** xref:cloud/operator/configuring-workflow-eventing-system.adoc[Workflow Eventing System]
*** Monitoring
**** xref:cloud/operator/monitoring-workflows.adoc[Workflow Monitoring]
**** xref:cloud/operator/sonataflow-metrics.adoc[Prometheus Metrics for Workflows]
// *** xref:cloud/operator/configuring-knative-eventing-resources.adoc[Knative Eventing]
*** xref:cloud/operator/known-issues.adoc[Roadmap and Known Issues]
*** xref:cloud/operator/add-custom-ca-to-a-workflow-pod.adoc[Add Custom CA to Workflow Pod]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
== Overview
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianrongzhang89 this common sonataflow_metrics document is great, many thanks.
My only observation is that the order of occurrence of each metric in the document, is not the same as the one being shown in the initial paragraph, which somehow corresponds with the workflow "natural" life-cycle.

see:
Screenshot from 2024-12-12 13-18-27

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Done!


In {product_name}, you can check the following metrics:

* `kogito_process_instance_started_total`: Number of started workflows.
* `kogito_process_instance_running_total`: Number of running workflows.
* `kogito_process_instance_completed_total`: Number of completed workflows.
* `kogito_process_instance_error`: Number of workflows that report an error.
* `kogito_process_instance_duration_seconds`: Duration of a workflow instance in seconds.
* `kogito_node_instance_duration_milliseconds`: Duration of relevant nodes in milliseconds.
* `sonataflow_input_parameters_counter_total`: Records input parameters, the occurrences of <"param_name","param_value"> per `processId`.

[NOTE]
====
Internally, workflows are referred as processes. Therefore, the `processId` and `processName` are workflow id and name respectively.
====

Each of the metrics mentioned previously contains a label for a specific workflow id. For example, the `kogito_process_instance_completed_total` metric below contains the labels for `callbackstatetimeouts` workflow:

.Example `kogito_process_instance_completed_total` metric
[source,yaml]
----
# HELP kogito_process_instance_completed_total Completed Process Instances
# TYPE kogito_process_instance_completed_total counter
kogito_process_instance_completed_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",process_state="Completed",version="1.0.0-SNAPSHOT",} 3.0
----

[NOTE]
====
Internally, {product_name} uses Quarkus Micrometer extension, which also exposes built-in metrics. You can disable the Micrometer metrics in {product_name}. For more information, see link:https://quarkus.io/guides/micrometer[Quarkus - Micrometer Metrics].
====

== Metrics Description

=== kogito_process_instance_started_total
Count the number of started workflow instances.

[source, yaml]
----
# HELP kogito_process_instance_started_total Started Process Instances
# TYPE kogito_process_instance_started_total counter
kogito_process_instance_started_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 7.0
----

=== kogito_process_instance_running_total
Records the number of running workflow instances.

[NOTE]
====
This includes workflow instances that are in the `Error` state, since the error state is not a terminal state.
Process instances that have reached a terminal status, i.e. `Completed` or `Aborted`, are not present in this metric.
====

[source, yaml]
----
# HELP kogito_process_instance_running_total Running Process Instances
# TYPE kogito_process_instance_running_total gauge
kogito_process_instance_running_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 4.0
----

=== kogito_process_instance_completed_total
Workflow instances that have reached a terminal status, `Aborted` or `Completed`, and thus are considered as completed.

[NOTE]
====
These are the only two terminal status. The `Error` state is not terminal.
Additionally, the metric has the process_state=`Completed`, or could be `Aborted`, to register exactly which of the two terminal status were reached.
====

[source, yaml]
----
# HELP kogito_process_instance_completed_total Completed Process Instances
# TYPE kogito_process_instance_completed_total counter
kogito_process_instance_completed_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",process_state="Completed",version="1.0.0-SNAPSHOT",} 3.0
----

=== kogito_process_instance_error
Records the number of errors that have occurred per processId and error, including the error message.

[source, yaml]
----
# HELP kogito_process_instance_error Number of errors that has occurred
# TYPE kogito_process_instance_error counter
----

=== kogito_process_instance_duration_seconds
Calculates duration of a workflow instance that has reached a terminal state, i.e. `Aborted` or `Completed`. This metric is registered when the process reaches the terminal state.

[source, yaml]
----
# HELP kogito_process_instance_duration_seconds_max Process Instances Duration
# TYPE kogito_process_instance_duration_seconds_max gauge
kogito_process_instance_duration_seconds_max{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 30.0


# HELP kogito_process_instance_duration_seconds Process Instances Duration
# TYPE kogito_process_instance_duration_seconds summary
kogito_process_instance_duration_seconds_count{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 3.0
kogito_process_instance_duration_seconds_sum{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 90.0
----

=== kogito_node_instance_duration_milliseconds
Records the duration of the execution for nodes “relevant” to the workflows. The metric is calculated when a given node has finished executing.

[source, yaml]
----
# HELP kogito_node_instance_duration_milliseconds_max Relevant nodes duration in milliseconds
# TYPE kogito_node_instance_duration_milliseconds_max gauge
kogito_node_instance_duration_milliseconds_max{artifactId="serverless-workflow-project",node_name="CallbackState",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 30014.0


# HELP kogito_node_instance_duration_milliseconds Relevant nodes duration in milliseconds
# TYPE kogito_node_instance_duration_milliseconds summary
kogito_node_instance_duration_milliseconds_count{artifactId="serverless-workflow-project",node_name="CallbackState",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 3.0
kogito_node_instance_duration_milliseconds_sum{artifactId="serverless-workflow-project",node_name="CallbackState",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 90128.0
----

=== sonataflow_input_parameters_counter_total

Records the occurrences of <"param_name", "param_value"> per processId.

[NOTE]
====
Parameters that are json values, or arrays are flattened.
====

[source, yaml]
----
# HELP sonataflow_input_parameters_counter_total Input parameters
# TYPE sonataflow_input_parameters_counter_total counter
sonataflow_input_parameters_counter_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",param_name="name",param_value="walter",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 1.0
sonataflow_input_parameters_counter_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",param_name="surname.sur1",param_value="Medvedeo",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 1.0
sonataflow_input_parameters_counter_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",param_name="name",param_value="bob",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 5.0
sonataflow_input_parameters_counter_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",param_name="surname",param_value="esponja",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 5.0
----
16 changes: 16 additions & 0 deletions serverlessworkflow/modules/ROOT/pages/cloud/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,22 @@ xref:cloud/operator/using-persistence.adoc[]
Learn how to define the workflow `Persistence` field to allow the workflow to store its context
--

[.card]
--
[.card-title]
xref:cloud/operator/monitoring-workflows.adoc[]
[.card-description]
Learn how to configure Prometheus, Grafana and Grafana Dashboard for monitoring of workflow instances
--

[.card]
--
[.card-title]
xref:cloud/operator/monitoring-workflows.adoc[]
[.card-description]
Learn Prometheus metrics for workflow monitoring
--

[.card]
--
[.card-title]
Expand Down
Loading
Loading