-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue-465 Create a documentation section to use Grafana DataSource with SonataFlow Prometheus metrics #693
base: main
Are you sure you want to change the base?
Conversation
dd9e4a9
to
1537dee
Compare
@jianrongzhang89 can you please take a look on CI? |
3250a02
to
d95b765
Compare
🎊 PR Preview ea1866d has been successfully built and deployed. See the documentation preview: https://sonataflow-docs-preview-pr-693.surge.sh |
@ricardozanini fixed CI errors. |
…ith SonataFlow Prometheus metrics
d95b765
to
79a31d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
image::cloud/operator/monitoring/grafana-dashboard-example.png[] | ||
|
||
=== Customize or build your own dashboard | ||
You can customize or build your own dashboard. For more information, see xref:https://grafana.com/docs/grafana/latest/dashboards[Grafana Dashboards] and xref:cloud/operator/sonataflow-metrics.adoc[SonataFlow Metrics]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link: xref:https://grafana.com/docs/grafana/latest/dashboards[Grafana Dashboards] is not working.
I think that for external links you must use the link: tag instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
== Additional resources | ||
|
||
* xref:cloud/operator/sonataflow-metrics.adoc[SonataFlow Metrics] | ||
* xref:https://grafana.com/docs/grafana/latest/dashboards[Grafana Dashboards] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, non working link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
secureJsonData: | ||
httpHeaderValue1: 'Bearer ${TOKEN}' | ||
name: Prometheus | ||
url: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, access to metrics is in the end "protected", and can be accessed only if we give the cluster-monitoring-view to grafana, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct: the metrics are exposed to Prometheus but are not available in Grafana.
@@ -0,0 +1,99 @@ | |||
= SonataFlow Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this page is not shown in the menu Cloud -> Operator or in any other menu entry.
On the other hand, we have a sort of metrics page that shows the metrics and that I personally don't like 100%.
I think that what we must do, is to do a kind of merge between this metrics content and what is shown in the page below, and provide something good.
But , it's out of from @jianrongzhang89 scope.
Feel free to merge as is, to not loose this content and we can restructure in a followup PR.
@ricardozanini @domhanak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ricardozanini @wmedvede @domhanak I tried to merge them. Please review.
Would you mind check the procedure for regular Kubernetes clusters? @domhanak |
eb1bac5
to
4892c25
Compare
@@ -0,0 +1,134 @@ | |||
== Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianrongzhang89 this common sonataflow_metrics document is great, many thanks.
My only observation is that the order of occurrence of each metric in the document, is not the same as the one being shown in the initial paragraph, which somehow corresponds with the workflow "natural" life-cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. Done!
|
||
== Metrics Description | ||
=== kogito_process_instance_completed_total | ||
Workflow instances that have reached a terminal status, “Aborted” or “Completed”, and thus are considered as completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workflow instances that have reached a terminal status, “Aborted” or “Completed”, and thus are considered as completed. | |
Workflow instances that have reached a terminal status, `Aborted` or `Completed`, and thus are considered as completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted
|
||
[NOTE] | ||
==== | ||
These are the only two terminal status. The “Error” state is not terminal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the only two terminal status. The “Error” state is not terminal. | |
These are the only two terminal status. The `Error` state is not terminal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted
|
||
[NOTE] | ||
==== | ||
This includes workflow instances that are in the "Error" state, since the error state is not a terminal state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This includes workflow instances that are in the "Error" state, since the error state is not a terminal state. | |
This includes workflow instances that are in the `Error` state, since the error state is not a terminal state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted
[NOTE] | ||
==== | ||
This includes workflow instances that are in the "Error" state, since the error state is not a terminal state. | ||
Process instances that have reached a terminal status, i.e. "Completed" or "Aborted", are not present in this metric. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Process instances that have reached a terminal status, i.e. "Completed" or "Aborted", are not present in this metric. | |
Process instances that have reached a terminal status, i.e. `Completed` or `Aborted`, are not present in this metric. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted
---- | ||
|
||
=== kogito_process_instance_duration_seconds | ||
Calculates duration of a workflow instance that has reached a terminal state,, i.e. "Aborted" or "Completed". This metric is registered when the process reaches the terminal state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculates duration of a workflow instance that has reached a terminal state,, i.e. "Aborted" or "Completed". This metric is registered when the process reaches the terminal state. | |
Calculates duration of a workflow instance that has reached a terminal state, i.e. `Aborted` or `Completed`. This metric is registered when the process reaches the terminal state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted
= Monitoring Workflows | ||
:compat-mode!: | ||
// Metadata: | ||
:description: Workflows monitoring configuration configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:description: Workflows monitoring configuration configuration | |
:description: Workflows monitoring configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks, @jianrongzhang89. This documentation seems good. Thanks, @wmedvede, for verifying the steps in the cluster!
@kaldesai mind taking a look too? |
4892c25
to
c406023
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jianrongzhang89 , I couldn't evict adding some more nitpicks when re-reading 😄
|
||
In {product_name}, you can check the following metrics: | ||
|
||
* `kogito_process_instance_started_total`: Number of started workflows (a workflow that has started might be running or completed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `kogito_process_instance_started_total`: Number of started workflows (a workflow that has started might be running or completed) | |
* `kogito_process_instance_started_total`: Number of started workflows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
* `kogito_process_instance_started_total`: Number of started workflows (a workflow that has started might be running or completed) | ||
* `kogito_process_instance_running_total`: Number of running workflows | ||
* `kogito_process_instance_completed_total`: Number of completed workflows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `kogito_process_instance_completed_total`: Number of completed workflows | |
* `kogito_process_instance_completed_total`: Number of completed workflows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
* `kogito_process_instance_started_total`: Number of started workflows (a workflow that has started might be running or completed) | ||
* `kogito_process_instance_running_total`: Number of running workflows | ||
* `kogito_process_instance_completed_total`: Number of completed workflows | ||
* `kogito_process_instance_error`: Number of workflows that report an error ( a workflow with an error might be still running or have been completed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `kogito_process_instance_error`: Number of workflows that report an error ( a workflow with an error might be still running or have been completed) | |
* `kogito_process_instance_error`: Number of workflows that report an error. |
* `kogito_process_instance_running_total`: Number of running workflows | ||
* `kogito_process_instance_completed_total`: Number of completed workflows | ||
* `kogito_process_instance_error`: Number of workflows that report an error ( a workflow with an error might be still running or have been completed) | ||
* `kogito_process_instance_duration_seconds`: Duration of a process instance in seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `kogito_process_instance_duration_seconds`: Duration of a process instance in seconds | |
* `kogito_process_instance_duration_seconds`: Duration of a workflow instance in seconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* `kogito_process_instance_completed_total`: Number of completed workflows | ||
* `kogito_process_instance_error`: Number of workflows that report an error ( a workflow with an error might be still running or have been completed) | ||
* `kogito_process_instance_duration_seconds`: Duration of a process instance in seconds | ||
* `kogito_node_instance_duration_milliseconds`: Duration of relevant nodes in milliseconds (a workflow is composed by nodes, user might be interested on the time consumed by an specific node type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `kogito_node_instance_duration_milliseconds`: Duration of relevant nodes in milliseconds (a workflow is composed by nodes, user might be interested on the time consumed by an specific node type) | |
* `kogito_node_instance_duration_milliseconds`: Duration of relevant nodes in milliseconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
[NOTE] | ||
==== | ||
Internally, workflows are referred as processes. Therefore, the `processId` and `processName` is workflow ID and name respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally, workflows are referred as processes. Therefore, the `processId` and `processName` is workflow ID and name respectively. | |
Internally, workflows are referred as processes. Therefore, the `processId` and `processName` are workflow id and name respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Internally, workflows are referred as processes. Therefore, the `processId` and `processName` is workflow ID and name respectively. | ||
==== | ||
|
||
Each of the metrics mentioned previously contains a label for a specific workflow ID. For example, the `kogito_process_instance_completed_total` metric below contains the labels for `callbackstatetimeouts` workflow: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each of the metrics mentioned previously contains a label for a specific workflow ID. For example, the `kogito_process_instance_completed_total` metric below contains the labels for `callbackstatetimeouts` workflow: | |
Each of the metrics mentioned previously contains a label for a specific workflow id. For example, the `kogito_process_instance_completed_total` metric below contains the labels for `callbackstatetimeouts` workflow: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
---- | ||
|
||
=== kogito_process_instance_duration_seconds | ||
Calculates duration of a workflow instance that has reached a terminal state,, i.e. `Aborted` or `Completed`. This metric is registered when the process reaches the terminal state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculates duration of a workflow instance that has reached a terminal state,, i.e. `Aborted` or `Completed`. This metric is registered when the process reaches the terminal state. | |
Calculates duration of a workflow instance that has reached a terminal state, i.e. `Aborted` or `Completed`. This metric is registered when the process reaches the terminal state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
* `kogito_process_instance_error`: Number of workflows that report an error ( a workflow with an error might be still running or have been completed) | ||
* `kogito_process_instance_duration_seconds`: Duration of a process instance in seconds | ||
* `kogito_node_instance_duration_milliseconds`: Duration of relevant nodes in milliseconds (a workflow is composed by nodes, user might be interested on the time consumed by an specific node type) | ||
* `sonataflow_input_parameters_counter`: Records input parameters, the occurrences of <"param_name","param_value"> per `processId`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `sonataflow_input_parameters_counter`: Records input parameters, the occurrences of <"param_name","param_value"> per `processId`. | |
* `sonataflow_input_parameters_counter_total`: Records input parameters, the occurrences of <"param_name","param_value"> per `processId`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
…ith SonataFlow Prometheus metrics: address review comments
c406023
to
1a39e31
Compare
Fix apache/incubator-kie-kogito-serverless-operator#465
Update the document to include Prometheus and Grafana installation, and Grafana Data Source congfiguration and import the default dashboard.
Issue-XYZ Subject
[0.9.x] Issue-XYZ Subject