Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add taskStatus dimension to service/heartbeat metric #17488

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

zachjsh
Copy link
Contributor

@zachjsh zachjsh commented Nov 18, 2024

Description

Added a taskStatus dimension to the service/heartbeat metric. This can be used to detect cases when a particular task has been reporting its heartbeat in a particular state for an unusually long time; for example if a streaming task is stuck in paused state. Only tasks derived from SeekableStreamIndexTask report their taskStatus with this metric for now, other task types do not.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some suggestions, similar to what had been suggested in an older PR #17268 (comment).

Comment on lines +394 to +395
|Metric|Description| Dimensions |Normal value|
|------|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert these formatting changes.

* @return The status of the task. Note: this interface method is unstable at this time.
*/
@Nullable
default String getStatus()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding a getStatus() method here, I would prefer it if we just cast to SeekableStreamTask in CliPeon.

The Task.run() method runs the task and returns the final status.
We shouldn't have or need access to the status of the task while the task is running.

Moreover, the status being returned here is not the same as the task status.

@@ -308,6 +308,9 @@ public Supplier<Map<String, Object>> heartbeatDimensions(Task task)
builder.put(DruidMetrics.DATASOURCE, task.getDataSource());
builder.put(DruidMetrics.TASK_TYPE, task.getType());
builder.put(DruidMetrics.GROUP_ID, task.getGroupId());
if (task.getStatus() != null) {
builder.put(DruidMetrics.TASK_STATUS, task.getStatus());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use this dimension as it corresponds to the final status of a task which only takes values SUCCESS or FAILURE. We could use DruidMetrics.STATUS or some other dimension instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants