Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #525: Add job duration metrics to connectors #528

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1209,6 +1209,81 @@ hw.status{state="degraded"} 1

In this case, only the `degraded` state is reported, and the zero values for `ok` and `failed` are suppressed after the initial state transition.

## Self-Monitoring

**MetricsHub** includes **self-monitoring capabilities** to track its own performance. This feature can monitor key aspects such **job duration metrics**.

### Configuration: `enableSelfMonitoring`

This configuration controls whether **MetricsHub** reports internal signals such as job duration metrics.

#### Supported Values

- `true` (default): Enables self-monitoring capabilities.
- `false`: Disables self-monitoring capabilities.

#### Configuration Scopes

You can configure `enableSelfMonitoring` at the following levels:

1. **Global Configuration**
Applies to all monitored resources.

```yaml
enableSelfMonitoring: true # Set to "false" to disable
resourceGroups: ...
```

2. **Per Resource Group**
Applies to all resources within a specific group.

```yaml
resourceGroups:
<resource-group-name>:
enableSelfMonitoring: true # Set to "false" to disable
resources: ...
```

3. **Per Resource**
Applies to an individual resource.

```yaml
resourceGroups:
<resource-group-name>:
resources:
<resource-id>:
enableSelfMonitoring: true # Set to "false" to disable
```

### Examples of Self-Monitoring Metrics

When enabled, **MetricsHub** reports the `metricshub.job.duration` metrics, for example:

```
metricshub.job.duration{job.type="discovery", monitor.type="enclosure", connector_id="HPEGen10IloREST"} 0.020
metricshub.job.duration{job.type="discovery", monitor.type="cpu", connector_id="HPEGen10IloREST"} 0.030
metricshub.job.duration{job.type="discovery", monitor.type="temperature", connector_id="HPEGen10IloREST"} 0.025
metricshub.job.duration{job.type="discovery", monitor.type="connector", connector_id="HPEGen10IloREST"} 0.015
metricshub.job.duration{job.type="collect", monitor.type="cpu", connector_id="HPEGen10IloREST"} 0.015
```

Where:
- **`job.type`**: Specifies the type of operation performed by MetricsHub.
- Possible values:
- `discovery`: Identifies and registers components.
- `collect`: Gathers telemetry data from the monitored components.
- `simple`: Executes a single straightforward task.
- `beforeAll` or `afterAll`: Runs preparatory or cleanup operations.
- **`monitor.type`**: Indicates the specific category of component being monitored.
- Examples:
- Hardware components like `cpu`, `memory`, `physical_disk`, or `disk_controller`.
- Environmental metrics like `temperature` or `battery`.
- Logical entities like `connector`.
- **`connector_id`**: The unique identifier of the connector defining the method and protocol to collect metrics for the specified component.
- Example: `"HPEGen10IloREST"` denotes the HPE Gen10 iLO REST connector.

These metrics provide granular insights into task execution times, enabling the identification of bottlenecks or inefficiencies and helping optimize monitoring performance.

#### Timeout, duration and period format

Timeouts, durations and periods are specified with the below format:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ public class Connector implements Serializable {
private Map<String, Source> afterAll = new HashMap<>();

/**
* Map of monitor jobs, where each key is the name of the monitor job and the value is its definition.
* Map of monitor jobs, where each key is the name of the monitor type and the value is the monitor job instance.
*/
@Default
private Map<String, MonitorJob> monitors = new LinkedHashMap<>();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,11 +94,18 @@ protected AbstractAllAtOnceStrategy(
/**
* This method processes each connector
*
* @param currentConnector
* @param hostname
* @param currentConnector The current connector
* @param hostname The host name
*/
private void process(final Connector currentConnector, final String hostname) {
if (!validateConnectorDetectionCriteria(currentConnector, hostname)) {
// Check whether the strategy job name matches at least one of the monitor jobs names of the current connector
final boolean connectorHasExpectedJobTypes = hasExpectedJobTypes(currentConnector, getJobName());
// If the connector doesn't define any monitor job that matches the given strategy job name, log a message then exit the current discovery or simple operation
if (!connectorHasExpectedJobTypes) {
log.debug("Connector doesn't define any monitor job of type {}.", getJobName());
return;
}
if (!validateConnectorDetectionCriteria(currentConnector, hostname, getJobName())) {
log.error(
"Hostname {} - The connector {} no longer matches the host. Stopping the connector's {} job.",
hostname,
Expand Down Expand Up @@ -255,13 +262,8 @@ private void processMonitorJob(

processSameTypeMonitors(currentConnector, mapping, monitorType, hostname, monitorJob);
final long jobEndTime = System.currentTimeMillis();
setJobDurationMetricInHostMonitor(
getJobName(),
monitorType,
currentConnector.getCompiledFilename(),
jobStartTime,
jobEndTime
);
// Set the job duration metric in the host monitor
setJobDurationMetric(getJobName(), monitorType, currentConnector.getCompiledFilename(), jobStartTime, jobEndTime);
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@
import org.sentrysoftware.metricshub.engine.common.helpers.TextTableHelper;
import org.sentrysoftware.metricshub.engine.configuration.HostConfiguration;
import org.sentrysoftware.metricshub.engine.connector.model.Connector;
import org.sentrysoftware.metricshub.engine.connector.model.monitor.SimpleMonitorJob;
import org.sentrysoftware.metricshub.engine.connector.model.monitor.StandardMonitorJob;
import org.sentrysoftware.metricshub.engine.connector.model.monitor.task.source.Source;
import org.sentrysoftware.metricshub.engine.connector.model.monitor.task.source.compute.Compute;
import org.sentrysoftware.metricshub.engine.extension.ExtensionManager;
Expand Down Expand Up @@ -400,17 +402,59 @@ public long getStrategyTimeout() {
return telemetryManager.getHostConfiguration().getStrategyTimeout();
}

/**
* Determines if the given strategy job name matches any monitor job in the connector.
* Matching is case-insensitive and based on the job type and its components.
* Supported strategy job names:
* - "discovery": Matches a {@link StandardMonitorJob} with a non-null discovery component.
* - "collect": Matches a {@link StandardMonitorJob} with a non-null collect component.
* - "simple": Matches a {@link SimpleMonitorJob} with a non-null simple component.
*
* @param currentConnector the connector containing monitor jobs
* @param strategyJobName the strategy job name to check (case-insensitive)
* @return {@code true} if a monitor job matches the strategy, {@code false} otherwise
*/
protected boolean hasExpectedJobTypes(final Connector currentConnector, final String strategyJobName) {
if (currentConnector == null || currentConnector.getMonitors() == null) {
return false;
}

return currentConnector
.getMonitors()
.values()
.stream()
.anyMatch(monitorJob -> {
switch (strategyJobName.toLowerCase()) {
case "discovery":
return monitorJob instanceof StandardMonitorJob standardJob && standardJob.getDiscovery() != null;
case "collect":
return monitorJob instanceof StandardMonitorJob standardJob && standardJob.getCollect() != null;
case "simple":
return monitorJob instanceof SimpleMonitorJob simpleJob && simpleJob.getSimple() != null;
default:
throw new IllegalArgumentException("Unknown strategy job name: " + strategyJobName);
}
});
}

/**
* Validates the connector's detection criteria
*
* @param currentConnector Connector instance
* @param hostname Hostname
* @param jobName The strategy job name
* @return boolean representing the success of the tests
*/
protected boolean validateConnectorDetectionCriteria(final Connector currentConnector, final String hostname) {
protected boolean validateConnectorDetectionCriteria(
final Connector currentConnector,
final String hostname,
final String jobName
) {
if (currentConnector.getConnectorIdentity().getDetection() == null) {
return true;
}
// Track the connector detection criteria execution start time
final long jobStartTime = System.currentTimeMillis();

final ConnectorTestResult connectorTestResult = new ConnectorSelection(
telemetryManager,
Expand All @@ -419,6 +463,19 @@ protected boolean validateConnectorDetectionCriteria(final Connector currentConn
extensionManager
)
.runConnectorDetectionCriteria(currentConnector, hostname);

// Track the connector detection criteria execution end time
final long jobEndTime = System.currentTimeMillis();

// Set the job duration metric of the connector monitor in the host monitor
setJobDurationMetric(
jobName,
KnownMonitorType.CONNECTOR.getKey(),
currentConnector.getCompiledFilename(),
jobStartTime,
jobEndTime
);

final String connectorId = currentConnector.getCompiledFilename();
final Monitor monitor = telemetryManager.findMonitorByTypeAndId(
KnownMonitorType.CONNECTOR.getKey(),
Expand Down Expand Up @@ -508,43 +565,122 @@ public boolean isMonitorFiltered(final String monitorType) {
}

/**
* Sets the job duration metric in the host monitor.
* Sets the job duration metric in the host monitor with a monitor type.
*
* @param jobName the name of the job
* @param monitorType the type of monitor
* @param monitorType the monitor type in the job
* @param connectorId the ID of the connector
* @param startTime the start time of the job in milliseconds
* @param endTime the end time of the job in milliseconds
* @param jobStartTime the start time of the job in milliseconds
* @param jobEndTime the end time of the job in milliseconds
*/
protected void setJobDurationMetricInHostMonitor(
protected void setJobDurationMetric(
final String jobName,
final String monitorType,
final String connectorId,
final long jobStartTime,
final long jobEndTime
) {
setJobDurationMetric(
() -> generateJobDurationMetricKey(jobName, monitorType, connectorId),
jobStartTime,
jobEndTime
);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce code duplicate using a supplier:

	/**
	 * Sets the job duration metric in the host monitor with a monitor type.
	 *
	 * @param jobName      the name of the job
	 * @param monitorType  the monitor type in the job
	 * @param connectorId  the ID of the connector
	 * @param jobStartTime the start time of the job in milliseconds
	 * @param jobEndTime   the end time of the job in milliseconds
	 */
	protected void setJobDurationMetric(final String jobName, final String monitorType, final String connectorId, final long jobStartTime,
			final long jobEndTime) {
		setJobDurationMetric(() -> generateJobDurationMetricKey(jobName, monitorType, connectorId), jobStartTime, jobEndTime);
	}

	/**
	 * Sets the job duration metric in the host monitor with a monitor type.
	 *
	 * @param jobName      the name of the job
	 * @param connectorId  the ID of the connector
	 * @param jobStartTime the start time of the job in milliseconds
	 * @param jobEndTime   the end time of the job in milliseconds
	 */
	protected void setJobDurationMetric(final String jobName, final String connectorId, final long jobStartTime,
			final long jobEndTime) {
		setJobDurationMetric(() -> generateJobDurationMetricKey(jobName, connectorId), jobStartTime, jobEndTime);
	}

	/**
	 * Sets the job duration metric in the host monitor with a monitor type.
	 *
	 * @param metricKeySupplier the supplier of the metric key
	 * @param startTime the start time of the job in milliseconds
	 * @param endTime   the end time of the job in milliseconds
	 */
	private void setJobDurationMetric(
		final Supplier<String> metricKeySupplier,
		final long startTime,
		final long endTime
	) {
		// If the enableSelfMonitoring flag is set to true, or it's not configured at all,
		// set the job duration metric on the monitor. Otherwise, don't set it.
		// By default, self monitoring is enabled
		if (telemetryManager.getHostConfiguration().isEnableSelfMonitoring()) {
			// Build the job duration metric key
			final String jobDurationMetricKey = metricKeySupplier.get();
			// Collect the job duration metric
			collectJobDurationMetric(jobDurationMetricKey, startTime, endTime);
		}
	}

	/**
	 * Generates the job duration metric key.
	 * @param jobName     the name of the job
	 * @param monitorType the monitor type
	 * @param connectorId the ID of the connector
	 * @return the job duration metric key.
	 */
	private String generateJobDurationMetricKey(final String jobName, final String monitorType, final String connectorId) {
		return new StringBuilder()
			.append("metricshub.job.duration{job.type=\"")
			.append(jobName)
			.append("\", monitor.type=\"")
			.append(monitorType)
			.append("\", connector_id=\"")
			.append(connectorId)
			.append("\"}")
			.toString();
	}

	/**
	 * Generate the job duration metric key.
	 * @param jobName      the name of the job
	 * @param connectorId  the ID of the
	 * @return the job duration metric key.
	 */
	private String generateJobDurationMetricKey(final String jobName, final String connectorId) {
		return new StringBuilder()
			.append("metricshub.job.duration{job.type=\"")
			.append(jobName)
			.append("\", connector_id=\"")
			.append(connectorId)
			.append("\"}")
			.toString();
	}

	/**
	 * Collects and records the job duration metric.
	 *
	 * @param jobDurationMetricKey the key identifying the job duration metric
	 * @param startTime the start time of the job in milliseconds
	 * @param endTime the end time of the job in milliseconds
	 */
	private void collectJobDurationMetric(final String jobDurationMetricKey, final long startTime, final long endTime) {
		final Monitor endpointHostMonitor = telemetryManager.getEndpointHostMonitor();
		final MetricFactory metricFactory = new MetricFactory();
		metricFactory.collectNumberMetric(
			endpointHostMonitor,
			jobDurationMetricKey,
			(endTime - startTime) / 1000.0, // Job duration in seconds
			strategyTime
		);
	}

/**
* Sets the job duration metric in the host monitor without a monitor type.
*
* @param jobName the name of the job
* @param connectorId the ID of the connector
* @param jobStartTime the start time of the job in milliseconds
* @param jobEndTime the end time of the job in milliseconds
*/
protected void setJobDurationMetric(
final String jobName,
final String connectorId,
final long jobStartTime,
final long jobEndTime
) {
setJobDurationMetric(() -> generateJobDurationMetricKey(jobName, connectorId), jobStartTime, jobEndTime);
}

/**
* Sets the job duration metric in the host monitor with a monitor type.
*
* @param metricKeySupplier the supplier of the metric key
* @param startTime the start time of the job in milliseconds
* @param endTime the end time of the job in milliseconds
*/
private void setJobDurationMetric(
final Supplier<String> metricKeySupplier,
final long startTime,
final long endTime
) {
// If the enableSelfMonitoring flag is set to true, or it's not configured at all,
// set the job duration metric on the monitor. Otherwise, don't set it.
// By default, self monitoring is enabled
if (telemetryManager.getHostConfiguration().isEnableSelfMonitoring()) {
final Monitor endpointHostMonitor = telemetryManager.getEndpointHostMonitor();
final MetricFactory metricFactory = new MetricFactory();
// Build the job duration metric key
final String jobDurationMetricKey = metricKeySupplier.get();
// Collect the job duration metric
final String jobDurationMetricKey = new StringBuilder()
.append("metricshub.job.duration{job.type=\"")
.append(jobName)
.append("\", monitor.type=\"")
.append(monitorType)
.append("\", connector_id=\"")
.append(connectorId)
.append("\"}")
.toString();
metricFactory.collectNumberMetric(
endpointHostMonitor,
jobDurationMetricKey,
(endTime - startTime) / 1000.0, // Job duration in seconds
strategyTime
);
collectJobDurationMetric(jobDurationMetricKey, startTime, endTime);
}
}

/**
* Generates the job duration metric key.
* @param jobName the name of the job
* @param monitorType the monitor type
* @param connectorId the ID of the connector
* @return the job duration metric key.
*/
private String generateJobDurationMetricKey(
final String jobName,
final String monitorType,
final String connectorId
) {
return new StringBuilder()
.append("metricshub.job.duration{job.type=\"")
.append(jobName)
.append("\", monitor.type=\"")
.append(monitorType)
.append("\", connector_id=\"")
.append(connectorId)
.append("\"}")
.toString();
}

/**
* Generate the job duration metric key.
* @param jobName the name of the job
* @param connectorId the ID of the
* @return the job duration metric key.
*/
private String generateJobDurationMetricKey(final String jobName, final String connectorId) {
return new StringBuilder()
.append("metricshub.job.duration{job.type=\"")
.append(jobName)
.append("\", connector_id=\"")
.append(connectorId)
.append("\"}")
.toString();
}

/**
* Collects and records the job duration metric.
*
* @param jobDurationMetricKey the key identifying the job duration metric
* @param startTime the start time of the job in milliseconds
* @param endTime the end time of the job in milliseconds
*/
private void collectJobDurationMetric(final String jobDurationMetricKey, final long startTime, final long endTime) {
final Monitor endpointHostMonitor = telemetryManager.getEndpointHostMonitor();
final MetricFactory metricFactory = new MetricFactory();
metricFactory.collectNumberMetric(
endpointHostMonitor,
jobDurationMetricKey,
(endTime - startTime) / 1000.0, // Job duration in seconds
strategyTime
);
}
}
Loading
Loading