Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #525: Add job duration metrics to connectors #528

Merged

Conversation

MedMaalej
Copy link
Contributor

* Compute job duration in validateConnectorDetectionCriteria
* Remove monitorType('none') from surrounding strategy job duration metric
* Add documentation for enableSelfMonitoring in configure-monitoring.md
* Add unit tests

* Compute job duration in validateConnectorDetectionCriteria
* Remove monitorType('none') from surrounding strategy job duration metric
* Add documentation for enableSelfMonitoring in configure-monitoring.md
* Add unit tests
* Simplify strategy name matching code in hasExpectedJobTypes
@MedMaalej MedMaalej added the enhancement New feature or request label Dec 27, 2024
@MedMaalej MedMaalej self-assigned this Dec 27, 2024
@MedMaalej
Copy link
Contributor Author

I tested this ticket on the host emc-vnxe.

Before the new development

We were not able to have job duration metrics in the connector monitors.

beforeChange

The logs of metricshub reported that 53 monitors were discovered.

beforeLog

After the new development

We are able to have job duration metrics in the connector monitors.

newFeature

new_feature2

The logs of metricshub also reported that 53 monitors were discovered ✔️

afterLog

Non regression test

The other job duration metrics were not impacted and they still show up correctly.

Non-regression1

Grafana test

The monitor type connector is now displayed in the drop-down list values

Capture d'écran 2024-12-27 155755
Capture d'écran 2024-12-27 155744

Surrounding strategy job duration metric test

Metricshub logs display the job duration metric of the beforeAll and afterAll strategies without a monitor type

"metricshub.job.duration{job.type=\"beforeAll\", connector_id=\"EMCuemcli\"}" : {
        "name" : "metricshub.job.duration{job.type=\"beforeAll\", connector_id=\"EMCuemcli\"}",
        "collectTime" : 1735310474120,
        "previousCollectTime" : 1735310400078,
        "attributes" : {
          "job.type" : "beforeAll",
          "connector_id" : "EMCuemcli"
        },
        "resetMetricTime" : false,
        "value" : 0.955,
        "previousValue" : 0.945,
        "type" : "NumberMetric",
        "updated" : true
} 

@NassimBtk NassimBtk requested a review from iguitton December 27, 2024 17:37
@NassimBtk NassimBtk added the documentation Improvements or additions to documentation label Dec 27, 2024
@iguitton
Copy link
Contributor

iguitton commented Dec 30, 2024

@MedMaalej , @NassimBtk : I suggest documenting the self-monitoring feature in a dedicated markdown file that will be placed under a Troubleshooting section (like we did for the hws doc: https://www.aws-dev.sentrysoftware.com/docs/hws-doc/latest/troubleshooting/self-monitoring.html). If you both agree, I will move forward and make the required changes. I can perform this task through #529

…ure/issue-525-Add-job-duration-metrics-to-connectors
@NassimBtk
Copy link
Member

@MedMaalej , @NassimBtk : I suggest documenting the self-monitoring feature in a dedicated markdown file that will be placed under a Troubleshooting section (like we did for the hws doc: https://www.aws-dev.sentrysoftware.com/docs/hws-doc/latest/troubleshooting/self-monitoring.html). If you both agree, I will move forward and make the required changes. I can perform this task through #529

Yes, I agree with you @iguitton showing self-monitoring in the Troubleshooting section is an excellent idea. We could showcase a real use case there and include graphs to highlight jobs that run slowly and might affect collection performance. However, I would still keep the instructions for disabling and re-enabling the option in configure-monitoring.md, since each variable in metricshub.yaml is documented there. We shouldn’t add too much detail in this section; instead, we can link from self-monitoring.md to configure-monitoring.md#self-monitoring. What do you think, @iguitton?

Copy link
Member

@NassimBtk NassimBtk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review done.

case "simple":
return monitorJob instanceof SimpleMonitorJob && ((SimpleMonitorJob) monitorJob).getSimple() != null;
default:
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return false;
throw new IllegalArgumentException("Unknown strategy job name: " + strategyJobName);

.anyMatch(monitorJob -> {
switch (strategyJobName.toLowerCase()) {
case "discovery":
return monitorJob instanceof StandardMonitorJob && ((StandardMonitorJob) monitorJob).getDiscovery() != null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return monitorJob instanceof StandardMonitorJob && ((StandardMonitorJob) monitorJob).getDiscovery() != null;
return monitorJob instanceof StandardMonitorJob standardJob && standardJob.getDiscovery() != null;

case "discovery":
return monitorJob instanceof StandardMonitorJob && ((StandardMonitorJob) monitorJob).getDiscovery() != null;
case "collect":
return monitorJob instanceof StandardMonitorJob && ((StandardMonitorJob) monitorJob).getCollect() != null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return monitorJob instanceof StandardMonitorJob && ((StandardMonitorJob) monitorJob).getCollect() != null;
return monitorJob instanceof StandardMonitorJob standardJob && standardJob.getCollect() != null;

case "collect":
return monitorJob instanceof StandardMonitorJob && ((StandardMonitorJob) monitorJob).getCollect() != null;
case "simple":
return monitorJob instanceof SimpleMonitorJob && ((SimpleMonitorJob) monitorJob).getSimple() != null;
Copy link
Member

@NassimBtk NassimBtk Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return monitorJob instanceof SimpleMonitorJob && ((SimpleMonitorJob) monitorJob).getSimple() != null;
return monitorJob instanceof SimpleMonitorJob simpleJob && simpleJob.getSimple() != null;

collectJobDurationMetric(jobDurationMetricKey, startTime, endTime);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce code duplicate using a supplier:

	/**
	 * Sets the job duration metric in the host monitor with a monitor type.
	 *
	 * @param jobName      the name of the job
	 * @param monitorType  the monitor type in the job
	 * @param connectorId  the ID of the connector
	 * @param jobStartTime the start time of the job in milliseconds
	 * @param jobEndTime   the end time of the job in milliseconds
	 */
	protected void setJobDurationMetric(final String jobName, final String monitorType, final String connectorId, final long jobStartTime,
			final long jobEndTime) {
		setJobDurationMetric(() -> generateJobDurationMetricKey(jobName, monitorType, connectorId), jobStartTime, jobEndTime);
	}

	/**
	 * Sets the job duration metric in the host monitor with a monitor type.
	 *
	 * @param jobName      the name of the job
	 * @param connectorId  the ID of the connector
	 * @param jobStartTime the start time of the job in milliseconds
	 * @param jobEndTime   the end time of the job in milliseconds
	 */
	protected void setJobDurationMetric(final String jobName, final String connectorId, final long jobStartTime,
			final long jobEndTime) {
		setJobDurationMetric(() -> generateJobDurationMetricKey(jobName, connectorId), jobStartTime, jobEndTime);
	}

	/**
	 * Sets the job duration metric in the host monitor with a monitor type.
	 *
	 * @param metricKeySupplier the supplier of the metric key
	 * @param startTime the start time of the job in milliseconds
	 * @param endTime   the end time of the job in milliseconds
	 */
	private void setJobDurationMetric(
		final Supplier<String> metricKeySupplier,
		final long startTime,
		final long endTime
	) {
		// If the enableSelfMonitoring flag is set to true, or it's not configured at all,
		// set the job duration metric on the monitor. Otherwise, don't set it.
		// By default, self monitoring is enabled
		if (telemetryManager.getHostConfiguration().isEnableSelfMonitoring()) {
			// Build the job duration metric key
			final String jobDurationMetricKey = metricKeySupplier.get();
			// Collect the job duration metric
			collectJobDurationMetric(jobDurationMetricKey, startTime, endTime);
		}
	}

	/**
	 * Generates the job duration metric key.
	 * @param jobName     the name of the job
	 * @param monitorType the monitor type
	 * @param connectorId the ID of the connector
	 * @return the job duration metric key.
	 */
	private String generateJobDurationMetricKey(final String jobName, final String monitorType, final String connectorId) {
		return new StringBuilder()
			.append("metricshub.job.duration{job.type=\"")
			.append(jobName)
			.append("\", monitor.type=\"")
			.append(monitorType)
			.append("\", connector_id=\"")
			.append(connectorId)
			.append("\"}")
			.toString();
	}

	/**
	 * Generate the job duration metric key.
	 * @param jobName      the name of the job
	 * @param connectorId  the ID of the
	 * @return the job duration metric key.
	 */
	private String generateJobDurationMetricKey(final String jobName, final String connectorId) {
		return new StringBuilder()
			.append("metricshub.job.duration{job.type=\"")
			.append(jobName)
			.append("\", connector_id=\"")
			.append(connectorId)
			.append("\"}")
			.toString();
	}

	/**
	 * Collects and records the job duration metric.
	 *
	 * @param jobDurationMetricKey the key identifying the job duration metric
	 * @param startTime the start time of the job in milliseconds
	 * @param endTime the end time of the job in milliseconds
	 */
	private void collectJobDurationMetric(final String jobDurationMetricKey, final long startTime, final long endTime) {
		final Monitor endpointHostMonitor = telemetryManager.getEndpointHostMonitor();
		final MetricFactory metricFactory = new MetricFactory();
		metricFactory.collectNumberMetric(
			endpointHostMonitor,
			jobDurationMetricKey,
			(endTime - startTime) / 1000.0, // Job duration in seconds
			strategyTime
		);
	}

* @param startTime the start time of the job in milliseconds
* @param endTime the end time of the job in milliseconds
*/
protected void setJobDurationMetricInHostMonitorWithoutMonitorType(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method names (setJobDurationMetricInHostMonitorWithoutMonitorType and setJobDurationMetricInHostMonitorWithMonitorType) are overly verbose. setJobDurationMetric is already sufficiently descriptive, especially when supplemented by clear Javadoc that explains the method's behavior.

NassimBtk and others added 3 commits January 2, 2025 17:34
…s' of github.com:sentrysoftware/metricshub into feature/issue-525-Add-job-duration-metrics-to-connectors
@MedMaalej MedMaalej requested a review from NassimBtk January 2, 2025 17:40
@iguitton
Copy link
Contributor

iguitton commented Jan 3, 2025

@MedMaalej , @NassimBtk : I suggest documenting the self-monitoring feature in a dedicated markdown file that will be placed under a Troubleshooting section (like we did for the hws doc: https://www.aws-dev.sentrysoftware.com/docs/hws-doc/latest/troubleshooting/self-monitoring.html). If you both agree, I will move forward and make the required changes. I can perform this task through #529

Yes, I agree with you @iguitton showing self-monitoring in the Troubleshooting section is an excellent idea. We could showcase a real use case there and include graphs to highlight jobs that run slowly and might affect collection performance. However, I would still keep the instructions for disabling and re-enabling the option in configure-monitoring.md, since each variable in metricshub.yaml is documented there. We shouldn’t add too much detail in this section; instead, we can link from self-monitoring.md to configure-monitoring.md#self-monitoring. What do you think, @iguitton?

Sounds like a good idea to me @NassimBtk and @MedMaalej

@NassimBtk NassimBtk linked an issue Jan 3, 2025 that may be closed by this pull request
@iguitton
Copy link
Contributor

iguitton commented Jan 3, 2025

@MedMaalej , @NassimBtk : this new feature will be documented through #533. Feel free to merge the PR when dev is ready.

Copy link
Contributor

@iguitton iguitton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation will be performed through a different issue.

@NassimBtk NassimBtk merged commit c82942d into main Jan 3, 2025
2 checks passed
@NassimBtk NassimBtk deleted the feature/issue-525-Add-job-duration-metrics-to-connectors branch January 3, 2025 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to enable or disable self-monitoring
3 participants