Modify metrics aggregator to use AggregatedExecution #479

kathy-t · 2024-01-08T15:24:18Z

Description
Corresponding PRs:

This PR modifies the metrics aggregator to use AggregatedExecution instead of Metrics as a result of the webservice changes in dockstore/dockstore#5778.

Also did some slight re-organizing to create a ValidationStatusAggregator, similar to the aggregators for the other metrics. There are no functional code changes there.

Note that builds will fail until I update the Dockstore webservice version to a tag containing the corresponding webservice changes.

Review Instructions
Build should pass.

Issue
SEAB-5943

Security
If there are any concerns that require extra attention from the security team, highlight them here.

Please make sure that you've checked the following before submitting your pull request. Thanks!

Check that you pass the basic style checks and unit tests by running mvn clean install in the project that you have modified (until https://ucsc-cgl.atlassian.net/browse/SEAB-5300 adds multi-module support properly)
Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.
If you are changing dependencies, check with dependabot to ensure you are not introducing new high/critical vulnerabilities
If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

kathy-t · 2024-01-08T16:21:58Z

...gregator/src/main/java/io/dockstore/metricsaggregator/helper/ValidationStatusAggregator.java

This file can be skimmed. There are no functional code changes, I copied code from AggregationHelper into this new class for organization

pom.xml

denis-yuen

Minor comments, but curious about updates

THIRD-PARTY-LICENSES.txt

denis-yuen · 2024-01-08T19:35:02Z

...regator/src/main/java/io/dockstore/metricsaggregator/client/cli/MetricsAggregatorClient.java

@@ -184,6 +186,7 @@ private void submitValidationData(MetricsAggregatorConfig config, ValidatorToolE
                    .validatorToolVersion(validatorVersion)
                    .isValid(isValid);
            validationExecution.setDateExecuted(dateExecuted);
+            validationExecution.setExecutionId(UUID.randomUUID().toString()); // No execution ID was provided by DNAstack, generate a random one


Should be a parameter? (i.e. to keep the execution id consistent between qa, staging, prod even if it is fake/arbitrary)

Note, a random execution id is also used for the tests below, but that is ok

Hmm, this command only uploads to one environment, so it's currently not possible to keep the execution ID consisten between QA, staging, and prod for this command

I meant, would it make sense to provide it as a parameter like java -jar target/metricsaggregator-*-SNAPSHOT.jar submit-validation-data --config my-custom-config --data <path-to-my-data-file> --validator MINIWDL --validatorVersion 1.0 --platform DNA_STACK --execution-id first-run-after-ai-omics-release for example

Ah I see, hmm yes I think that would work assuming that the file DNAstack provided us contains no duplicate combo of TRS ID and version, otherwise the duplicates will fail because the execution ID already exists for the TRS ID and version (and platform).

I think it's a safe assumption to make given that they provided us with a list of TRS IDs and versions that validated correctly with miniwdl (there's no point in duplicate TRS ID and version entries because it conveys the same info). I'll make the change so it's consistent between environments

I may be getting turned around, could you refresh my memory as to how the execution id enters the system and is there a difference between validations and normal executions in light of

In order to efficiently find the execution to update, the S3 key must contain the execution ID. The file name in S3 > was previously the current time in ms, but it's now the user-provided execution ID. This means that each file in the > metrics S3 bucket contains only 1 execution, whereas it previously could contain multiple.

could you refresh my memory as to how the execution id enters the system

In dockstore/dockstore#5778, the execution ID enters the system through the POST /api/ga4gh/v2/extended/{id}/versions/{version_id}/executions endpoint. In that endpoint's request body, users specify the executions that they want to submit, whether it's a list of workflow executions, a list of validation executions, and/or a list of taskExecutions sets. Each execution in each list of executions requires a user-provided ID that will become the file name in S3.

is there a difference between validations and normal executions in light of
In order to efficiently find the execution to update, the S3 key must contain the execution ID. The file name in S3 was previously the current time in ms, but it's now the user-provided execution ID. This means that each file in the metrics S3 bucket contains only 1 execution, whereas it previously could contain multiple.

There is no difference between validations and normal executions - validations are a type of execution and are included in the italicized comment, i.e. they need a user-provided execution ID and they can be updated. There's 3 types of executions: workflow executions, validation executions, and task executions

denis-yuen · 2024-01-08T20:03:25Z

...gregator/src/main/java/io/dockstore/metricsaggregator/helper/ValidationStatusAggregator.java

+        return Optional.of(new ValidationStatusMetric().validatorTools(newValidatorToolToValidatorInfo));
+    }
+
+    static Optional<ValidationExecution> getLatestValidationExecution(List<ValidationExecution> executions) {


these two methods look the same, can ValidationExecution and ValidatorVersionInfo share an interface that has getDateExecuted

coverbeck · 2024-01-09T23:17:32Z

...gregator/src/main/java/io/dockstore/metricsaggregator/helper/ValidationStatusAggregator.java

+     * @param executions
+     * @return
+     */
+    @SuppressWarnings("checkstyle:magicnumber")


Nit: I'd add a const for 100 and get rid of this suppression.

…te execution IDs

kathy-t · 2024-01-12T22:09:37Z

Re-requesting reviews due to the new changes in dockstore/dockstore#5778.

The new change is that the metrics aggregator now takes the newest execution if there are multiple executions with the same ID.

kathy-t · 2024-01-15T14:57:47Z

metricsaggregator/src/main/java/io/dockstore/metricsaggregator/MetricsAggregatorS3Client.java

+            // Note: executions that were submitted to S3 prior to the existence of execution IDs don't have an execution ID.
+            // For the purposes of aggregation, generate one so that the execution is considered unique.


Important note: we have executions in S3 that were submitted prior to the existence of executions IDs thus they don't have one. The metrics aggregator generates a random one in this case because the executions are assumed to be unique. The aggregator does not send back the object with a randomly generated execution ID to S3.

Should I create a ticket to migrate previous executions in S3 to have an execution ID or do we just live with this work-around?

We could also wipe the old executions and submit anew with execution IDs.

Hmm, true, everything in the metrics S3 bucket is submitted by us and it would allow us to get rid of this special case. If we're okay with that, sounds like we can do the following steps:

Wipe the entire metrics bucket (everything in it was submitted by us)

Re-submit the DNAstack validation metrics using the submit-validation-data command

Re-upload the tool tester AGC executions using the upload-results command, which uploads the executions in this directory. There's not many so a quick fix is to manually modify those files to include execution IDs.

Ingest Terra metrics

Aggregate metrics

@denis-yuen if we're okay with that, I'll take out that PR change that I commented on and add release notes steps to do the above. Thoughts?

I updated the PR so that we can wipe the metrics bucket and re-ingest the DNAstack validation metrics and tool tester AGC executions.

For the DNAstack validation metrics, I modified the metricsaggregator/scripts/format-dnastack-validation-data.sh script so that it takes a dateExecuted argument that is used by the executions submitted to Dockstore. This way, we can set the dateExecuted to the date of the first time we submitted the metrics

For the tooltester AGC executions, I added a unique UUID as the execution ID for each execution, and I had to add a dateExecuted field because that's required by our schema now. Luckily, the date executed is the file name, so I just converted those epoch milliseconds to ISO date format

I will add the steps mentioned in my previous comment to the release checklist. @denis-yuen re-requesting your review

codecov · 2024-01-16T21:19:08Z

Codecov Report

Attention: 23 lines in your changes are missing coverage. Please review.

Comparison is base (2936d16) 52.79% compared to head (be583c1) 54.31%.

Files	Patch %	Lines
...saggregator/helper/ValidationStatusAggregator.java	85.71%	6 Missing and 7 partials ⚠️
...e/metricsaggregator/MetricsAggregatorS3Client.java	86.66%	6 Missing ⚠️
...re/metricsaggregator/helper/AggregationHelper.java	81.25%	1 Missing and 2 partials ⚠️
...csaggregator/helper/ExecutionStatusAggregator.java	80.00%	1 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##             develop     #479      +/-   ##
=============================================
+ Coverage      52.79%   54.31%   +1.52%     
- Complexity       248      267      +19     
=============================================
  Files             30       31       +1     
  Lines           1665     1725      +60     
  Branches         141      143       +2     
=============================================
+ Hits             879      937      +58     
- Misses           721      725       +4     
+ Partials          65       63       -2

Flag	Coverage Δ
metricsaggregator	`45.27% <87.76%> (+1.85%)`	⬆️
toolbackup	`41.27% <52.12%> (-1.49%)`	⬇️
tooltester	`32.23% <52.12%> (-1.17%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

denis-yuen · 2024-01-16T22:35:40Z

metricsaggregator/src/main/java/io/dockstore/metricsaggregator/MetricsAggregatorS3Client.java

+            // Note: executions that were submitted to S3 prior to the existence of execution IDs don't have an execution ID.
+            // For the purposes of aggregation, generate one so that the execution is considered unique.


We could also wipe the old executions and submit anew with execution IDs.

svonworl · 2024-01-17T02:56:55Z

...gregator/src/main/java/io/dockstore/metricsaggregator/helper/ValidationStatusAggregator.java

+        validatorToolToValidations.forEach((validatorTool, validatorToolExecutions) -> {
+            Optional<ValidationExecution> latestValidationExecution = getLatestValidationExecution(validatorToolExecutions);
+
+            if (latestValidationExecution.isPresent()) {


FWIW, no need to change this, but the ifPresent/get in this block (and elsewhere) has the same risk as a check and subsequent use of a nullable: if we try to use an Optional with no value, an exception gets thrown, and it's easy for someone to later get rid of the check and break the code. To eliminate that possibility, there's an ifPresent that accepts a consumer:
https://docs.oracle.com/en/java/javase/21/docs//api/java.base/java/util/Optional.html#ifPresent(java.util.function.Consumer)

With it, you can do something like:

getLatestValidationExecution(validatorToolExecutions).ifPresent(latestValidationExecution -> { // some stuff });

There's also an ifPresentOrElse to implement both sides of a conditional.
https://docs.oracle.com/en/java/javase/21/docs//api/java.base/java/util/Optional.html#ifPresentOrElse(java.util.function.Consumer,java.lang.Runnable)

svonworl · 2024-01-17T03:06:48Z

metricsaggregator/src/main/java/io/dockstore/metricsaggregator/helper/AggregationHelper.java

+        // Set run metrics
+        Optional<ExecutionStatusMetric> aggregatedExecutionStatus = new ExecutionStatusAggregator().getAggregatedMetricFromMetricsList(aggregatedMetrics);
+        boolean containsRunMetrics = aggregatedExecutionStatus.isPresent();
+        if (aggregatedExecutionStatus.isPresent()) {


The data is such that if we can't aggregate an ExecutionStatusMetric, we can't aggregate these other things, either?

Yeah, it was previously like this. The idea is that execution status is a metric that we require from all run executions thus ExecutionStatusMetric is an aggregated metric that the webservice requires

metricsaggregator/src/main/java/io/dockstore/metricsaggregator/helper/ExecutionAggregator.java

svonworl · 2024-01-17T03:31:26Z

metricsaggregator/src/main/java/io/dockstore/metricsaggregator/helper/ExecutionAggregator.java

 * @param <M> The aggregated metric from Metrics
 * @param <E> The execution metric from RunExecution
 */
-public interface RunExecutionAggregator<M, E> {
+public interface ExecutionAggregator<T, M, E> {


Would there be a benefit to bounding any of the generic types, here or elsewhere? (I don't know the answer.)

I bound T because it has to be a type of Execution, but the others are not boundable because they don't share a common type

… re-ingestion

sonarcloud · 2024-01-17T21:22:19Z

Quality Gate failed

Failed conditions

0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

kathy-t added 2 commits January 6, 2024 21:07

Changes required for changes in webservice

d48070e

Forgot this

e3f6939

kathy-t self-assigned this Jan 8, 2024

kathy-t mentioned this pull request Jan 8, 2024

Add update execution metrics endpoint dockstore/dockstore#5778

Merged

9 tasks

kathy-t marked this pull request as ready for review January 8, 2024 15:59

kathy-t commented Jan 8, 2024

View reviewed changes

kathy-t requested review from denis-yuen, svonworl and coverbeck January 8, 2024 16:46

denis-yuen reviewed Jan 8, 2024

View reviewed changes

svonworl approved these changes Jan 8, 2024

View reviewed changes

coverbeck approved these changes Jan 9, 2024

View reviewed changes

kathy-t added 3 commits January 10, 2024 20:38

PR feedback

1640cb8

Filter for non nulls

b93d2db

Aggregator should only consider newest execution if there are duplica…

f93059a

…te execution IDs

kathy-t requested review from denis-yuen, coverbeck and svonworl January 12, 2024 22:09

coverbeck approved these changes Jan 13, 2024

View reviewed changes

Generate execution ID for legacy executions without execution IDs

8607fea

kathy-t commented Jan 15, 2024

View reviewed changes

Use 1.15.0-rc.0 webservice version

9347f95

kathy-t mentioned this pull request Jan 16, 2024

Ingest Terra metrics #478

Merged

4 tasks

denis-yuen approved these changes Jan 16, 2024

View reviewed changes

svonworl approved these changes Jan 17, 2024

View reviewed changes

kathy-t added 2 commits January 17, 2024 10:13

Add more javadoc, bound Execution in interface

12d9801

Modify tooltester executions to upload and submit validation data for…

be583c1

… re-ingestion

kathy-t requested a review from denis-yuen January 17, 2024 21:19

denis-yuen approved these changes Jan 17, 2024

View reviewed changes

kathy-t merged commit 81eb362 into develop Jan 18, 2024
9 of 10 checks passed

kathy-t deleted the feature/seab-5943/update-executions branch January 18, 2024 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify metrics aggregator to use AggregatedExecution #479

Modify metrics aggregator to use AggregatedExecution #479

kathy-t commented Jan 8, 2024 •

edited

Loading

kathy-t Jan 8, 2024

denis-yuen left a comment

denis-yuen Jan 8, 2024

denis-yuen Jan 8, 2024

kathy-t Jan 8, 2024

denis-yuen Jan 8, 2024

kathy-t Jan 8, 2024

denis-yuen Jan 8, 2024 •

edited

Loading

kathy-t Jan 8, 2024

denis-yuen Jan 8, 2024

coverbeck Jan 9, 2024

kathy-t commented Jan 12, 2024

kathy-t Jan 15, 2024

denis-yuen Jan 16, 2024

kathy-t Jan 17, 2024

kathy-t Jan 17, 2024

codecov bot commented Jan 16, 2024 •

edited

Loading

denis-yuen Jan 16, 2024

svonworl Jan 17, 2024

svonworl Jan 17, 2024

kathy-t Jan 17, 2024

svonworl Jan 17, 2024

kathy-t Jan 17, 2024

sonarcloud bot commented Jan 17, 2024

		// Note: executions that were submitted to S3 prior to the existence of execution IDs don't have an execution ID.
		// For the purposes of aggregation, generate one so that the execution is considered unique.

Modify metrics aggregator to use AggregatedExecution #479

Modify metrics aggregator to use AggregatedExecution #479

Conversation

kathy-t commented Jan 8, 2024 • edited Loading

Choose a reason for hiding this comment

denis-yuen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

denis-yuen Jan 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kathy-t commented Jan 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 16, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarcloud bot commented Jan 17, 2024

Quality Gate failed

kathy-t commented Jan 8, 2024 •

edited

Loading

denis-yuen Jan 8, 2024 •

edited

Loading

codecov bot commented Jan 16, 2024 •

edited

Loading