Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare expected RF results against actual using synthetic data for Reporting V2 integration tests #1147

Conversation

tristanvuong2021
Copy link
Contributor

No description provided.

@wfa-reviewable
Copy link

This change is Reviewable

@tristanvuong2021 tristanvuong2021 force-pushed the tristanvuong-compare-expected-rf-metric-results-against-actual-for-integration-test branch from d86d8a3 to 41eb76f Compare August 2, 2023 17:52
@tristanvuong2021 tristanvuong2021 marked this pull request as ready for review August 2, 2023 18:03
@tristanvuong2021 tristanvuong2021 changed the title Compare expected RF results against actual for Reporting V2 integration tests Compare expected RF results against actual using synthetic data for Reporting V2 integration tests Aug 2, 2023
Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @riemanli and @tristanvuong2021)


src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 510 at r1 (raw file):

    assertThat(actualResult)
      .reachValue()
      .isWithinPercent(0.5)

Add TODOs to all these to use variance rather than fixed thresholds.

Code quote:

isWithinPercent(0.5)

src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 631 at r1 (raw file):

          }
      }
    assertThat(actualResult).reachValue().isWithinPercent(500.0).of(1)

Just to confirm, your expected reach value here is 1 and you want to be within 500% of that?

I guess that's fine, it just looks weird. You can probably narrow it down if you use a query with a higher expected reach value. I think with reasonable DP params and an expected reach >=2000 you should be able to reduce the threshold to like 10%.

Code quote:

.isWithinPercent(500.0).of(1)

src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 1751 at r1 (raw file):

  ): Measurement.Result {
    val reachAndFrequency =
      MeasurementResults.computeReachAndFrequency(sampledVids.asIterable(), maximumFrequencyPerUser)

This parameter is the overall max frequency (largest bucket), not the max frequency per user. We don't actually support the latter for R/F measurements (see world-federation-of-advertisers/cross-media-measurement-api#159).

Code quote:

maximumFrequencyPerUser

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @riemanli and @SanjayVas)


src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 510 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Add TODOs to all these to use variance rather than fixed thresholds.

Done.


src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 631 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Just to confirm, your expected reach value here is 1 and you want to be within 500% of that?

I guess that's fine, it just looks weird. You can probably narrow it down if you use a query with a higher expected reach value. I think with reasonable DP params and an expected reach >=2000 you should be able to reduce the threshold to like 10%.

Done.


src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 1751 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

This parameter is the overall max frequency (largest bucket), not the max frequency per user. We don't actually support the latter for R/F measurements (see world-federation-of-advertisers/cross-media-measurement-api#159).

Done.

@riemanli riemanli requested a review from SanjayVas August 3, 2023 21:01
Copy link
Contributor

@riemanli riemanli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @SanjayVas and @tristanvuong2021)


src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 1132 at r2 (raw file):

        } else {
          assertThat(actualResult).reachValue().isWithinPercent(500.0).of(1)
        }

Is the expected value equal to 1 because the event group used here is too small? If so, maybe we could use a larger event group?

Code quote:

        if (resultAttribute.groupingPredicatesList.contains(grouping2Predicate1)) {
          assertThat(actualResult).reachValue().isWithinPercent(500.0).of(1)
        } else {
          assertThat(actualResult).reachValue().isWithinPercent(500.0).of(1)
        }

src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 1391 at r2 (raw file):

        }
        .binResult
        .value

Is it equivalent to

retrievedMetric.result.frequencyHistogram.binsList.sumOf { bin -> bin.binResult.value }

Code quote:

      retrievedMetric.result.frequencyHistogram.binsList
        .reduce { acc, bin ->
          acc.copy { binResult = acc.binResult.copy { value += bin.binResult.value } }
        }
        .binResult
        .value

src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 1511 at r2 (raw file):

    // TODO(@tristanvuong2021): Calculate watch duration using synthetic spec. Current calculation
    // is too large.

IIUC, we don't have duration measurements yet, and currently the EDP simulator simply outputs externalDataProviderId. Do we use a fake Kingdom here?

Code quote:

    // TODO(@tristanvuong2021): Calculate watch duration using synthetic spec. Current calculation
    // is too large.

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 6 unresolved discussions (waiting on @riemanli and @SanjayVas)


src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 1132 at r2 (raw file):

Previously, riemanli (Rieman) wrote…

Is the expected value equal to 1 because the event group used here is too small? If so, maybe we could use a larger event group?

These are a result of filters that result in no events. I updated the synthetic data to include events for this test case.


src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 1391 at r2 (raw file):

Previously, riemanli (Rieman) wrote…

Is it equivalent to

retrievedMetric.result.frequencyHistogram.binsList.sumOf { bin -> bin.binResult.value }

Done.


src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 1511 at r2 (raw file):

Previously, riemanli (Rieman) wrote…

IIUC, we don't have duration measurements yet, and currently the EDP simulator simply outputs externalDataProviderId. Do we use a fake Kingdom here?

I was planning to just leave this until we change the calculation in the EDP simulator.

Copy link
Contributor

@riemanli riemanli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 4 unresolved discussions (waiting on @SanjayVas and @tristanvuong2021)


src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 1132 at r2 (raw file):

Previously, tristanvuong2021 (Tristan Vuong) wrote…

These are a result of filters that result in no events. I updated the synthetic data to include events for this test case.

With the updated synthetic data, should we update the expected value and the tolerance?

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @tristanvuong2021)


src/main/k8s/testing/data/synthetic_event_group_spec_1.textproto line 83 at r3 (raw file):

    vid_range_specs {
      vid_range {
        start: 9000000

nit: since 0 is an invalid VID and the population VID range starts at 1, prefer to define event group spec ranges offset by 1

e.g. [9000001, 9001001)


src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 1511 at r2 (raw file):

Previously, tristanvuong2021 (Tristan Vuong) wrote…

I was planning to just leave this until we change the calculation in the EDP simulator.

FYI #1150 switches EventQuery to actually return labeled events, so it would be possible to update the simulator to do "real" durations after that. Of course, it would mean adding duration to the synthetic generator specs.

Copy link
Contributor

@riemanli riemanli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @tristanvuong2021)


src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 1132 at r2 (raw file):

Previously, riemanli (Rieman) wrote…

With the updated synthetic data, should we update the expected value and the tolerance?

Sorry did you mean you intentionally to obtain the expected result == 1?

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 5 files reviewed, 1 unresolved discussion (waiting on @riemanli and @SanjayVas)


src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt line 1132 at r2 (raw file):

Previously, riemanli (Rieman) wrote…

Sorry did you mean you intentionally to obtain the expected result == 1?

That was before, but for this test case, the expected results are more normal now.

Copy link
Contributor

@riemanli riemanli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @tristanvuong2021)

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run a bazel coverage report for this test against the reporting server code base? I'd like to make sure we're covering all of the major functionality.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Marco-Premier)

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get 81.2% line coverage. The error paths aren't covered with these tests.

Reviewable status: 3 of 7 files reviewed, all discussions resolved (waiting on @Marco-Premier, @riemanli, and @SanjayVas)

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, 2 of 2 files at r4, 1 of 2 files at r5, 1 of 3 files at r6, all commit messages.
Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @Marco-Premier, @riemanli, @SanjayVas, and @tristanvuong2021)


src/test/kotlin/org/wfanet/measurement/integration/postgres/BUILD.bazel line 37 at r6 (raw file):

        "//src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/common/server/postgres:services",
        "//src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/postgres/testing",
        "//src/test/kotlin/org/wfanet/measurement/integration/common/duchy:spanner_duchy_dependency_provider_rule",

ditto


src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt line 27 at r6 (raw file):

import org.wfanet.measurement.integration.common.ALL_DUCHY_NAMES
import org.wfanet.measurement.integration.common.InProcessDuchy
import org.wfanet.measurement.integration.common.duchy.SpannerDuchyDependencyProviderRule

what was the reason for this change?

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @Marco-Premier, @riemanli, @SanjayVas, and @stevenwarejones)


src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt line 27 at r6 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

what was the reason for this change?

The postgres duchy test rule is not fast enough for this test. With my test changes, it leads to frequent time outs.

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @Marco-Premier, @riemanli, @SanjayVas, and @tristanvuong2021)


src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt line 27 at r6 (raw file):

Previously, tristanvuong2021 (Tristan Vuong) wrote…

The postgres duchy test rule is not fast enough for this test. With my test changes, it leads to frequent time outs.

The name of the file is V2PostgrestInProcess....

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @Marco-Premier, @riemanli, @SanjayVas, and @stevenwarejones)


src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt line 27 at r6 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

The name of the file is V2PostgrestInProcess....

The main point of this is to test the Reporting Server.

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @Marco-Premier, @riemanli, @SanjayVas, and @stevenwarejones)


src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt line 27 at r6 (raw file):

Previously, tristanvuong2021 (Tristan Vuong) wrote…

The main point of this is to test the Reporting Server.

The file name is in reference to the Reporting Server.

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 7 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier, @riemanli, and @SanjayVas)


src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt line 27 at r6 (raw file):

Previously, tristanvuong2021 (Tristan Vuong) wrote…

The file name is in reference to the Reporting Server.

Why is the Postgres rule so much slower? Is it that the EmbeddedPostgresDatabaseProvider is so much slower? This will become an issue as we port over to Postgres from Spanner. We should open a task or issue do deal with this imo FYI @SanjayVas

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 7 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier, @riemanli, and @SanjayVas)


src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt line 27 at r6 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

Why is the Postgres rule so much slower? Is it that the EmbeddedPostgresDatabaseProvider is so much slower? This will become an issue as we port over to Postgres from Spanner. We should open a task or issue do deal with this imo FYI @SanjayVas

The Spanner rule uses an emulated Spanner database, but Postgres uses a docker container that runs an actual Postgres database in it.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 7 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier, @riemanli, @stevenwarejones, and @tristanvuong2021)


src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt line 27 at r6 (raw file):

Previously, tristanvuong2021 (Tristan Vuong) wrote…

The Spanner rule uses an emulated Spanner database, but Postgres uses a docker container that runs an actual Postgres database in it.

  1. The "Postgres" distinction in the file name was incorrectly added in Add in process postgres duchy correctness tests #1152, which is also the PR that switched it to use the Postgres Duchy implementation. The reason the distinction is inaccurate is that the test still uses a Spanner-based Kingdom.
  2. I'm looking into what we can to do possibly speed up Postgres testing. Unfortunately there's no lightweight Postgres emulator the way there is for Spanner.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 7 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier, @riemanli, @stevenwarejones, and @tristanvuong2021)


src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt line 27 at r6 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…
  1. The "Postgres" distinction in the file name was incorrectly added in Add in process postgres duchy correctness tests #1152, which is also the PR that switched it to use the Postgres Duchy implementation. The reason the distinction is inaccurate is that the test still uses a Spanner-based Kingdom.
  2. I'm looking into what we can to do possibly speed up Postgres testing. Unfortunately there's no lightweight Postgres emulator the way there is for Spanner.

world-federation-of-advertisers/common-jvm#211 attempts to improve the postgres testing situation. I did some informal testing with that change and I end up with the InProcessLifeOfAMeasurementIntegrationTest taking the same amount of time to run for both the Spanner and Postgres Duchy implementations.

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r6.
Reviewable status: 6 of 7 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier, @riemanli, and @tristanvuong2021)

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 6 of 7 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier, @riemanli, and @tristanvuong2021)

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, 2 of 2 files at r4, 1 of 2 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Marco-Premier and @riemanli)

Copy link
Contributor Author

@tristanvuong2021 tristanvuong2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r7, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Marco-Premier and @riemanli)

@tristanvuong2021 tristanvuong2021 merged commit 19587e5 into main Aug 16, 2023
@tristanvuong2021 tristanvuong2021 deleted the tristanvuong-compare-expected-rf-metric-results-against-actual-for-integration-test branch August 16, 2023 16:30
ple13 pushed a commit that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants