-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare expected RF results against actual using synthetic data for Reporting V2 integration tests #1147
Conversation
d86d8a3
to
41eb76f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @riemanli and @tristanvuong2021)
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 510 at r1 (raw file):
assertThat(actualResult) .reachValue() .isWithinPercent(0.5)
Add TODOs to all these to use variance rather than fixed thresholds.
Code quote:
isWithinPercent(0.5)
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 631 at r1 (raw file):
} } assertThat(actualResult).reachValue().isWithinPercent(500.0).of(1)
Just to confirm, your expected reach value here is 1
and you want to be within 500% of that?
I guess that's fine, it just looks weird. You can probably narrow it down if you use a query with a higher expected reach value. I think with reasonable DP params and an expected reach >=2000 you should be able to reduce the threshold to like 10%.
Code quote:
.isWithinPercent(500.0).of(1)
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 1751 at r1 (raw file):
): Measurement.Result { val reachAndFrequency = MeasurementResults.computeReachAndFrequency(sampledVids.asIterable(), maximumFrequencyPerUser)
This parameter is the overall max frequency (largest bucket), not the max frequency per user. We don't actually support the latter for R/F measurements (see world-federation-of-advertisers/cross-media-measurement-api#159).
Code quote:
maximumFrequencyPerUser
…lts-against-actual-for-integration-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @riemanli and @SanjayVas)
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 510 at r1 (raw file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
Add TODOs to all these to use variance rather than fixed thresholds.
Done.
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 631 at r1 (raw file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
Just to confirm, your expected reach value here is
1
and you want to be within 500% of that?I guess that's fine, it just looks weird. You can probably narrow it down if you use a query with a higher expected reach value. I think with reasonable DP params and an expected reach >=2000 you should be able to reduce the threshold to like 10%.
Done.
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 1751 at r1 (raw file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
This parameter is the overall max frequency (largest bucket), not the max frequency per user. We don't actually support the latter for R/F measurements (see world-federation-of-advertisers/cross-media-measurement-api#159).
Done.
…lts-against-actual-for-integration-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @SanjayVas and @tristanvuong2021)
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 1132 at r2 (raw file):
} else { assertThat(actualResult).reachValue().isWithinPercent(500.0).of(1) }
Is the expected value equal to 1 because the event group used here is too small? If so, maybe we could use a larger event group?
Code quote:
if (resultAttribute.groupingPredicatesList.contains(grouping2Predicate1)) {
assertThat(actualResult).reachValue().isWithinPercent(500.0).of(1)
} else {
assertThat(actualResult).reachValue().isWithinPercent(500.0).of(1)
}
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 1391 at r2 (raw file):
} .binResult .value
Is it equivalent to
retrievedMetric.result.frequencyHistogram.binsList.sumOf { bin -> bin.binResult.value }
Code quote:
retrievedMetric.result.frequencyHistogram.binsList
.reduce { acc, bin ->
acc.copy { binResult = acc.binResult.copy { value += bin.binResult.value } }
}
.binResult
.value
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 1511 at r2 (raw file):
// TODO(@tristanvuong2021): Calculate watch duration using synthetic spec. Current calculation // is too large.
IIUC, we don't have duration measurements yet, and currently the EDP simulator simply outputs externalDataProviderId
. Do we use a fake Kingdom here?
Code quote:
// TODO(@tristanvuong2021): Calculate watch duration using synthetic spec. Current calculation
// is too large.
…lts-against-actual-for-integration-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 5 files reviewed, 6 unresolved discussions (waiting on @riemanli and @SanjayVas)
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 1132 at r2 (raw file):
Previously, riemanli (Rieman) wrote…
Is the expected value equal to 1 because the event group used here is too small? If so, maybe we could use a larger event group?
These are a result of filters that result in no events. I updated the synthetic data to include events for this test case.
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 1391 at r2 (raw file):
Previously, riemanli (Rieman) wrote…
Is it equivalent to
retrievedMetric.result.frequencyHistogram.binsList.sumOf { bin -> bin.binResult.value }
Done.
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 1511 at r2 (raw file):
Previously, riemanli (Rieman) wrote…
IIUC, we don't have duration measurements yet, and currently the EDP simulator simply outputs
externalDataProviderId
. Do we use a fake Kingdom here?
I was planning to just leave this until we change the calculation in the EDP simulator.
…lts-against-actual-for-integration-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 5 files reviewed, 4 unresolved discussions (waiting on @SanjayVas and @tristanvuong2021)
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 1132 at r2 (raw file):
Previously, tristanvuong2021 (Tristan Vuong) wrote…
These are a result of filters that result in no events. I updated the synthetic data to include events for this test case.
With the updated synthetic data, should we update the expected value and the tolerance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @tristanvuong2021)
src/main/k8s/testing/data/synthetic_event_group_spec_1.textproto
line 83 at r3 (raw file):
vid_range_specs { vid_range { start: 9000000
nit: since 0
is an invalid VID and the population VID range starts at 1, prefer to define event group spec ranges offset by 1
e.g. [9000001, 9001001)
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 1511 at r2 (raw file):
Previously, tristanvuong2021 (Tristan Vuong) wrote…
I was planning to just leave this until we change the calculation in the EDP simulator.
FYI #1150 switches EventQuery to actually return labeled events, so it would be possible to update the simulator to do "real" durations after that. Of course, it would mean adding duration to the synthetic generator specs.
…lts-against-actual-for-integration-test
…lts-against-actual-for-integration-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @tristanvuong2021)
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 1132 at r2 (raw file):
Previously, riemanli (Rieman) wrote…
With the updated synthetic data, should we update the expected value and the tolerance?
Sorry did you mean you intentionally to obtain the expected result == 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 5 files reviewed, 1 unresolved discussion (waiting on @riemanli and @SanjayVas)
src/main/kotlin/org/wfanet/measurement/integration/common/reporting/v2/InProcessLifeOfAReportIntegrationTest.kt
line 1132 at r2 (raw file):
Previously, riemanli (Rieman) wrote…
Sorry did you mean you intentionally to obtain the expected result == 1?
That was before, but for this test case, the expected results are more normal now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @tristanvuong2021)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run a bazel coverage report for this test against the reporting server code base? I'd like to make sure we're covering all of the major functionality.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @Marco-Premier)
…lts-against-actual-for-integration-test
…lts-against-actual-for-integration-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get 81.2% line coverage. The error paths aren't covered with these tests.
Reviewable status: 3 of 7 files reviewed, all discussions resolved (waiting on @Marco-Premier, @riemanli, and @SanjayVas)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 2 of 2 files at r4, 1 of 2 files at r5, 1 of 3 files at r6, all commit messages.
Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @Marco-Premier, @riemanli, @SanjayVas, and @tristanvuong2021)
src/test/kotlin/org/wfanet/measurement/integration/postgres/BUILD.bazel
line 37 at r6 (raw file):
"//src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/common/server/postgres:services", "//src/main/kotlin/org/wfanet/measurement/reporting/deploy/v2/postgres/testing", "//src/test/kotlin/org/wfanet/measurement/integration/common/duchy:spanner_duchy_dependency_provider_rule",
ditto
src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt
line 27 at r6 (raw file):
import org.wfanet.measurement.integration.common.ALL_DUCHY_NAMES import org.wfanet.measurement.integration.common.InProcessDuchy import org.wfanet.measurement.integration.common.duchy.SpannerDuchyDependencyProviderRule
what was the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @Marco-Premier, @riemanli, @SanjayVas, and @stevenwarejones)
src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt
line 27 at r6 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
what was the reason for this change?
The postgres duchy test rule is not fast enough for this test. With my test changes, it leads to frequent time outs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @Marco-Premier, @riemanli, @SanjayVas, and @tristanvuong2021)
src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt
line 27 at r6 (raw file):
Previously, tristanvuong2021 (Tristan Vuong) wrote…
The postgres duchy test rule is not fast enough for this test. With my test changes, it leads to frequent time outs.
The name of the file is V2PostgrestInProcess....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @Marco-Premier, @riemanli, @SanjayVas, and @stevenwarejones)
src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt
line 27 at r6 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
The name of the file is
V2PostgrestInProcess....
The main point of this is to test the Reporting Server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @Marco-Premier, @riemanli, @SanjayVas, and @stevenwarejones)
src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt
line 27 at r6 (raw file):
Previously, tristanvuong2021 (Tristan Vuong) wrote…
The main point of this is to test the Reporting Server.
The file name is in reference to the Reporting Server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 7 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier, @riemanli, and @SanjayVas)
src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt
line 27 at r6 (raw file):
Previously, tristanvuong2021 (Tristan Vuong) wrote…
The file name is in reference to the Reporting Server.
Why is the Postgres rule so much slower? Is it that the EmbeddedPostgresDatabaseProvider
is so much slower? This will become an issue as we port over to Postgres from Spanner. We should open a task or issue do deal with this imo FYI @SanjayVas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 7 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier, @riemanli, and @SanjayVas)
src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt
line 27 at r6 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
Why is the Postgres rule so much slower? Is it that the
EmbeddedPostgresDatabaseProvider
is so much slower? This will become an issue as we port over to Postgres from Spanner. We should open a task or issue do deal with this imo FYI @SanjayVas
The Spanner rule uses an emulated Spanner database, but Postgres uses a docker container that runs an actual Postgres database in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 7 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier, @riemanli, @stevenwarejones, and @tristanvuong2021)
src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt
line 27 at r6 (raw file):
Previously, tristanvuong2021 (Tristan Vuong) wrote…
The Spanner rule uses an emulated Spanner database, but Postgres uses a docker container that runs an actual Postgres database in it.
- The "Postgres" distinction in the file name was incorrectly added in Add in process postgres duchy correctness tests #1152, which is also the PR that switched it to use the Postgres Duchy implementation. The reason the distinction is inaccurate is that the test still uses a Spanner-based Kingdom.
- I'm looking into what we can to do possibly speed up Postgres testing. Unfortunately there's no lightweight Postgres emulator the way there is for Spanner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 7 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier, @riemanli, @stevenwarejones, and @tristanvuong2021)
src/test/kotlin/org/wfanet/measurement/integration/postgres/V2PostgresInProcessLifeOfAReportIntegrationTest.kt
line 27 at r6 (raw file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
- The "Postgres" distinction in the file name was incorrectly added in Add in process postgres duchy correctness tests #1152, which is also the PR that switched it to use the Postgres Duchy implementation. The reason the distinction is inaccurate is that the test still uses a Spanner-based Kingdom.
- I'm looking into what we can to do possibly speed up Postgres testing. Unfortunately there's no lightweight Postgres emulator the way there is for Spanner.
world-federation-of-advertisers/common-jvm#211 attempts to improve the postgres testing situation. I did some informal testing with that change and I end up with the InProcessLifeOfAMeasurementIntegrationTest taking the same amount of time to run for both the Spanner and Postgres Duchy implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r6.
Reviewable status: 6 of 7 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier, @riemanli, and @tristanvuong2021)
…lts-against-actual-for-integration-test
…lts-against-actual-for-integration-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 6 of 7 files reviewed, 1 unresolved discussion (waiting on @Marco-Premier, @riemanli, and @tristanvuong2021)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 2 of 2 files at r4, 1 of 2 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @Marco-Premier and @riemanli)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r7, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @Marco-Premier and @riemanli)
…eporting V2 integration tests (#1147)
No description provided.