Add imprecise comparison variants to SnapshotTesting extension #143

NickEntin · 2023-08-15T00:33:00Z

This adds variants of the Snapshotting extensions that allow for imprecise comparisons, i.e. using the precision and perceptualPrecision parameters.

Why is this necessary?

Adding precision parameters has been a highly requested feature (see #63) to work around some simulator changes introduced in iOS 13. Historically the simulator has supported CPU-based rendering, giving us very stable image representations of views that we can compare pixel-by-pixel. Unfortunately, with iOS 13, Apple changed the simulator to use exclusively GPU-based rendering, which means that the resulting snapshots may differ slightly across machines (see pointfreeco/swift-snapshot-testing#313).

The negative effects of this were mitigated in SnapshotTesting by adding two precision controls to snapshot comparisons: a perceptual precision that controls how close in color two pixels need to be to count as unchanged (using the Lab ΔE distance between colors) and an overall precision that controls what portion of pixels between two images need to be the same (based on the per-pixel calculation) for the images to be considered unchanged. Setting these precisions to non-one values enables engineers to record tests on one machine and run them on another (e.g. record new reference images on their laptop and then run tests on CI) without worrying about the tests failing due to differences in GPU rendering. This is great in theory, but from our testing we've found even the lowest tolerances (near-one precision values) to consistently handle GPU differences between machine types let through a significant number of visual regressions. In other words, there is no magic set of precision values that avoids false negatives based on GPU rendering and also avoids false positives based on minor visual regressions.

This is especially true for accessibility snapshots. To start, tolerances seem to be more reliable when applied to relatively small snapshot images, but accessibility snapshots tend to be fairly large since they include both the view and the legend. Additionally, the text in the legend can change meaningfully and reflect only a small number of pixel changes. For example, I ran a test of full screen snapshot on an iPhone 12 Pro with two columns of legend. Even a precision of 0.9999 (99.99%) was enough to let through a regression where one of the elements lost its .link trait (represented by the text "Link." appended to the element's description in the snapshot). But this high a precision wasn't enough to handle the GPU rendering differences between a MacBook Pro and a Mac Mini. This is a simplified example since it only uses precision, not perceptualPrecision, but we've found many similar situations arise even with the combination.

Some teams have developed infrastructure to allow snapshots to run on the same hardware consistently and have built a developer process around that infrastructure, but many others have accepted lowering precision as a necessity today.

Why create separate "imprecise" variants?

The simplest approach to adding tolerances would be adding the precision and perceptualPrecision parameters to the existing snapshot methods, however I feel adding separate methods with an "imprecise" prefix is better in the long run. The naming is motivated by the idea that it needs to be very obvious when what you're doing might result in unexpected/undesirable behavior. In other words, when using one of the core snapshot variants, you should have extremely high confidence that a test passing means there's no regressions. When you use an "imprecise" variant, it's up to you to set your confidence levels according to your chosen precision values. This is similar to the "unsafe" terminology around memory in the Swift API. You should generally feel very confident in the memory safety of your code, but any time you see "unsafe" it's a sign to be extra careful and not gather unwarranted confidence from the compiler.

Longer term, I'm hopeful we can find alternative comparison algorithms that allow for GPU rendering differences without opening the door to regressions. We can integrate these into the core snapshot variants as long as they do not introduce opportunities for regressions, or add additional comparison variants to iterate on different approaches.

NickEntin · 2023-08-15T03:41:42Z

We've run into a bit of a snag here. SnapshotTesting added perceptual precision in version 1.10.0, which is really the type of precision we want to support here. However, in the same version SnapshotTesting dropped support for installing via CocoaPods. Our demo app uses CocoaPods, so we can't bump the minimum version to 1.10.0 without dropping it from the demo app, which would kill our test suite.

This adds variants of the `Snapshotting` extensions that allow for imprecise comparisons, i.e. using the `precision` and `perceptualPrecision` parameters. ## Why is this necessary? Adding precision parameters has been a highly requested feature (see #63) to work around some simulator changes introduced in iOS 13. Historically the simulator has supported CPU-based rendering, giving us very stable image representations of views that we can compare pixel-by-pixel. Unfortunately, with iOS 13, Apple changed the simulator to use exclusively GPU-based rendering, which means that the resulting snapshots may differ slightly across machines (see pointfreeco/swift-snapshot-testing#313). The negative effects of this were mitigated in SnapshotTesting by adding two precision controls to snapshot comparisons: a **perceptual precision** that controls how close in color two pixels need to be to count as unchanged (using the Lab ΔE distance between colors) and an overall **precision** that controls what portion of pixels between two images need to be the same (based on the per-pixel calculation) for the images to be considered unchanged. Setting these precisions to non-one values enables engineers to record tests on one machine and run them on another (e.g. record new reference images on their laptop and then run tests on CI) without worrying about the tests failing due to differences in GPU rendering. This is great in theory, but from our testing we've found even the lowest tolerances (near-one precision values) to consistently handle GPU differences between machine types let through a significant number of visual regressions. In other words, there is no magic set of precision values that avoids false negatives based on GPU rendering and also avoids false positives based on minor visual regressions. This is especially true for accessibility snapshots. To start, tolerances seem to be more reliable when applied to relatively small snapshot images, but accessibility snapshots tend to be fairly large since they include both the view and the legend. Additionally, the text in the legend can change meaningfully and reflect only a small number of pixel changes. For example, I ran a test of full screen snapshot on an iPhone 12 Pro with two columns of legend. Even a precision of `0.9999` (99.99%) was enough to let through a regression where one of the elements lost its `.link` trait (represented by the text "Link." appended to the element's description in the snapshot). But this high a precision _wasn't_ enough to handle the GPU rendering differences between a MacBook Pro and a Mac Mini. This is a simplified example since it only uses `precision`, not `perceptualPrecision`, but we've found many similar situations arise even with the combination. Some teams have developed infrastructure to allow snapshots to run on the same hardware consistently and have built a developer process around that infrastructure, but many others have accepted lowering precision as a necessity today. ## Why create separate "imprecise" variants? The simplest approach to adding tolerances would be adding the `precision` and `perceptualPrecision` parameters to the existing snapshot methods, however I feel adding separate methods with an "imprecise" prefix is better in the long run. The naming is motivated by the idea that **it needs to be very obvious when what you're doing might result in unexpected/undesirable behavior**. In other words, when using one of the core snapshot variants, you should have extremely high confidence that a test passing means there's no regressions. When you use an "imprecise" variant, it's up to you to set your confidence levels according to your chosen precision values. This is similar to the "unsafe" terminology around memory in the Swift API. You should generally feel very confident in the memory safety of your code, but any time you see "unsafe" it's a sign to be extra careful and not gather unwarranted confidence from the compiler. Longer term, I'm hopeful we can find alternative comparison algorithms that allow for GPU rendering differences without opening the door to regressions. We can integrate these into the core snapshot variants as long as they do not introduce opportunities for regressions, or add additional comparison variants to iterate on different approaches.

NickEntin · 2024-01-20T08:56:02Z

@luispadron and I are working on Bazel support right now (#166). My plan is to rewrite the demo app in Bazel, which will unblock us running the test suite with the latest version of SnapshotTesting.

alexey1312 · 2024-09-23T12:06:19Z

Hello!
Bazel PR has merged, are you planning to update this PR?

NickEntin force-pushed the entin/imprecise-snapshottesting branch from fa1fffb to b6426fb Compare August 15, 2023 03:27

NickEntin force-pushed the entin/rm-dynamic-type branch 2 times, most recently from 690e585 to 751d34c Compare August 16, 2023 03:30

Base automatically changed from entin/rm-dynamic-type to master August 16, 2023 03:39

NickEntin force-pushed the entin/imprecise-snapshottesting branch from b6426fb to 69d57bb Compare August 16, 2023 05:14

This was referenced Aug 16, 2023

Feature/precision parameter #121

Closed

Move SnapshotTesting tests to SPM test target #149

Closed

NickEntin linked an issue Oct 24, 2023 that may be closed by this pull request

Add precision and tolerance apis #63

Open

NickEntin added the blocked This issue or pull request requires another change to be made first label Oct 24, 2023

NickEntin mentioned this pull request Jan 20, 2024

SnapshotTesting: Allow users to specify precision when using accessibilityImage helper #179

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add imprecise comparison variants to SnapshotTesting extension #143

Add imprecise comparison variants to SnapshotTesting extension #143

NickEntin commented Aug 15, 2023 •

edited

Loading

NickEntin commented Aug 15, 2023

NickEntin commented Jan 20, 2024

alexey1312 commented Sep 23, 2024

Add imprecise comparison variants to SnapshotTesting extension #143

Are you sure you want to change the base?

Add imprecise comparison variants to SnapshotTesting extension #143

Conversation

NickEntin commented Aug 15, 2023 • edited Loading

Why is this necessary?

Why create separate "imprecise" variants?

NickEntin commented Aug 15, 2023

NickEntin commented Jan 20, 2024

alexey1312 commented Sep 23, 2024

NickEntin commented Aug 15, 2023 •

edited

Loading