Add precision and tolerance apis #63

fruitcoder · 2021-05-25T12:00:38Z

The missing tolerance was mentioned here #15 but no PR was added yet. I'll leave this here while I'm experimenting with the implementation to fix failing snapshots on my CI.

The text was updated successfully, but these errors were encountered:

mattprowse-xero · 2022-09-15T04:58:35Z

I'm in the process of adding AccessibilitySnapshot to a project I'm working on, and this is our last remaining blocker. I noticed #64 from a bit over a year ago: is there anything I can do to help get this over the line? More than happy to create a new PR if needed.

fruitcoder · 2022-09-18T19:07:43Z

We're still using our fork. Would be great to get this merged ir any info what would contribute to it being merged

NickEntin · 2022-09-23T14:13:16Z

Hey! Sorry, this one fell through my GitHub queue.

We've been exploring snapshot precision/tolerance APIs internally for snapshots in general. Reducing precision is unfortunately necessary when working across multiple types of machines, but we've seen a lot of cases where it's easy to let non-trivial regressions slip through with reasonable precision levels. I'm hoping we can find some alternative comparison methods that resolve the issue while reducing the possibility of letting through regressions.

cc @jhneves

NickEntin · 2022-09-23T14:17:32Z

@fruitcoder Does using per-pixel tolerance alone (without overall tolerance) fix your CI issues? I think that one is a bit safer than overall tolerance.

fruitcoder · 2022-09-23T20:03:02Z

Are you talking about the perceptualPrecision that's mentioned in the new swift-snapshit-testing 1.10.0 release or was this something I could have used before. If so, can you point me in the right direction? 😅

NickEntin · 2022-09-26T03:19:21Z

Ahh you're using SnapshotTesting. Per-pixel tolerance is only supported out of box in iOSSnapshotTestCase right now (see overallTolerance vs perPixelTolerance here).

Perceptual precision is very similar to per-pixel precision, just using a different definition for how "similar" two pixels are. It potentially could be more reliable than standard per-pixel precision, but I haven't had a chance to run it through our test suite yet.

NickEntin · 2022-12-15T01:12:01Z

Hey @fruitcoder, just following up here - did you get a chance to test whether per-pixel/perceptual precision is sufficient to fix your CI issues?

fruitcoder · 2022-12-21T10:01:39Z

Hey @NickEntin! Is there anything I can test here? I'm using snapshot testing and still can't find a way to get an image strategy that uses any kind of tolerance 🤔

NickEntin · 2023-01-03T03:53:25Z

@fruitcoder Try this branch. That should give you a parameter for .accessibilityImage(perceptualPrecision: ...). Try something like 0.995 to start.

This adds variants of the `SnapshotVerify*(...)` methods that allow for imprecise comparisons, i.e. using the `perPixelTolerance` and `overallTolerance` parameters. ## Why is this necessary? Adding tolerances has been a highly requested feature (see #63) to work around...

This adds variants of the `SnapshotVerify*(...)` methods that allow for imprecise comparisons, i.e. using the `perPixelTolerance` and `overallTolerance` parameters. ## Why is this necessary? Adding tolerances has been a highly requested feature (see #63) to work around some simulator changes introduced in iOS 13. Historically the simulator has supported CPU-based rendering, giving us very stable image representations of views that we can compare pixel-by-pixel. Unfortunately, with iOS 13, Apple changed the simulator to use exclusively GPU-based rendering, which means that the resulting snapshots may differ slightly across machines (see uber/ios-snapshot-test-case#109). The negative effects of this were mitigated in iOSSnapshotTestCase by adding two tolerances: a **per-pixel tolerance** that controls how close in color two pixels need to be to count as unchanged and an **overall tolerance** that controls what portion of pixels between two images need to be the same (based on the per-pixel calculation) for the images to be considered unchanged. Setting these tolerances to non-zero values enables engineers to record tests on one machine and run them on another without worrying about the tests failing due to differences in GPU rendering.

This adds variants of the `SnapshotVerify*(...)` methods that allow for imprecise comparisons, i.e. using the `perPixelTolerance` and `overallTolerance` parameters. ## Why is this necessary? Adding tolerances has been a highly requested feature (see #63) to work around some simulator changes introduced in iOS 13. Historically the simulator has supported CPU-based rendering, giving us very stable image representations of views that we can compare pixel-by-pixel. Unfortunately, with iOS 13, Apple changed the simulator to use exclusively GPU-based rendering, which means that the resulting snapshots may differ slightly across machines (see uber/ios-snapshot-test-case#109). The negative effects of this were mitigated in iOSSnapshotTestCase by adding two tolerances to snapshot comparisons: a **per-pixel tolerance** that controls how close in color two pixels need to be to count as unchanged and an **overall tolerance** that controls what portion of pixels between two images need to be the same (based on the per-pixel calculation) for the images to be considered unchanged. Setting these tolerances to non-zero values enables engineers to record tests on one machine and run them on another (e.g. record new reference images on their laptop and then run tests on CI) without worrying about the tests failing due to differences in GPU rendering. This is great in theory, but from our testing we've found even the lowest tolerance values to consistently handle GPU differences between machine types let through a significant number of visual regressions. In other words, there is no magic tolerance threshold that avoids false negatives based on GPU rendering and also avoids false positives based on minor visual regressions. This is especially true for accessibility snapshots. To start, tolerances seem to be more reliable when applied to relatively small snapshot images, but accessibility snapshots tend to be fairly large since they include both the view and the legend. Additionally, the text in the legend can change meaningfully and reflect only a small number of pixel changes. For example, I ran a test of full screen snapshot on an iPhone 12 Pro with two columns of legend. Even an overall tolerance of only `0.0001` (0.01%) was enough to let through a regression where one of the elements lost its `.link` trait (represented by the text "Link." appended to the element's description in the snapshot). But this low a tolerance _wasn't_ enough to handle the GPU rendering differences between a MacBook Pro and a Mac Mini. This is a simplified example since it only uses `overallTolerance`, not `perPixelTolerance`, but we've found many similar situations arise even with the combination. Some teams have developed infrastructure to allow snapshots to run on the same hardware consistently and have built a developer process around that infrastructure, but many others have accepted tolerances as a necessity today. ## Why create separate "imprecise" variants? The simplest approach to adding tolerances would be adding the `perPixelTolerance` and `overallTolerance` parameters to the existing snapshot methods, however I feel adding separate methods with an "imprecise" prefix is better in the long run. The naming is motivated by the idea that **it needs to be very obvious when what you're doing might result in unexpected/undesirable behavior**. In other words, when using one of the core snapshot methods, you should have extremely high confidence that a test passing means there's no regressions. When you use an "imprecise" variant, it's up to you to set your confidence levels according to your chosen tolerances. This is similar to the "unsafe" terminology around memory in the Swift API. You should generally feel very confident in the memory safety of your code, but any time you see "unsafe" it's a sign to be extra careful and not gather unwarranted confidence from the compiler. Longer term, I'm hopeful we can find alternative comparison algorithms that allow for GPU rendering differences without opening the door to regressions. We can integrate these into the core snapshot methods as long as they do not introduce opportunities for regressions, or add additional comparison variants to iterate on different approaches.

This adds variants of the `Snapshotting` extensions that allow for imprecise comparisons, i.e. using the `precision` and `perceptualPrecision` parameters. ## Why is this necessary? Adding precision parameters has been a highly requested feature (see #63) to work around some simulator changes introduced in iOS 13. Historically the simulator has supported CPU-based rendering, giving us very stable image representations of views that we can compare pixel-by-pixel. Unfortunately, with iOS 13, Apple changed the simulator to use exclusively GPU-based rendering, which means that the resulting snapshots may differ slightly across machines (see pointfreeco/swift-snapshot-testing#313). The negative effects of this were mitigated in SnapshotTesting by adding two precision controls to snapshot comparisons: a **perceptual precision** that controls how close in color two pixels need to be to count as unchanged (using the Lab ΔE distance between colors) and an overall **precision** that controls what portion of pixels between two images need to be the same (based on the per-pixel calculation) for the images to be considered unchanged. Setting these precisions to non-one values enables engineers to record tests on one machine and run them on another (e.g. record new reference images on their laptop and then run tests on CI) without worrying about the tests failing due to differences in GPU rendering. This is great in theory, but from our testing we've found even the lowest tolerances (near-one precision values) to consistently handle GPU differences between machine types let through a significant number of visual regressions. In other words, there is no magic set of precision values that avoids false negatives based on GPU rendering and also avoids false positives based on minor visual regressions. This is especially true for accessibility snapshots. To start, tolerances seem to be more reliable when applied to relatively small snapshot images, but accessibility snapshots tend to be fairly large since they include both the view and the legend. Additionally, the text in the legend can change meaningfully and reflect only a small number of pixel changes. For example, I ran a test of full screen snapshot on an iPhone 12 Pro with two columns of legend. Even a precision of `0.9999` (99.99%) was enough to let through a regression where one of the elements lost its `.link` trait (represented by the text "Link." appended to the element's description in the snapshot). But this high a precision _wasn't_ enough to handle the GPU rendering differences between a MacBook Pro and a Mac Mini. This is a simplified example since it only uses `precision`, not `perceptualPrecision`, but we've found many similar situations arise even with the combination. Some teams have developed infrastructure to allow snapshots to run on the same hardware consistently and have built a developer process around that infrastructure, but many others have accepted lowering precision as a necessity today. ## Why create separate "imprecise" variants? The simplest approach to adding tolerances would be adding the `precision` and `perceptualPrecision` parameters to the existing snapshot methods, however I feel adding separate methods with an "imprecise" prefix is better in the long run. The naming is motivated by the idea that **it needs to be very obvious when what you're doing might result in unexpected/undesirable behavior**. In other words, when using one of the core snapshot variants, you should have extremely high confidence that a test passing means there's no regressions. When you use an "imprecise" variant, it's up to you to set your confidence levels according to your chosen precision values. This is similar to the "unsafe" terminology around memory in the Swift API. You should generally feel very confident in the memory safety of your code, but any time you see "unsafe" it's a sign to be extra careful and not gather unwarranted confidence from the compiler. Longer term, I'm hopeful we can find alternative comparison algorithms that allow for GPU rendering differences without opening the door to regressions. We can integrate these into the core snapshot variants as long as they do not introduce opportunities for regressions, or add additional comparison variants to iterate on different approaches.

fruitcoder mentioned this issue May 25, 2021

Add precision/tolerance to api #64

Closed

NickEntin mentioned this issue Oct 6, 2022

Update workflow with macOS 11 #104

Closed

NickEntin assigned jhneves Dec 15, 2022

NickEntin added the enhancement Request for a new feature or improvement to an existing feature label Jan 20, 2023

alexey1312 mentioned this issue Mar 18, 2023

Feature/precision parameter #121

Closed

DimitarNestorov mentioned this issue Apr 19, 2023

Profile screen accessibility snapshot callstack-internal/React-Native-Accesibility-Snapshot-Example#2

Merged

NickEntin mentioned this issue Aug 15, 2023

Add imprecise comparison variants to iOSSnapshotTestCase extension #144

Merged

NickEntin linked a pull request Aug 15, 2023 that will close this issue

Add imprecise comparison variants to iOSSnapshotTestCase extension #144

Merged

NickEntin removed a link to a pull request Aug 15, 2023

Add imprecise comparison variants to iOSSnapshotTestCase extension #144

Merged

NickEntin linked a pull request Aug 16, 2023 that will close this issue

Add imprecise comparison variants to SnapshotTesting extension #143

Draft

NickEntin unassigned jhneves Sep 28, 2023

NickEntin mentioned this issue Oct 17, 2023

Using imageWithSmartInvert with isRecording = true/false - fails everytime #127

Closed

NickEntin linked a pull request Oct 24, 2023 that will close this issue

Add imprecise comparison variants to SnapshotTesting extension #143

Draft

NickEntin mentioned this issue Jan 20, 2024

SnapshotTesting: Allow users to specify precision when using accessibilityImage helper #179

Closed

NickEntin added the blocked This issue or pull request requires another change to be made first label Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add precision and tolerance apis #63

Add precision and tolerance apis #63

fruitcoder commented May 25, 2021

mattprowse-xero commented Sep 15, 2022

fruitcoder commented Sep 18, 2022

NickEntin commented Sep 23, 2022

NickEntin commented Sep 23, 2022

fruitcoder commented Sep 23, 2022

NickEntin commented Sep 26, 2022 •

edited

Loading

NickEntin commented Dec 15, 2022

fruitcoder commented Dec 21, 2022

NickEntin commented Jan 3, 2023

Add precision and tolerance apis #63

Add precision and tolerance apis #63

Comments

fruitcoder commented May 25, 2021

mattprowse-xero commented Sep 15, 2022

fruitcoder commented Sep 18, 2022

NickEntin commented Sep 23, 2022

NickEntin commented Sep 23, 2022

fruitcoder commented Sep 23, 2022

NickEntin commented Sep 26, 2022 • edited Loading

NickEntin commented Dec 15, 2022

fruitcoder commented Dec 21, 2022

NickEntin commented Jan 3, 2023

NickEntin commented Sep 26, 2022 •

edited

Loading