From fba8de6b91bef7370ffcdd794988b2c0ecefaa62 Mon Sep 17 00:00:00 2001 From: Andreas Reich Date: Tue, 11 Feb 2025 11:30:44 +0100 Subject: [PATCH 1/7] Document testing in Rerun --- CONTRIBUTING.md | 161 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 161 insertions(+) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index ea608d9b4657..08ec355be0d6 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -108,6 +108,163 @@ To learn about the viewer, run: cargo run -p rerun -- --help ``` +## Tests + +There are various kinds of automated tests throughout the repository. +If not noted otherwise, all tests run automated on CI, however their frequency (per PR, on `main`, nightly, etc.) and platform coverage may vary. + +### Rust tests + +```sh +cargo nextest run --all-targets --all-features +cargo test --all-features --doc +``` +or alternatively: +```sh +cargo test --all-targets --all-features +``` + +Runs both unit & integration tests for Rust crates, including the Rerun viewer. + +Tests are written using the standard `#[test]` attribute. + +#### `insta` snapshot tests + +Some of the tests in the `rerun` family of crates are [`insta`](https://docs.rs/insta/latest/insta/) snapshot tests. +These tests work by comparing a textual output of a test against a checked-in reference. + +They run as part of the regular Rust test suite, no extra action is required to include them in a test run. + +#### Image comparision tests + +Some of the tests in the `rerun` family of crates are image comparision tests. +These tests work by rendering an image an image and then comparing it with a checked-in reference image. + +They run as part of the regular Rust test suite, no extra action is required to include them in a test run. + +Comparision tests are driven by [egui_kittest](https://github.com/rerun-io/egui_kittest)'s `Harness::snapshot` method. +Typically, we use [TestContext](rerun/crates/viewer/re_viewer_context/src/test_context.rs) in order to mock +relevant parts of the Rerun viewer. + +##### Comparing results & updating images + +Each run of the comparision tests will produce new images that are saved to the comparision images. +(typically at `/snapshots`) + +Upon failure, additionally `diff.png` file is added that highlights all differences between the reference and the new image. +In order to update reference with the new image, run with `UPDATE_SNAPSHOTS=1` environment variable set. + +Use `pixi run snapshots` to compare the results of all failed tests in Rerun. + +##### Guidelines for writing image comparision tests + +* avoid! Whenever **possible** prefer regular Rust tests or `insta` snapshot tests over image comparision tests. +* images should… + * …be checked in as LFS file + * …depict exactly what's tested and nothing else + * …have a low resolution to avoid growth in repo size + * …have a low comparision threshold to avoid false positives + +##### Why does CI / another computer produce a different image? + +First check whether the difference is due to a change in enabled rendering features, potentially due to difference in hardware (/software renderer) capabilitites - our setup generally avoids this by enforcing the same set of features, but this may happen nonetheless. + +However, for smaller discrepancies that may make the output differ depending on GPU/driver implementation details: +* multi-sample anti-aliasing + * sample placement and sample resolve steps are implementation defined + * alpha-to-coverage algorithm/pattern can wary wildly between implementations + * by default we render without anti-aliasing +* texture filtering + * different implementations may apply different optimizations *even* for simple linear texture filtering +* out of bounds texture access (via `textureLoad`) + * implementations are free to return indeterminate values instead of clamping +* floating point evaluation, for details see [WGSL spec § 15.7. Floating Point Evaluation](https://www.w3.org/TR/WGSL/#floating-point-evaluation). Notably: + * rounding mode may be inconsistent + * floating point math "optimizations" may occur + * depending on output shading language, different arithmetic optimizations may be performed upon floating point operations even if they change the result + * floating point denormal flush + * even on modern implementations, denormal float values may be flushed to zero + * `NaN`/`Inf` handling + * whenever the result of a function should yield `NaN`/`Inf`, implementations may free to yield an indeterminate value instead + * builtin-function function precision & error handling (trigonometric functions and others) +* [partial derivatives (dpdx/dpdx)](https://www.w3.org/TR/WGSL/#dpdx-builtin) + * implementations are free to use either `dpdxFine` or `dpdxCoarse` +* [...] + + +Whenever you can't avoid these problems there's two types of threshholds you can tweak: +* threshold for when a pixel is considered different (see [`egui_kittest::SnapshotOptions::treshhold`]) +* how many pixels are allowed to differ (see [`HarnessExt::snapshot_with_broken_pixels_threshold`]) + TODO(emilk/egui#5683): this should be natively supported by kittest + +##### Rendering backend + +Just like for drawing the viewer itself, drawing for comparision tests requires a `wgpu` compatible driver. +As of writing comparision tests are only run via Vulkan & Metal. +For CI / headless environments we a recent version `llvmpipe` for software rendering on Linux & Windows. +On MacOS we currently always use hardware rendering. +For details on how to set this up refer to the [CI setup](./.github/workflows/reusable_checks_rust.yml). + + + +### Python tests + +```sh +pixi run py-test +``` + +The Python SDK is tested using [`pytest`](https://docs.pytest.org/). +Tests are located in the [./rerun_py/tests/](./rerun_py/tests/) folder. + +### C++ tests + +```sh +pixi run cpp-test +``` + +The C++ SDK is tested using [`catch2`](https://github.com/catchorg/Catch2). +Tests are located in the [./rerun_cpp/tests/](./rerun_cpp/tests/) folder. + + +### Snippet comparision tests + +```sh +pixi run -e py docs/snippets/compare_snippet_output.py +``` + +More details in the [README.md](./docs/snippets/README.md). + +Makes sure all of the snippets in the [snippets/](./docs/snippets/) folder are working and yield the same output in all of the supported languages, unless configured otherwise in the [snippets.toml](./docs/snippets/snippets.toml) file. + +### "Roundtrip" tests + +```sh +pixi run ./tests/roundtrips.py +``` + +A set of cross SDK language tests that makes sure that the same logging commands for a select group of archetypes +yields the same output in all of the supported languages. + +Nowadays largely redundant with the snippet comparision tests. + +### Release checklists + +```sh +pixi run -e examples python tests/python/release_checklist/main.py +``` + +More details in the [README.md](./tests/python/release_checklist/README.md). + +A set of **manual** checklist-style tests that should be run prior to each release. +Introduction of new release checklists should be avoided as they add a lot of friction to the release process, +and failures are easy to be missed. + +### Other ad-hoc manual tests + +There's various additional test scenes located at [./tests/cpp/](./tests/cpp/), [./tests/python/](./tests/python/) and [./tests/rust/](./tests/rust/). +We generally build those as a CI step, but they are run only irregularly. +See respective readme files for more details. + ## Tools We use the [`pixi`](https://prefix.dev/) for managing dev-tool versioning, download and task running. To see available tasks, use `pixi task list`. @@ -116,6 +273,10 @@ We use [cargo deny](https://github.com/EmbarkStudios/cargo-deny) to check our de Configure your editor to run `cargo fmt` on save. Also configure it to strip trailing whitespace, and to end each file with a newline. Settings for VSCode can be found in the `.vscode` folder and should be applied automatically. If you are using another editor, consider adding good setting to this repository! Depending on the changes you made run `cargo test --all-targets --all-features`, `pixi run py-test` and `pixi run -e cpp cpp-test` locally. +For details see [the test section above](CONTRIBUTING.md#tests). + +It is not strictly required, but we recommend [`cargo nextest`](https://nexte.st/) for running Rust tests as it is significantly faster than `cargo test` and yields much more readable output. +Note however, that as of writing `cargo nextest` does not yet support doc tests, those need to be run with `cargo test`. ### Linting Prior to pushing changes to a PR, at a minimum, you should always run `pixi run fast-lint`. This is designed to run From b411fc7c4d48742612bff0a9e77ed5bc66cb58fd Mon Sep 17 00:00:00 2001 From: Andreas Reich Date: Tue, 11 Feb 2025 11:40:06 +0100 Subject: [PATCH 2/7] typos --- CONTRIBUTING.md | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 08ec355be0d6..60aed8aac32d 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -135,20 +135,20 @@ These tests work by comparing a textual output of a test against a checked-in re They run as part of the regular Rust test suite, no extra action is required to include them in a test run. -#### Image comparision tests +#### Image comparison tests -Some of the tests in the `rerun` family of crates are image comparision tests. +Some of the tests in the `rerun` family of crates are image comparison tests. These tests work by rendering an image an image and then comparing it with a checked-in reference image. They run as part of the regular Rust test suite, no extra action is required to include them in a test run. -Comparision tests are driven by [egui_kittest](https://github.com/rerun-io/egui_kittest)'s `Harness::snapshot` method. -Typically, we use [TestContext](rerun/crates/viewer/re_viewer_context/src/test_context.rs) in order to mock +Comparison tests are driven by [egui_kittest](https://github.com/emilk/egui/tree/master/crates/egui_kittest)'s `Harness::snapshot` method. +Typically, we use [TestContext](./crates/viewer/re_viewer_context/src/test_context.rs) in order to mock relevant parts of the Rerun viewer. ##### Comparing results & updating images -Each run of the comparision tests will produce new images that are saved to the comparision images. +Each run of the comparison tests will produce new images that are saved to the comparison images. (typically at `/snapshots`) Upon failure, additionally `diff.png` file is added that highlights all differences between the reference and the new image. @@ -156,14 +156,14 @@ In order to update reference with the new image, run with `UPDATE_SNAPSHOTS=1` e Use `pixi run snapshots` to compare the results of all failed tests in Rerun. -##### Guidelines for writing image comparision tests +##### Guidelines for writing image comparison tests -* avoid! Whenever **possible** prefer regular Rust tests or `insta` snapshot tests over image comparision tests. +* avoid! Whenever **possible** prefer regular Rust tests or `insta` snapshot tests over image comparison tests. * images should… * …be checked in as LFS file * …depict exactly what's tested and nothing else * …have a low resolution to avoid growth in repo size - * …have a low comparision threshold to avoid false positives + * …have a low comparison threshold to avoid false positives ##### Why does CI / another computer produce a different image? @@ -180,7 +180,7 @@ However, for smaller discrepancies that may make the output differ depending on * implementations are free to return indeterminate values instead of clamping * floating point evaluation, for details see [WGSL spec § 15.7. Floating Point Evaluation](https://www.w3.org/TR/WGSL/#floating-point-evaluation). Notably: * rounding mode may be inconsistent - * floating point math "optimizations" may occur + * floating point math "optimizations" may occur * depending on output shading language, different arithmetic optimizations may be performed upon floating point operations even if they change the result * floating point denormal flush * even on modern implementations, denormal float values may be flushed to zero @@ -192,15 +192,15 @@ However, for smaller discrepancies that may make the output differ depending on * [...] -Whenever you can't avoid these problems there's two types of threshholds you can tweak: -* threshold for when a pixel is considered different (see [`egui_kittest::SnapshotOptions::treshhold`]) +Whenever you can't avoid these problems there's two types of thresholds you can tweak: +* threshold for when a pixel is considered different (see [`egui_kittest::SnapshotOptions::threshold`]) * how many pixels are allowed to differ (see [`HarnessExt::snapshot_with_broken_pixels_threshold`]) TODO(emilk/egui#5683): this should be natively supported by kittest ##### Rendering backend -Just like for drawing the viewer itself, drawing for comparision tests requires a `wgpu` compatible driver. -As of writing comparision tests are only run via Vulkan & Metal. +Just like for drawing the viewer itself, drawing for comparison tests requires a `wgpu` compatible driver. +As of writing comparison tests are only run via Vulkan & Metal. For CI / headless environments we a recent version `llvmpipe` for software rendering on Linux & Windows. On MacOS we currently always use hardware rendering. For details on how to set this up refer to the [CI setup](./.github/workflows/reusable_checks_rust.yml). @@ -226,7 +226,7 @@ The C++ SDK is tested using [`catch2`](https://github.com/catchorg/Catch2). Tests are located in the [./rerun_cpp/tests/](./rerun_cpp/tests/) folder. -### Snippet comparision tests +### Snippet comparison tests ```sh pixi run -e py docs/snippets/compare_snippet_output.py @@ -245,7 +245,7 @@ pixi run ./tests/roundtrips.py A set of cross SDK language tests that makes sure that the same logging commands for a select group of archetypes yields the same output in all of the supported languages. -Nowadays largely redundant with the snippet comparision tests. +Nowadays largely redundant with the snippet comparison tests. ### Release checklists From adfb1904594b4b5546c316f717bdd389ea507b5c Mon Sep 17 00:00:00 2001 From: Andreas Reich Date: Tue, 11 Feb 2025 11:41:27 +0100 Subject: [PATCH 3/7] improve wording --- CONTRIBUTING.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 60aed8aac32d..519b6bd726dc 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -163,7 +163,7 @@ Use `pixi run snapshots` to compare the results of all failed tests in Rerun. * …be checked in as LFS file * …depict exactly what's tested and nothing else * …have a low resolution to avoid growth in repo size - * …have a low comparison threshold to avoid false positives + * …have a low comparison threshold to avoid the test passing despite unwanted differences ##### Why does CI / another computer produce a different image? From 8c91fafd55c87f4e0d593132da3a85623966c0da Mon Sep 17 00:00:00 2001 From: Andreas Reich Date: Tue, 11 Feb 2025 11:44:25 +0100 Subject: [PATCH 4/7] fix wording --- CONTRIBUTING.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 519b6bd726dc..41392140e824 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -169,7 +169,8 @@ Use `pixi run snapshots` to compare the results of all failed tests in Rerun. First check whether the difference is due to a change in enabled rendering features, potentially due to difference in hardware (/software renderer) capabilitites - our setup generally avoids this by enforcing the same set of features, but this may happen nonetheless. -However, for smaller discrepancies that may make the output differ depending on GPU/driver implementation details: +However, smaller discrepancies may be caused by a variety of implementation details depending on the concrete GPU/driver (even between different versions of the same driver). +For instance: * multi-sample anti-aliasing * sample placement and sample resolve steps are implementation defined * alpha-to-coverage algorithm/pattern can wary wildly between implementations From c8a342c761c5d48810e5ac05bde774c6fa431dfb Mon Sep 17 00:00:00 2001 From: Andreas Reich Date: Tue, 11 Feb 2025 16:29:48 +0100 Subject: [PATCH 5/7] fix typo, shorter link --- CONTRIBUTING.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 41392140e824..d6484130d51e 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -138,7 +138,7 @@ They run as part of the regular Rust test suite, no extra action is required to #### Image comparison tests Some of the tests in the `rerun` family of crates are image comparison tests. -These tests work by rendering an image an image and then comparing it with a checked-in reference image. +These tests work by rendering an image and then comparing it with a checked-in reference image. They run as part of the regular Rust test suite, no extra action is required to include them in a test run. @@ -274,7 +274,7 @@ We use [cargo deny](https://github.com/EmbarkStudios/cargo-deny) to check our de Configure your editor to run `cargo fmt` on save. Also configure it to strip trailing whitespace, and to end each file with a newline. Settings for VSCode can be found in the `.vscode` folder and should be applied automatically. If you are using another editor, consider adding good setting to this repository! Depending on the changes you made run `cargo test --all-targets --all-features`, `pixi run py-test` and `pixi run -e cpp cpp-test` locally. -For details see [the test section above](CONTRIBUTING.md#tests). +For details see [the test section above](#tests). It is not strictly required, but we recommend [`cargo nextest`](https://nexte.st/) for running Rust tests as it is significantly faster than `cargo test` and yields much more readable output. Note however, that as of writing `cargo nextest` does not yet support doc tests, those need to be run with `cargo test`. From 472f7c6d04b4ef57eac190840c3f6051f30f9879 Mon Sep 17 00:00:00 2001 From: Andreas Reich Date: Tue, 11 Feb 2025 16:34:40 +0100 Subject: [PATCH 6/7] [unrelated] remove broken gamedev.net links --- crates/viewer/re_renderer/src/renderer/lines.rs | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/crates/viewer/re_renderer/src/renderer/lines.rs b/crates/viewer/re_renderer/src/renderer/lines.rs index e73a3380bcf4..cd38c049ff18 100644 --- a/crates/viewer/re_renderer/src/renderer/lines.rs +++ b/crates/viewer/re_renderer/src/renderer/lines.rs @@ -9,9 +9,7 @@ //! The quads are oriented and spanned in a vertex shader. //! //! It is tempting to use instancing and store per-instance (==quad) data in a instance-stepped vertex buffer. -//! However, GPUs are notoriously bad at processing instances with a small batch size as -//! [various](https://gamedev.net/forums/topic/676540-fastest-way-to-draw-quads/5279146/) -//! [people](https://gamedev.net/forums/topic/702292-performance-fastest-quad-drawing/5406023/) +//! However, at least historically GPUs are notoriously bad at processing instances with a small batch size as //! [point](https://www.reddit.com/r/vulkan/comments/le74sr/why_gpu_instancing_is_slow_for_small_meshes/) //! [out](https://www.reddit.com/r/vulkan/comments/47kfve/instanced_rendering_performance/) //! […](https://www.reddit.com/r/opengl/comments/q7yikr/how_to_draw_several_quads_through_instancing/). From 2fb4121d6152266172f417d6bd8cc261cb728b02 Mon Sep 17 00:00:00 2001 From: Andreas Reich Date: Wed, 12 Feb 2025 11:52:38 +0100 Subject: [PATCH 7/7] review improvements, refer to (to be updated) egui_kittest readme --- CONTRIBUTING.md | 52 +++++++------------------------------------------ 1 file changed, 7 insertions(+), 45 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d6484130d51e..c64b0178f627 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -116,12 +116,12 @@ If not noted otherwise, all tests run automated on CI, however their frequency ( ### Rust tests ```sh -cargo nextest run --all-targets --all-features -cargo test --all-features --doc +cargo test --all-targets --all-features ``` -or alternatively: +or alternatively (if you've [installed cargo nextest](https://nexte.st/)): ```sh -cargo test --all-targets --all-features +cargo nextest run --all-targets --all-features +cargo test --all-features --doc ``` Runs both unit & integration tests for Rust crates, including the Rerun viewer. @@ -135,6 +135,8 @@ These tests work by comparing a textual output of a test against a checked-in re They run as part of the regular Rust test suite, no extra action is required to include them in a test run. +If the output of them changes (either intentionally or not), they will fail, and you can review the results by running `cargo insta review` (you first need to install it with `cargo install cargo-insta`). + #### Image comparison tests Some of the tests in the `rerun` family of crates are image comparison tests. @@ -156,47 +158,7 @@ In order to update reference with the new image, run with `UPDATE_SNAPSHOTS=1` e Use `pixi run snapshots` to compare the results of all failed tests in Rerun. -##### Guidelines for writing image comparison tests - -* avoid! Whenever **possible** prefer regular Rust tests or `insta` snapshot tests over image comparison tests. -* images should… - * …be checked in as LFS file - * …depict exactly what's tested and nothing else - * …have a low resolution to avoid growth in repo size - * …have a low comparison threshold to avoid the test passing despite unwanted differences - -##### Why does CI / another computer produce a different image? - -First check whether the difference is due to a change in enabled rendering features, potentially due to difference in hardware (/software renderer) capabilitites - our setup generally avoids this by enforcing the same set of features, but this may happen nonetheless. - -However, smaller discrepancies may be caused by a variety of implementation details depending on the concrete GPU/driver (even between different versions of the same driver). -For instance: -* multi-sample anti-aliasing - * sample placement and sample resolve steps are implementation defined - * alpha-to-coverage algorithm/pattern can wary wildly between implementations - * by default we render without anti-aliasing -* texture filtering - * different implementations may apply different optimizations *even* for simple linear texture filtering -* out of bounds texture access (via `textureLoad`) - * implementations are free to return indeterminate values instead of clamping -* floating point evaluation, for details see [WGSL spec § 15.7. Floating Point Evaluation](https://www.w3.org/TR/WGSL/#floating-point-evaluation). Notably: - * rounding mode may be inconsistent - * floating point math "optimizations" may occur - * depending on output shading language, different arithmetic optimizations may be performed upon floating point operations even if they change the result - * floating point denormal flush - * even on modern implementations, denormal float values may be flushed to zero - * `NaN`/`Inf` handling - * whenever the result of a function should yield `NaN`/`Inf`, implementations may free to yield an indeterminate value instead - * builtin-function function precision & error handling (trigonometric functions and others) -* [partial derivatives (dpdx/dpdx)](https://www.w3.org/TR/WGSL/#dpdx-builtin) - * implementations are free to use either `dpdxFine` or `dpdxCoarse` -* [...] - - -Whenever you can't avoid these problems there's two types of thresholds you can tweak: -* threshold for when a pixel is considered different (see [`egui_kittest::SnapshotOptions::threshold`]) -* how many pixels are allowed to differ (see [`HarnessExt::snapshot_with_broken_pixels_threshold`]) - TODO(emilk/egui#5683): this should be natively supported by kittest +For best practices & unexpected sources of image differences refer to the [egui_kittest README](https://github.com/emilk/egui/tree/master/crates/egui_kittest#snapshot-testing). ##### Rendering backend