Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid printing full file paths in cranelift generated code #9553

Open
nmattia opened this issue Nov 5, 2024 · 3 comments · May be fixed by #9580
Open

Avoid printing full file paths in cranelift generated code #9553

nmattia opened this issue Nov 5, 2024 · 3 comments · May be fixed by #9580

Comments

@nmattia
Copy link

nmattia commented Nov 5, 2024

Feature

It would be great to avoid printing full paths of source files in the generated code.

Benefit

This would allow for better or more correct caching in third party build systems like Bazel or Nix.

Implementation

  1. The naive solution is to remove references to the source files:
diff --git a/cranelift/codegen/meta/src/srcgen.rs b/cranelift/codegen/meta/src/srcgen.rs
index d3c321e5b..5b94ddd19 100644
--- a/cranelift/codegen/meta/src/srcgen.rs
+++ b/cranelift/codegen/meta/src/srcgen.rs
@@ -94,7 +94,6 @@ impl Formatter {
         directory: &std::path::Path,
     ) -> Result<(), error::Error> {
         let path = directory.join(&filename);
-        eprintln!("Writing generated file: {}", path.display());
         let mut f = fs::File::create(path)?;
 
         for l in self.lines.iter().map(|l| l.as_bytes()) {
diff --git a/cranelift/isle/isle/src/codegen.rs b/cranelift/isle/isle/src/codegen.rs
index 158285832..d292e43c0 100644
--- a/cranelift/isle/isle/src/codegen.rs
+++ b/cranelift/isle/isle/src/codegen.rs
@@ -127,9 +127,6 @@ impl<'a> Codegen<'a> {
             "// Generated automatically from the instruction-selection DSL code in:",
         )
         .unwrap();
-        for file in &self.files.file_names {
-            writeln!(code, "// - {file}").unwrap();
-        }
 
         if !options.exclude_global_allow_pragmas {
             writeln!(
@@ -641,12 +638,11 @@ impl<L: Length, C> Length for ContextIterWrapper<L, C> {{
                             stack.push((Self::validate_block(ret_kind, body), "", scope));
                         }
 
-                        &ControlFlow::Return { pos, result } => {
+                        &ControlFlow::Return { pos: _pos, result } => {
                             writeln!(
                                 ctx.out,
-                                "{}// Rule at {}.",
+                                "{}",
                                 &ctx.indent,
-                                pos.pretty_print_line(&self.files)
                             )?;
                             write!(ctx.out, "{}", &ctx.indent)?;
                             match ret_kind {
  1. The behavior could be enabled/disabled based on a crate feature or environment variable.
  2. The path could be made relative to CARGO_MANIFEST_DIR; alternatively only the basename of the source file could be retained and written to the generated files.

Not very familiar with the crate so can't make an educated decision! But this would be a nice improvement in the way of system-independent reproducible builds.

@nmattia nmattia changed the title Avoid printing full file paths in generated code Avoid printing full file paths in cranelift generated code Nov 5, 2024
nmattia added a commit to dfinity/ic that referenced this issue Nov 5, 2024
This includes a couple of fixes to make the replica build more
deterministic.

This in theory allows anyone building `//rs/replica` with Bazel in the
Docker image to get a bit-for-bit reproducible replica executable, and
this should also improve Bazel cache hit rates.

This fixes for the following issues:

* Non-reproducible `jemalloc` build: the `tikv-jemalloc-sys` crate
  vendors `jemalloc` and builds it as part of `build.rs`. Unfortunately
  that build in not deterministic. To make it more deterministic we
  build `jemalloc` separately, which also speeds up rebuilds as the C
  code does not need to be rebuilt when rust versions change. We only
  enable this in Linux; this also includes a patch to support this in
  `rules_rust`: bazelbuild/rules_rust#2981

* Non-reproducible `rules_rust` build log: this includes a backport of a
  fix that disables build logs in `rules_rust` by default (backported
  because our build is not compatible with the latest `rules_rust`)
  bazelbuild/rules_rust#2974

* Non-reproducible make & pkgconfig toolchains: some toolchains packaged
  by `rules_foreign_cc` cause build determinism issues so instead we use
  the ones installed in the container and remove build logs: bazel-contrib/rules_foreign_cc#1313

* Non-reproducible obj file generation in cc-rs: the `cc-rs` crate used
  in many C builds, including the (ASM) build of `ring`'s crypto bits,
  generates object files that include the Bazel sandbox full path:
  rust-lang/cc-rs#1271

* Non-reproducible codegen: `cranelift-isle` and
  `cranelift-codegen-meta` include references to source files as
  absolute paths that include the Bazel sandbox path:
  bytecodealliance/wasmtime#9553
nmattia added a commit to dfinity/ic that referenced this issue Nov 5, 2024
This includes a couple of fixes to make the replica build more
deterministic.

This in theory allows anyone building `//rs/replica` with Bazel in the
Docker image to get a bit-for-bit reproducible replica executable, and
this should also improve Bazel cache hit rates.

This fixes for the following issues:

* Non-reproducible `jemalloc` build: the `tikv-jemalloc-sys` crate
  vendors `jemalloc` and builds it as part of `build.rs`. Unfortunately
  that build in not deterministic. To make it more deterministic we
  build `jemalloc` separately, which also speeds up rebuilds as the C
  code does not need to be rebuilt when rust versions change. We only
  enable this in Linux; this also includes a patch to support this in
  `rules_rust`: bazelbuild/rules_rust#2981

* Non-reproducible `rules_rust` build log: this includes a backport of a
  fix that disables build logs in `rules_rust` by default (backported
  because our build is not compatible with the latest `rules_rust`)
  bazelbuild/rules_rust#2974

* Non-reproducible make & pkgconfig toolchains: some toolchains packaged
  by `rules_foreign_cc` cause build determinism issues so instead we use
  the ones installed in the container and remove build logs: bazel-contrib/rules_foreign_cc#1313

* Non-reproducible obj file generation in cc-rs: the `cc-rs` crate used
  in many C builds, including the (ASM) build of `ring`'s crypto bits,
  generates object files that include the Bazel sandbox full path:
  rust-lang/cc-rs#1271

* Non-reproducible codegen: `cranelift-isle` and
  `cranelift-codegen-meta` include references to source files as
  absolute paths that include the Bazel sandbox path:
  bytecodealliance/wasmtime#9553
github-merge-queue bot pushed a commit to dfinity/ic that referenced this issue Nov 5, 2024
This includes a couple of fixes to make the replica build more
deterministic.

This in theory allows anyone building `//rs/replica` with Bazel in the
Docker image to get a bit-for-bit reproducible replica executable, and
this should also improve Bazel cache hit rates.

This fixes for the following issues:

* Non-reproducible `jemalloc` build: the `tikv-jemalloc-sys` crate
vendors `jemalloc` and builds it as part of `build.rs`. Unfortunately
that build in not deterministic. To make it more deterministic we build
`jemalloc` separately, which also speeds up rebuilds as the C code does
not need to be rebuilt when rust versions change. We only enable this in
Linux; this also includes a patch to support this in `rules_rust`:
bazelbuild/rules_rust#2981

* Non-reproducible `rules_rust` build log: this includes a backport of a
fix that disables build logs in `rules_rust` by default (backported
because our build is not compatible with the latest `rules_rust`)
bazelbuild/rules_rust#2974

* Non-reproducible make & pkgconfig toolchains: some toolchains packaged
by `rules_foreign_cc` cause build determinism issues so instead we use
the ones installed in the container and remove build logs:
bazel-contrib/rules_foreign_cc#1313

* Non-reproducible obj file generation in cc-rs: the `cc-rs` crate used
in many C builds, including the (ASM) build of `ring`'s crypto bits,
generates object files that include the Bazel sandbox full path:
rust-lang/cc-rs#1271

* Non-reproducible codegen: `cranelift-isle` and
`cranelift-codegen-meta` include references to source files as absolute
paths that include the Bazel sandbox path:
bytecodealliance/wasmtime#9553
@cfallin
Copy link
Member

cfallin commented Nov 5, 2024

Hi @nmattia -- thanks for filing this issue!

I don't think we will remove the paths altogether -- let me explain a bit more why. We emit these paths in the generated Rust code to help us debug and to see where rule-matching cases are defined in the original source. It was an explicit design goal for the output of the ISLE metacompiler to be relatively comprehensible to humans. The ability to refer back to source locations is useful for this.

That said, I'd be happy to review a PR that makes the paths relative to the source root (your option 3).

I'm also a bit curious about your use-case. You say "more correct caching" -- do you mean, higher cache hit-rate for a cache shared across many builds at different paths? We have not made it an explicit design goal to avoid, e.g., absolute paths in comments, though we do take care to ensure the final built code is semantically deterministic (i.e. our only variances should be in comments). I think it'd be reasonable to aim for this tighter intermediate-textual-form definition of determinism, just want to clarify the exact requirement.

@fitzgen
Copy link
Member

fitzgen commented Nov 6, 2024

FWIW, I think we do (or did at one point do) this kind of prefix removal in cranelift's build script or the meta crate.

@nmattia
Copy link
Author

nmattia commented Nov 7, 2024

We emit these paths in the generated Rust code to help us debug

That makes complete sense!

I'd be happy to review a PR that makes the paths relative to the source root

I'll prepare something and send it your way then :)

higher cache hit-rate for a cache shared across many builds at different paths?

Yes, that's right! Some build systems read and hash inputs to see if any of the outputs that depend on said inputs need to be rebuilt -- unlike Make for instance, which uses mtime. So when some of those inputs (generated code with changing comments) change, the previously built outputs are discarded and need to be rebuilt, even though the changed comments should not have an impact on the actual built code.

@nmattia nmattia linked a pull request Nov 7, 2024 that will close this issue
alin-at-dfinity pushed a commit to dfinity/ic that referenced this issue Nov 7, 2024
This includes a couple of fixes to make the replica build more
deterministic.

This in theory allows anyone building `//rs/replica` with Bazel in the
Docker image to get a bit-for-bit reproducible replica executable, and
this should also improve Bazel cache hit rates.

This fixes for the following issues:

* Non-reproducible `jemalloc` build: the `tikv-jemalloc-sys` crate
vendors `jemalloc` and builds it as part of `build.rs`. Unfortunately
that build in not deterministic. To make it more deterministic we build
`jemalloc` separately, which also speeds up rebuilds as the C code does
not need to be rebuilt when rust versions change. We only enable this in
Linux; this also includes a patch to support this in `rules_rust`:
bazelbuild/rules_rust#2981

* Non-reproducible `rules_rust` build log: this includes a backport of a
fix that disables build logs in `rules_rust` by default (backported
because our build is not compatible with the latest `rules_rust`)
bazelbuild/rules_rust#2974

* Non-reproducible make & pkgconfig toolchains: some toolchains packaged
by `rules_foreign_cc` cause build determinism issues so instead we use
the ones installed in the container and remove build logs:
bazel-contrib/rules_foreign_cc#1313

* Non-reproducible obj file generation in cc-rs: the `cc-rs` crate used
in many C builds, including the (ASM) build of `ring`'s crypto bits,
generates object files that include the Bazel sandbox full path:
rust-lang/cc-rs#1271

* Non-reproducible codegen: `cranelift-isle` and
`cranelift-codegen-meta` include references to source files as absolute
paths that include the Bazel sandbox path:
bytecodealliance/wasmtime#9553
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants