created: 2022-07-21 last updated: 2022-09-07 status: Approved reviewers:
- @Wyverald
- @oquenchil
- @phst title: Locating runfiles with Bzlmod authors:
- @fmeum
With Bazel's new external dependency management system Bzlmod, the actual names of repositories under the output base are no longer static constants, but depend on the result of its dependency resolution algorithm. As a consequence, functionality that relies on the static nature of these identifiers, most notably runfiles libraries and Stardoc, need to be made aware of the mapping between the names used to refer to repositories and their actual canonical name on disk. This proposal suggests a way to serialize this information into a new type of manifest as well as changes to rulesets and runfiles libraries that allows end users to locate runfiles under Bzlmod with no or minimal changes to their existing code.
Bazel has had a concept of repository mappings for a long time.
A repository can specify a mapping applied to the repository part of label strings used within this repository.
For example, a repository can specify that every label @foo//:target
specified in it is actually interpreted as @bar//:target
.
Here, foo
is called the apparent repository name and bar
is called the canonical repository name.
Repository mappings are rarely used in the WORKSPACE
system for external dependency management and, crucially, all mapping entries are static.
Bazel's new external dependency management system Bzlmod implements "strict deps" checks for repositories by using repository mappings.
Every Bazel module depending on a Bazel module foo
can specify its own apparent repository name for the repository generated by foo
.
The canonical name of the repository corresponding to foo
is an implementation detail of Bzlmod and typically of the form @foo~1.0.3
, where 1.0.3
is the version of foo
.
This version is determined by Bzlmod's dependency resolution algorithm and thus not known in advance by any of the modules participating in this resolution.
In contrast to the situation for WORKSPACE
files, the canonical repository names thus aren't known by the Bazel modules themselves.
The fact that canonical repository names have to be treated as dynamic with Bzlmod poses problems for at least two parts of Bazel that so far rely on repository names being statically known:
- If an executable target
@foo//:exe
in a Bazel modulefoo
has adata
dependency on a file target@bar//pkg:runfile
, wherebar
is a Bazel module that is a direct dependency offoo
, then the runfiles path of@bar//:runfile
will be<canonical repository name of bar>/pkg/runfile
. In order to locate the file at runtime,@foo//:exe
needs to pass this path to one of the runfiles libraries. This is a problem since the path contains the version ofbar
, which isn't static, and thus can't be referenced as a constant literal string anymore. In fact, unlessfoo
is the main module, the same problem already arises for files infoo
itself. This issue is tracked by bazelbuild/bazel#15029. - Stardoc generates documentation from
.bzl
files and for that has to resolve labels inload
statements to.bzl
files on disk. Since it doesn't actually evaluateWORKSPACE
orMODULE.bazel
files, it has no way of discovering repository mappings in the same way as Bazel. Thus, with Bzlmod, Stardoc fails to generate documentation for any.bzl
file that loads.bzl
files from other repositories. This issue is tracked by bazelbuild/bazel#14140 and bazelbuild/stardoc#117.
At runtime, an executable target has to be able to translate an apparent repository name to a canonical repository name. This requires serializing the repository mapping of every transitive dependency of the target that may perform runfiles lookups to a file that can be accessed at runtime, similar to the existing runfiles manifest.
In more formal terms, the complete repository mapping maintained by Bazel is a function repo_mapping
that maps a pair (C, A)
of a canonical repository name C
and an apparent repository name A
to the canonical repository name repo_mapping(C, A)
to which A
resolves when referred to from within C
.
The repository mapping manifest would then look as follows:
- The repository mapping manifest is an ASCII file named
<executable>.repo_mapping
and placed in the same directory as the target's executable. - Every triple
(C, A, repo_mapping(C, A))
is written as a single line terminated by\n
, with the three components separated by,
. Since neither canonical nor apparent repository names can contain commas, this is unambiguous. As a special case, ifrepo_mapping(C, A)
is the empty string (i.e., when the apparent name resolves to the main repository), it is serialized as the workspace name (either the value of thename
attribute of theworkspace
function or__main__
) instead. This is necessary since the main repository is stored under this name in the runfiles tree. As a further special case, ifA
is the empty string, which happens only for the main workspace, the corresponding entry is skipped. - The lines of the manifest are sorted lexicographically. This ordering is equivalent to the lexicographical ordering of the triples comprising the repository mapping.
- The repository mapping manifest for an executable target
T
consists only of those entries(C, A, repo_mapping(C, A))
for which both of the following conditions are satisfied:T
transitively depends on a target inC
;T
's runfiles contain an artifact owned by a target inrepo_mapping(C, A)
. This property ensures that the repository mapping manifest only contains the entries that are actually needed for the target to resolve its runfiles. If all entries (instead of just this limited set) were emitted into the manifest, all actions depending on T would have to rerun if any repository containing a transitive dependency of T would declare a new dependency or change an apparent repository name, even if that repository neither contributes a runfile nor is contained in the transitive closure of T.
In order to help runfiles libraries find the repository mapping manifest, it is added to the runfiles manifest (and thus the runfiles directory) under the fixed path _repo_mapping
.
Since canonical repository names do not start with _
, there is no risk of this synthetic part being a prefix of an actual runfile.
Consider the following Bazel module:
# WORKSPACE
workspace(name = "my_workspace")
# MODULE.bazel
module(
name = "my_module",
version = "1.2.3",
)
bazel_dep(name = "protobuf", version = "3.19.2", repo_name = "my_protobuf")
bazel_dep(name = "rules_go", version = "0.33.0")
# BUILD.bazel
cc_binary(
name = "some_tool",
srcs = ["main.cpp"],
data = [
"data.txt",
"@my_protobuf//:protoc",
],
deps = ["@bazel_tools//tools/cpp/runfiles"],
)
With my_module
as the root module and Bzlmod enabled, the repository mapping manifest of //:some_tool
would look as follows:
,my_module,my_workspace
,my_protobuf,protobuf~3.19.2
,my_workspace,my_workspace
protobuf~3.19.2,protobuf,protobuf~3.19.2
- The manifest contains the mapping to the canonical repository name for the workspace and module name since
//:some_tool
has a data dependency on//:data.txt
and a direct dependency on a runfiles library. - The manifest contains the mapping of
my_protobuf
(used by the main module) to the canonical repository name for protobuf since//:some_tool
has a data dependency on@@protobuf~3.19.2//:protoc
. - The manifest contains the mapping of
protobuf
(used by the protobuf module) to the canonical repository name for protobuf since//:some_tool
has a data dependency on@@protobuf~3.19.2//:protoc
. - The manifest does not contain any entries where
C
isrules_go~0.33.0
sincerules_go
does not contribute a target to the transitive closure of//:some_tool
. - The manifest does not contain any entries where
C
isbazel_tools
sincebazel_tools
does not depend on any module providing runfiles (protobuf
or the main module). - The manifest does not contain any entries where
repo_mapping(C, A)
isrules_go~0.33.0
orbazel_tools
since these repositories do not contribute runfiles to the transitive closure of//:some_tool
.
If a module other_module
depends on my_module
and contains a target that depends on @@my_module~1.2.3//:some_tool
, then that target's repository mapping manifest would contain the following lines (among others):
my_module~1.2.3,my_module,my_module~1.2.3
my_module~1.2.3,my_protobuf,protobuf~3.19.2
protobuf~3.19.2,protobuf,protobuf~3.19.2
- Add a new field to
RunfilesProvider
to track theRepositoryName
andRepositoryMapping
of transitive dependencies of a given target. - When creating an instance of
RunfilesProvider
, collect this information for all non-tool dependencies and add theRepositoryName
andRepositoryMapping
of the current target. This is very similar to the logic inInstrumentedFilesCollector#forwardAll
, which forwards information about source files instrumented for coverage. Toolchain dependencies built in the target configuration have to be included as well - a language runtime may need to look up runfiles at runtime. - Alternatively, use
RuleConfiguredTargetValue#getTransitivePackagesForPackageRootResolution
if available at the right times. - Pass the
RepositoryMapping
s toRunfilesSupport
and let it register a new action that writes the repository mapping manifest for only those repositories that are actually contributing runfiles. This requires maintaining the inverse of the mappingRepositoryMapping
in aMultimap
: repository mappings are not necessarily injective. - Add the repository mapping manifest artifact to the runfiles middleman and the runfiles source manifest.
Emitting a new file into the runfiles tree is a backwards-compatible change.
Based on Change 2, runfiles libraries can now apply the repository mapping at runtime. Concretely, runfiles libraries should modify their current lookup procedure described in the original design doc as follows:
-
After the
Create
method of the runfiles library has located either the runfiles directory or the runfiles manifest, it uses either to look up the repository mapping manifest under the fixed runfiles path_repo_mapping
.a. If the runfiles library is shipped with Bazel, it should fail with a descriptive error message if the manifest is not found.
b. If the runfiles library is not shipped with Bazel, it should fall back to the original lookup procedure. This is necessary to support the use of runfiles libraries with versions of Bazel that do not yet create the manifest.
-
The runfiles library parses the repository mapping manifest and stores a map of
(C, A)
torepo_mapping(C, A)
for all entries in the manifest. -
When
Rlocation
is called with a runfiles-root-relative pathrpath
, after validatingrpath
, the runfiles library extracts the first path segment, understood to be an apparent repository nameA
, ofrpath
. It also determinesC
, the canonical name of the repository from which it was called, in a language-specific way (see Change 4 below suggestions). IfC
can't be determined, the runfiles library setsC
to the empty string, which corresponds to performing runfiles lookups using the mapping of the main repository. It then looks up the map entry for the key(C, A)
created in Step 2.a. If
(C, A)
is contained in the map, the runfiles library continues the original lookup procedure for the modified pathrepo_mapping(C, A) + rpath[rpath.find('/'):]
.b. If
(C, A)
is not contained in the map andA
contains a tilde~
, the runfiles library continues the original lookup procedure for the unmodified pathrpath
. Otherwise, it emits a descriptive error message.
The precise way in which this procedure is implemented depends on the particular implementation of the runfiles library for each language.
For example, the runfiles library for Bash doesn't offer a Create
method and implicitly looks up the runfiles library or manifest on each invocation of the rlocation
function.
The implementation details will have to be handled by the individual libraries' maintainers.
Step 2 falls back to using the empty string as the canonical repository name of the current repository so that existing code performing runfiles lookup can be kept working to the extent possible: While this will not lead to successful runfiles lookups from modules that are used as dependencies of other modules, it will correctly resolve in the typical case of ruleset end users that only look up files in their main repository or direct dependencies of that repository. It is possible that this behavior leads to cases where a runfiles lookup succeeds, but resolves to a different file than intended, for runfiles libraries usages by dependencies that haven't been updated. The author believes that the risk of this happening in practice is outweighed by the chance to remain compatible with existing end user codebases.
Step 3 a) falls back to looking up the unmodified runfiles path rpath
so that alternative approaches for generating post-repo-mapping runfiles paths remain viable.
This includes:
- Passing in runfiles paths from Starlark via environment variables or arguments.
- Generating code constants containing already repo-mapped runfiles paths (see rules_runfiles).
This fallback behavior is not prone to ambiguity: With Bzlmod, canonical names of non-main repositories always a ~
, but apparent repo names do not.
End users that do not use Bzlmod will experience no changes when using updated runfiles libraries.
End users that do use Bzlmod currently can't use runfiles libraries successfully in all but the simplest cases (lookups of runfiles contained in the main repository), thus backwards compatibility is not a concern.
The following changes solve the two problems described in the background section of this document based on Change 1, 2, and 3 described above. However, due to the federated nature of rulesets, which in many cases aren't even released with Bazel, and the wildly different natures of the languages they cover, these changes are at this point only to be treated as suggestions. Further feedback from ruleset and Stardoc maintainers as well as language experts is required to ensure that these changes actually provide the best possible API and work correctly in all cases.
Using the new features described in Part 1 and 2, repository mapping aware runfiles lookups can be performed knowing the canonical repository name of the repository containing the code that performs the lookup. The canonical repository name is not an absolute constant known to the end user --- it is an implementation detail of Bzlmod and e.g. contains module versions that may change depending on the outcome of the dependency resolution algorithm.
For compiled languages, some amount of language-specific code generation will likely be needed to make that information available at runtime. For example, the following changes could be made to the rules for compiled languages shipped with Bazel:
- C/C++: Unconditionally add a
local_define
ofBAZEL_CURRENT_REPOSITORY
with value the canonical repository name of the repository containing the target. Then, require users to passBAZEL_CURRENT_REPOSITORY
as an argument to the runfiles library'sCreate
method. - Java: Generate a source file providing a
static final String
constantcom.google.devtools.build.runfiles.RunfilesHelper.CURRENT_REPOSITORY
with value the canonical repository name of the repository containing the target. Separately compile this source file withneverlink = True
and implicitly add it as compile-time dependency of the target. Since the Java Language Specification mandates thatstatic
constants are inlined, this will ensure that no reference to theRunfilesHelper
class is emitted into the compilation output of the target. Each target can then reference the sameCURRENT_REPOSITORY
constant and pass it to theCreate
method ofRunfiles
. Since this would require adding two additional actions to each Java target, it will be more efficient to implement this directly inbuildjar
.
For other compiled as well as dynamic languages, the canonical repository name may instead be obtained directly by the runfiles library by parsing the path of the calling source file. For example, Go has runtime.Caller, Python has inspect.getframeinfo(sys._getframe(1)).filename
and Bash has caller
.
Using the new functionality suggested by this proposal, Stardoc can be made aware of repository mappings as follows:
- Create a new
stardoc_runfiles
rule that consumesbzl_library
targets and advertises aDefaultInfo
with all transitive.bzl
files inrunfiles
and a no-op executable inexecutable
. - Replace the current
stardoc
rule with a macro that evaluates to both astardoc_runfiles
and astardoc
target, with the latter referencing the former and adding it to thetools
of the actual Stardoc action. This stages all.bzl
files as runfiles and, due to 1., ensures that they are covered by the generated.repo_mapping
manifest. - At runtime, parse the runfiles path of a root
.bzl
file to obtain its canonical repository name and use the Java runfiles library to look up the paths of dependencies according to this repository's repository mapping.
Steps 1 and 2 could alternatively be replaced by implementing and using one of the new features proposed in bazelbuild/bazel#15486.