Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native_assets_builder] Report all dependencies (or none and mandate to always be invoked) #1770

Open
mkustermann opened this issue Nov 29, 2024 · 6 comments · May be fixed by #1780
Open

[native_assets_builder] Report all dependencies (or none and mandate to always be invoked) #1770

mkustermann opened this issue Nov 29, 2024 · 6 comments · May be fixed by #1780
Assignees

Comments

@mkustermann
Copy link
Member

The dart builder returns BuildResult.dependencies and LinkResult.dependencies. Those dependencies are the ones reported by the hooks themselves - which are dependencies that should cause re-run of hooks if changed.

Though it seems that the source files of the hooks isn't reported to the bundling tool at all today.

=> The purpose of the dependencies is to tell a bundling tool when it needs to re-run the build/link steps.
=> The bundling tool needs to know all dependencies, including hook sources, dart sdk version, ... - anything that if changed should re-trigger of hooks

Alternatively one could say bundling tools always have to re-run the dart build and it may detect it's a NOP quickly and return.

=> If we want that, its questionable whether we should report dependencies at all to the bundling tool.
=> Then a bundling tool probably wants to know an additional piece of information, namely whether calling the dart build resulted in a NOP build or it actually changed the assets.

@mkustermann
Copy link
Member Author

That brings up another interesting topic: We report dependencies as List<Uri> to the bundling tool. That means the bundling tool can only take advantage of that to avoid re-running the dart build by using file stamps (e.g. it may have remembered the last dart build's timestamp and will see if any of those deps are newer than the start of the last dart build invocation). It cannot take advantage of hashing-based cache invalidation.

=> Then we have are in a very awkward spot: Even though dart builds use content-based hashing to determine whether rebuilding is needed, users of the dart build may use time stamps and therefore the system doesn't actually guarantee we re-run when content changes.

If we wanted to have hashing-based caching consistently across the system then we would need to tell the bundling tool Set<({Uri file, String hash})> then it can do the same validation that is done internally.

=> Another aspect is that if bundling tool uses reported dependencies to avoid re-running the dart build, we will, if something changed, do the same dependency validation again in the package:native_assets_builder, so we check all the deps twice (and if we use hashing, we hash all the files twice).

@mkustermann
Copy link
Member Author

/cc @dcharkes

@mkustermann
Copy link
Member Author

As another comment on this caching business: I have some worries that using content-based hashes for everything will make this system overall very slow - especially concerning in hot-reload/hot-restart developer flow where we should keep this in the <=100s ms (and we'll have to perform this did-anything-chance every single time on all the deps of all assets & hook sources)

We could contemplate to seggregate this, e.g.:

  • ignore some dependencies entirely: e.g. files that aren't writiable by the current user (e.g. system headers like /usr/include/*)
  • use timestamps for dependencies that are very unlikely to be updated and if get updated should get new timestamps (~/.pub-cache/*, source files (**), ...)
  • use content based hashing only for things that are risk of having new content without getting modified timestamp updated.

Do we even have strong evidence that we need content-based file-change detection?

This may be an aspect that can influence protocol as the hooks report dependencies (if we did do segregation that may be visible in the protocol). Then hooks that e.g. download files and set an old timestamp could explicitly opt into content-based hashing.

(**) Very intentionally git for example will always touch (aka modify) files when changing their content. Even when checking out old branches it will make the files have newest timestamp. It does so because many incremental build systems only rely on file stamps.

@dcharkes
Copy link
Collaborator

Though it seems that the source files of the hooks isn't reported to the bundling tool at all today.

Right, DartBuildResult has dependencies, but the NativeAssetsBuildRunner kind of assumes it is always invoked and it itself does the caching.

Besides the Dart sources of the hooks, the dependencies should also include a way to detect that a new package with a hook has been introduced. (Very similar to how we use directories as dependencies to indicate a new asset could have appeared.) Maybe it should depend on the packages_config.json, but then if there is a path dependency on another package that doesn't contain a hook, and someone introduces a hook, that will be missed. So we'll need to add all the $pkgName/hook/ directories of all packages to dependencies.

Another thing that dependencies should account for is environment variables. This is not supported in Flutters' Targets at the moent. So, that would require some refactoring in Flutter.

Alternatively one could say bundling tools always have to re-run the dart build and it may detect it's a NOP quickly and return.

This didn't use to be possible from a Flutter Targets perspective. It causes all dependent Targets to be rerun which included Dart kernel compilation before my recent refactoring, which caused regressions in benchmarks (e.g. flutter run, ctrl+c, flutter run became much slower). Now it only includes the copying the assets bundle. Not sure how fast/slow that is. And also not sure if the flutter Targets change detection can stop propagating changes if a previous Target did not make any changes at the end.

This is worth exploring.

If we wanted to have hashing-based caching consistently across the system then we would need to tell the bundling tool Set<({Uri file, String hash})> then it can do the same validation that is done internally.

Right, the best the bundling tool can do is (1) hash after the fact, and (2) check that none of the files were modified after the bundling tool started its build. Which is rather wasteful, as we already do this inside native_assets_builder.

It would be more efficient to indeed report hashes as well. (And it means that changing the hashing algorithm is a change that needs a manual roll.)

Do we even have strong evidence that we need content-based file-change detection?

Yes. It doesn't work properly on windows with timestamps. Especially with downloaded files that have timestamps in the past. That's why Flutter uses content hashes.

I have some worries that using content-based hashes for everything will make this system overall very slow - especially concerning in hot-reload/hot-restart developer flow where we should keep this in the <=100s ms (and we'll have to perform this did-anything-chance every single time on all the deps of all assets & hook sources)

This is exactly why Flutter uses timestamps for hot reload and accepts it's not fully correct. And uses hashing for non-hot-reload.
This is an argument for not blindly treating the invocation for hot-reload the same as non-hot-reload and only passing in buildAssetTypes = ['data'] as was our last idea in #1207.

(**) Very intentionally git for example will always touch (aka modify) files when changing their content. Even when checking out old branches it will make the files have newest timestamp.

@blaugold and I had the idea that we could touch all assets reported on a build hook. The problem though is that it's unclear when to not touch them, if you touch them every time the hook runs, you will never have caching. And it kind of goes against the idea that config.outputDirectory is untouched by the native_assets_builder so that the hook authors can rely on that when implementing their hooks to do more efficient runs on multiple subsequent invocations (#1375).

It does so because many incremental build systems only rely on file stamps.

Some incremental build systems. For example make does, but ninja does not. See my investigation in:

Reflecting on this, I think we have two options as you mentioned:

  1. Bundling tools must always have to re-run the dart build and it may detect it's a NOP quickly and return. - If this at all possible in Flutter tools.
  2. Also report (a) the Dart sources, (b) the package_config, (c) all non-hook-package hook directories; and (d) extend Flutter tools to be able to consume a list of environment variables that is being listened to.

Side note on (d) @mosuem suggested instead of pre-defining the list of env vars we pass in, we could also have the hook output the env vars it used in it's output.

@dcharkes dcharkes added this to the Native Assets v1.0 milestone Nov 29, 2024
@dcharkes dcharkes changed the title Make dart builder report dependencies from hooks themselves [native_assets_builder] Report all dependencies (or none and mandate to always be invoked) Nov 29, 2024
@mkustermann
Copy link
Member Author

Some incremental build systems. For example make does, but ninja does not. See my investigation in:

#1593 (comment)

Actually that issue says: "last-modified timestamp & last-build-started timestamp ...This is the strategy used by make and ninja"

This is an argument for not blindly treating the invocation for hot-reload the same as non-hot-reload and only passing in buildAssetTypes = ['data'] as was our last idea in #1207.

Not sure if it's related. Asking a package for data assets only instead of data assets and code assets is one thing. How dependencies are handled is another thing. A bundling tool (such as flutter) could always be less precise (e.g. imagine we have BuildResult.dependencies being List<({Uri file, String hash})>, it could still decide to only use timestamps).

Why do you see this as related to buildAssetTypes = ['data']?

Also report (a) the Dart sources,

If the hook itself changes (e.g. adding another data asset may require modifying it) we should re-run.
I could be convinced that stuff in ~/.pub-cache is immutable and doesn't need to be part of deps (if we have evidence this speeds things up significantly, then it's something we can consider).

(b) the package_config,

I would assume that the package config is already included because when compiling a hook to kernel the CFE will have to read the package config to resolve package imports. So I'd assume it already emits .dart_tool/package_config.json as part of the dependencies, does it not do that?

(c) all non-hook-package hook directories

Yes, we should detect if a hook/build.dart is added and handle it in hot-reload. If we didn't do it, it's an inconvenience but may not be the end of the world - as it's not something that happens too often. But modifying an existing hook to e.g. emit another data asset or dropping a new file into asset/ folder are more common and have to work.

and (d) extend Flutter tools to be able to consume a list of environment variables that is being listened to.

If we restrict the env variables (which I suggested we do) then we won't re-run the whole world if another env variable is set by the user.

I'd say the importance of re-running (if the env vars we allow through change) would be more important for cold runs (i.e. new flutter run invocation) and less important for reload runs. (I'd assume env vars can affect the native build of plugins as well and if flutter doesn't handle that already, then us not handling it is around the same).

Maybe we can put the short cuts / handling of hot-reload into flutter tools entirely?

@dcharkes
Copy link
Collaborator

Actually that issue says: "last-modified timestamp & last-build-started timestamp ...This is the strategy used by make and ninja"

Ah, it was Bazel. (Remind myself to actually check things instead of doing from memory...)

This is an argument for not blindly treating the invocation for hot-reload the same as non-hot-reload and only passing in buildAssetTypes = ['data'] as was our last idea in #1207.

Not sure if it's related. Asking a package for data assets only instead of data assets and code assets is one thing. How dependencies are handled is another thing. A bundling tool (such as flutter) could always be less precise (e.g. imagine we have BuildResult.dependencies being List<({Uri file, String hash})>, it could still decide to only use timestamps).

For the completely-nothing-changed scenario, it doesn't matter if we do file hashes inside native_assets_builder if we don't invoke it at all from flutter_tools and it decides to use timestamps.

However, if we want the hook-just-passes-through-data-assets-on-disk scenario and one of those files changed to be also blazing fast. Then it potentially matters that we do file-content-hashing inside native_assets_builder. Then native_assets_builder itself should maybe have a param that controls whether its file hashing or timestamps.

(b) the package_config,

I would assume that the package config is already included because when compiling a hook to kernel the CFE will have to read the package config to resolve package imports. So I'd assume it already emits .dart_tool/package_config.json as part of the dependencies, does it not do that?

Apparently not 🙈 dart-lang/sdk#59637

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants