Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native_assets_cli] Input.outputDirectorySystemShared #1962

Open
mkustermann opened this issue Jan 31, 2025 · 6 comments
Open

[native_assets_cli] Input.outputDirectorySystemShared #1962

mkustermann opened this issue Jan 31, 2025 · 6 comments

Comments

@mkustermann
Copy link
Member

Currently the build/link hooks are given two output directories:

  • outDir: Unique to the configuration and package
  • outDirShared: Unique to the package

Both of those are located in the .dart_tool folder that the developer is working on. We can leave it like this, but I just wanted to file this issue so we could discuss it a bit:

Once our ecosystem uses hooks pervasively in many packages, we really care about it being fast: Running & Building apps should be fast.

It does raise a few questions:

  • What if someone upgrades Flutter SDK or Dart SDK: It will cause rerunning of all hooks, but the hooks are invoked with the same directories, making it an incremental build (afaik).

    This is probably what we want, as the particular version of the Flutter/Dart SDK used to invoke the hook shouldn't affect the hook output much (if at all), so we really want incremental build here.

  • What if someone upgrades dependencies: It will cause rerunning of those dependencies, but the hooks are invoked with the same directories, making it an incremental build (afaik)

    => There's a question about expectations here: Will hook authors of packages write the hooks in a way that allows them to have "left over stuff" from different (most likely older) version of the package. So e.g. user has package:crypto dependency 1.1.0, runs pub upgrade and gets 2.0.0. Will the hook from 2.0.0 be disturbed by finding old build results from 1.1.0?
    => One could consider making the package directory also include the version number, but not sure if this makes sense.
    => The hook author could of course make a sub directory with the version number to avoid this.

  • What if someone builds app A and then app B: This will currently cause a re-build from scratch as the two apps will have different .dart_tool folders which don't share hook outputs. Not only will the user pay the price in terms of waiting time (for build hooks to run) but also in disc space.

    => Let's assume developers will have reasonably many .dart_tool folders on their machine (at least I do).
    => Let's further assume that very common packages will start using hooks (imagine a future package:io replacing dart:io, package:http, ...)
    => Let's further assume those common packages's build output will be MBs in size.

    (When actually building an app (e.g. dart build, flutter build) one then copies those binaries again around, ending up with more duplicates)

That will probably lead to a lot of wasted disc space (and possibly waiting time, network bandwith used, ...).

So I'm wondering if we should have a config.outDirSystemShared which is a system-wide per-package directory where the package's hook can store things. Those directories would be shared across all uses of the hook. So a package:io would store it's downloaded shared libraries into one place on the system and all hook invocations of package:io versions in various .dart_tool app folders will all just emit assets from this shared directory (which may have version number of package:io in it).

Another angle to view this is: The particular SDK (or other user-machine specifics) in question shouldn't matter for building artifacts of a package. One may want to build artifacts

  • as part of releasing a package (e.g. as git workflow hook)
  • on the package hosting server
  • on the user machine when a package is put into the pub cache

All of these workflows would have one thing in common:

  • They wouldn't be dependent on the particular .dart_tool/package_config.json of a particular app (which they shouldn't depend on anyway)
  • They may build the package's assets with a separate version resolution

=> We probably would do the build always this way if pub had build_dependencies.

So given that, I'm wondering how much it makes sense to have the directories per-app and/or whether we should have an additional config.outDirSystemShared (e.g. ~/.cache/dart/native_assets/<pkg-version>/).

Of course hook authors can do that themselves (just write to some place in the home directory), but:

  • if we ever did want to have sandboxing, we'd want the hooks to only write into places we tell it to (i.e. the output directories we give it)
  • different packages may put it into different places, making it non-uniform

The arguments whether to have it system-wide or not is somewhat similar to whether one may want to have a system-wide ~/.pub-cache or one per app

/cc @dcharkes @jonasfj

@dcharkes
Copy link
Collaborator

Good idea input.outDirSystemShared. 👍

Both of those are located in the .dart_tool folder that the developer is working on.

Just to clarify, this is the workspace .dart_tool (which can be a dart tool directory in a parent directory).

  • => There's a question about expectations here: Will hook authors of packages write the hooks in a way that allows them to have "left over stuff" from different (most likely older) version of the package. So e.g. user has package:crypto dependency 1.1.0, runs pub upgrade and gets 2.0.0. Will the hook from 2.0.0 be disturbed by finding old build results from 1.1.0?

If you upgrade package:cryptofrom 1.1.0 to 1.1.1 you might not even have a new binary. So if you have a hook that downloads and it checks the hash of the file it should reuse the cache. If the hash differs, it should redownload. I think this is working as intended.

Another angle to view this is: The particular SDK (or other user-machine specifics) in question shouldn't matter for building artifacts of a package.

I believe that to be the default case, but not the general case.

Examples for exceptions:

  • package:intl4x wants a user-define to have fetch mode or build local mode
  • Some dependency of yours has a native lib, you somehow have a crash in that lib, the owner responds with could you please add some-package-user-define-debug-mode=true in the user-defines.

If we design the whole system with that user-machine specifics shouldn't matter for building artifacts of a package, then also package resolution and the resulting package_config.json should be disallowed to have an influence. In that case we should have a separate pub spec for hooks, and that pubspec must pin all dependencies.

=> We probably would do the build always this way if pub had build_dependencies.

Exactly.

But there are considerable downsides to that approach as well. The code in your hook/ is not running against the same package resolution as the code in lib/. Your hook should then also not import from lib/. @jonasfj Made a good argument yesterday that you want a single package resolution that is used for all the Dart files in a workspace.

I think it might be good enough if hook authors decide whether their assets depend on the resolution in package_config.json with the specific versions of their dependencies or not. Instead of us enforcing it.

So I'm wondering if we should have a config.outDirSystemShared which is a system-wide per-package directory where the package's hook can store things. Those directories would be shared across all uses of the hook. So a package:io would store it's downloaded shared libraries into one place on the system and all hook invocations of package:io versions in various .dart_tool app folders will all just emit assets from this shared directory (which may have version number of package:io in it).

👍

config.outDirSystemShared ~/.cache/dart/hooks/build/<pkg>/<version>/output

It should document it's shared across the system, but not across versions.
(I think for WIP, instead of pub-published versions, it would probably work if we simply use the whatever version is currently the wip version in the pubspec. For WIP cache invalidation should be working as intended.)

What do we do with user-defines though? #39

  • If there are more than 0 user-defines we invalidate the global cache? (That would be undesirable.)
  • Or if there are more than 0 user-defines, we don't provide the outDirSystemShared?
  • Or do we want to put the user-defines as part of the cache directory? (That would likely lead to many cache directories again, so probably no.)
  • Other options?

(Off-topic: This probably also means we want the user-defines to have a target package and only make them available in those build and link hooks, so that we don't invalidate the cache for all other hooks.)

And with input.outputDirectory, input.outputDirectoryShared and input.outputDirectorySystemShared we might make config.outputDirectory({...}). (Then we could for example return the non-shared output directory if there are any user-defines.)

  • The output directory is shared yes/no across different asset types (data/code) and different (code-configs). (I don't know if we still need this though.)
  • The output directory is shared yes/no outside the workspace.

Note that link hooks would never be shared. They always depend on the entry point with tree shaking enabled. So they shouldn't even be shared in the workspace (#1956).

@dcharkes dcharkes changed the title Discussion around output directories [native_assets_cli] Input.outputDirectorySystemShared Jan 31, 2025
@dcharkes dcharkes added this to the Native Assets v1.0 milestone Jan 31, 2025
@dcharkes
Copy link
Collaborator

dcharkes commented Feb 3, 2025

Notes from discussion with @mkustermann @mosuem @liamappelbe and @HosseinYousefi:

  • We also have version sharing/isolation to think about.
    • This brings us to 8 options with the cross product.
  • We can put this in the protocol or let the hook make subdirs itself for:
    • config (it can do hashing itself)
    • versions (we'd need to add the current version of the package to the protocol though!)
    • But it's a foot gun if we don't provide an API that lets users do this easily.
  • We lose the ability to remove .dart_tool to reset native assets.
  • We'd still do the input.json and output.json workspace, config, and version isolated.
    • So a build hook would always be reinvoked if anything changes; and not be reinvoked on an identical config in an identical workspace with identical versions.
    • And a hook will always be reinvoked in a new project. (The hook invoker doesn't know whether the hook only depends on the system shared output.) The hook should internally cache to not redownload.
    • User defines ([native_assets_cli] User defines #39) will input that specific input.json.
      • (Hook writers should not output anything to the system-wide shared directories based on user-defines).

Dart API

To provide a nicer API for users, we could provide optional named arguments for the various types of sharing. These arguments can also be the place for documentation. A sketch:

  /// The directory in which output and intermediate artifacts can be placed.
  ///
  /// If [workspaceIsolated], the directory will be scoped to the pub workspace.
  /// If not [workspaceIsolated], the directory is system global. Only use the
  /// system global output directory for assets that are _only_ dependent on the
  /// hook itself. For example: Downloading prebuilt assets from the web.
  ///
  /// If [configIsolated], the directory is unique per [config]. Use this to
  /// avoid having config-dependent artifacts conflicting with each other.
  ///
  /// If [versionIsolated], the directory is unique per version. For packages
  /// with assets changing less frequent than the version, consider not using
  /// version isolation and not using workspace isolation. This will prevent
  /// building or redownloading assets multiple times.
  ///
  /// The contents of this directory will not be modified by anything else than
  /// the hook itself.
  ///
  /// The invoker of the the hook will ensure concurrent invocations wait on
  /// each other.
  Uri getOutputDirectory({
    bool workspaceIsolated = true,
    bool versionIsolated = true,
    bool configIsolated = true,
  })

Protocol

We can chose various levels of support in the protocol:

  • (not an option) only the workspaceIsolated=true option (This is currently outputDirectoryShared)
  • workspaceIsolated=[true,false] options
    • subdirectories for configIsolated=[true,false] and versionIsolated=[true,false] can be created inside the hook.
    • SDKs would not have the capability to prune old verions. ([native_assets_builder] Purge old directories #1353)
    • SDKs would not have the capability to prune outputs for old configs (e.g. not used in a month).
  • workspaceIsolated=[true,false] * configIsolated=[true,false] * versionIsolated=[true,false] options
    • SDKs can prune old directories ([native_assets_builder] Purge old directories #1353)
    • Requires a whopping 8 directories in the protocol.
    • If hooks want to cache across different user-defines, they'd still make their own subdirectories.
      • It's unlikely that we'd want to prune old-user defines.

Related design questions

  • @mkustermann suggested that fetch hooks ([native_assets_cli] fetch and offline mode #1620) should not have a Config. This would mean that if we want to share directories between fetch and build we could do so as long as configIsolated=false. Gradle fetches jars that are multi-architecture, so they'd use configIsolated=false. So that nicely plays together.

@mkustermann
Copy link
Member Author

mkustermann commented Feb 4, 2025

Maybe it would be best to make the protocol have two directories, e.g.:

  • ~/.cache/dart/build_hooks/<package-name>-<version> (maybe in pub cache?)
  • .dart_tool/build_hooks/<package-name>-<version> (this is debatable: once one upgrades, the old ones aren't really needed anymore, we could consider making this non-version specific, i.e. only ~/.dart_tool/build_hooks/<package-name> as it's now)

and allow a hook to get a sub-directory of that that is config-specific via an API like @dcharkes suggested.

This would

  • ensure that on version upgrades the hooks re-run on clean directories and don't have to deal about being run in a dirty directory from a hook of an earlier package version
    => Though this is only true for "immutable" package versions. If one has a path dependency / dependency override then the package version may not change but the package itself.
  • may allow pruning in the future: if the system knows all used .dart_tool/package_config.jsons on the system (i.e. back links) then it can know from there which package versions are used and delete all other directories
  • mirror how pub stores packages atm
  • it gives the hook finer granularity how to cache things across configs (maybe it doesn't use the whole config hash but only architecture, or similar things - maybe it has to include user defines in the hash or not, ...)

The downside is that it may store some binaries repeatedly, which is a bit like a pub-cache getting large with old unused versions. If we want to solve this problem, we probably want the same pruning solution that works for pub packages and their build artifacts.

@dcharkes
Copy link
Collaborator

dcharkes commented Feb 4, 2025

(this is debatable: once one upgrades, the old ones aren't really needed anymore, we could consider making this non-version specific, i.e. only ~/.dart_tool/build_hooks/ as it's now)

Agreed workspaceIsolated=true + versionIsolated=true would basically immediately lead to the other versions being never used and they could/should be pruned immediately.

It would be very weird to have the global cache versionIsolated and the workspace cache be version shared. I think that would lead to a lot of confusion.

  • it gives the hook finer granularity how to cache things across configs (maybe it doesn't use the whole config hash but only architecture, or similar things - maybe it has to include user defines in the hash or not, ...)

I like this argument. Putting configIsolated=true in the protocol is too rigid. And the only thing we lose by not putting it in the config is being able to prune individual configurations. That most likely doesn't matter, it's more important we can prune old versions.

The downside is that it may store some binaries repeatedly, which is a bit like a pub-cache getting large with old unused versions. If we want to solve this problem, we probably want the same pruning solution that works for pub packages and their build artifacts.

Additionally the downside is that it will download binaries repeatedly.

Let's investigate these downsides:

We already download and store multiple versions of the Dart package already, so we'd be following that behavior. The question is whether the binary artifacts are significantly larger in size than the Dart sources.

$ du -h -d 0 webcrypto-0.5.7/
 16M	webcrypto-0.5.7/

(or 3 MB if third_party/ sources were excluded, which they would if a published version of the package would only support downloading prebuilt binaries. However, if there would be a webcrypto-built-from-source user define, they would need to be included.)

$ du -h .dart_tool/webcrypto/libwebcrypto.dylib 
1.5M	.dart_tool/webcrypto/libwebcrypto.dylib

So for sample size N=1 the dylibs are the same size as the package (with cross compiling you'd get N * 1.5M leading to the same order of magnitude storage space).

So @mkustermann's suggestion is:

  Uri getOutputDirectory({
    bool workspaceIsolated = true,
    bool configIsolated = true,
    // bool versionIsolated = true, // always true
  })

Protocol:

  • workspaceIsolated=[true,false] * versionIsolated=true options

Some more gotcha's to think about when having version isolation:

  • If you write your package in a way where you use workspaceIsolate=false and someone uses a git dependency on you hash abc and version 1.2.3-wip, and on the same machine uses a git dependency hash def with the same version 1.2.3-wip, it would overrride the global cache in ~/.cache/dart/build_hooks/<package-name>/1.2.3-wip.
    • pub deals with this by having a directory per git hash.
    • Ditto for you developing that package locally with workspaceIsolate=false (either the root package or a path dependency or a package in the same workspace).
      • But we wouldn't have a git hash (and wouldn't want a new directory all the time).
      • Recognizing that a package is hosted/ dependency, git/ dependency or a path dependency (workspace local package or package root) could be done by relying on dart pub deps --json. We already do this for the dependency graph anyway.

I believe versionIsolated=true for a git dependency should then be the git hash.
And, for a root it should just be shared across all roots. It would only potentially pollute the global cache for developers working on that package on their machine.

@dcharkes
Copy link
Collaborator

dcharkes commented Feb 4, 2025

Notes from a discussion with @jonasfj

workspace isolated / global cache

  • The only use case for which we want a global cache is downloading prebuilt assets.
  • The only thing we should allow in a global cache is downloading prebuilt assets with known hashes. Running a CMake build should not be something we would allow hooks to do in the global cache dir.
  • Downloading binaries from GitHub artifacts at scale could be an issue. (And what if GitHub goes down.) We should strongly consider hosting these artifacts on pub.
  • We should probably be designing something more specific/declarative for this use case.
    • This could then for example integrated in pub. Pub could download them all at the same time. Download them together with the .tar for the package etc.
    • The prebuilt assets could be passed in as Input to the hooks.
    • Hooks would only be able to write to the output directory in .dart_tool.
    • (Still a lot of open questions on how this would work ultimately.)
  • We don't have "hooks pervasively in many packages" yet
    • We can make this addition to the protocol after v1.0.

cc @mkustermann Thoughts?

version isolated / version shared

  • "version isolated" in the workspace would mean deleting the outdir before reinvoking the hook. But hooks should already be written in a way that doesn't require an empty dir in the dev workflow on their own package.
  • And version isolated in a global cache is a necessity. But if we let pub download artifacts, "version isolated" is not a property of an output directory for hook writers, it just the file paths that pub provides to the hook input.

config isolated

  • Indeed little value for having configIsolated=true in the protocol.
  • We can still have it on the Dart helper package.
  • Implementation considerations
    • This would lead us to deprecating outputDirectory and only having outputDirectoryShared.
    • We need to migrate the helper packages such as native_toolchain_c to use outputDirectoryShared from the json before we can remove outputDirectory

@jonasfj
Copy link
Member

jonasfj commented Feb 6, 2025

=> Let's assume developers will have reasonably many .dart_tool folders on their machine (at least I do).
=> Let's further assume that very common packages will start using hooks (imagine a future package:io replacing dart:io, package:http, ...)
=> Let's further assume those common packages's build output will be MBs in size.

In this scenario, I think we should consider adding support for such binary artifacts through pub.

I think @mkustermann suggested we do this a while back (something like python wheels), and I pushed back.
I'll change my position to: I think that we should do this, but not until we actually need it.

And we shouldn't block native assets 1.0 on this.


=> We probably would do the build always this way if pub had build_dependencies.

I'm not entirely sure what we think build_dependencies would be.

But I'll say a few general things here:

  • Having all dependencies locked in pubspec.lock brings a lot of value.
    • We could have multiple resolutions inside a pubspec.lock, but there are downsides.
  • Having a single version of each package in a workspace makes many things simpler to reason about.
    • If we want multiple versions of the same package, I think we should ensure that such packages can't be "observed" (whatever that means) by the same third-package (whatever that is).
  • Global caches are extremely difficult to do, we should be extremely careful.
    • We don't want users growing accustomed to running rm -rf ~/.cache/.
    • There is a LOT of value in rm -rf .dart_tool resetting all build state!
  • I think hooks will read package_config.json use code from my dependencies to build assets :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants