Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support precompiled library binaries #116

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

eerii
Copy link

@eerii eerii commented Nov 14, 2024

This is a work in progress and there are still open questions.

This PR attempts to support precompiled binaries for libraries. Projects using system-deps will be able to specify an url1 for downloading and unpacking an archive containing static libraries and pkg-config descriptions. Its paths are added to the PKG_CONFIG_PATH for each library. This makes it possible to use projects with -sys crate dependencies without having to install system dependencies.

Usage example

Before going into the technical explanation, here is a brief example on how the proposed configuration could work. We have a project P that depends on the crate A, a safe wrapper for the bindings of liba, in crate A_sys. Then, A depends on B, similarly a wrapper around B_sys.

flowchart LR
    P --> A
    A --> As{{_A_sys_}}
    A --> B
    B --> Bs{{_B_sys_}}
Loading

To use prebuilt binaries for B_sys, we can add it in the same place we configure the system_deps metadata.

P/Cargo.toml A/Cargo.toml B/Cargo.toml B/sys/Cargo.toml
[package]
name = "P"

[dependencies]
A = { ... }
[package]
name = "A"

[dependencies]
A_sys = { ... }
B = { ... }
[package]
name = "B"

[dependencies]
B_sys = { ... }
[package]
name = "B_sys"

[build-dependencies]
system-deps = "7"

[features]
binary = [ "system-deps/binary" ]

[package.metadata.system-deps.B]
name = "libb"
version = "1.0"
url = "https://download/libb.tar.gz"
checksum = "..."
paths = [ "lib/pkgconfig" ]

When building P, libb.tar.gz will be downloaded. To find libb, system-deps will instruct pkg-config to look first at DOWNLOAD_DIR/B/lib/pkgconfig, and if there is a valid library it will be linked. Users of P don't have to install libb.

Let's say that A_sys also has available binaries. The configuration is the same as before. However, imagine that the A_sys download already includes a version of libb and we want to use that instead. We can make B_sys use it like so:

[package]
name = "A_sys"

[package.metadata.system-deps.A]
name = "liba"
version = "1.0"
url = "https://download/liba.tar.gz"
checksum = "..."
paths = [ "lib/pkgconfig" ]
provides = [ "B" ]

This time, liba.tar.gz will be downloaded and it will be used to link both liba and libb. Note that overriding options is only allowed from downstream crates, the outermost configuration will be used.

Finally, if an user building P wants to use a local copy of the binaries instead of the provided one, or another url, it can overwrite yet again the configuration set by A_sys:

[package]
name = "P"

[package.metadata.system-deps.A]
url = "file:///path/to/liba"

Previous work

Some crates in the rust ecosystem already provide some similar functionality. Two of the most relevant are skia and openssl, with different approaches.

  • Skia uses skia-bindings to compile the source code in the CI and upload it to a release repository, with different versions for certain feature sets. For users, it first checks if there is a package that matches the version and features requested. If there is none, it falls backs to compiling the code locally.
  • OpenSSL has a vendored feature to control if it should use precompiled binaries. By default it queries the system for the needed libraries, but if vendored is enabled it uses openssl-src to download and build the source code.

While these are great implementations, they are very specific to the crates they are in and they need certain prerequisites that may not work well for all libraries. For instance, they are working with a base crate with no other -sys dependencies that may also want binaries.

Requisites

As system-deps is a general library, the proposal for using precompiled binaries needs to be as general as possible. The intended goal is to abstract the difficulty of managing the download and linking but still allow crates to configure when and how their binaries should be used. The only requirement is that they use pkg-config files.

It also needs to allow for multiple crates in the tree to request and link binaries, since many projects have a combination of libraries using system-deps. Additionally, some crates may want to share the same binary archive or may need to override the options of a dependency.

The configuration should be as easy as possible for the end user. Ideally, the sys crates should provide the default binary location and the projects using would be able to use it without further configuration. It needs to be possible to specify different downloads for each target and version of the library.

Extra care needs to be taken to avoid unnecessary recompilation or downloads. Using precompiled binaries is specially useful for less powerful systems since it avoid extra compute work, so it is important to optimize the solution.

Configuration

The main issues when developing this implementation were related to dependency ordering and metadata reading. Cargo is very strict with some of its features, and communicating between crates during compilation is difficult. system-deps runs in the build script of the system libraries, once for each dependency that uses it in the project.

Reuse current system

We need a way of specifying a set of attributes (url, checksum, paths, ...) for each binary archive we want to download. system-deps already does something similar with cargo metadata, with one caveat: It only reads the manifest of the crate that it is being compiled. For this use case this is too limiting, since it would mean that every dependency would have to specify its own binary url and that downstream projects would have no way of overriding this behaviour later. It also doesn't allow sharing binary archives across crates, since the downloads would be ran in parallel.

flowchart LR
    P --> A
    A --> As
    As{{_A_sys_}} --> sda[system_deps]
    sda --Read--> As
    A --> B
    B --> Bs
    Bs{{_B_sys_}} --> sdb[system_deps]
    sdb --Read--> Bs
Loading

Environment variables

Using environment variables is also not enough. While we could ask the user to set overrides for certain crates (SYSTEM_DEPS_BINARY_CRATE_NAME_URL, SYSTEM_DEPS_BINARY_CRATE_NAME_PKG_PATHS, ...), this is too verbose, forces the user to add configuration, and it misses one feature: intermediate dependencies overriding their dependencies. Even if we supported adding environment in the build script (#114), we can't do that in middle crates since they are actually running a different instance of system_deps.

flowchart LR
    P --> A
    A --> As
    As{{_A_sys_}} --> sda[system_deps]
    A --> B
    B --> Bs
    Bs{{_B_sys_}} --> sdb[system_deps]
    subgraph .
    Env(**Env**)
    end
    sda & sdb --Read--> .
    As & P -.Can't interact.-x sdb
Loading

Patch dependency

A previous prototype tried to "inject" a dependency before system-deps ran. This was used as its configuration provider. Since we needed something that ran before every system_deps instance, we added a dependency to system-deps that could be patched by the caller. This worked because it didn't break the dependency build order, but it was very hacky and it required that users added a [patch] pointing to the correct subcrate since that directive has to appear in the top level manifest. Additionally, this didn't allow configuration to come from multiple crates and it was not straightforward to write.

flowchart LR
    P --> A
    A --> As
    As{{_A_sys_}} --> sda[system_deps]
    A --> B
    B --> Bs
    Bs{{_B_sys_}} --> sdb[system_deps]
    
    P -.Patches.-> .
    sda & sdb --Read--> .
    subgraph .
        direction TB
        sdp{{_patched_meta_}} --Replaces--> sdm{{_system_deps_meta_}}
    end
Loading

Read metadata from project root

Finally, we arrived at the current solution. If we manage to obtain the top level project directory (the one that has the outmost Cargo.toml), we can recursively search the dependency tree from there and read the metadata of its crates.

There is an ongoing discussion on how should cargo provide the workspace root of the project (rust-lang/cargo#3946). Other crates such as bevy and toml-conf have a similar use case for it as we do. Even if there is no official method to do so at the moment (which we would really appreciate), we found a reliable alternative, inspired by what toml-conf does.

Build scripts get passed one argument that can be read using std::env::args. This points to the build directory they have assigned in the root target directory. This can be different than reading OUT_DIR, which is a target directory in the folder where the dependency is cloned (/project/dir/target vs .cargo/checkout/dep/target). Because that directory is inside the project, we can run cargo locate-project to get the workspace root, which contains the top level Cargo.toml manifest. This functionality is implemented in meta/build::manifest, more details on the design later.

It works by default in most cases: On different compilation profiles, if the target directory is renamed, using dependencies from crates.io, git and local paths. The only instance in which we found that we can't find it is TARGET_DIR is explicitly set to a folder that is not a subdirectory of the top level project (understandably). We provide the SYSTEM_DEPS_MANIFEST variable that allows to set the Cargo.toml file that will be used. In the rare cases where it fails to locate the project root, it shows the user how to set this variable, which can be as easy as SYSTEM_DEPS_MANIFEST=$(pwd)/Cargo.toml or writing [env] SYSTEM_DEPS_MANIFEST = { value = "Cargo.toml", relative = true } to .cargo/config.toml.

Ideally Rust could provide this path (or at least some way of defining global metadata) so we wouldn't need to do this. However, we feel like this is a good solution that works with minimal user configuration, and the worst case scenario is setting a variable. We could also support alternative ways of providing the binary config, like environment variables as mentioned before.

flowchart LR
    P --> A
    A --> As
    As{{_A_sys_}} --> sda[system_deps]
    A --> B
    B --> Bs
    Bs{{_B_sys_}} --> sdb[system_deps]
    sda & sdb --Read--> sdm{{_system_deps_meta_}}
    sdm -.Finds.-> P
    P -.Read.-> A
    A -.Read.-> B
Loading

Design

The final design for the proposed changes is divided in two modules.

Metadata and binaries

The system-deps-meta crate has two purposes:

First, it's build script finds the root manifest as we discussed before. This needs to happen in a build script because cargo passes different arguments to libraries and build scripts, and the path we need is only passed for the latter. Additionally, it also exports a target directory (defaults to it's OUT_DIR) where the binaries will be stored. This also needs to be defined in the build script because all of the libraries depending on system_deps need to have access to the same binary directory to share binaries, so we need to compile it as a constant.

The library part of the crate uses cargo-metadata to parse the root manifest and recursively search the dependency tree for metadata. It returns a table with all of the metadata values related to system-deps. The merging function needs some improvements, specially to make sure that only downstream crates can overwrite the previously set metadata, and to handle multiple dependents on the same crate.

The good thing is that this crate can be used from other dependencies to also query the workspace metadata without having to write complex logic. For example, we tested it as a method of specifying which plugins to link in gstreamer using the [package.metadata.gstreamer-plugins] manifest key, following the same rules as before.

System deps and pkg config

The system-deps build script (in a previous version this was in another separate crate) handles the downloading, unpacking and linking of the binary archives. It almost exclusively happens during its build script for two reasons: The binary download needs to happen before any crate that uses system_deps gets built, and so that it only happens once.

Finally, in system-deps, when we are building each library, we get the paths to the folders that contain the .pc files from the downloaded files. We modify the environment in the call to pkg_config prepending these paths to the PKG_CONFIG_PATH so that the libraries from the binaries are prioritized in the search. The .pc files must use a relative prefix so that it works inside of the extracted folder, but this is a common option in meson and other build systems.

Build process diagram

Since explaining how the build dependencies work can be a mouthful, I think this diagram can be useful to understand which crates are compiled before which. The order is from right to left. Note that the only guarantee is that dependencies are compiled before dependents, and that the build script goes before the library, but the actual order can vary for each run.

flowchart LR
    P --> A
    A --> As
    As{{_A_sys_}} --> sda[system_deps]
    A --> B
    B --> Bs
    Bs{{_B_sys_}} --> sdb[system_deps]
    sda & sdb --> sdbinb
    sdbinb --> sdm
    sdm --> sdmb
    Bs --> sdm

    subgraph download binaries
    sdbinb{{system_deps}}
    end
    subgraph "read
    metadata"
    sdm[meta]
    end
    subgraph find root
    sdmb{{_meta_}}
    end
Loading

Motivation

We are working towards adding prebuilt binaries for gstreamer-rs. It has multiple system libraries, and it depends on external ones such as glib-sys. Our goal is for the project to provide the official prebuilt binaries so that other crates can use them directly, by only having to enable the binary feature (similar to openssl using vendored). This proposal started as a gstreamer-rs only project, and went through quite a bit of iterations, but we realized some of the ideas were generalizable to other crates using system-deps.

I still need to rebase, clean and document the gstreamer-rs part of the implementation, so it might be messy at the moment, but the development is happening in this branch.

TODO:

  • Respect library version keys as system-deps does.
  • Add readme and information to binary and meta.
  • Workshop naming for crates, features and public API.

Followups

  • Replace the uses of cargo-metadata with a similar implementation based on serde-toml so that deserialization libraries are not duplicated. It would make sense to move the current logic from metadata.rs to meta.
  • Improve caching for reading metadata.
  • Templated URIs.

Footnotes

  1. Even if this mentions "download" and "url", local files or folders are allowed using links that start with file://. We took inspiration from skia for this one.

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

Attention: Patch coverage is 96.02555% with 56 lines in your changes missing coverage. Please review.

Project coverage is 93.78%. Comparing base (38acdbc) to head (0040b68).

Files with missing lines Patch % Lines
meta/src/error.rs 40.00% 30 Missing ⚠️
meta/src/binary.rs 95.13% 9 Missing ⚠️
meta/src/parse.rs 93.39% 7 Missing ⚠️
src/lib.rs 92.04% 7 Missing ⚠️
src/metadata.rs 60.00% 2 Missing ⚠️
src/test_binary.rs 99.76% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #116      +/-   ##
==========================================
+ Coverage   90.91%   93.78%   +2.86%     
==========================================
  Files           3        9       +6     
  Lines        2257     3619    +1362     
==========================================
+ Hits         2052     3394    +1342     
- Misses        205      225      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gdesmott
Copy link
Owner

Thanks for this very detailed proposal @eerii !

As I said I'm all in favor of having this feature as part of system-deps but having no experience with this kind of use cases I'd like to hear the input of my fellow GStreamer developers on the design.

cc @nirbheek @thiblahute @xclaesse @sdroege

@xclaesse
Copy link

I'm not a big fan of downloading prebuilt, how do you know it's built for the right platform? OS, arch, libc, etc...

@xclaesse
Copy link

From a license POV if that tarball contains (L)GPL code you'll also have to provide script to rebuild it.

@eerii
Copy link
Author

eerii commented Nov 15, 2024

I'm not a big fan of downloading prebuilt, how do you know it's built for the right platform

The idea is to allow binaries to be specified using cfg target directives, so each platform can download an appropriate archive. I ommited it from the example for brevity.

[package.metadata.system-deps.A.'cfg(target = "...")']
url = "..."

If binaries are requested and there is not a prebuilt archive available for that target, we can always fall back to the default behaviour of system-deps and emit a warning.

From a license POV if that tarball contains (L)GPL code you'll also have to provide script to rebuild it

That would be something that the projects creating the tarballs would have to include in them, not something system-deps has to manage, right? Or am I missing something.

Supporting source code downloads is easy. However, I worry that there is too much complexity on the compilation part. Projects have many differences on their build systems and prerequisites, so we would have to find a way of accounting for that. I'm all for adding it as an option, but I think having prebuilts can be very useful in some situations, specially for users with low end hardware that can't afford the compilation time.

And using prebuilts would always be an user configuration option in any case.

@xclaesse
Copy link

That would be something that the projects creating the tarballs would have to include in them, not something system-deps has to manage, right? Or am I missing something.

That's indeed the problem of projects using this feature, not system-deps' problem. Maybe worth mentioning in system-deps doc?

The idea is to allow binaries to be specified using cfg target directives, so each platform can download an appropriate archive.

Makes sense, that can at least help building for major platforms that has a prebuilt provided.

Another question: What's the expected file tree contained inside the tarball? How do you typically generate such tarball? In the GStreamer case I imagine that's going to be a sysroot generated by cerbero? If that's the case, I believe you'll have the problem that .pc files it contains won't be relocatable, that means it will have -L flags pointing to /usr instead of where you extracted the tarball. It's been a while I haven't touched cerbero, but IIRC it has a tool to rewrite .pc files to relocate them.

@xclaesse
Copy link

BTW, Meson can now generate relocatable pc files, not sure cerbero uses that feature, and it won't work for non-Meson libraries anyway: https://mesonbuild.com/Pkgconfig-module.html#relocatable-pkgconfig-files.

@eerii
Copy link
Author

eerii commented Nov 15, 2024

Maybe worth mentioning in system-deps doc?

Yes, that is a good idea!

What's the expected file tree contained inside the tarball?

That's what the pkg_paths metadata attribute is for. It is a list of folders inside the tarball in which to look for the .pc files. This way we don't enforce any specific structure on them. For the tests I have been doing with gstreamer I have been using this (the main one and the plugins directory):

pkg_paths = [ "lib/x86_64-linux-gnu/pkgconfig", "lib/x86_64-linux-gnu/gstreamer-1.0/pkgconfig" ]

I believe you'll have the problem that .pc files it contains won't be relocatable

Another thing to make sure to specify in the docs is that this only works with relocatable .pc files, they need to point to relative paths inside the binary. I think cerbero already does this since looking at the tarball I built locally they do have relative paths and I didn't change any settings for that:

# lib/x86_64-linux-gnu/pkgconfig/gstreamer-1.0.pc 
prefix=${pcfiledir}/../../..
datadir=${prefix}/share
includedir=${prefix}/include
libdir=${prefix}/lib/x86_64-linux-gnu
libexecdir=${prefix}/libexec

exec_prefix=${prefix}
toolsdir=${exec_prefix}/bin
pluginsdir=${libdir}/gstreamer-1.0
girdir=${datadir}/gir-1.0
typelibdir=${libdir}/girepository-1.0
pluginscannerdir=${libexecdir}/gstreamer-1.0

Name: gstreamer-1.0
Description: Streaming media framework
Version: 1.25.0.1
Requires: glib-2.0 >=  2.64.0, gobject-2.0, gmodule-no-export-2.0, libunwind
Libs: -L${libdir} -lgstreamer-1.0 -lm -ldl -lnsl
Cflags: -I${includedir}/gstreamer-1.0 -DGST_STATIC_COMPILATION

@xclaesse
Copy link

@eerii ok all good if cerbero uses ${pcfiledir} already, that's what Meson does as well with -Dpkgconfig:relocatable=true. That just need to be documented then :)

@eerii eerii force-pushed the binary branch 3 times, most recently from cbe7515 to 41ecc6c Compare November 27, 2024 14:11
@nirbheek
Copy link

cerbero uses ${pcfiledir} already

Yeah, we use -Dpkgconfig.relocatable=true when building meson recipes, and we modify the pc files in post-install for all other recipes. Relocatable prefixes are a requirement in cerbero.

@nirbheek
Copy link

nirbheek commented Nov 28, 2024

I finally had time to read through the whole proposal. It's good and seems sound. Some comments:

Does it make sense to concatenate pkg_paths lists?

Yes. The default should be @LIBDIR@/pkgconfig, and the metadata can provide more. Note that @LIBDIR@ can vary per-platform, so it must not be hard-coded in the Cargo.toml, like in the examples provided above.

The way this is usually solved, is that the prebuilt prefix ships with its own pkg-config, which has a value hard-coded in it for where the pc files are. This is obviously wasteful when you're dealing with several different platforms, and might not even work when cross-compiling (the bindir will only contain host binaries, not binaries for your build machine).

There are two possible solutions:

  1. Search the prebuilt prefix for .pc files and add all such dirs to the list of pkg_paths
  2. Require prebuilt prefixes to ship with a info.toml file that species this
    • This can be extended for other use-cases later

I don't have a strong preference here.

Does follow or url have priority?

I don't understand this question. "Outermost" values should override "inner" values.

How to handle two crates overwriting the same dependency with conflicting values?

Throw an error and let the user deal with it. It's always better to be strict, because it's easier to allow use-cases that weren't allowed previously, than disallow use-cases that have always worked incorrectly.

@eerii
Copy link
Author

eerii commented Dec 2, 2024

Thank you so much for the thorough comments!

The default should be @libdir@/pkgconfig

That makes sense. What I was thinking was using the same cfg directives that handle downloads for different platforms to specify the library paths for each binary:

[package.metadata.system-deps.A.'cfg(target = "linux_x86...")']
url = "linuxdownload.tar.gz"
pkg_paths = [ "linuxlib/pkg-config" ]

[package.metadata.system-deps.A.'cfg(target = "windows_x86...")']
url = "windowsdownload.tar.gz"
pkg_paths = [ "windowslib/pkg-config" ]

But I agree that this can quickly become too verbose and difficult to handle. Searching the prebuilt for .pc files sounds good, as long as the packages don't contain conflicting files. I think this is a reasonable expectation, if there are no clear use cases that require otherwise.

I don't understand this question. "Outermost" values should override "inner" values.

Sorry, I wasn't very clear with this. But yes, the idea is that outermost values override inner ones.

Throw an error and let the user deal with it. It's always better to be strict, because it's easier to allow use-cases that weren't allowed previously, than disallow use-cases that have always worked incorrectly.

And yes, I have been working on ensuring that this is done correctly and analyzing the different package graph cases and which are conflicting. I'm almost done fixing the implementation to account for this, I'll upload the revised version later today.

@thiblahute
Copy link

thiblahute commented Dec 2, 2024

The default should be @libdir@/pkgconfig

That makes sense. What I was thinking was using the same cfg directives that handle downloads for different platforms to specify the library paths for each binary:

[package.metadata.system-deps.A.'cfg(target = "linux_x86...")']
url = "linuxdownload.tar.gz"
pkg_paths = [ "linuxlib/pkg-config" ]

[package.metadata.system-deps.A.'cfg(target = "windows_x86...")']
url = "windowsdownload.tar.gz"
pkg_paths = [ "windowslib/pkg-config" ]

But I agree that this can quickly become too verbose and difficult to handle. Searching the prebuilt for .pc files sounds good, as long as the packages don't contain conflicting files. I think this is a reasonable expectation, if there are no clear use cases that require otherwise.

I think I prefer your approach of requiring the user to specify the paths instead of trying to guess tbh, we should also probably show that right from the start in the documentation.

@nirbheek
Copy link

nirbheek commented Dec 2, 2024

I think I prefer your approach of requiring the user to specify the paths instead of trying to guess tbh, we should also probably show that right from the start in the documentation.

I think that's a property of the prefix (and hence the tarball). It doesn't make sense to put in the configuration, and it just makes it easy to get it wrong. That's why I suggested that the prefix/tarball could specify it in a config.toml or something, if we don't want to do autodetection.

@nirbheek
Copy link

nirbheek commented Dec 2, 2024

For example, if you want to add support for a new target (say, something based on RISC-V, or NixOS), you don't need to update every -sys crate (and potentially every project that uses that -sys crate). You just build some new binary tarballs and everything works.

@thiblahute
Copy link

thiblahute commented Dec 3, 2024

For example, if you want to add support for a new target (say, something based on RISC-V, or NixOS), you don't need to update every -sys crate (and potentially every project that uses that -sys crate). You just build some new binary tarballs and everything works.

You will have to anyway to add the URI to the new tarball, unless we add some templating logic, which could also make sense.

I think I prefer your approach of requiring the user to specify the paths instead of trying to guess tbh, we should also probably show that right from the start in the documentation.

I think that's a property of the prefix (and hence the tarball). It doesn't make sense to put in the configuration, and it just makes it easy to get it wrong. That's why I suggested that the prefix/tarball could specify it in a config.toml or something, if we don't want to do autodetection.

That approach is fine by me, I do not like autodetection. Also it could be added later fmopv as we already have a working solution.

Also, what is the point of having the .pc files in the tarballs platform dependent? They are never meant to be used on a system wide prefix, but should be consumed by intermediary apps. I have the impression that we could default to lib/pkgconfig and make it all simpler no?

@eerii eerii force-pushed the binary branch 4 times, most recently from 0756f2d to d52c97b Compare December 9, 2024 04:37
@nirbheek
Copy link

nirbheek commented Dec 9, 2024

You will have to anyway to add the URI to the new tarball, unless we add some templating logic, which could also make sense.

I thought that was the plan already, the manylinux URIs look templated :)

Also, what is the point of having the .pc files in the tarballs platform dependent? They are never meant to be used on a system wide prefix, but should be consumed by intermediary apps. I have the impression that we could default to lib/pkgconfig and make it all simpler no?

That's how distros work. Libraries and executables built for a particular distro will look for that distro's' libdir in whatever prefix you use. We tried to not do this in cerbero already, and that failed.

@thiblahute
Copy link

Also, what is the point of having the .pc files in the tarballs platform dependent? They are never meant to be used on a system wide prefix, but should be consumed by intermediary apps. I have the impression that we could default to lib/pkgconfig and make it all simpler no?

That's how distros work. Libraries and executables built for a particular distro will look for that distro's' libdir in whatever prefix you use. We tried to not do this in cerbero already, and that failed.

Interesting, how did that fail?

@nirbheek
Copy link

nirbheek commented Dec 9, 2024

Interesting, how did that fail?

I don't remember anymore. Something around how modules are searched. All I remember is that in cerbero we tried to enforce prefix/lib everywhere and had to revert it.

@eerii eerii force-pushed the binary branch 5 times, most recently from c6e2884 to a3a40b7 Compare December 13, 2024 16:06
@eerii
Copy link
Author

eerii commented Dec 13, 2024

Hi! Update on this, I have been refactoring the proposal, cleaning old changes and improving the interface. I think that it is now at a point where I'm pretty happy with the public API and how the parsing system works. There may be a few details to still work on, but I think it is a good place to start reviewing this.

I improved the error handling and added tests for both the metadata parsing and the binary downloads and filetypes. Since there are complex dependency cases with multiple pacakges, each new test generates the needed Cargo.toml files. This makes it easier to write tests and check a few variations on each.

I updated the PR description, but here is a list of the important changes:

Improvements on metadata parsing

The naive previous implementation only worked for very simple package trees and it didn't fail properly on conflicts and sometimes it produced wrong results. I have reworked this part from scratch and it should hopefully be more robust and able to handle more generic cases. It is also reusable outside of system-deps (for example, to read gstreamer-rs plugins).

First, the base package(s) is read from the root manifest. For each one, we read the metadata and descend into its (unvisited) dependencies, building an "upside down" dependency tree containing the metadata. Starting from the leaves we reduce the value applying the correct overwrites and checking for conflicts. The final result is a single map with the calculated values.

It also has many little improvements, such as type checking for values, extending lists instead of overwriting them, using cfg to gate configurations behind a target and more. Additionally, as @nirbheek suggested, paths can be read from a info.toml at the base level of the downloaded archive:

# info.toml
paths = [ "lib/pkgconfig" ]

Reworked how prebuilt binaries are shared:

The follows key has been changed to provides. Instead of placing it on each package that wants to reuse binaries from another, the original package can have a list of what libraries are contained in the binary it is downloading. This is more concise and avoids having to modify other manifests.

# New version
[package.metadata.system-deps.gstreamer_1_0]
url = "..."
checksum = "..."
provides = [ "glib_2_0", "gobject_2_0", "gio_2_0", "gst*" ]

# Before
[package.metadata.system-deps.gstreamer_1_0]
url = "..."
checksum = "..."

[package.metadata.system-deps.glib_2_0]
follows = "gstreamer_1_0"

[package.metadata.system-deps.gobject_2_0]
follows = "gstreamer_1_0"

# Many more lines ...

Internally, it is actually adding the follows key to the packages when building the final metadata tree (this is done to reuse the same overwriting and merging rules and to ensure that there are no conflicts), so the previous version still technically works.

Additionally, wilcards are supported for provided packages. This is particularly useful for gstreamer where there are too many libraries to reliably list. It is important to note that wildcards are a bit special and they never overwrite a previous package url, as opposed to writing the full name in provides. There are a few technical reasons why this was kept like that, mainly code complexity and performance overhead in metadata resolution. I think they are useful but we should clearly explain how they work in the documentation.

Downloading and decompressing

These actions should now be faster and more reliable. They now happen in parallel and only if there wasn't a valid version. Checksums are properly calculated and checked.

The library for handling the downloads has been changed from reqwest to attohttpc, since it is more lightweight and it doesn't depend on an async runtime.

Reorganize the proposal structure

Instead of having three crates and everything all over the place, now the logic is on the system-deps-meta crate (gated by features) so that it can be easily reused by other projects.

  • Build system-deps-meta: Finds project root.
  • Lib system-deps-meta: Utils for parsing metadata and downloading binaries.
  • Build system-deps: Binaries are downloaded and paths are serialized (before any dependent crate is built).
  • Lib system-deps: Probe libraries, set pkg-config paths, ...

What's next?

The current version now finally covers the use cases indicated in the proposal, works for projects with complex dependency trees, and it has tests and error handling. There are a few details missing that are important before merging:

  • Although it is somewhat complete, finish the bits of documentation left and add an example to the README.
  • system-deps allows to specify multiple versions of a package with [system-deps.name.version]. At the moment the url can only be specified at the top level, but adding version support shouldn't be too hard.
  • Up until now I have been calling these feature "binary" but I'm not too sold on the name. Other options are "prebuilt", "precompiled", "download", "vendored". What do you think we should refer to this feature as? I think it is a good idea to do a pass rewriting the docs and naming of everything to be more cohesive.

There are other features that I think are more suited to a followup PR, since this is already big enough and they are not essential:

  • Unify system-deps-meta and metadata.rs. This is not hard, we could probably do it now, but it will be tricky to do without adding dependencies. I was more lenient on system-deps-meta since it is opt-in and it will not affect existing users, but cargo-metadata uses serde-json so we are duplicating the serialization libraries. So I think this will require more careful planning.
  • Templated URIs and other similar features.
  • Caching is not a big issue since read_metadata is only called once on the build script, which all crates then share. However, I'm sure some improvements can be made to make this even more performant.

@eerii eerii changed the title Draft: Support precompiled library binaries Support precompiled library binaries Dec 13, 2024
@eerii eerii marked this pull request as ready for review December 13, 2024 16:32
@gdesmott
Copy link
Owner

gdesmott commented Dec 13, 2024

Thanks for all your work on this!

I'll let @thiblahute and @nirbheek review the design and user API as they know much more than me about this use case.

Btw, feel free to clean up the branch (squash commits, etc) as it will be required before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants