Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Using vcpkg in the .NET product #311

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,4 +100,5 @@ Use update-index to regenerate it:
| | [Ref returns in C# and IL verification.](proposed/verifiable-ref-returns.md) | |
| | [Swift Interop](proposed/swift-interop.md) | [Andy Gocke](https://github.com/agocke), [Jeremy Koritzinsky](https://github.com/jkoritzinsky) |
| | [Target AVX2 in R2R images](proposed/vector-instruction-set-default.md) | [Richard Lander](https://github.com/richlander) |
| | [Using vcpkg in .NET projects and the VMR](proposed/vcpkg.md) | [Jeremy Koritzinsky](https://github.com/jkoritzinsky) |

91 changes: 91 additions & 0 deletions proposed/vcpkg.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Using vcpkg in .NET projects and the VMR

**Owner** [Jeremy Koritzinsky](https://github.com/jkoritzinsky)

## Scenarios and User Experience

In .NET today, we handle native dependencies and native dependency flow by a few different mechanisms:

- Vendoring dependencies, aka including a copy of the source with our applied patches in the product repository.
- Git submodules pointing to the source repo of the dependency.
- NuGet packages that flow pre-built between our repositories.

For many of our dependencies, these build options work decently well. However, these mechanisms have a few drawbacks:

- Vendoring code increases the size of our repositories and increases build times.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, we have been vendoring fairly small projects. Vendoring does not substantially increase the repo footprint.

Personally, I love the simplicity and flexibility of vendoring for our use cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dotnet/runtime uses vendoring for its native dependencies today, but dotnet/aspnetcore does not (they use submodules), possibly because their dependencies are more optional (Windows-only as the native code is IIS-specific) and they're larger (GoogleTest is pretty chunky).

Our mechanism for vendoring code also requires us to add additional (although I agree small) CMake logic to support our source-build customers opting-out of our copies of libraries and using the system libraries instead.

I believe we could adjust how we vendor our projects to help mitigate the second issue as well as other issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vendoring is not SB friendly in cases where the dependency is already included in a distro. Distro maintainers would prefer our applied patches were upstreamed. This reduces the maintenance overhead for them. From a security perspective there is only one version to maintain.

- Submodules aren't cloned by default and have a worse UX than vendoring code, and cannot support patches.
- NuGet packages are not easy to consume from native CMake projects, so we end up having to wire up MSBuild to consume them and construct paths to pass to CMake.

Additionally, [the VMR design documents on external source](https://github.com/dotnet/arcade/blob/main/Documentation/UnifiedBuild/VMR-Strategy-For-External-Source.md) defines that we shouldn't use submodules directly in our build.

For native dependencies, we could alternatively use Vcpkg, a cross-platform C++ package manager provided by another team at Microsoft.
Vcpkg would make it easier for us to both consume native dependencies in our projects, as many C++ libraries that we may consider using are already available in vcpkg.
Copy link
Member

@jkotas jkotas Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation for this proposal seems to be based on an assumption that we will add a bunch more C++ dependencies to the .NET project. Is that correct? What are those dependencies that you expect to add and why?

Copy link
Member Author

@jkoritzinsky jkoritzinsky Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've spoken before about adding the GSL in the future. In dnmd (which we hope to productize in a few years), we're adding unit tests based on googletest and using microsoft/wil (the "product" code isn't using these dependencies). We've had plans on and off for switching our zlib dependencies to zlib-ng each release, which will require a much more expensive CMake integration than our existing vendoring solutions for zlib and zlib-intel for maximum performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would be useful to list existing dependencies of .NET project to be migrated to use vcpkg as part of this proposal.

It would also make it easier for us to contribute our own native projects, like the nethost library, back to vcpkg in an official manner.

## Requirements

- We must be able to version and patch dependencies ourselves.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need to be able to build with private patches, without disclosing the patches publicly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a point-in-time requirement? Like some security fixes that you want to apply and then publicly disclose later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is primarily for security fixes.

- We must be able to consume dependencies from vcpkg in our projects.
- It must be possible to use vcpkg in offline build scenarios.
- It must be possible to use vcpkg in the VMR builds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add something about how it should be possible (maybe not trivial, but possible) for distro packages to be able to use the native dependency that's packaged in a distro over the dependency specified/vendored in .NET?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a requirement for that. I've already addressed that this is doable later in the proposal without much work.

### Goals

- Enable developers of .NET to consume native dependencies from vcpkg.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that "use tech XYZ" should be the goal for eng infrastructure investments.

The goal for infrastructure investment should be to make the eng infrastructure cheaper, simpler, faster, etc. For example, check https://github.com/dotnet/arcade/blob/05493e05d4bbf262f1be1bd517ac95f5bff3a2ef/Documentation/UnifiedBuild/Overview.md#goals .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rephrase this point. My main idea here was to show how this engineering improvement could assist us in more easily shipping an official nethost vcpkg package (something our customers want and some members of our team unofficially maintain) and provide some product benefit in addition to the infrastructure benefits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could assist us in more easily shipping an official nethost vcpkg package

What makes vcpkg packages official? My understanding is that the vcpkg packages are published by merging a PR to https://github.com/microsoft/vcpkg. Would that step go away somehow?

(In any case, we can reduce the patch set required to publish the package, but that should not require introducing vcpkg official build dependency.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly what makes the packages official, but I know that they explicitly mark unofficial packages as unofficial by putting targets generated by them in the unofficial namespace to ensure they won't clash with official packages. I agree that we can make official vcpkg packages without using vcpkg ourselves, but using vcpkg ourselves makes things a little easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look at what makes a package "official". The "official" vs "unofficial" terminology mainly refers to which CMake targets are available. If a target is exported from CMake by a patch added in the portfile, then it is an "unofficial" target and should be namespaced as such. If the target is exported by the CMake scripts in the source repository, then the targets are considered "official".

I'll remove this goal as it looks like we would just need to upstream the patches to install CMake exports in our build scripts to make them "official"


### Non-Goals

- Replace all existing native dependency mechanisms.
- Initially, using vcpkg from MSBuild is considered out of scope.
- Using vcpkg in .NET 9 is out of scope.

## Stakeholders and Reviewers

- @jkoritzinsky
- @mmitche
- @jkotas
- @AaronRobinsonMSFT
- @wtgodbe

## Design

### Acquiring vcpkg

Arcade should provide functionality in the scripts in `eng/common` to acquire the vcpkg tool.
vcpkg will be able requested by an entry in the global.json file for a repository.
The Arcade build scripts will define which version of vcpkg is used.

As we will be using our own custom registry, Arcade can do a sparse checkout of the vcpkg repository to reduce the amount of data that needs to be downloaded, excluding the default registry.

To support offline builds, the vcpkg repo and a matching version of the [vcpkg-tool repository](https://github.com/microsoft/vcpkg-tool) will be included as a submodule in the source-build-externals repository.

### Custom registry

We can either create a new repository for the registry or use the source-build-externals repository as a vcpkg registry as well. This registry will, similar to the `dotnet-public` NuGet feed, contain all of the public packages that we want to consume from within our projects. However, unlike the `dotnet-public` NuGet feed, we will need to modify the portfiles for the packages that we want to consume to not use online sources in an offline build. This likely means that for each package, we'll need to create a submodule in source-build-externals or another repository for each package that we want to consume.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the following:

How often you see the versions of these packages changing?
You mentioned the importance of patching earlier. Where would that be done if the source-build-externals(SBE) repo were used? Would the patching be done in a dotnet fork of the original source and then SBE would reference the fork?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually pull a new version of our dependencies about once a year, mainly to get the upstream version that has the patches that we made locally and sent to the upstream in the prior release.

For patches, we'd either patch in a dotnet fork (which SBE would then reference) or we'd use patch files which vcpkg can apply as part of the package configuration.


Alternatively, we could build an alternative mechanism in the VMR that pre-clones all dependencies when updating the VMR. This option seems more expensive, and as such we should avoid it if possible.

### Using vcpkg

All repositories that use vcpkg must use the manifest mode of vcpkg to declare their dependencies. Classic mode is not supported.

### Source-build partners and conditionally system dependencies

For some dependencies, we may want to use a system-installed version of the dependency in source-build scenarios.
For these cases, our source-build partners can remove the dependencies from the vcpkg manifest.
Since the CMake integration with vcpkg uses the standard `find_library` and `find_package` CMake commands, the system dependencies will be found instead of the vcpkg dependencies with no change to the CMake source code.

### Flowing native dependencies with binary caching

vcpkg has a binary caching feature that can use NuGet.
We should be able to utilize this feature for flowing native dependencies that are consumed by native code (such as `dotnet/llvm-project` assets) to avoid having to run the build through MSBuild to locate the dependencies or rebuilding the dependencies in each repository. This is not something that needs to be done immediately, but is something we can consider in the future instead of our existing package flow model.

## Q & A

- Why use vcpkg instead of another package manager?
- Vcpkg is already used by many teams at Microsoft, and is already used by many of the dependencies that we may want to consume.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that vcpkg is useful for applications.
 
What are the open source projects similar to .NET runtime that use vcpkg? For example, are there open source projects that use vcpkg and that are included in mainstream Linux distros? How do they deal with the offline build requirements, etc.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there are any projects that use vcpkg and are included in mainstream Linux distros. To my knowledge, most projects with non-test dependencies that are included in mainstream Linux distros have their build-time dependencies available as dependencies in the distro (through -dev packages or the source packages) and they reference them as shared libraries at runtime. Testing dependencies are usually vendored in (LLVM vendors GoogleTest and Google Benchmark).

We may be the first in this space.

The offline scenario is why I was thinking of vcpkg over another package manager. Using vcpkg allows us to use the same CMake whether the package is a system-provided dependency or a vcpkg-provided dependency, enabling our Linux distro partners to easily switch dependencies to system-provided if desired.

- Vcpkg is cross-platform.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this true for all platforms we support today? what about s390x, ppc64le, riscv or other community maintained ports like FreeBSD?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we'd be defining our own registry, we can define custom tuples for the various different platforms we support. I don't see anything in the vcpkg tool repository that wouldn't work on our various target machines, but they also don't document which platforms it works on. This does provide some risk.

Copy link
Member

@akoeplinger akoeplinger Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I'm less concerned about the vcpkg packages themselves since they can be patched (which we'd need to do anyway to support a new platform in case a dependency doesn't support it) but about vcpkg itself.

It looks like it is written in C++ and either downloaded or built as part of the bootstrapping (https://github.com/microsoft/vcpkg/blob/a1a1cbc975abf909a6c8985a6a2b8fe20bbd9bd6/scripts/bootstrap.sh#L196) so that doesn't sound too bad. Though there is a new vcpkg-artifacts which seems to rely on TypeScript/nodejs for the CLI but I'm not sure how that relates to the main vcpkg and what the future plans for that are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, vcpkg-artifacts is used for pulling down the binary cache packages/artifacts. I believe we can avoid using that in the VMR (which is where we need to be able to do offline builds).

- Vcpkg allows CMake consumers to consume dependencies as if they were installed on the machine instead of requiring custom CMake logic to find dependencies. Supporting changing a package between a vcpkg and system dependency would require more work with other package managers.
- What is vcpkg's minimum toolset requirements? Do these conflict with the requirements for .NET?
- Vcpkg requires CMake 3.14 or newer. We require 3.20 or newer, so we should be fine.
- Vcpkg requires a C++ compiler that supports C++17. Although we don't use C++17, all of the compilers we support do support C++17.
Copy link
Member

@huoyaoyuan huoyaoyuan Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds interesting. Is it confirmed that all our target platforms, including non-popular ones like FreeBSD/Tizen/ARMv6, support C++ 17?
I'd like to see the exact compiler versions gathered so that we can confidently use new features.

Loading