Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query facts from root during resolution #263

Closed
konstin opened this issue Oct 13, 2024 · 10 comments
Closed

Query facts from root during resolution #263

konstin opened this issue Oct 13, 2024 · 10 comments
Labels
enhancement New feature or request

Comments

@konstin
Copy link
Member

konstin commented Oct 13, 2024

Background When uv queries multiple versions of a package in a row, it will start to prefetch many versions. In this, I tried resolving https://github.com/getsentry/sentry/blob/51281a6abd8ff4a93d2cebc04e1d5fc7aa9c4c11/requirements-base.txt, which has sentry-kafka-schemas>=0.1.50. We try a lot of versions (until eventually realizing we went down the wrong path), but we're also trying versions <0.1.50 since for the motivating example (boto/botocore/urllib3), the partial solution would forbid prefetching.

It would be great to have an API to query unchangeable facts about package version, i.e. incompatibilities with root, such as sentry-kafka-schemas>=0.1.50 being always true, to avoid doing the wrong batch prefetching. We can subsequently apply this to our other prefetching, too.

@konstin konstin added the enhancement New feature or request label Oct 13, 2024
@mpizenberg
Copy link
Member

mpizenberg commented Oct 13, 2024

Oh I see I think. Basically preloading the resolution with all the root dependencies as incompatibilities. WAIT, isn’t that supposed to be already?

@konstin
Copy link
Member Author

konstin commented Oct 13, 2024

For this specific case, I could query the root requirements from our preprocessing. But PubGrub learns more facts during resolution, so it would be great if I could ask the state of what we currently know for certain; it would certainly be the cleaner solution than doing this without pubgrub.

Say we start with foo==1, bar>=2,<5 and some other requirements. From foo=1, we learn bar>=3. We know this an incompatibility with root due to root -> foo==1 -> bar>=3 => root -> bar>=3. We selected some other packages, then we selected a few versions of bar, and we find them conflicting with the partial solution. At this point, I'd like to ask PubGrub "what do we know about bar definitely" and get bar>=3,<5. With this information, I can then prefetch only in this range and speedup our search for a matching bar version in the remaining space.

@mpizenberg
Copy link
Member

mpizenberg commented Oct 13, 2024

Right, so what we know for sure, at any given time, is the set of incompatibilities inside internal::core::State

incompatibilities: Map<DP::P, Vec<IncompDpId<DP>>>,

As far as I remember, we haven’t yet worked on optimizing that set of incompatibilities (except removing duplicates maybe?) so these are basically unstructured info.

The accumulated structured info, solved per package, is actually inside the partial solution, and more precisely, inside the PackageAssignments where we keep the intersection of all assignements.

struct PackageAssignments<P: Package, VS: VersionSet, M: Eq + Clone + Debug + Display> {

So I think what you are interested in an assignments_intersection state that we could know for certain, with only the facts from the accumulated incompatibilities until that time.

@mpizenberg
Copy link
Member

The only assignments that you can know for sure are those gathered from either the root, or another package where there is no other choice anymore, for example packages picked with exact versions, or that are definitely needed from the root, and for which there is only 1 version choice left. I’m wondering if there could be a way to snapshot somehow these states, and/or if they could just be derived from the reduction work still needed on the set of incompatibilities.

@mpizenberg
Copy link
Member

mpizenberg commented Oct 13, 2024

Actually, I had forgotten about it because it was framed as an error message improvements, but Jacob did improve the merging of incompats in #163, at least in one case.
So I guess you can look at merged_incompatibilities instead of raw incompatibilities.
I’m still not sure I fully understand your problem, so I’ll wait for what Jacob has to say.

@Eh2406
Copy link
Member

Eh2406 commented Oct 14, 2024

Most of what I'm about to say you already know, but I'm talking it out to clarify my own thoughts.

I would love to provide better APIs for querying the incompatibility/partial_solution data during resolution. Along with other heuristics about how resolution is going. (In my case I have packages that represent "features" that I know a priori are going to have singleton requirements on the package they augment, so I would like choose_version straight to the only version that will work.)

All Incompatibilities are unchangeable facts in some literal but unhelpful way. Given that your writing your own resolution loop you can look directly at incompatibilities. Unfortunately, there only indexed by package and most of them are only conditionally relevant. The unchangeable fact that d depends on f or that three versions of q are unavailable are probably not helpful.

PackageAssignments may be more helpful. It is a list of all facts that are relevant, given the decisions that are made so far. You want the facts that are relevant independent of the decisions that have been made so far. Answering the question of which ones are relevant universally exactly correctly is (I think) isomorphic to actually solving the resolution problem. So we can only efficiently answer the question "which ones have we proven to be relevant ..." Which should be the ones that are in decision level 0 (do not depend on any decisions, like the root being required and versions being unavailable) and decision level 1 (the roots direct dependencies and transitive dependencies proven to be reachable from the root). The individual assignments should be sorted by decision level. There may be some corner cases when they're not, but that can be fixed with #257

@konstin
Copy link
Member Author

konstin commented Oct 14, 2024

I'm looking for querying only the facts that don't depend on the decisions made so far. The background are the following two cases:

We have a pathological but very common case with urllib3, boto3 and botocore. We select a version of urllib3, then have to go through a lot of version of boto3 to find one that is compatible with that version of urllib3. We speed this up from 45s to 2s (cold resolve) by batch prefetching boto3 (astral-sh/uv#2452). Each version of boto3 depends on one version of botocore, so we also need to prefetch botocore, or we have all those versions of boto3, but spend 25s waiting on all the botocore versions (astral-sh/uv#2452, fig. 3). But for any given boto3, only one botocore version remains allowed, we only have compatibility with one botocore version so we need to prefetch incompatible botocore versions.

The other case is sentry: We have sentry-kafka-schemas>=0.1.50 given. We need to prefetch sentry-kafka-schemas (in this case due to an assignment of python-rapidjson we will eventually reject, but still), and due to botocore, we prefetch even below 0.1.50, which is bad.

Instead, I want to ask what we know for sure: For botocore, it's Range::full() (we aren't seeing enough of the graph to know if we need it at all), for sentry-kafka-schemas, it's sentry-kafka-schemas>=0.1.50.

I'm trying to not touch incompatibilities directly, i see this as an implementation detail, it's easy to misuse it and would like to make it private again in our fork; Hence i'm asking if we can make an API specific for these kinds of prefetching asks.

@Eh2406
Copy link
Member

Eh2406 commented Oct 15, 2024

I think adding something to PartialSolution like:

    /// Retrieve intersection of terms related to package that will not chage.
    // It is pub for UV
    pub fn unchanging_term_intersection_for_package(
        &self,
        package: &DP::P,
    ) -> Option<&Term<DP::VS>> {
        let pa = self.package_assignments.get(package)?;

        let idx_newer = pa
            .dated_derivations
            .as_slice()
            .partition_point(|dd| dd.decision_level <= DecisionLevel(1));
        let idx = idx_newer.checked_sub(1)?;
        Some(&pa.dated_derivations[idx].accumulated_intersection)
    }

which should be the intersection of all of the incompatibilities that have been proven to be unchangeable. (As I mentioned above, the "has been proven" is doing a lot of work here.) I would also experiment with putting that on top of #257 which is more aggressive about declaring things unchangeable.

konstin added a commit to astral-sh/pubgrub that referenced this issue Oct 16, 2024
When prefetching, we want to avoid prefetching packages we will certainly reject. `PartialSolution::unchanging_term_for_package` returns the terms from root that a certain independent from the current partial solution. See pubgrub-rs#263 (comment)
@konstin
Copy link
Member Author

konstin commented Oct 16, 2024

Thank you that's perfect!

@konstin
Copy link
Member Author

konstin commented Oct 17, 2024

Started using this in astral-sh#32 and astral-sh/uv#8246 🎉

@konstin konstin closed this as completed Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants