Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PULP-276] Add support for prns #3864

Merged
merged 2 commits into from
Feb 5, 2025
Merged

Conversation

pedro-psb
Copy link
Member

@pedro-psb pedro-psb commented Jan 21, 2025

Notes:

  • The copy API used to check if objects belonged to the current domain. For PRN references, we'll have to do it in the task context to avoid response timeouts.

@github-actions github-actions bot added multi-commit Added when a PR consists of more than one commit no-changelog no-issue labels Jan 21, 2025
@pedro-psb pedro-psb changed the title Add support for prns [PULP-276] Add support for prns Jan 23, 2025
@pedro-psb pedro-psb force-pushed the add-support-for-prns branch 2 times, most recently from f0697ee to 3e948d0 Compare January 23, 2025 18:38
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied from the PR adding PRN to pulpcore: pulp/pulpcore#5813.
It did catch the DistributionTree case, where it did not had prn specified.

@pedro-psb pedro-psb force-pushed the add-support-for-prns branch from 3e948d0 to 0664071 Compare January 23, 2025 18:45
@pedro-psb pedro-psb linked an issue Jan 24, 2025 that may be closed by this pull request
@pedro-psb
Copy link
Member Author

I've ran a few experiments after on-demand syncing >10,000 package from centos and here are some conclusions:

  1. The distinct query suggested by @dralley is better! [0]
  2. The extract_pk performs very poorly with pulp_hrefs because of the use of django's resolve. [1]

[0] Query comparision w/ list of 10,000 packages prns

In [54]: timeit(lambda: Content.objects.filter(pk__in=(extract_pk(n) for n in prns)).values("pulp_domain").distinct(), number=10)
Out[55]: 0.17054506699787453

In [56]: timeit(lambda: Content.objects.filter(pk__in=(extract_pk(n) for n in prns),pulp_domain=domain).count(), number=10)
Out[57]: 0.6148349700015387

[1] Very bad performance with extract_pk from pulp_href

In [34]: timeit(lambda: [extract_pk(n) for n in prns], number=10)
Out[34]: 0.023607832001289353

In [35]: timeit(lambda: [extract_pk(n) for n in hrefs], number=10)
Out[35]: 22.53728126399801

In [36]: len(prns)
Out[36]: 10000

In [37]: len(hrefs)
Out[37]: 10000

@dralley
Copy link
Contributor

dralley commented Jan 27, 2025

Great!

If we can do all validation in the view layer with clever queries, that would be greatly preferable to deferring it to the task.

If we can reimplement extract_pk such that it doesn't use Django's resolve(), but still do the validation we want to do, that would be ideal.

@pedro-psb pedro-psb force-pushed the add-support-for-prns branch 7 times, most recently from 34fb058 to 2d58627 Compare January 31, 2025 19:18
@pedro-psb pedro-psb marked this pull request as ready for review January 31, 2025 19:18
@pedro-psb pedro-psb marked this pull request as draft January 31, 2025 21:20
@pedro-psb pedro-psb force-pushed the add-support-for-prns branch 2 times, most recently from adff5a2 to 6c4fdad Compare January 31, 2025 21:23
@pedro-psb pedro-psb marked this pull request as ready for review January 31, 2025 21:38
@pedro-psb
Copy link
Member Author

This is ready to review, but I'm not sure how to handle lowerbounds incompatibility.
I'm expecting that when we declare compatibility with pulpcore 3.70 our requirements will be >=3.70, <3.85.
If not, some conditionals will be required to handle it.

help_text=_(
dedent(
"""\
A JSON document describing sources, destinations, and content to be copied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content to be copied into the given destinations from the given sources.


def get_prn(uri):
"""Utility to get prn without having to setup django.
TODO: This is a Copy-paste from pulpcore. Make it public there.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the TODO

.values("repository__pulp_domain")
.get(pk=resource_pk)["repository__pulp_domain"]
)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least this function should have a comment stating that it shouldn't be used on anything but RpmRepository and RepositoryVersion types. But it would also be good to throw a ValueError if it's not one of those types.

Copy link
Contributor

@dralley dralley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved apart from the last request

* Add cross_domain validation for PRNs on copy API serializer. The new
  validation doesn't capture the first href/prn that failed anymore,
  as it relies on querying distinct domains in the bulk of references.

Fixes: pulp#3853

fixup: review changes

fixup: review changes on class type validation
@pedro-psb pedro-psb force-pushed the add-support-for-prns branch from 88b1a10 to d738251 Compare February 4, 2025 21:01
@dralley dralley merged commit 6267099 into pulp:main Feb 5, 2025
10 of 12 checks passed
@pedro-psb pedro-psb deleted the add-support-for-prns branch February 5, 2025 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-commit Added when a PR consists of more than one commit no-changelog no-issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Advanced copy API needs PRN support
2 participants