Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for public key id checking when adding packages to repos #2954

Closed
wants to merge 1 commit into from

Conversation

sdherr
Copy link
Contributor

@sdherr sdherr commented Feb 10, 2023

closes #2258

The overall structure of this patch is that:

  1. On package ingest, we read the rpm header and store the key id of the public key it was signed by. Since this only happens at package ingest this is a one-time check.
    1. For the package-upload scenario, we already have a local copy of the RPM available to look at, so we just read it's header with a new library that gives us access to signature information (createrepo_c doesn't).
    2. For the sync scenario, I added the parsing into the RpmContentSaver step. I'm not sure if there's a better place.
  2. We then set on the repos a list of key_ids that are acceptable (if we want to enable this feature, if not leave it empty). Any time (content altering or syncing!) after that we attempt to add a package to a repo it'll check that these fields match and reject any change where they don't.

There is a new command added, pulpcore-manager rpm-datarepair 2258 that re-download every rpm out of storage to examine its header and set the pub key id.

There is currently no attempt to ensure that existing repo contents are valid. I'm not sure it would be a good idea to try, it seems like there are a lot of edge cases here and I'm not sure what you'd do on failure.

import shutil
from hashlib import sha256
from pgpy.pgp import PGPSignature
import rpmfile
Copy link
Contributor

@dralley dralley Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would be significant dependencies, I need to see if these would actually be acceptable to release as-is. Filing a createrepo_c as you've done is great, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and I'm absolutely not tied to these deps in particular, these are just what I happened to get it working with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, totally reasonable. Having looked at rpmfile it looks like basically a partial reimplementation of RPM which I'm not thrilled about, I would definitely prefer to use RPM proper since createrepo_c needs it anyway.

I'll ping them to get the ball rolling a bit faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the feedback I got is that pulpcore and the debian plugin use python-gnupg, so it would be best if we can depend on that instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already chatted about this on Matrix, but I'll respond here too for posterity.

There is no equivalent in python-gnupg to the in-memory parsing of the signature packet that pgpy does. Nor does it even give access to the signer key id. However, all python-gnupg is is a wrapper around a subprocess call-out to gpg, so we could just cut out the middleman and do that ourselves if we don't mind the extra overhead of writing the signature to disk and calling gpg in a subprocess. That change would look something like this.

As for dropping the rpmfile dep, that's harder unless createrepo_c adds the python bindings that we're asking for. You can definitely access those headers through the rpm module, if python3-rpm is installed, which is not a given. And since that's a module that wants to be installed as on OS package instead of through pip, requiring it is harder. The best solution is if createrepo makes these headers available since that's already installed, so we'll see what they say.

Copy link
Contributor

@dralley dralley Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked over the code for rpmfile and it looks.. OK. There's nothing terribly objectionable, just that it's overly simple for the number of edge cases that I know exist, but for the sake of a small number of specific tags it should probably be fine.

Copy link
Contributor

@dralley dralley Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have an answer on the package signing questions I have by the end of the week. The people I've been trying to ask are very busy with working on DNF5 but I have a 1hr meeting with them directly soon.

We can also reference the pulp 2 code... I just don't want to express 100% confidence that it was (or remains after X years) correct.

In any case I don't want to hold up this feature any longer, so one way or another we can hopefully merge it soon. If we find issues later we can fix them so long as it's tech-preview.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdherr It looks like pgpy does not support EdDSA. I think that's probably a longer-term issue since RSA is used everywhere... it may eventually become one but I'd rather pgpy be a short-term solution anyway.

@sdherr sdherr force-pushed the check_signer_key_upstream branch from b595f25 to fd47a2b Compare February 13, 2023 22:09
pulp_rpm/app/serializers/package.py Show resolved Hide resolved
if self.allowed_pub_keys:
rejected_nevras = []
for package in Package.objects.filter(pk__in=new_version.added()).iterator():
if str(package.signer_key_id) not in self.allowed_pub_keys:
Copy link
Member

@ipanova ipanova Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen if between repo_version 1 and 2 allowed_pub_keys is changed? repo version 2 will fail to be created because existing content won't be adhering to the allowed list. should this field be immutable then? or maybe just additive i.e only extend the list of keys?
or maybe this verification should be done at sync/upload time of incoming content and not during the whole version finalization
however if we move the verification check at sync/upload time, then we cannot really say for sure what package signatures repo version X contains, it can be a mixture.

Copy link
Member

@ipanova ipanova Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another unfortunate aspect is that we store the allowed_pub-key_list on the repo, and when creating repo version 1 only key A might have been in place, however since things can change repo version 10 can have A,B,C. So we lose the track of what rules were applied when repo version 1 was created because the only data we see is field on the repo

Copy link
Contributor Author

@sdherr sdherr Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing content is not checked, only added packages. This seems like the least surprising way to handle this restriction, but if there are better options I'm all ears. Perhaps the RepositoryVersion should copy and make immutable the list of pubkeys that were allowed when it was created? I don't know what the purpose of that would be though. You can't go back and edit the content of a previous RepoVersion can you? Just create a new one. So the only thing that matters is what the pub key list is now.

Copy link
Contributor Author

@sdherr sdherr Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you're saying about moving the check to upload/sync time. Adding content to a repo and syncing both of course create a new repo version, so this check does currently run at sync/content adding time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@dralley dralley Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that's true. I think createrepo_c re-implements a lot of information about the RPM file structure in order to avoid that dependency. If I'm wrong let me know, because if rpmlib is installed then there are a lot more options here.

It has RPM as a hard dependency. There are only certain bits of functionality that it doesn't rely on librpm for to increase flexibility, but the basic file format processing it does. If you hunt down cr_package_from_file() and trace it backwards you can see that it's relying on librpm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not as obvious to me as I thought it would be how we could take advantage of that to drop the rpmfile dependency, but if you have a suggestion feel free to mention it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same functionality should be available, though given their docs are offline it's probably hard to tell. I'm trying to prod them to fix that currently

rpm-software-management/rpm-web#34

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdherr Could you try out this implementation? rpm-software-management/createrepo_c#346 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works if python3-rpm is installed, yes, but that's not a given.

pulp_rpm/app/tasks/synchronizing.py Show resolved Hide resolved
@sdherr sdherr force-pushed the check_signer_key_upstream branch from fd47a2b to 99414aa Compare February 14, 2023 15:20
@ipanova
Copy link
Member

ipanova commented Feb 14, 2023

@sdherr I gave it a bit more thought. I think the main question to answer is what is our goal of these checks a) I am checking only incoming content is complaint the current rules and I accept that previous content might not be, as a result I can end up distributing repo version that contains a mixture of content or b) I am sure whole repo version is complaint the current rules.

Example1:

  1. repoTest, list_of_keys=["A"]
  2. add rpm foo signed with A, check incoming content, check passed, repo version 1 created
  3. key A got compromised, list_of_keys=["B"]
  4. add rpm bar signed with B, check incoming content, check passed, repo version 2 created
  5. serve repo version 2 containing foo and bar, it contains content signed with compromised key

Example2:

  1. repoTest, list_of_keys=["A"]
  2. add rpm foo signed with A, check overall repo version content, check passed, repo version 1 created
  3. key A got compromised, list_of_keys=["B"] ( maybe even a trigger should be added that would be executed on this field change, this way repo version content will be audited)
  4. add rpm bar signed with B, check overall repo version content, check failed
  5. remove rpm foo, repo version 2 created
  6. add rpm bar signed with B, check overall repo version content, check passed, repo version 3 created
  7. serve repo version 3 containing rpm bar, it contains content adhering to the repo rules

@sdherr
Copy link
Contributor Author

sdherr commented Feb 14, 2023

@ipanova Yes, so in my personal scenario we mostly care about (a), incoming content is valid.

In my scenario the set of people who can add packages to a repo is different from the set of people who can create or edit the repos themselves. And we want to gate the package-adders behind a set of restrictions, one of which is that their packages must be signed appropriately.

As you point out that's a very different scenario from some kind of consistency checking scenario. Like I said in the description:

There is currently no attempt to ensure that existing repo contents are valid. I'm not sure it would be a good idea to try, it seems like there are a lot of edge cases here and I'm not sure what you'd do on failure.

If you were going for (b), what should you do if the repo's key list changes? Remove all content that doesn't match the new policy? Probably not, the repo might not even be dependecy-complete at that point, although if you're dealing with a key-revocation scenario maybe that's what you naively think you want. Reject the key-list update? Better than nothing, but that doesn't seem very useful. Reject all future content updates to the repo until someone comes along and fixes things, like you suggest? Maybe?

It just seems too unclear and like you'd want different behavior in different scenarios / edge cases. But right now, given that this is a net-new feature that does not exist in Pulp at all at this point, I'm happy with the reduced scope of the simpler incoming-content checks. If others want a more complicated and thoroughly-enforced set of Managed Security Policies then they can propose their own changes.

@ipanova
Copy link
Member

ipanova commented Feb 15, 2023

@sdherr @dralley Compliance and trust is a difficult problem indeed. Solution (a) is exactly how it was done in pulp2 and there was a general misperception and expectation that content present in the repo adheres to the allowed keys list, which can be not true as shown in example1. (a) and (b) are conceptually different, as long as we state all the nuances explicitly to the users I am down with it.

@sdherr sdherr force-pushed the check_signer_key_upstream branch from 99414aa to 4954043 Compare March 15, 2023 14:26
for package in packages:
_, pub_key_id = read_crpackage_from_artifact(package._artifacts.first()) # Only one.
if pub_key_id is not None:
print(f"Fixing stored signature for {package.nevra}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, it's not the whole signature just the key ID

"verification. Signature verification requires examination of the actual RPM, so "
"the current Repository 'allowed_pub_keys' setting conflicts with the Remote "
"'policy' setting."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

return rpm.headers.get(header, "")

# What the `rpm -qi` command does. See "--info" definition in /usr/lib/rpm/rpmpopt-*
signature = hdr("dsaheader") or hdr("rsaheader") or hdr("siggpg") or hdr("sigpgp")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

log.info(traceback.format_exc())
return None

signer_key_id = "" # If package is unsigned, store empty str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncertain about this. It makes sense from the perspective of being able to tell "have I processed it or not" but it makes it more difficult to determine whether or not any particular RPM is signed. It's a "two values of null" situation.

@ipanova @ggainey Thoughts?

# through the python bindings. Until that is available we'll have to read the header
# a second time with a utility that actually makes it available.
# https://github.com/rpm-software-management/createrepo_c/issues/346
# TODO: When the above is resolved re-evaluate and potentially drop extra dependencies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdherr You can drop all of this now

@@ -1534,6 +1538,12 @@ def _handle_distribution_tree(declarative_content):
for update_reference in update_references:
update_reference.update_record = update_record
update_references_to_save.append(update_reference)
elif isinstance(declarative_content.content, Package):
artifact = declarative_content.d_artifacts[0] # Packages have 1 artifact
signer_key_id = parse_signer_id(artifact.artifact.file.path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think path will work when Pulp's artifact storage is S3 / Azure. We could use .read() and save it into a temporary file, then parse that, but it would not be terribly efficient. But short term it might be the only way... what we need is a way to parse the downloaded files before they get shipped off from temporary storage to the storage backend.

That is too invasive of a change to handle in this PR though.

The question is whether we want to live with that inefficiency or disable the feature if the user is using a cloud storage backend.

if self.allowed_pub_keys:
rejected_nevras = []
for package in Package.objects.filter(pk__in=new_version.added()).iterator():
if str(package.signer_key_id) not in self.allowed_pub_keys:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line should be

if package.signer_key_id is None or str(package.signer_key_id) not in self.allowed_pub_keys

Otherwise it will try to compare "None" which will still work as expected in practice but isn't the best code

@dralley
Copy link
Contributor

dralley commented Apr 5, 2023

@sdherr Strictly speaking this doesn't close #2258 since this only implements a signing key ID allow list but doesn't perform verification of the signature. Could you file a new issue pertaining to just the key allow list portion and link that instead?

@dralley
Copy link
Contributor

dralley commented Apr 6, 2023

@sdherr Sorry about the delay, but I think we can start moving forwards on this again.

@dralley
Copy link
Contributor

dralley commented Apr 10, 2023

@sdherr Are you still interested?

@sdherr
Copy link
Contributor Author

sdherr commented Apr 10, 2023

@sdherr Are you still interested?

Hey @dralley , I was out for a few days there. I can look at updating this PR soon, I guess the larger question is if this is still something that the larger community would find useful. Neither of us are very enthusiastic about the implementation, and doing it "better" would have to involve a much more complicated change involving public key management in pulpcore and such. I can (and have) solved my own signature-checking needs internally and was just trying to be a good open-source citizen and contribute something back. But if no one will use it then there's no point in saddling pulp_rpm with a sub-optimal implementation.

@dralley
Copy link
Contributor

dralley commented Apr 10, 2023

I think the general idea is good and we have had requests from elsewhere for the feature (https://bugzilla.redhat.com/show_bug.cgi?id=2172980) so I don't think nobody would use it, but the volume isn't super high. If your situation is handled and you're not burning to have it merged, we can close this out and gradually out some of the kinks with e.g. the sync pipeline, then revisit? As long as the changes are kept around somewhere for future reference.

@sdherr
Copy link
Contributor Author

sdherr commented Apr 10, 2023

Okay, barring any outpouring of public support for this change I think I am going to abandon it for now. I'll leave the branch in my fork up. To summarize, the basic problems with "checking signatures when adding RPMs to repos" are:

  1. Actually checking the signatures requires that you have the public key. For a generic pulp solution allowing for a wide variety of use-cases this probably means adding CURDL api/views for public key management in pulpcore, and allowing for many-to-many associations with Repos.
  2. RPM signature checking is complicated, and definitely not something that pulp should re-implement. Instead it should call out to rpmlib, however rpmlib assumes that there is a file on-disk to check, and does not support checking the signature of an in-memory (possibly streaming) file.
  3. Even just checking that the signature's signer_id is the expected value (and not doing full signature verification with the public key) either involves significant new dependencies or writing the file to disk and calling out to other utilities.
  4. Checking signer_id also runs into problems because there is no good "what is the signature of this RPM"-level API in rpmlib. Instead it exposes lower level headers that forces us to make assumptions about signature algorithms and deprecated headers and unenforced behaviors of the rpm-building toolchain that we do not want to have to make.
  5. In all cases this requires that you actually have the rpm, which does not work with deferred syncing.
  6. It's not obvious what the correct thing to do is when matching signature checking with repo syncing. Should a "repo sync" always result in a mirror of the upstream repo? Or should it reject RPMs that are not signed appropriately if that's what the local settings say?

@sdherr sdherr closed this Apr 10, 2023
@dralley
Copy link
Contributor

dralley commented Apr 10, 2023

Good summary. I'll add:

  1. Even in a context where the file is immediately downloaded, it gets shipped off to the storage backend (which may be Azure/S3) fairly quickly, so we need to adjust the sync pipeline so that there is a place to work with the file while it remains in Pulp's local temporary storage.

@knumskull
Copy link

As this i unlikely to be implemented in pulp, but a wanted feature by some industry, what could be an alternative workflow to achieve similar approach?
Something which should be put in front of the pulp sync service?

@ggainey
Copy link
Contributor

ggainey commented Oct 10, 2024

The RPM team had another discussion on this need, and identified even more edge-cases involving "getting content into a repository" that would need to be handled if we want to verify-signatures as-content-appears. Also, verifying-signatures-on-the-fly has a nontrivial impact on, say, sync-performance.

One observation that came up was, "The Thing we're trying to address here, is making sure that clients pulling content do not get rpms that are signed with unaccepted/unknown keys". One way to accomplish this would be to to "get involved" not at sync-time, but between "sync into Pulp" and "distribute content to my users". What is being envisioned is a workflow that goes something like this:

  • Sync from far-away repo, into Pulp. Repo-version is not yet Distributed
  • run a (pulp-provided) tool, that you point at the repo-version, and hand a (list of?) acceptable keys. The tool does the veification, and reports back status.
  • if everything passes, then you can publish and dsitribute the repo-version, confident that it contains only rpms signed with the keys you expect.

How does this sound as a possible workflow?

@dralley
Copy link
Contributor

dralley commented Oct 14, 2024

We also discussed a possible implementation - a very different one from the present implementation which absolutely comes with downsides - which is that we could potentially query the package header for each package. That would remove dependence on fields that are or are not im the metadata, at the cost of many many many more web requests being made and more data being downloaded.

@sdherr
Copy link
Contributor Author

sdherr commented Oct 14, 2024

I think adding an optional validation step before publishing makes sense, and could be useful to enforce other rules as well (rpmlint?) if it were in some way generic and configurable, although signature checking is definitely the big one. We have pretty much decided that's the long-term direction we want to take - a validation / quality check job that runs on pulp-worker - so anything pulp provides in that direction would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

As a user, Pulp is able to verify package signatures, and reject unsigned or invalidly-signed packages
5 participants