Allow for public key id checking when adding packages to repos #2954

sdherr · 2023-02-10T21:59:28Z

The overall structure of this patch is that:

On package ingest, we read the rpm header and store the key id of the public key it was signed by. Since this only happens at package ingest this is a one-time check.
1. For the package-upload scenario, we already have a local copy of the RPM available to look at, so we just read it's header with a new library that gives us access to signature information (createrepo_c doesn't).
2. For the sync scenario, I added the parsing into the RpmContentSaver step. I'm not sure if there's a better place.
We then set on the repos a list of key_ids that are acceptable (if we want to enable this feature, if not leave it empty). Any time (content altering or syncing!) after that we attempt to add a package to a repo it'll check that these fields match and reject any change where they don't.

There is a new command added, pulpcore-manager rpm-datarepair 2258 that re-download every rpm out of storage to examine its header and set the pub key id.

There is currently no attempt to ensure that existing repo contents are valid. I'm not sure it would be a good idea to try, it seems like there are a lot of edge cases here and I'm not sure what you'd do on failure.

pulp_rpm/app/management/commands/rpm-datarepair.py

pulp_rpm/app/models/repository.py

dralley · 2023-02-13T05:54:45Z

pulp_rpm/app/shared_utils.py

 import shutil
 from hashlib import sha256
+from pgpy.pgp import PGPSignature
+import rpmfile


These would be significant dependencies, I need to see if these would actually be acceptable to release as-is. Filing a createrepo_c as you've done is great, though.

Agreed, and I'm absolutely not tied to these deps in particular, these are just what I happened to get it working with.

Right, totally reasonable. Having looked at rpmfile it looks like basically a partial reimplementation of RPM which I'm not thrilled about, I would definitely prefer to use RPM proper since createrepo_c needs it anyway.

I'll ping them to get the ball rolling a bit faster.

So the feedback I got is that pulpcore and the debian plugin use python-gnupg, so it would be best if we can depend on that instead.

We already chatted about this on Matrix, but I'll respond here too for posterity.

There is no equivalent in python-gnupg to the in-memory parsing of the signature packet that pgpy does. Nor does it even give access to the signer key id. However, all python-gnupg is is a wrapper around a subprocess call-out to gpg, so we could just cut out the middleman and do that ourselves if we don't mind the extra overhead of writing the signature to disk and calling gpg in a subprocess. That change would look something like this.

As for dropping the rpmfile dep, that's harder unless createrepo_c adds the python bindings that we're asking for. You can definitely access those headers through the rpm module, if python3-rpm is installed, which is not a given. And since that's a module that wants to be installed as on OS package instead of through pip, requiring it is harder. The best solution is if createrepo makes these headers available since that's already installed, so we'll see what they say.

I've looked over the code for rpmfile and it looks.. OK. There's nothing terribly objectionable, just that it's overly simple for the number of edge cases that I know exist, but for the sake of a small number of specific tags it should probably be fine.

I should have an answer on the package signing questions I have by the end of the week. The people I've been trying to ask are very busy with working on DNF5 but I have a 1hr meeting with them directly soon.

We can also reference the pulp 2 code... I just don't want to express 100% confidence that it was (or remains after X years) correct.

In any case I don't want to hold up this feature any longer, so one way or another we can hopefully merge it soon. If we find issues later we can fix them so long as it's tech-preview.

@sdherr It looks like pgpy does not support EdDSA. I think that's probably a longer-term issue since RSA is used everywhere... it may eventually become one but I'd rather pgpy be a short-term solution anyway.

pulp_rpm/app/serializers/package.py

ipanova · 2023-02-14T11:51:50Z

pulp_rpm/app/models/repository.py

+        if self.allowed_pub_keys:
+            rejected_nevras = []
+            for package in Package.objects.filter(pk__in=new_version.added()).iterator():
+                if str(package.signer_key_id) not in self.allowed_pub_keys:


what will happen if between repo_version 1 and 2 allowed_pub_keys is changed? repo version 2 will fail to be created because existing content won't be adhering to the allowed list. should this field be immutable then? or maybe just additive i.e only extend the list of keys?
or maybe this verification should be done at sync/upload time of incoming content and not during the whole version finalization
however if we move the verification check at sync/upload time, then we cannot really say for sure what package signatures repo version X contains, it can be a mixture.

another unfortunate aspect is that we store the allowed_pub-key_list on the repo, and when creating repo version 1 only key A might have been in place, however since things can change repo version 10 can have A,B,C. So we lose the track of what rules were applied when repo version 1 was created because the only data we see is field on the repo

Existing content is not checked, only added packages. This seems like the least surprising way to handle this restriction, but if there are better options I'm all ears. Perhaps the RepositoryVersion should copy and make immutable the list of pubkeys that were allowed when it was created? I don't know what the purpose of that would be though. You can't go back and edit the content of a previous RepoVersion can you? Just create a new one. So the only thing that matters is what the pub key list is now.

I don't understand what you're saying about moving the check to upload/sync time. Adding content to a repo and syncing both of course create a new repo version, so this check does currently run at sync/content adding time.

here is a check added to ansible at upload time https://github.com/pulp/pulp_ansible/blob/5ae04f630a5c1ed679f7395323dcf0cf94656c97/pulp_ansible/app/tasks/signature.py#L52
and for the sync i was thinking to right away raise error here https://github.com/pulp/pulp_rpm/pull/2954/files#r1105711127

I don't believe that's true. I think createrepo_c re-implements a lot of information about the RPM file structure in order to avoid that dependency. If I'm wrong let me know, because if rpmlib is installed then there are a lot more options here.

It has RPM as a hard dependency. There are only certain bits of functionality that it doesn't rely on librpm for to increase flexibility, but the basic file format processing it does. If you hunt down cr_package_from_file() and trace it backwards you can see that it's relying on librpm.

It's not as obvious to me as I thought it would be how we could take advantage of that to drop the rpmfile dependency, but if you have a suggestion feel free to mention it.

The same functionality should be available, though given their docs are offline it's probably hard to tell. I'm trying to prod them to fix that currently

rpm-software-management/rpm-web#34

@sdherr Could you try out this implementation? rpm-software-management/createrepo_c#346 (comment)

That works if python3-rpm is installed, yes, but that's not a given.

pulp_rpm/app/tasks/synchronizing.py

ipanova · 2023-02-14T17:18:42Z

@sdherr I gave it a bit more thought. I think the main question to answer is what is our goal of these checks a) I am checking only incoming content is complaint the current rules and I accept that previous content might not be, as a result I can end up distributing repo version that contains a mixture of content or b) I am sure whole repo version is complaint the current rules.

Example1:

repoTest, list_of_keys=["A"]
add rpm foo signed with A, check incoming content, check passed, repo version 1 created
key A got compromised, list_of_keys=["B"]
add rpm bar signed with B, check incoming content, check passed, repo version 2 created
serve repo version 2 containing foo and bar, it contains content signed with compromised key

Example2:

repoTest, list_of_keys=["A"]
add rpm foo signed with A, check overall repo version content, check passed, repo version 1 created
key A got compromised, list_of_keys=["B"] ( maybe even a trigger should be added that would be executed on this field change, this way repo version content will be audited)
add rpm bar signed with B, check overall repo version content, check failed
remove rpm foo, repo version 2 created
add rpm bar signed with B, check overall repo version content, check passed, repo version 3 created
serve repo version 3 containing rpm bar, it contains content adhering to the repo rules

sdherr · 2023-02-14T19:50:49Z

@ipanova Yes, so in my personal scenario we mostly care about (a), incoming content is valid.

In my scenario the set of people who can add packages to a repo is different from the set of people who can create or edit the repos themselves. And we want to gate the package-adders behind a set of restrictions, one of which is that their packages must be signed appropriately.

As you point out that's a very different scenario from some kind of consistency checking scenario. Like I said in the description:

There is currently no attempt to ensure that existing repo contents are valid. I'm not sure it would be a good idea to try, it seems like there are a lot of edge cases here and I'm not sure what you'd do on failure.

If you were going for (b), what should you do if the repo's key list changes? Remove all content that doesn't match the new policy? Probably not, the repo might not even be dependecy-complete at that point, although if you're dealing with a key-revocation scenario maybe that's what you naively think you want. Reject the key-list update? Better than nothing, but that doesn't seem very useful. Reject all future content updates to the repo until someone comes along and fixes things, like you suggest? Maybe?

It just seems too unclear and like you'd want different behavior in different scenarios / edge cases. But right now, given that this is a net-new feature that does not exist in Pulp at all at this point, I'm happy with the reduced scope of the simpler incoming-content checks. If others want a more complicated and thoroughly-enforced set of Managed Security Policies then they can propose their own changes.

ipanova · 2023-02-15T15:27:29Z

@sdherr @dralley Compliance and trust is a difficult problem indeed. Solution (a) is exactly how it was done in pulp2 and there was a general misperception and expectation that content present in the repo adheres to the allowed keys list, which can be not true as shown in example1. (a) and (b) are conceptually different, as long as we state all the nuances explicitly to the users I am down with it.

closes pulp#2258

dralley · 2023-04-04T02:53:37Z

pulp_rpm/app/management/commands/rpm-datarepair.py

+        for package in packages:
+            _, pub_key_id = read_crpackage_from_artifact(package._artifacts.first())  # Only one.
+            if pub_key_id is not None:
+                print(f"Fixing stored signature for {package.nevra}")


Nitpick, it's not the whole signature just the key ID

dralley · 2023-04-04T21:38:31Z

pulp_rpm/app/viewsets/repository.py

+                "verification. Signature verification requires examination of the actual RPM, so "
+                "the current Repository 'allowed_pub_keys' setting conflicts with the Remote "
+                "'policy' setting."
+            )


dralley · 2023-04-04T21:44:05Z

pulp_rpm/app/shared_utils.py

+                return rpm.headers.get(header, "")
+
+            # What the `rpm -qi` command does. See "--info" definition in /usr/lib/rpm/rpmpopt-*
+            signature = hdr("dsaheader") or hdr("rsaheader") or hdr("siggpg") or hdr("sigpgp")


dralley · 2023-04-04T21:46:38Z

pulp_rpm/app/shared_utils.py

+        log.info(traceback.format_exc())
+        return None
+
+    signer_key_id = ""  # If package is unsigned, store empty str


Uncertain about this. It makes sense from the perspective of being able to tell "have I processed it or not" but it makes it more difficult to determine whether or not any particular RPM is signed. It's a "two values of null" situation.

@ipanova @ggainey Thoughts?

dralley · 2023-04-04T21:47:51Z

pulp_rpm/app/shared_utils.py

+    # through the python bindings. Until that is available we'll have to read the header
+    # a second time with a utility that actually makes it available.
+    # https://github.com/rpm-software-management/createrepo_c/issues/346
+    # TODO: When the above is resolved re-evaluate and potentially drop extra dependencies.


@sdherr You can drop all of this now

dralley · 2023-04-05T00:40:01Z

pulp_rpm/app/tasks/synchronizing.py

@@ -1534,6 +1538,12 @@ def _handle_distribution_tree(declarative_content):
                for update_reference in update_references:
                    update_reference.update_record = update_record
                    update_references_to_save.append(update_reference)
+            elif isinstance(declarative_content.content, Package):
+                artifact = declarative_content.d_artifacts[0]  # Packages have 1 artifact
+                signer_key_id = parse_signer_id(artifact.artifact.file.path)


I don't think path will work when Pulp's artifact storage is S3 / Azure. We could use .read() and save it into a temporary file, then parse that, but it would not be terribly efficient. But short term it might be the only way... what we need is a way to parse the downloaded files before they get shipped off from temporary storage to the storage backend.

That is too invasive of a change to handle in this PR though.

The question is whether we want to live with that inefficiency or disable the feature if the user is using a cloud storage backend.

dralley · 2023-04-05T00:52:49Z

pulp_rpm/app/models/repository.py

+        if self.allowed_pub_keys:
+            rejected_nevras = []
+            for package in Package.objects.filter(pk__in=new_version.added()).iterator():
+                if str(package.signer_key_id) not in self.allowed_pub_keys:


I think this line should be

if package.signer_key_id is None or str(package.signer_key_id) not in self.allowed_pub_keys

Otherwise it will try to compare "None" which will still work as expected in practice but isn't the best code

dralley · 2023-04-05T00:53:56Z

@sdherr Strictly speaking this doesn't close #2258 since this only implements a signing key ID allow list but doesn't perform verification of the signature. Could you file a new issue pertaining to just the key allow list portion and link that instead?

dralley · 2023-04-06T20:41:02Z

@sdherr Sorry about the delay, but I think we can start moving forwards on this again.

dralley · 2023-04-10T13:28:37Z

@sdherr Are you still interested?

sdherr · 2023-04-10T13:39:10Z

@sdherr Are you still interested?

Hey @dralley , I was out for a few days there. I can look at updating this PR soon, I guess the larger question is if this is still something that the larger community would find useful. Neither of us are very enthusiastic about the implementation, and doing it "better" would have to involve a much more complicated change involving public key management in pulpcore and such. I can (and have) solved my own signature-checking needs internally and was just trying to be a good open-source citizen and contribute something back. But if no one will use it then there's no point in saddling pulp_rpm with a sub-optimal implementation.

dralley · 2023-04-10T13:47:51Z

I think the general idea is good and we have had requests from elsewhere for the feature (https://bugzilla.redhat.com/show_bug.cgi?id=2172980) so I don't think nobody would use it, but the volume isn't super high. If your situation is handled and you're not burning to have it merged, we can close this out and gradually out some of the kinks with e.g. the sync pipeline, then revisit? As long as the changes are kept around somewhere for future reference.

sdherr · 2023-04-10T14:38:22Z

Okay, barring any outpouring of public support for this change I think I am going to abandon it for now. I'll leave the branch in my fork up. To summarize, the basic problems with "checking signatures when adding RPMs to repos" are:

Actually checking the signatures requires that you have the public key. For a generic pulp solution allowing for a wide variety of use-cases this probably means adding CURDL api/views for public key management in pulpcore, and allowing for many-to-many associations with Repos.
RPM signature checking is complicated, and definitely not something that pulp should re-implement. Instead it should call out to rpmlib, however rpmlib assumes that there is a file on-disk to check, and does not support checking the signature of an in-memory (possibly streaming) file.
Even just checking that the signature's signer_id is the expected value (and not doing full signature verification with the public key) either involves significant new dependencies or writing the file to disk and calling out to other utilities.
Checking signer_id also runs into problems because there is no good "what is the signature of this RPM"-level API in rpmlib. Instead it exposes lower level headers that forces us to make assumptions about signature algorithms and deprecated headers and unenforced behaviors of the rpm-building toolchain that we do not want to have to make.
In all cases this requires that you actually have the rpm, which does not work with deferred syncing.
It's not obvious what the correct thing to do is when matching signature checking with repo syncing. Should a "repo sync" always result in a mirror of the upstream repo? Or should it reject RPMs that are not signed appropriately if that's what the local settings say?

dralley · 2023-04-10T15:09:00Z

Good summary. I'll add:

Even in a context where the file is immediately downloaded, it gets shipped off to the storage backend (which may be Azure/S3) fairly quickly, so we need to adjust the sync pipeline so that there is a place to work with the file while it remains in Pulp's local temporary storage.

knumskull · 2024-09-25T07:05:51Z

As this i unlikely to be implemented in pulp, but a wanted feature by some industry, what could be an alternative workflow to achieve similar approach?
Something which should be put in front of the pulp sync service?

ggainey · 2024-10-10T17:50:26Z

The RPM team had another discussion on this need, and identified even more edge-cases involving "getting content into a repository" that would need to be handled if we want to verify-signatures as-content-appears. Also, verifying-signatures-on-the-fly has a nontrivial impact on, say, sync-performance.

One observation that came up was, "The Thing we're trying to address here, is making sure that clients pulling content do not get rpms that are signed with unaccepted/unknown keys". One way to accomplish this would be to to "get involved" not at sync-time, but between "sync into Pulp" and "distribute content to my users". What is being envisioned is a workflow that goes something like this:

Sync from far-away repo, into Pulp. Repo-version is not yet Distributed
run a (pulp-provided) tool, that you point at the repo-version, and hand a (list of?) acceptable keys. The tool does the veification, and reports back status.
if everything passes, then you can publish and dsitribute the repo-version, confident that it contains only rpms signed with the keys you expect.

How does this sound as a possible workflow?

dralley · 2024-10-14T03:52:57Z

We also discussed a possible implementation - a very different one from the present implementation which absolutely comes with downsides - which is that we could potentially query the package header for each package. That would remove dependence on fields that are or are not im the metadata, at the cost of many many many more web requests being made and more data being downloaded.

sdherr · 2024-10-14T14:15:56Z

I think adding an optional validation step before publishing makes sense, and could be useful to enforce other rules as well (rpmlint?) if it were in some way generic and configurable, although signature checking is definitely the big one. We have pretty much decided that's the long-term direction we want to take - a validation / quality check job that runs on pulp-worker - so anything pulp provides in that direction would be helpful.

dralley reviewed Feb 13, 2023

View reviewed changes

pulp_rpm/app/management/commands/rpm-datarepair.py Show resolved Hide resolved

dralley reviewed Feb 13, 2023

View reviewed changes

pulp_rpm/app/models/repository.py Outdated Show resolved Hide resolved

dralley reviewed Feb 13, 2023

View reviewed changes

pulp_rpm/app/models/repository.py Outdated Show resolved Hide resolved

dralley reviewed Feb 13, 2023

View reviewed changes

sdherr force-pushed the check_signer_key_upstream branch from b595f25 to fd47a2b Compare February 13, 2023 22:09

ipanova reviewed Feb 14, 2023

View reviewed changes

sdherr force-pushed the check_signer_key_upstream branch from fd47a2b to 99414aa Compare February 14, 2023 15:20

Allow for public key id checking when adding packages to repos

4954043

closes pulp#2258

sdherr force-pushed the check_signer_key_upstream branch from 99414aa to 4954043 Compare March 15, 2023 14:26

dralley reviewed Apr 4, 2023

View reviewed changes

dralley reviewed Apr 5, 2023

View reviewed changes

sdherr closed this Apr 10, 2023

pedro-psb mentioned this pull request Jun 17, 2024

Provide basic gpg public key management pulp/pulpcore#3024

Closed

Allow for public key id checking when adding packages to repos #2954

Allow for public key id checking when adding packages to repos #2954

Conversation

sdherr commented Feb 10, 2023

dralley Feb 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dralley Apr 3, 2023 • edited Loading

Choose a reason for hiding this comment

dralley Apr 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova Feb 14, 2023 • edited Loading

Choose a reason for hiding this comment

ipanova Feb 14, 2023 • edited Loading

Choose a reason for hiding this comment

sdherr Feb 14, 2023 • edited Loading

Choose a reason for hiding this comment

sdherr Feb 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dralley Feb 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova commented Feb 14, 2023 • edited Loading

sdherr commented Feb 14, 2023 • edited Loading

ipanova commented Feb 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dralley commented Apr 5, 2023

dralley commented Apr 6, 2023

dralley commented Apr 10, 2023

sdherr commented Apr 10, 2023 • edited Loading

dralley commented Apr 10, 2023

sdherr commented Apr 10, 2023 • edited Loading

dralley commented Apr 10, 2023

knumskull commented Sep 25, 2024

ggainey commented Oct 10, 2024

dralley commented Oct 14, 2024

sdherr commented Oct 14, 2024

dralley Feb 13, 2023 •

edited

Loading

dralley Apr 3, 2023 •

edited

Loading

dralley Apr 3, 2023 •

edited

Loading

ipanova Feb 14, 2023 •

edited

Loading

ipanova Feb 14, 2023 •

edited

Loading

sdherr Feb 14, 2023 •

edited

Loading

sdherr Feb 14, 2023 •

edited

Loading

dralley Feb 15, 2023 •

edited

Loading

ipanova commented Feb 14, 2023 •

edited

Loading

sdherr commented Feb 14, 2023 •

edited

Loading

sdherr commented Apr 10, 2023 •

edited

Loading

sdherr commented Apr 10, 2023 •

edited

Loading