-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for public key id checking when adding packages to repos #2954
Conversation
import shutil | ||
from hashlib import sha256 | ||
from pgpy.pgp import PGPSignature | ||
import rpmfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These would be significant dependencies, I need to see if these would actually be acceptable to release as-is. Filing a createrepo_c as you've done is great, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, and I'm absolutely not tied to these deps in particular, these are just what I happened to get it working with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, totally reasonable. Having looked at rpmfile
it looks like basically a partial reimplementation of RPM which I'm not thrilled about, I would definitely prefer to use RPM proper since createrepo_c needs it anyway.
I'll ping them to get the ball rolling a bit faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the feedback I got is that pulpcore and the debian plugin use python-gnupg
, so it would be best if we can depend on that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already chatted about this on Matrix, but I'll respond here too for posterity.
There is no equivalent in python-gnupg
to the in-memory parsing of the signature packet that pgpy
does. Nor does it even give access to the signer key id. However, all python-gnupg
is is a wrapper around a subprocess call-out to gpg
, so we could just cut out the middleman and do that ourselves if we don't mind the extra overhead of writing the signature to disk and calling gpg in a subprocess. That change would look something like this.
As for dropping the rpmfile
dep, that's harder unless createrepo_c
adds the python bindings that we're asking for. You can definitely access those headers through the rpm
module, if python3-rpm
is installed, which is not a given. And since that's a module that wants to be installed as on OS package instead of through pip, requiring it is harder. The best solution is if createrepo
makes these headers available since that's already installed, so we'll see what they say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked over the code for rpmfile
and it looks.. OK. There's nothing terribly objectionable, just that it's overly simple for the number of edge cases that I know exist, but for the sake of a small number of specific tags it should probably be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have an answer on the package signing questions I have by the end of the week. The people I've been trying to ask are very busy with working on DNF5 but I have a 1hr meeting with them directly soon.
We can also reference the pulp 2 code... I just don't want to express 100% confidence that it was (or remains after X years) correct.
In any case I don't want to hold up this feature any longer, so one way or another we can hopefully merge it soon. If we find issues later we can fix them so long as it's tech-preview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sdherr It looks like pgpy
does not support EdDSA
. I think that's probably a longer-term issue since RSA is used everywhere... it may eventually become one but I'd rather pgpy
be a short-term solution anyway.
b595f25
to
fd47a2b
Compare
if self.allowed_pub_keys: | ||
rejected_nevras = [] | ||
for package in Package.objects.filter(pk__in=new_version.added()).iterator(): | ||
if str(package.signer_key_id) not in self.allowed_pub_keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will happen if between repo_version 1 and 2 allowed_pub_keys is changed? repo version 2 will fail to be created because existing content won't be adhering to the allowed list. should this field be immutable then? or maybe just additive i.e only extend the list of keys?
or maybe this verification should be done at sync/upload time of incoming content and not during the whole version finalization
however if we move the verification check at sync/upload time, then we cannot really say for sure what package signatures repo version X contains, it can be a mixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another unfortunate aspect is that we store the allowed_pub-key_list on the repo, and when creating repo version 1 only key A might have been in place, however since things can change repo version 10 can have A,B,C. So we lose the track of what rules were applied when repo version 1 was created because the only data we see is field on the repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing content is not checked, only added packages. This seems like the least surprising way to handle this restriction, but if there are better options I'm all ears. Perhaps the RepositoryVersion
should copy and make immutable the list of pubkeys that were allowed when it was created? I don't know what the purpose of that would be though. You can't go back and edit the content of a previous RepoVersion can you? Just create a new one. So the only thing that matters is what the pub key list is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you're saying about moving the check to upload/sync time. Adding content to a repo and syncing both of course create a new repo version, so this check does currently run at sync/content adding time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is a check added to ansible at upload time https://github.com/pulp/pulp_ansible/blob/5ae04f630a5c1ed679f7395323dcf0cf94656c97/pulp_ansible/app/tasks/signature.py#L52
and for the sync i was thinking to right away raise error here https://github.com/pulp/pulp_rpm/pull/2954/files#r1105711127
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that's true. I think createrepo_c re-implements a lot of information about the RPM file structure in order to avoid that dependency. If I'm wrong let me know, because if rpmlib is installed then there are a lot more options here.
It has RPM as a hard dependency. There are only certain bits of functionality that it doesn't rely on librpm for to increase flexibility, but the basic file format processing it does. If you hunt down cr_package_from_file()
and trace it backwards you can see that it's relying on librpm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not as obvious to me as I thought it would be how we could take advantage of that to drop the rpmfile
dependency, but if you have a suggestion feel free to mention it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same functionality should be available, though given their docs are offline it's probably hard to tell. I'm trying to prod them to fix that currently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sdherr Could you try out this implementation? rpm-software-management/createrepo_c#346 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works if python3-rpm
is installed, yes, but that's not a given.
fd47a2b
to
99414aa
Compare
@sdherr I gave it a bit more thought. I think the main question to answer is what is our goal of these checks a) I am checking only incoming content is complaint the current rules and I accept that previous content might not be, as a result I can end up distributing repo version that contains a mixture of content or b) I am sure whole repo version is complaint the current rules. Example1:
Example2:
|
@ipanova Yes, so in my personal scenario we mostly care about (a), incoming content is valid. In my scenario the set of people who can add packages to a repo is different from the set of people who can create or edit the repos themselves. And we want to gate the package-adders behind a set of restrictions, one of which is that their packages must be signed appropriately. As you point out that's a very different scenario from some kind of consistency checking scenario. Like I said in the description:
If you were going for (b), what should you do if the repo's key list changes? Remove all content that doesn't match the new policy? Probably not, the repo might not even be dependecy-complete at that point, although if you're dealing with a key-revocation scenario maybe that's what you naively think you want. Reject the key-list update? Better than nothing, but that doesn't seem very useful. Reject all future content updates to the repo until someone comes along and fixes things, like you suggest? Maybe? It just seems too unclear and like you'd want different behavior in different scenarios / edge cases. But right now, given that this is a net-new feature that does not exist in Pulp at all at this point, I'm happy with the reduced scope of the simpler incoming-content checks. If others want a more complicated and thoroughly-enforced set of Managed Security Policies then they can propose their own changes. |
@sdherr @dralley Compliance and trust is a difficult problem indeed. Solution (a) is exactly how it was done in pulp2 and there was a general misperception and expectation that content present in the repo adheres to the allowed keys list, which can be not true as shown in example1. (a) and (b) are conceptually different, as long as we state all the nuances explicitly to the users I am down with it. |
99414aa
to
4954043
Compare
for package in packages: | ||
_, pub_key_id = read_crpackage_from_artifact(package._artifacts.first()) # Only one. | ||
if pub_key_id is not None: | ||
print(f"Fixing stored signature for {package.nevra}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, it's not the whole signature just the key ID
"verification. Signature verification requires examination of the actual RPM, so " | ||
"the current Repository 'allowed_pub_keys' setting conflicts with the Remote " | ||
"'policy' setting." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
return rpm.headers.get(header, "") | ||
|
||
# What the `rpm -qi` command does. See "--info" definition in /usr/lib/rpm/rpmpopt-* | ||
signature = hdr("dsaheader") or hdr("rsaheader") or hdr("siggpg") or hdr("sigpgp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
log.info(traceback.format_exc()) | ||
return None | ||
|
||
signer_key_id = "" # If package is unsigned, store empty str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# through the python bindings. Until that is available we'll have to read the header | ||
# a second time with a utility that actually makes it available. | ||
# https://github.com/rpm-software-management/createrepo_c/issues/346 | ||
# TODO: When the above is resolved re-evaluate and potentially drop extra dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sdherr You can drop all of this now
@@ -1534,6 +1538,12 @@ def _handle_distribution_tree(declarative_content): | |||
for update_reference in update_references: | |||
update_reference.update_record = update_record | |||
update_references_to_save.append(update_reference) | |||
elif isinstance(declarative_content.content, Package): | |||
artifact = declarative_content.d_artifacts[0] # Packages have 1 artifact | |||
signer_key_id = parse_signer_id(artifact.artifact.file.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think path
will work when Pulp's artifact storage is S3 / Azure. We could use .read()
and save it into a temporary file, then parse that, but it would not be terribly efficient. But short term it might be the only way... what we need is a way to parse the downloaded files before they get shipped off from temporary storage to the storage backend.
That is too invasive of a change to handle in this PR though.
The question is whether we want to live with that inefficiency or disable the feature if the user is using a cloud storage backend.
if self.allowed_pub_keys: | ||
rejected_nevras = [] | ||
for package in Package.objects.filter(pk__in=new_version.added()).iterator(): | ||
if str(package.signer_key_id) not in self.allowed_pub_keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line should be
if package.signer_key_id is None or str(package.signer_key_id) not in self.allowed_pub_keys
Otherwise it will try to compare "None"
which will still work as expected in practice but isn't the best code
@sdherr Sorry about the delay, but I think we can start moving forwards on this again. |
@sdherr Are you still interested? |
Hey @dralley , I was out for a few days there. I can look at updating this PR soon, I guess the larger question is if this is still something that the larger community would find useful. Neither of us are very enthusiastic about the implementation, and doing it "better" would have to involve a much more complicated change involving public key management in pulpcore and such. I can (and have) solved my own signature-checking needs internally and was just trying to be a good open-source citizen and contribute something back. But if no one will use it then there's no point in saddling pulp_rpm with a sub-optimal implementation. |
I think the general idea is good and we have had requests from elsewhere for the feature (https://bugzilla.redhat.com/show_bug.cgi?id=2172980) so I don't think nobody would use it, but the volume isn't super high. If your situation is handled and you're not burning to have it merged, we can close this out and gradually out some of the kinks with e.g. the sync pipeline, then revisit? As long as the changes are kept around somewhere for future reference. |
Okay, barring any outpouring of public support for this change I think I am going to abandon it for now. I'll leave the branch in my fork up. To summarize, the basic problems with "checking signatures when adding RPMs to repos" are:
|
Good summary. I'll add:
|
As this i unlikely to be implemented in pulp, but a wanted feature by some industry, what could be an alternative workflow to achieve similar approach? |
The RPM team had another discussion on this need, and identified even more edge-cases involving "getting content into a repository" that would need to be handled if we want to verify-signatures as-content-appears. Also, verifying-signatures-on-the-fly has a nontrivial impact on, say, sync-performance. One observation that came up was, "The Thing we're trying to address here, is making sure that clients pulling content do not get rpms that are signed with unaccepted/unknown keys". One way to accomplish this would be to to "get involved" not at sync-time, but between "sync into Pulp" and "distribute content to my users". What is being envisioned is a workflow that goes something like this:
How does this sound as a possible workflow? |
We also discussed a possible implementation - a very different one from the present implementation which absolutely comes with downsides - which is that we could potentially query the package header for each package. That would remove dependence on fields that are or are not im the metadata, at the cost of many many many more web requests being made and more data being downloaded. |
I think adding an optional validation step before publishing makes sense, and could be useful to enforce other rules as well (rpmlint?) if it were in some way generic and configurable, although signature checking is definitely the big one. We have pretty much decided that's the long-term direction we want to take - a validation / quality check job that runs on pulp-worker - so anything pulp provides in that direction would be helpful. |
closes #2258
The overall structure of this patch is that:
There is a new command added,
pulpcore-manager rpm-datarepair 2258
that re-download every rpm out of storage to examine its header and set the pub key id.There is currently no attempt to ensure that existing repo contents are valid. I'm not sure it would be a good idea to try, it seems like there are a lot of edge cases here and I'm not sure what you'd do on failure.