-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add non-admin controller design #18
Add non-admin controller design #18
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shubham-pampattiwar I also was writing similar doc, pushed to my branch (without PR) but it's different as I am using mermaid to create diagrams.
Let's discuss on our sync which way to go. I like the clarity of your diagram a bit better, mine is more detailed, but maybe too much?
https://github.com/mpryc/oadp-non-admin/blob/docs_design/docs/design/NonAdminBackup.md
### Backup Operation | ||
- As a non-admin user/namespace owner with administrative priviledges for a particular namespace, the user should be able to: | ||
- Create a Backup/Schedule of the namespace | ||
- Update the Backup/Schedule spec of the namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one I am confused of what will happen. Example: if I update after Velero started (or finished) the backup process, nothing will happen, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not even sure what happens to Velero if you update a backup after it starts. Not sure we need to make any promises here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for backups -- for schedules, those can be updated, but see above as to whether we need schedules for the first iteration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if you update backup spec, nac could create deleteBackupRequests, mark NAB as pending, then once old backup is gone, NAB will be processing with new backup of the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, updating backup spec makes sense as it just does exactly what a normal velero (admin) user modifying backup spec would do. If there's any modification that makes sense in velero, then it could be done here.
…lation and implementation details
|
||
## Open Questions and Know Limitations | ||
- Velero command and pod logs | ||
- Multiple instance of OADP Operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add:
- status regarding the queue of backups pending or running that may be blocking the non-admin backup.
- e.g. There are currently 5 OADP backups queued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple instances of OADP operator can exist, but we must make sure that no more than one of them enable non-admin. If a second DPA adds enableNonAdmin, that should trigger a validation error.
### Backup Operation | ||
- As a non-admin user/namespace owner with administrative priviledges for a particular namespace, the user should be able to: | ||
- Create a Backup/Schedule of the namespace | ||
- Update the Backup/Schedule spec of the namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not even sure what happens to Velero if you update a backup after it starts. Not sure we need to make any promises here.
### Backup Operation | ||
- As a non-admin user/namespace owner with administrative priviledges for a particular namespace, the user should be able to: | ||
- Create a Backup/Schedule of the namespace | ||
- Update the Backup/Schedule spec of the namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for backups -- for schedules, those can be updated, but see above as to whether we need schedules for the first iteration)
|
||
- The Non-Admin Controller (NAC) will be installed via OADP Operator. | ||
- The Data Protection Application (DPA) CR will consist of a root level spec flag called `enableNonAdmin` | ||
- If the `enableNonAdmin` flag is set to `true`, the OADP Operator will install the NAC in OADP Operator's install namespace, the default value for `enableNonAdmin` will be `false` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need some validation to ensure that only one DPA in the cluster sets this to true
. Otherwise, if there are 2 OADP instances, both enabling non-admin, we will have race conditions around which velero instance handles each NonAdminBackup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there are 2 OADP instances, both enabling non-admin
Both instances in a supported scenario would be in separate oadp namespaces, I don't see a possible overlap here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default it watches all Namespaces
nvm. agree with sseago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. scenario is this:
- OADP installed in openshift-adp
- OADP installed in openshift-adp2
- user creates NonAdminBackup in mysql-persistent
If both OADPs have non-admin enabled, it's a race condition as to which one grabs and labels/annotates it first (and creates Velero CR), and possibly in some cases you end up with both attempting to modify it at the same time. If only one of them has non-admin enabled, then that OADP will manage it, and the other OADP will stay out of the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think non admin use case will prop up only in env where multiple users share a cluster. Therefore I think we need to ensure that multiple non admin controllers per velero is supported, so we need a way to limit the scope of each nac so it doesn't overlap.
- Non-Admin Backup (NAB) CRD: This iCRD will encapsulate the whole Velero Backup CRD and some additional spec felds that will be needed for non-admin feature. | ||
- Non-Admin Restore (NAR) CRD | ||
|
||
**Note:** Currently, this design considers that the responsibility of the BackupStorageLocation configuration is that of the cluster admin and not non-admin/namespace admin. Hence, no introduction of non-admin BSL controllers and CRDs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, this means users will not be managing their own BSLs/buckets. They will have to know the name of the configured BSL (given by admin), and the NAC will assume that any user who knows the name of a BSL is permitted to back up to that BSL. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, any NAC user who does not specify/know the BSL name will be considered approved to use the default BSL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will have a way for admin to scope who can use which bsl, the nab/nar controller will validate accordingly before processing the CRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will that require a CR, or will it be managed via annotations/labels/etc on the BSL itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
up for discussion.
|
||
## Open Questions and Know Limitations | ||
- Velero command and pod logs | ||
- Multiple instance of OADP Operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple instances of OADP operator can exist, but we must make sure that no more than one of them enable non-admin. If a second DPA adds enableNonAdmin, that should trigger a validation error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If admin creates dpa on another cluster, and use the bsl containing NAC backups, will NAC backups.velero.io synced back to cluster regenerate backup.nac.oadp.openshift.io?
This is a good question. Might be required for disaster recovery of non-admin backups (i.e. cluster is corrupted, rebuilt from scratch, add BSL to new OADP instance, user backups have returned. This would require a non-admin backup sync controller to watch in-cluster velero Backups, and if one is found with NAC labels/annotations without a corresponding NAC backup, it would regenerate (creating the namespace if necessary). |
- Non-Admin Backup (NAB) Controller: The responsibilities of the NAB controller are: | ||
- Validate whether the non-admin user has appropriate administrative namepsace access | ||
- Validate Wehther the non-admin user has appropriate access to create/view/update /delete Non-Admin Backup CR | ||
- Listen to requests pertaining to Non-Admin Backup CRD across all the namespaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shubham-pampattiwar please update this such that more than one OADP/NAC combo can be installed on the same cluster. Requests will have to be namespace filtered. @mpryc @mateusoliveira43 I suppose the namespaces for the NAC will be configured in the DPA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not discussed, I believe. But one option, yes
…ntroller, Queuing mechanism, Validating Webhooks, NAC watch NS filtering and control over spec exposed to non-admin users
This latest commit addresses the following:
|
Need to add a note regarding tech preview: #47 (comment) |
# Non-Admin Backup/Restore Design | ||
|
||
## Background | ||
[OADP (Openshift API for Data Protection)](https://github.com/openshift/oadp-operator) Operator currently requires cluster admin access for performing Backup and Restore operations of applications deployed on the OpenShift platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here "has" would be better than "requires"?
suppose someone tries to use OADP in non admin way (without NAC). Installs it in a non admin namespace. the non admin user can only create things in that namespace, but it can tell OADP to backup/restore anything (in any namespace) in the cluster, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either is fine, currently we advertise OADP functionalities as cluster admin tasks so that way requires makes sense.
- View the status of the Restore created for the particular namespace | ||
- Delete the Restore of the namespace | ||
|
||
### BackupStorageLocation (BSL) Configuration Operation (optional use case) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention the use case of that BSL in the other operations such as Backup? How this will be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the use-case
- The Data Protection Application (DPA) CR will consist of a root level struct called `nonAdmin` under the features. This struct will have a boolean to `enable` or `disable` the non-admin feature. | ||
- If the `features.nonAdmin.enable` flag is set to `true`, the OADP Operator will install the NAC in OADP Operator's install namespace, by default this feature will be disabled. | ||
- **Multiple OADP:NAC installations:** Due to performance limitations of single threaded nature of Velero controller, it might be desirable in some environments to have more than one instance of OADP:NAC installed. | ||
- We need to ensure that one OADP operator instance is mapped to only one NAC instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't clear to me. One instance of NAC for one OADP operator, and above we have the multiple OADP:NAC installations, so does it mean we can only scale the NAC by scaling OADP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OADP Operator is NS scoped operator, so you can have multiple OADP Operator installations in the cluster. But each of the OADP Operator will have one NAC controller instance if NAC is enabled. So we need to ensure that if there are multiple OADP/NAC installations in one particular cluster then no 2 NAC controllers should serve once single non-admin application/user namespace.
- If the `features.nonAdmin.enable` flag is set to `true`, the OADP Operator will install the NAC in OADP Operator's install namespace, by default this feature will be disabled. | ||
- **Multiple OADP:NAC installations:** Due to performance limitations of single threaded nature of Velero controller, it might be desirable in some environments to have more than one instance of OADP:NAC installed. | ||
- We need to ensure that one OADP operator instance is mapped to only one NAC instance | ||
- Also, the namespaces catered by each NAC instance should be separate and not merge with each other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not clear - NAC will create namespace per instance? Or it's OADP namespace to which the NAC controller(s) get installed, or if I think what it is:
The non-admin namespaces on which NAC controllers will operate....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes your understanding that The non-admin namespaces on which NAC controllers will operate....
is correct
## Pre-requisites | ||
- **OADP installed**: OADP Operator must be installed and configured to use non-admin controller | ||
- **Non Admin Controller configured**: Data Protection Application (DPA) instance must configure Non Admin Controller to watch user namespace(s), by default it watches all Namespaces | ||
- **RBAC privileges for the user**: User must have the appropriate RBAC privileges to create Non Admin objects within the Namespace where Backup will be taken. An example of such ClusterRole, which may be added to the user with `RoleBinding`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user has admin rights to a particular namespace, the RBAC is not required right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah makes sense, added a note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
It's really nice doc. If requires further updates we can address those in additional PR's as NAC design updates.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mateusoliveira43, mpryc, shubham-pampattiwar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/override "Continuous Integration / oadp-compatibility-check (pull_request)" |
@mateusoliveira43: /override requires failed status contexts, check run or a prowjob name to operate on.
Only the following failed contexts/checkruns were expected:
If you are trying to override a checkrun that has a space in it, you must put a double quote on the context. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/override oadp-compatibility-check |
@mateusoliveira43: Overrode contexts on behalf of mateusoliveira43: oadp-compatibility-check In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Fixes #14
Fixes openshift/oadp-operator#1043
To review: https://github.com/shubham-pampattiwar/oadp-non-admin/blob/add-nac-design/docs/design/Non_Admin_Controller_design.md