-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSC-Platform-031: Secret management for proprietary data ingestion #81
Comments
See can this be part of Operate first |
ODH team to provide guidance on secret mgmt (Landon Smith) - not just notebooks;Heather to ask on ODH support channel in slack. This issue to be broken up. Key mgmt requires overall architecture and planning. Need for discussion/meeting - @HeatherAck to set up. |
@HeatherAck to schedule week of 14-Nov |
meeting planned for 8-Dec; will mark as blocked until that date |
Prefer to use airflow as scheduler - handle more complex data pipelines - Other factors to consider: open metadata and airflow are tightly coupled. Consider externalizing airflow to enable other functionality. See also #243 |
(3) Need to know use cases where the keys will be used.
(4) Need to establish different rules/restrictions for CL2 to ensure sandbox development and testing is not slowed down ; CL3 will be stable cluster (5) Document the policy and use thereof |
@redmikhail still investigating (1) above re: (3) above - customize plugin; certificates (need to be updated - expiring on 1-Jan, issue with cert mgr need to fix bug - manual updates required every 3 mos - see operate-first/apps#1998);
|
@bryonbaker will help with HashiCorp. @redmikhail to reply to email and still validating airflow's use. @HeatherAck to ensure ticket created in LF |
I have started PoCing HashiCorp Vault and various means to inject and access secrets, but would like to get a broader view of the team's needs to make sure I come up with the right solution. |
Here is a list of secrets currently required for data pipeline ingestion:
Ideally the secrets should be injected to Airflow in runtime, including the generation of a JWT token for Trino access which is used for both data ingestion and read (probably the most complex requirements). Also we need to check if we can automate the execution of OpenMetadata at end of the data pipeline which may require secrets of the OpenMetada admin acount (automation not tested yet, we need to open a dedicated issue for this). |
Just to add to the list stated above , we also need to manage
While may be not perfect we have solution that is already being used for use cases where corresponding component/application uses kubernetes secrets - External Secrets Operator (https://external-secrets.io/). It ultimately syncs entries in external KMS (including HashiCorp Vault) with kubernetes secrets. It is fairly lightweight and non-invasive (not using side-car containers and so on) . Some of the use cases mentioned above may be covered already , however if target application does not allow easy access to the kubernetes secret we may need to have different way of implementing it |
Is there any reason we could not use Hashicorp vault with K8s service accounts to give all pods in a namespace access to a set of secrets? I know airflow lets you assign attributes to the scheduled pods so that would enable injection. |
We have two issues currently with secret management for AWS S3 buckets:
We want to go preferably towards a secret management solution that works with a secretless broker to make the process seamless for developers, example with Conjur secret store:
https://github.com/cyberark/secretless-broker
The text was updated successfully, but these errors were encountered: