-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored workload API token management for better security and implemented generic API token dispenser #3154
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 Walkthrough📝 Walkthrough📝 Walkthrough📝 Walkthrough📝 Walkthrough📝 Walkthrough📝 Walkthrough📝 Walkthrough📝 Walkthrough📝 Walkthrough📝 Walkthrough📝 Walkthrough📝 WalkthroughWalkthroughThe changes introduced in this pull request include the addition of a new attribute Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant AuthServer
participant TokenManager
User->>AuthServer: Request login
AuthServer->>TokenManager: Generate access token
TokenManager-->>AuthServer: Return access token
AuthServer-->>User: Send access token
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (2)
src/zenml/orchestrators/utils.py (1)
Line range hint
113-128
: Security implementation looks robustThe code properly handles different token expiration scenarios and includes appropriate security warnings. The implementation:
- Uses expiring tokens for regular pipeline runs
- Clearly warns about security implications of non-expiring tokens
- Guides users toward more secure service account authentication
Consider implementing a maximum lifetime limit for non-expiring tokens as an additional security measure.
src/zenml/constants.py (1)
272-272
: Consider making the token lifetime configurable via environment variable.The constant follows the naming convention and its value aligns with the PR objective. However, consider making it configurable via an environment variable, similar to how
PIPELINE_API_TOKEN_EXPIRES_MINUTES
is handled.Add an environment variable and update the constant:
+ENV_ZENML_SERVER_GENERIC_API_TOKEN_LIFETIME = f"{ENV_ZENML_SERVER_PREFIX}GENERIC_API_TOKEN_LIFETIME" -DEFAULT_ZENML_SERVER_GENERIC_API_TOKEN_LIFETIME = 60 * 60 # 1 hour +DEFAULT_ZENML_SERVER_GENERIC_API_TOKEN_LIFETIME = handle_int_env_var( + ENV_ZENML_SERVER_GENERIC_API_TOKEN_LIFETIME, + default=60 * 60, # 1 hour +)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (4)
- src/zenml/config/server_config.py (3 hunks)
- src/zenml/constants.py (1 hunks)
- src/zenml/orchestrators/utils.py (2 hunks)
- src/zenml/zen_server/routers/auth_endpoints.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
src/zenml/config/server_config.py (1)
Pattern
src/zenml/**/*.py
: Review the Python code for conformity with Python best practices.src/zenml/constants.py (1)
Pattern
src/zenml/**/*.py
: Review the Python code for conformity with Python best practices.src/zenml/orchestrators/utils.py (1)
Pattern
src/zenml/**/*.py
: Review the Python code for conformity with Python best practices.src/zenml/zen_server/routers/auth_endpoints.py (1)
Pattern
src/zenml/**/*.py
: Review the Python code for conformity with Python best practices.
🔇 Additional comments (3)
src/zenml/orchestrators/utils.py (1)
37-37
: LGTM: Logger initialization follows best practicesThe logger is properly initialized using
__name__
as the logger name, which is the recommended approach in Python.src/zenml/constants.py (1)
272-272
: Verify the constant usage in the codebase.Let's verify how this constant is being used in the codebase.
✅ Verification successful
Constant is properly integrated and used
The constant
DEFAULT_ZENML_SERVER_GENERIC_API_TOKEN_LIFETIME
is correctly defined inconstants.py
and appropriately used inserver_config.py
as a default value for thegeneric_api_token_lifetime
configuration parameter. The usage is consistent with its intended purpose of setting a default lifetime for generic API tokens.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Search for usages of the new constant to ensure it's properly integrated. rg "DEFAULT_ZENML_SERVER_GENERIC_API_TOKEN_LIFETIME"Length of output: 320
Script:
#!/bin/bash # Check the usage context in server_config.py rg -A 5 -B 5 "DEFAULT_ZENML_SERVER_GENERIC_API_TOKEN_LIFETIME" src/zenml/config/server_config.pyLength of output: 948
src/zenml/config/server_config.py (1)
123-124
: LGTM! Documentation is clear and follows the existing style.The documentation for the new attribute is well-written and clearly explains its purpose.
src/zenml/orchestrators/utils.py
Outdated
logger.warning( | ||
"An API token without an expiration time will be generated " | ||
"and used to run this pipeline on a schedule. This is very " | ||
"insecure because the API token cannot be revoked in case " | ||
"of potential theft without disabling the entire user " | ||
"accountWhen deploying a pipeline on a schedule, it is " | ||
"strongly advised to use a service account API key to " | ||
"authenticate to the ZenML server instead of your regular " | ||
"user account. For more information, see " | ||
"https://docs.zenml.io/how-to/connecting-to-zenml/connect-with-a-service-account" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo and improve warning message readability
There's a typo in the warning message where "account" and "When" are merged. Additionally, the message could be split into multiple lines for better readability.
Apply this fix:
logger.warning(
- "An API token without an expiration time will be generated "
- "and used to run this pipeline on a schedule. This is very "
- "insecure because the API token cannot be revoked in case "
- "of potential theft without disabling the entire user "
- "accountWhen deploying a pipeline on a schedule, it is "
- "strongly advised to use a service account API key to "
- "authenticate to the ZenML server instead of your regular "
- "user account. For more information, see "
- "https://docs.zenml.io/how-to/connecting-to-zenml/connect-with-a-service-account"
+ "An API token without an expiration time will be generated "
+ "and used to run this pipeline on a schedule.\n\n"
+ "SECURITY WARNING: This is very insecure because the API token "
+ "cannot be revoked in case of potential theft without disabling "
+ "the entire user account.\n\n"
+ "When deploying a pipeline on a schedule, it is strongly advised "
+ "to use a service account API key to authenticate to the ZenML "
+ "server instead of your regular user account.\n\n"
+ "For more information, see: "
+ "https://docs.zenml.io/how-to/connecting-to-zenml/connect-with-a-service-account"
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
logger.warning( | |
"An API token without an expiration time will be generated " | |
"and used to run this pipeline on a schedule. This is very " | |
"insecure because the API token cannot be revoked in case " | |
"of potential theft without disabling the entire user " | |
"accountWhen deploying a pipeline on a schedule, it is " | |
"strongly advised to use a service account API key to " | |
"authenticate to the ZenML server instead of your regular " | |
"user account. For more information, see " | |
"https://docs.zenml.io/how-to/connecting-to-zenml/connect-with-a-service-account" | |
) | |
logger.warning( | |
"An API token without an expiration time will be generated " | |
"and used to run this pipeline on a schedule.\n\n" | |
"SECURITY WARNING: This is very insecure because the API token " | |
"cannot be revoked in case of potential theft without disabling " | |
"the entire user account.\n\n" | |
"When deploying a pipeline on a schedule, it is strongly advised " | |
"to use a service account API key to authenticate to the ZenML " | |
"server instead of your regular user account.\n\n" | |
"For more information, see: " | |
"https://docs.zenml.io/how-to/connecting-to-zenml/connect-with-a-service-account" | |
) |
src/zenml/config/server_config.py
Outdated
generic_api_token_lifetime: int = ( | ||
DEFAULT_ZENML_SERVER_GENERIC_API_TOKEN_LIFETIME | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add validation for the token lifetime value.
The generic_api_token_lifetime
should be validated to ensure it's a positive integer to prevent potential security issues with invalid token lifetimes.
Add the following validation to the _validate_config
method:
@model_validator(mode="before")
@classmethod
@before_validator_handler
def _validate_config(cls, data: Dict[str, Any]) -> Dict[str, Any]:
+ # Validate generic API token lifetime
+ if "generic_api_token_lifetime" in data:
+ lifetime = data["generic_api_token_lifetime"]
+ if not isinstance(lifetime, int) or lifetime <= 0:
+ raise ValueError(
+ "generic_api_token_lifetime must be a positive integer"
+ )
Committable suggestion was skipped due to low confidence.
💡 Codebase verification
Add validation for maximum token lifetime
The current implementation allows setting arbitrary token lifetimes without an upper bound. The default value is 1 hour, which is a good security practice, but there's no enforcement to prevent setting much longer lifetimes. Consider adding a validation rule to limit the maximum token lifetime to a reasonable duration (e.g., 24 hours) to prevent potential security risks from long-lived tokens.
- In
src/zenml/config/server_config.py
, add a validator forgeneric_api_token_lifetime
to enforce a maximum value
🔗 Analysis chain
Consider adding a maximum lifetime limit.
For security best practices, consider adding an upper bound to the token lifetime to prevent the creation of tokens with excessive lifetimes.
Let's check if there are any existing token lifetime limits in the codebase:
Let me check if there are any validation rules for the token lifetime:
Let me check the default value for the token lifetime:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for token lifetime related constants and validations
rg -i "token.*lifetime|lifetime.*token" -g "!*.{log,md,txt}"
Length of output: 752
Script:
#!/bin/bash
# Check for any validation rules around token lifetime
ast-grep --pattern 'generic_api_token_lifetime'
# Also get the content of the server config file
cat src/zenml/config/server_config.py
Length of output: 25570
Script:
#!/bin/bash
# Check the default token lifetime value
cat src/zenml/constants.py | grep -A 1 "DEFAULT_ZENML_SERVER_GENERIC_API_TOKEN_LIFETIME"
Length of output: 157
return generate_access_token( | ||
user_id=token.user_id, | ||
expires_in=expires_minutes * 60 if expires_minutes else None, | ||
pipeline_id=pipeline_id, | ||
schedule_id=schedule_id, | ||
).access_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing unused pipeline_id
and schedule_id
to generate_access_token
In this call to generate_access_token
, the parameters pipeline_id
and schedule_id
are provided but not utilized within the function. Since generate_access_token
does not handle these parameters, this could lead to confusion or unintended behavior.
Consider removing these parameters from the function call or updating generate_access_token
to handle them appropriately.
response: Optional[Response] = None, | ||
device: Optional[OAuthDeviceInternalResponse] = None, | ||
api_key: Optional[APIKeyInternalResponse] = None, | ||
expires_in: Optional[int] = None, | ||
pipeline_id: Optional[UUID] = None, | ||
schedule_id: Optional[UUID] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused parameters 'pipeline_id' and 'schedule_id' in generate_access_token
The parameters pipeline_id
and schedule_id
are added to the generate_access_token
function signature but are not utilized within the function body. If these parameters are intended to scope the access token, they should be incorporated into the JWTToken
or otherwise appropriately used.
Apply this diff to include pipeline_id
and schedule_id
in the JWTToken
:
access_token = JWTToken(
user_id=user_id,
device_id=device.id if device else None,
api_key_id=api_key.id if api_key else None,
+ pipeline_id=pipeline_id,
+ schedule_id=schedule_id,
).encode(expires=expires)
Committable suggestion was skipped due to low confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be missing indeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response: Optional[Response] = None, | ||
device: Optional[OAuthDeviceInternalResponse] = None, | ||
api_key: Optional[APIKeyInternalResponse] = None, | ||
expires_in: Optional[int] = None, | ||
pipeline_id: Optional[UUID] = None, | ||
schedule_id: Optional[UUID] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be missing indeed
@schustmi Small note: If we can get this PR over the line and give to aagya we might be able to do it before she leaves... not sure but just wanted to add that if you can "take this over" so to speak and its an easy win, id do that |
Yeah I wanted to, but really unsure about the early return comment, I think we should probably wait for Stefan's input on this |
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the way this turned out, just one small comment from my side
) | ||
logger.error(error) | ||
raise CredentialsNotValid(error) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I get this right, every time such a workload token is used to send a request to the server, this code is running, correct? Do you think this will cause any trouble at scale as this is running up to 3 additional database queries per request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I could implement a local time-based cache that would delay re-fetching the same object from the DB for up to 30 seconds. We should keep this contained to a limited number of entries though, otherwise the memory requirements will explode during high loads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially yeah, but not necessary for now I hope. We're considerably reducing the amounts of server requests during pipeline execution in this and the next release, which means this hopefully doesn't matter as much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schustmi I added a simple in-memory cache for objects that we want to cache in the ZenML server
src/zenml/zen_server/cache.py
Outdated
cache = MemoryCache() | ||
uuid_key = UUID("12345678123456781234567812345678") | ||
|
||
cached_or_real_object = cache.get_or_cache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the get_or_cache
method anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right, this was left behind from an older implementation version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/zenml/zen_server/cache.py
Outdated
self.cache.popitem(last=False) # type: ignore[call-arg] | ||
|
||
|
||
F = Callable[[UUID], Optional[Any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Any
already includes Optional
, so this is sort of redundant. Can also leave it like this if needed for clarity though, up to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/zenml/zen_server/cache.py
Outdated
|
||
# Ensure we don't exceed max capacity | ||
while len(self.cache) > self.max_capacity: | ||
self.cache.popitem(last=False) # type: ignore[call-arg] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to annotate self.cache: OrderedDict[...]
to get rid of this type ignore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read that OrderedDict
isn't available as a type annotation in Python 3.8 and decided to use Dict
instead. But then I realized we don't support Python 3.8 anymore so... yeah, sure :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Describe changes
This PR heavily reworks the current mechanism of generating API tokens used by pipeline workloads to authenticate to the ZenML server API to implement additional scoping and security validation checks for generated workload API tokens and to remove the previous time-based limitations. Here are some highlights:
In addition to this, the current restrictions concerning which types of API tokens can be used to generate workload API tokens have been removed. All types of accounts and API tokens (authorized devices, service accounts, dashboard session tokens) can be used to generate API tokens. Additional workload API tokens can be generated by pipeline workloads authenticated with workload API tokens and the scoping rules are applied incrementally:
These improvements add a layer of security to workload API tokens that was previously only implemented via time-based expiration that was difficult to work with in the context of long-running pipelines:
The existing api_token endpoint can now also be used to issue generic API tokens than expire after one hour and can be used to make programmatic calls to the API for automation purposes.
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes
Summary by CodeRabbit
New Features
Bug Fixes
Documentation