-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema versioning to flow #504
Conversation
…specify the config.
for more information, see https://pre-commit.ci
I'm explicitly requesting review from @bdice and @csadorf on this one since they did most of the work on glotzerlab/signac#253 and this should be easiest for them to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. I have some questions and possibly inconsistent suggestions about how to write the error and log messages. We need more consistent names for things like "signac-flow configuration schema" vs. "flow schema" vs. "flow configuration." Also I prefer to use the name signac-flow instead of flow, unless specifically referring to the package you import. If we standardize those names throughout this PR I think it'll be ready to go. I haven't explicitly tested the logic, but I trust from the tests and your forward planning for future PRs that this is going to work as desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few minor comments to resolve. Otherwise this is fine. @vyasr I don't have a complete understanding of what next steps are needed, so please continue leading this effort through the sequence of pull requests you have in mind.
|
||
|
||
class IncompatibleSchemaVersion(RuntimeError): | ||
"""The FlowProject's configuration schema version is incompatible with this version of signac-flow.""" # noqa: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shorter docstring, to align with other error docstrings.
"""The FlowProject's configuration schema version is incompatible with this version of signac-flow.""" # noqa: E501 | |
"""Indicates FlowProject schema is not supported by this signac-flow version.""" |
|
||
if get_config_schema_version() > schema_version: | ||
# Project config schema version is newer and therefore not supported. | ||
raise RuntimeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be IncompatibleSchemaVersion
? Seems like the messages are very similar.
yield (origin, destination), migration | ||
break | ||
else: | ||
raise RuntimeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be IncompatibleSchemaVersion
? Seems like the messages are very similar.
"The signac-flow schema version used by this project is '{}', but flow {} " | ||
"only supports up to schema version '{}'. Try updating flow.".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this error message consistent with the one in flow/migration/__init__.py
. The terms signac-flow
and flow
are interchanged (and possibly other differences).
"The flow schema version used by this project is '{}', but flow {} " | ||
"requires schema version '{}'. Please use '$ flow migrate' to " | ||
"irreversibly migrate this project's schema to the supported " | ||
"version.".format(config_schema_version, __version__, schema_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The flow schema version used by this project is '{}', but flow {} " | |
"requires schema version '{}'. Please use '$ flow migrate' to " | |
"irreversibly migrate this project's schema to the supported " | |
"version.".format(config_schema_version, __version__, schema_version) | |
"The signac-flow schema version used by this project is '{}', but signac-flow {} " | |
"requires schema version '{}'. Please use '$ flow migrate' to " | |
"irreversibly migrate this project's schema to a supported " | |
"version.".format(config_schema_version, __version__, schema_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the package will support multiple schema versions concurrently? That is, should this be able to say "requires 3 <= schema version <= 5" or similar?
@@ -3,3 +3,5 @@ coverage==5.5 | |||
pytest-cov==2.11.1 | |||
pytest==6.2.3 | |||
ruamel.yaml==0.17.4 | |||
filelock>=3.0 | |||
packaging>=15.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These requirements are in addition to the core requirements.txt
. Only testing requirements should be listed here.
I don't understand this. Why should flow be indepenent of signac? The package strongly and unidirectionally depends on signac which means that there is no cross-dependency and we also have full control over what signac API is available to us. A user cannot install a version of signac that is incompatible with flow unless they break their environment. Do we have this much trouble with this one dependency? Maybe for 2.0, we should consider to use one of the standard Python plugin approaches for a better integration of signac-core and other packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have major reservations to this change set and hopefully have outlined my arguments clear enough. If not, I am happy to discuss them either on this PR or on a call.
@@ -0,0 +1,115 @@ | |||
# Copyright (c) 2021 The Regents of the University of Michigan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly think it is a terrible idea to duplicate all of this logic here. Is it really that hard to adjust the existing framework to perform the migration?
# 1. User updates signac before flow. In this case, signac's config API may | ||
# already have changed, and we cannot rely on it. Therefore, we need add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes no sense. flow depends on signac ergo we can rely on its API.
# 2. User updates signac and attempts to migrate before migrating flow | ||
# schema. In this case, `signac migrate` should error and tell the user | ||
# to update flow and run `flow migrate` first. This can be accomplished |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would flows schema be affected by a schema change of signac core? signac's API is how access the config, any schema change in the core package should be transparent to dependencies, including flow.
# "flow" key in the config and error if it is present. We will | ||
# introduce the flow v1->v2 migration before signac's to ensure that | ||
# this is possible. | ||
# 3. Users update signac and create a new project, but still use an old |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not possible unless they break their environment.
Just chatted with @csadorf. We agree that shared migration logic and zero dependence of flow on signac's internals (including API but also extending to configuration) would be great. We mainly differ on whether we need to protect users with invalid Python environments (specifically, improper signac/flow dependency management) from corrupting their metadata. The duplication of logic in this PR, many of the inline comments, and the general move to decouple flow's config from signac's are targeted at making that safer. However, @csadorf feels strongly that sticking to a single configuration file will make our lives much easier in the long term, and that we should not be trying to protect against those invalid Python environments. While we were chatting, we identified the major interdependencies between signac and flow that make flow's reliance on signac's internals problematic. Removing flow's monkey-patching of signac's configuration and moving validation of the flow components of the config into flow would allow flow to safely depend on signac's config API. Since moving in that direction may make it less important to consider separating out flow's config, I'm going to focus on trying to make that change in the shorter term and then work my way back to this PR later and see if I can't find a better compromise solution that address both Simon and my concerns. |
This PR is no longer really relevant. signac and flow's configs are now completely decoupled. We may eventually need to do some form of schema maintenance for flow beyond the simple validation introduced in #573, but we can cross that bridge when we reach a point that flow tries to modify its schema, and we can solve that use post-load jsonschema-based validation. The extensive replication of signac's schema code is no longer necessary. |
Description
This PR adds schema versioning logic to flow. It is a slightly modified version of glotzerlab/signac#253, and it will likely be helpful to reviewers to have that PR open for comparison. It adds the ability to track flow's internal schema, which includes all internal files and configuration information used to implement flow that are generally internal details transparent to the user.
Note to reviewers: Currently I'm targeting this PR directly into master because it doesn't actually include any changes to flow's internal schema. If preferred, I can create an alternate branch to collect all such changes, but for now I decided that could wait until I make a PR that actually changes the schema (since we'd definitely want to collect all those changes together before merging them so that we only have a single schema version bump).
Motivation and Context
This PR is the first in a chain for resolving glotzerlab/signac#527. It will allow us to decouple flow's internals more completely from signac's and ensure that we can update them independently of one another in the future. Once this is merged
I considered making use of signac's migration logic directly in flow, but I decided against this because our primary goal right now is to get to a point where flow is independent of signac and we're going to have enough trouble disentangling the two without introducing additional version dependencies between them. We can consider refactoring the code in the future to expose a common migration API for any package in the signac framework, but for now I think the duplication of logic is a preferable path forward.
Types of Changes
1The change breaks (or has the potential to break) existing functionality.
Checklist:
If necessary: