Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema versioning to flow #504

Closed
wants to merge 20 commits into from
Closed

Add schema versioning to flow #504

wants to merge 20 commits into from

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Apr 25, 2021

Description

This PR adds schema versioning logic to flow. It is a slightly modified version of glotzerlab/signac#253, and it will likely be helpful to reviewers to have that PR open for comparison. It adds the ability to track flow's internal schema, which includes all internal files and configuration information used to implement flow that are generally internal details transparent to the user.

Note to reviewers: Currently I'm targeting this PR directly into master because it doesn't actually include any changes to flow's internal schema. If preferred, I can create an alternate branch to collect all such changes, but for now I decided that could wait until I make a PR that actually changes the schema (since we'd definitely want to collect all those changes together before merging them so that we only have a single schema version bump).

Motivation and Context

This PR is the first in a chain for resolving glotzerlab/signac#527. It will allow us to decouple flow's internals more completely from signac's and ensure that we can update them independently of one another in the future. Once this is merged

I considered making use of signac's migration logic directly in flow, but I decided against this because our primary goal right now is to get to a point where flow is independent of signac and we're going to have enough trouble disentangling the two without introducing additional version dependencies between them. We can consider refactoring the code in the future to expose a common migration API for any package in the signac framework, but for now I think the duplication of logic is a preferable path forward.

Types of Changes

  • Documentation update
  • Bug fix
  • New feature
  • Breaking change1

1The change breaks (or has the potential to break) existing functionality.

Checklist:

If necessary:

  • I have updated the API documentation as part of the package doc-strings.
  • I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
  • I have updated the changelog.

@vyasr vyasr requested review from a team as code owners April 25, 2021 17:48
@vyasr vyasr requested review from tcmoore3 and SchoeniPhlippsn and removed request for a team April 25, 2021 17:48
@vyasr vyasr changed the title Feature/schema Add schema versioning to flow Apr 25, 2021
@vyasr vyasr requested review from bdice and csadorf and removed request for tcmoore3 and SchoeniPhlippsn April 25, 2021 17:55
@vyasr
Copy link
Contributor Author

vyasr commented Apr 25, 2021

I'm explicitly requesting review from @bdice and @csadorf on this one since they did most of the work on glotzerlab/signac#253 and this should be easiest for them to review.

@vyasr vyasr mentioned this pull request Apr 25, 2021
12 tasks
Copy link
Member

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I have some questions and possibly inconsistent suggestions about how to write the error and log messages. We need more consistent names for things like "signac-flow configuration schema" vs. "flow schema" vs. "flow configuration." Also I prefer to use the name signac-flow instead of flow, unless specifically referring to the package you import. If we standardize those names throughout this PR I think it'll be ready to go. I haven't explicitly tested the logic, but I trust from the tests and your forward planning for future PRs that this is going to work as desired.

flow/errors.py Outdated Show resolved Hide resolved
flow/migration/__init__.py Outdated Show resolved Hide resolved
flow/migration/__init__.py Outdated Show resolved Hide resolved
flow/migration/__init__.py Outdated Show resolved Hide resolved
flow/migration/__init__.py Outdated Show resolved Hide resolved
flow/migration/v0_to_v1.py Outdated Show resolved Hide resolved
flow/migration/v0_to_v1.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@vyasr vyasr requested a review from bdice June 1, 2021 04:54
Copy link
Member

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few minor comments to resolve. Otherwise this is fine. @vyasr I don't have a complete understanding of what next steps are needed, so please continue leading this effort through the sequence of pull requests you have in mind.



class IncompatibleSchemaVersion(RuntimeError):
"""The FlowProject's configuration schema version is incompatible with this version of signac-flow.""" # noqa: E501
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorter docstring, to align with other error docstrings.

Suggested change
"""The FlowProject's configuration schema version is incompatible with this version of signac-flow.""" # noqa: E501
"""Indicates FlowProject schema is not supported by this signac-flow version."""


if get_config_schema_version() > schema_version:
# Project config schema version is newer and therefore not supported.
raise RuntimeError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be IncompatibleSchemaVersion? Seems like the messages are very similar.

yield (origin, destination), migration
break
else:
raise RuntimeError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be IncompatibleSchemaVersion? Seems like the messages are very similar.

Comment on lines +1629 to +1630
"The signac-flow schema version used by this project is '{}', but flow {} "
"only supports up to schema version '{}'. Try updating flow.".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this error message consistent with the one in flow/migration/__init__.py. The terms signac-flow and flow are interchanged (and possibly other differences).

Comment on lines +1636 to +1639
"The flow schema version used by this project is '{}', but flow {} "
"requires schema version '{}'. Please use '$ flow migrate' to "
"irreversibly migrate this project's schema to the supported "
"version.".format(config_schema_version, __version__, schema_version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The flow schema version used by this project is '{}', but flow {} "
"requires schema version '{}'. Please use '$ flow migrate' to "
"irreversibly migrate this project's schema to the supported "
"version.".format(config_schema_version, __version__, schema_version)
"The signac-flow schema version used by this project is '{}', but signac-flow {} "
"requires schema version '{}'. Please use '$ flow migrate' to "
"irreversibly migrate this project's schema to a supported "
"version.".format(config_schema_version, __version__, schema_version)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the package will support multiple schema versions concurrently? That is, should this be able to say "requires 3 <= schema version <= 5" or similar?

@@ -3,3 +3,5 @@ coverage==5.5
pytest-cov==2.11.1
pytest==6.2.3
ruamel.yaml==0.17.4
filelock>=3.0
packaging>=15.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These requirements are in addition to the core requirements.txt. Only testing requirements should be listed here.

@csadorf
Copy link
Contributor

csadorf commented Jun 4, 2021

I considered making use of signac's migration logic directly in flow, but I decided against this because our primary goal right now is to get to a point where flow is independent of signac and we're going to have enough trouble disentangling the two without introducing additional version dependencies between them. We can consider refactoring the code in the future to expose a common migration API for any package in the signac framework, but for now I think the duplication of logic is a preferable path forward.

I don't understand this. Why should flow be indepenent of signac? The package strongly and unidirectionally depends on signac which means that there is no cross-dependency and we also have full control over what signac API is available to us. A user cannot install a version of signac that is incompatible with flow unless they break their environment.

Do we have this much trouble with this one dependency? Maybe for 2.0, we should consider to use one of the standard Python plugin approaches for a better integration of signac-core and other packages.

Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have major reservations to this change set and hopefully have outlined my arguments clear enough. If not, I am happy to discuss them either on this PR or on a call.

@@ -0,0 +1,115 @@
# Copyright (c) 2021 The Regents of the University of Michigan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly think it is a terrible idea to duplicate all of this logic here. Is it really that hard to adjust the existing framework to perform the migration?

Comment on lines +21 to +22
# 1. User updates signac before flow. In this case, signac's config API may
# already have changed, and we cannot rely on it. Therefore, we need add
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes no sense. flow depends on signac ergo we can rely on its API.

Comment on lines +32 to +34
# 2. User updates signac and attempts to migrate before migrating flow
# schema. In this case, `signac migrate` should error and tell the user
# to update flow and run `flow migrate` first. This can be accomplished
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would flows schema be affected by a schema change of signac core? signac's API is how access the config, any schema change in the core package should be transparent to dependencies, including flow.

# "flow" key in the config and error if it is present. We will
# introduce the flow v1->v2 migration before signac's to ensure that
# this is possible.
# 3. Users update signac and create a new project, but still use an old
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not possible unless they break their environment.

@vyasr
Copy link
Contributor Author

vyasr commented Jun 4, 2021

Just chatted with @csadorf. We agree that shared migration logic and zero dependence of flow on signac's internals (including API but also extending to configuration) would be great. We mainly differ on whether we need to protect users with invalid Python environments (specifically, improper signac/flow dependency management) from corrupting their metadata. The duplication of logic in this PR, many of the inline comments, and the general move to decouple flow's config from signac's are targeted at making that safer. However, @csadorf feels strongly that sticking to a single configuration file will make our lives much easier in the long term, and that we should not be trying to protect against those invalid Python environments.

While we were chatting, we identified the major interdependencies between signac and flow that make flow's reliance on signac's internals problematic. Removing flow's monkey-patching of signac's configuration and moving validation of the flow components of the config into flow would allow flow to safely depend on signac's config API. Since moving in that direction may make it less important to consider separating out flow's config, I'm going to focus on trying to make that change in the shorter term and then work my way back to this PR later and see if I can't find a better compromise solution that address both Simon and my concerns.

@vyasr vyasr marked this pull request as draft June 4, 2021 21:55
@bdice bdice requested review from bdice and removed request for bdice June 4, 2021 22:50
@bdice
Copy link
Member

bdice commented Jun 4, 2021

@vyasr and @csadorf I’m happy to let you come to whatever solution you mutually feel is workable. I have removed my request for review for now.

@vyasr
Copy link
Contributor Author

vyasr commented Dec 13, 2022

This PR is no longer really relevant. signac and flow's configs are now completely decoupled. We may eventually need to do some form of schema maintenance for flow beyond the simple validation introduced in #573, but we can cross that bridge when we reach a point that flow tries to modify its schema, and we can solve that use post-load jsonschema-based validation. The extensive replication of signac's schema code is no longer necessary.

@vyasr vyasr closed this Dec 13, 2022
@vyasr vyasr deleted the feature/schema branch December 13, 2022 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants