-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Establish clear flow schema #506
Comments
@vyasr Generally looks good to me, and I think it's close enough that we can begin work on this set of proposals. Comments on Concrete ProposalsMy understanding is that names like
# project.py
from flow import FlowProject
class Project(FlowProject):
pass
# operations...
if __name__ == '__main__':
Project.get_project("/data/my_project_data").main() Comments on Possible Proposals
The current config values are:
and account names, which are specified in a sub-namespace like
I would be comfortable with all of the first set of values being per-project or global, and to the second point, I would also be perfectly fine with making these values FlowProject constructor arguments. I don't think it matters. I do think that it is important that users can specify account names globally, ideally via an equivalent to
I don't understand this suggestion enough to evaluate whether it's a good or bad idea. Can you be more concrete about what you're thinking? Lastly, I support json/gzip for status and bundling. I would even suggest that we make that a SyncedCollection backend that specializes in large documents. 😉 Comments on Open QuestionsThe concern about orphaning bundles/status when renaming a FlowProject class is a good question. I'd be willing to just accept the cost of orphaning data and warn users not to do that in the documentation. Hope that hit all the highlights. |
Thanks for the feedback! Responses inline:
Yes, paths are relative to the
The question of the proper owner of config info (system-wide, signac project, or FlowProject) is closely intertwined with the question about public vs. private information, and also begs the question of whether we need to support "stacking" configs (e.g. That also means that the public/private question is partly moot. However, accounts are a bit different from our other config variables: they're really submission directives masquerading as config data so that we can hide them in
It's probably no longer relevant if we're sticking to config files, but for completeness' sake I was basically envisioning something like
I think you skipped the second sentence in my proposal 😉:
👍 |
All directives are applied to operations/groups, while accounts are not. While both directives and accounts end up generating lines in the scheduler's script preamble, I don't think we should adhere to the same usage/storage paradigm because of the distinct types of functionality. |
I actually have a slightly different view on how we would want to handle flow-specific configuration.
from pathlib import Path
# config associated with class
class Project(FlowProject):
# config = """my hardcoded config"
# config = dict(my_hard="coded", config="as dict")
# config = flow.read_config("project.cfg") # config read from file (alternative 1)
config = Path("project.cfg") # config read from file (alternative 2)
# config associated with instance:
if __name__ == '__main__':
Project.get_project().main(config=...) # config argument like above I am not sure whether it would work exactly like this since The bigger question for me is whether we consider the flow-configuration to be directly associated to the class definition or a class instance. I would assume it's the latter, but I'm interested in your thoughts. As a final note: Let's make sure that we enable and encourage to store "staticy" information in static files. Trying to parse static information from code is basically impossible and might cause huge headaches down the line. It is really important that those two things are separated, a good example is the evolution from |
I agree with most of that. I think my question about moving some of the config information into the class definition was made in the hope of simplification without sufficient consideration of the implications. I think the configuration should be associated with the instance, not the definition. The class definition in my view should be restricted to defining the workflow, and any customization beyond that belongs to the instance. I don't think this is very different from how things work now. Technically your approach specifies a file rather than the project root directory, but ultimately I think the two are interchangeable since the location of configuration files relative to the project root must be clearly specified by the schema anyway. The one thing I'm not quite aligned with is the the idea of using the same files. I think that's a reasonable goal to move towards in the long term, but so long as 1) flow remains a separate package from signac with a different release cycle, and 2) flow remains in a less mature state than signac with respect to its internals, I think coupling the two will present problems to the evolution of either. I would like to think that the changes we make here will obviate that, but I think we should be realistic and expect some tweaking to be necessary. In the event that the tweaking comes in the form of renaming files (e.g. if the move from |
This issue is part of glotzerlab/signac#527 but requires a more focused thread.
signac-flow currently stores a number of pieces of internal data, such as bundles, submission status, and configuration information for FlowProjects. The current data organization is ad hoc and relies on signac's config, and we want to decouple flow from these signac internals. #504 is the first step, establishing an explicit schema versioning for flow's internal data and enabling migration. The next step is defining a clear schema for signac-flow.
Concrete proposals:
flow init
in a directory should runsignac init
if necessary (already happens), then create an additional directory.flow
and a file.flow/config
for flow config information..flow/${FLOWPROJECT_NAME}_bundles
..flow/${FLOWPROJECT_NAME}_status
. It should not be stored in the project document, which is user-facing.Possible proposals:
project.py
files (note that we already do this in signac.rc, but maybe we can find a way to avoid that). One possibility is that some information ends up belonging to the FlowProject class, while other information (like accounts) goes in a config file.Open questions:
FlowProjectSubclass.init_project("/path/to/root")
in order for that to work, but we need to make sure that's the case. Decoupling flow's schema from signac's should help somewhat since it'll remove some of the fragile monkey-patching of the config that we do.The text was updated successfully, but these errors were encountered: