Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Back-end configuration for clusters and batch systems #106

Closed
benkrikler opened this issue Nov 8, 2019 · 2 comments
Closed

Back-end configuration for clusters and batch systems #106

benkrikler opened this issue Nov 8, 2019 · 2 comments
Labels
Backend Relates to a processing system enhancement New feature or request

Comments

@benkrikler
Copy link
Member

We will likely need a way to configure back-ends for the cluster being run on:

@benkrikler Given that all clusters are a bit different and you have to tweak settings, you'll probably need some configuration boilerplate in your yaml file where the workflow is defined.

Perhaps something like --mode coffea:local and --mode coffea:cluster and if it's coffea cluster it looks for a cluster config in the yaml file and sets up the right call to coffea.

Originally posted by @lgray in #88 (comment)

Relates to #55, although the original scope of that was smaller.

@benkrikler benkrikler added enhancement New feature or request Backend Relates to a processing system labels Nov 8, 2019
@benkrikler
Copy link
Member Author

I've been thinking about this too. I can think of two ways to support this:

  1. Total user flexibility by having some cluster configuration mechansim, be that a python module / YAML config and /or included in the processing config or in a new file, or
  2. A built-in configuration system which identifies which cluster it is on and uses that configuration.

Option 1 is more general, flexible, and less "clever" so less chance for strange, unexpected bugs than in 2. Option 2 however reduces both the amount of code / config a user should write so should give them a nicer package, and it also increases how much code is shared between users on the same site.

I think we can probably try to do both 1 and 2: we build a mechansim to configure sites on a user-by-user basis, but then default to fill this "automatically" based on some cluster discovery service...

One thing I don't want to do, however, is mix the description of the cluster with the description of the analysis itself (as interpreted by fast-flow). This will mean a third config file to be passed in which will need to be parsed before the others in order to provide correct back-ends to those.

@benkrikler
Copy link
Member Author

As of PR #129 we now have an additional command-line option to provide a config file to configure the backend. The exact format of this file is left up to the backend that has been selected, currently chosen using the --mode option. With this in place, I think we have enough to close this off for now, although this is probably something we will be returning to in the future as we explore what is needed from this config, what is standard etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend Relates to a processing system enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant