Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envvars #439

Closed
wants to merge 6 commits into from
Closed

Envvars #439

wants to merge 6 commits into from

Conversation

ankostis
Copy link
Contributor

@ankostis ankostis commented Aug 12, 2017

I believe I Implemented #99 for receiving trait-values from environment-variables with only minimal changes to the codebase.
The priority for the source of trait-values is the following (bottom wins):

  1. program defaults (trait.default_value, dynamic_defaults),
  2. config-files,
  3. env-var,
  4. command-line arguments,
  5. values directly assigned.

Example:

class App(Application):
    a = Unicode('def').tag(config=True,
                           envvar='MY_ENVVAR')
    b = Unicode().tag(config=True,
                      envvar='MY_ENVVAR')
    aliases = {'a': 'App.a', 'b': 'App.b'}

    @default('b')
    def set_a_dyn(self):
        return 'def'


cfg = Config()
cfg.App.a = cfg.App.b = 'cfg'

app = App()
assert app.a == app.b == 'def'

app = App(config=cfg)
assert app.a == app.b == 'cfg'

app.parse_command_line(['-a=cmd', '-b=cmd'])
assert app.a == app.b == 'cmd'


os.environ['MY_ENVVAR'] = 'env'


app = App()
assert app.a == app.b == 'env'

app = App(config=cfg)
assert app.a == app.b == 'env'

app.parse_command_line(['-a=cmd', '-b=cmd'])
assert app.a == app.b == 'cmd'

And this is the help string:

>>> print(App.class_get_trait_help(App.a))
--App.a=<Unicode>
    Env-var: MY_ENVVAR
    Default: 'def'

Implementation:

The feature is implemented in 3 forward-dependent commits:

  1. trait-layer:
    The TraitType.metadata['envvar'] stores the name of the environment-variable.
    That metadata is checked in TraitType.get(), just before defaults.
    If the env-var exists, its value fetched and cached in _trait_values dict, as a normal value.
    No specific effort is made for converting string env-vars - use of coercing traits is recommended.

  2. configurable-layer:
    the Configurable._load_config() does an exception for traits with an existing env-var;
    in that case, it does not load their values from configs but from the env-var.

  3. application-layer:
    the Application.parse_command_line() invokes Configurable._load_config(skip_env=True)
    to bypass the above rule for its command-line config-params.
    (the skip_env keywords had to be added on update_config() for this to work).

Rational:

Open issues:

  • Documentation: Where to describe the new capability? Maybe collect all metadata usages in some place?
  • Should we print the envvar-value in the help description, if var exists?
  • Should we add a new Configurable.skip_env trait (not tagged as config of course)?
    I would vote for *yes. Otherwise, it's impossible to affect this behavior from the constructor, when a config object is given - as it is now, environment always applies, and it overwrites any configs.
    But adding a new special-purpose trait on Configurable class is too obtrusive for me to decide.
    [edit:] A cheaper alternative would be to add a special skip_env kwd in configurable's cstor.
  • Is there any other code-path that needs to become env-aware? Is _load_config() & parse_command_line() enough?

ankostis added a commit to ankostis/traitlets that referenced this pull request Aug 12, 2017
+ When getting trait-value, try 1st env-var, then fallback to defaults.
ankostis added a commit to ankostis/traitlets that referenced this pull request Aug 12, 2017
+ Don't load config-values on traits with existing ENV-VAR.
+ Print ENV-VAR used in class-help, config and RsT msgs.
+ Add/update TCs.
ankostis added a commit to ankostis/traitlets that referenced this pull request Aug 12, 2017
+ Cmd-line configs override ENV-VARS
+ Add/update TCs.
+ When getting trait-value, try 1st env-var, then fallback to defaults.
+ Don't load config-values on traits with existing ENV-VAR.
+ Print ENV-VAR used in class-help, config and RsT msgs.
+ Add/update TCs.
+ Cmd-line configs override ENV-VARS
+ Add/update TCs.
@minrk minrk added this to the 5.1 milestone Aug 14, 2017
@@ -402,6 +424,10 @@ def class_config_rst_doc(cls):
termline += ' : ' + ttype
lines.append(termline)

env_var = trait.metadata.get('envvar')
if env_var:
lines.append(indent(indent('Env-var: ``%s``' % env_var, 4)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent(indent( typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.
Were missng also '# ' prefix in write-config comment-line.
Now had added new parameterized pytest TC checking new comment-line in all 3 cases.


env_var = trait.metadata.get('envvar')
if env_var:
lines.append('Env-var: %s' % env_var)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Environment variable:'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

that no `envvar` metadata is defined.
"""
env_var = self.metadata.get('envvar')
return env_var and os.environ.get(env_var)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, any environment variable defined and empty should behave the same as that environment variable being undefined. It's probably appropriate to return Undefined in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily.
I may want to override some config-param NOT to the empty-string('') but to None - that's a common need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind that the code for coercing config-values does not run for env-vars (correct?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(oups, was to ambitious, and had forgotten the purpose of my here)

The point above was NOT to allow for None; it was for empty-string('') to pass through to the trait - that's a very common case.
None is impossible to signify from environment; that is why I used it as a sentinel to mean "no-env" var exists at all.

Copy link
Member

@minrk minrk Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a very common case.

It is also a very common case where setting an environment variable to an empty string is meant to be treated the same as undefined. I have not personally encountered a case where setting an environment variable to an empty string should be treated differently to the environment variable not being set at all. Please make sure that env being an empty string is treated the same as the value not being defined. Also make sure that this value returns Undefined for no value being set, so that traits that allow None can still use this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then how can I set a trait to the empty-string with an envvar?

@@ -203,8 +210,13 @@ def _config_changed(self, change):
section_names = self.section_names()
self._load_config(change.new, traits=traits, section_names=section_names)

def update_config(self, config):
"""Update config and load the new values"""
def update_config(self, config, skip_env=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe override_env=True as the default? skip_env is ambigous as to whether it means "skip loading environment variables" or "skip config also set by environment variables".

Also, the default behavior should be reversed. Calling .update_config explicitly should load all of the config by default. The exception is loading default config files, which should be applied at lower-than-env priority.

Copy link
Contributor Author

@ankostis ankostis Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always try to have booleans defaulting to False.
That's the default for traits, and many other programming domains.
Also it allows to work as planned if None is given.

Now why should it be reversed?
If someone puts in his code the tag(envvar='SOMEVAR'), why would he not want to use this, by default?
If he didn't like it, he wouldn't put the env-var there, it in the first place; he could have used dynamic-defaults.
That's the essence of this PR, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... ambigous as to whether it means "skip loading environment variables" or "skip config also set by environment variables".

Sorry, I think i may have missunderstood what you meant.
Can you elaborate a bit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run:

cfg = Config(key=value)
obj.update_config(cfg)

all of that configuration should take effect. Not just what hasn't been specified via a particular mechanism. It is not update_config in general that should be lower priority than environment variables, but load_config_files in particular.

I always try to have booleans defaulting to False.

I don't think that matters. The name should be the clearest choice and the value should be the logical default behavior. But if you have a clear name where False clearly means "override environment variables", that's okay, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not update_config in general that should be lower priority than environment variables, but load_config_files in particular.

I see now.
respect_env=False for update_config() should be clear, no?

There is also here the API-backward-compatibility problem, with the new kwd in update_config() (_load_config() is private, so i guess it's ok).
So what? Should better add it as a trait on configurable? Then it would be difficult to enforce a different default from load_config_file().
Maybe breaking API change for this kwd is the only way here.

Copy link
Contributor Author

@ankostis ankostis Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came to the conclusion that not respecting the environment, by default,
is not the correct behavior for this new feature.
Let me explain why:

Previous versions of traitlets had not any envvar capability.
Newer versions of traitlets would support that, but client-apps would NOT get that for free; they have to add tag(envvar='FOO') to specific traits. Otherwise, they will continue to behave EXACTLY as before.

Now, having by default respect_env=False means that for clients, modifying their tags is not enough; they must also do changes to all update_config()/load_config_file() sites.
For instance, in my application that's in 4 places + ~20 sites in test-cod.
That's a big effort. But why?

That's the point of a new feature:
it should come preloaded with all hassle-free defaults, no?
For this PR, it is to follow the priority given in the OP of #439.

That practical point is this:

  • For old clients, nothing will change.
  • For those wanting the new functionality, the current defaults makes more sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a big effort. But why?

Because it breaks expectations of existing code, that .update_config will load the config given.

For those wanting the new functionality, the current defaults makes more sense.

I disagree. update_config(config) loading the config given makes more sense to me.

it should come preloaded with all hassle-free defaults, no?

Yes, though I think your assessment of hassle-free is the reverse of mine. Plus backward-compatibility is of absolute utmost priority for traitlets. Changing the default behavior of .update_config to have no effect if a lower priority config source is given is not better than preserving the existing behavior of respecting the arguments of the function call. User calls to .update_config(config) not loading that config is a backward-incompatible change and more surprising behavior.

The only calls to .update_config that ought to have lower priority than environment variables are those in Application.load_config_file. That should not be true for all config objects in general loaded at any time.

The default in this PR would require updates to IPython, because the only calls to .update_config in IPython/Jupyter projects are explicit user-provided high priority calls that should override environment variables. If the default were override_env=True and Application.load_config_file set override_env=False, then everything would behave correctly and nothing would need updating.

Copy link
Contributor Author

@ankostis ankostis Aug 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is uttermost important to keep this PR backward-compatible, so I did a small study.

I searched through all jupyter/ipython sources for update_config() invocations
and I singled out the following x4 (+ x5) config_update() usages:

  1. ipython/ipython@IPython/core/application.py#L462: self.update_config(<cmd-line>)
    in overridden Application.initialize().
  2. jupyter/jupyter_core@jupyter_core/application.py#L246: self.update_config(<cmd-line>)
    same as above.
  3. jupyter/notebook@notebook/notebookapp.py#L1091: self.update_config(<notebook-dir>)
    in overridden Application.parse_command_line().
  4. ipython/ipython@IPython/core/magics/config.py#L156: configurable.update_config(<notebook-locals>)
    in %config line-magic (is the "user-provided high priority calls" you talked about @minrk?)
  5. x5 more usages in test-cases or other projects (e.g. nbgrader).

Now, for all of them, simply upgrading to this branch of traitlets would cause no harm, because:

  • no tag(envvar=...) would exist in these downstream projects;
  • any pre-existing trait-default from os.environ patterns would continue to work before, and configs would overwrite envvars.

Assuming that you Jupyter-devs want to really take advantage of this PR so you start adding tags to various traits.

  • Current defaults of respecting envvars would benefit x5 places of (5).
  • Of course (4) must NOT respect the environment, but apply strictly all user-defined locals from notebook-cell - so intervention needed.

For the rest, the situation requires intervention for both cases.
Specifically:

  • for cases (1) & (2), the overridden Application.initialize() method, seems "strange",
    since it replicates the logic already present in traitlet's Application.parse_command_line(),
    which it already invokes, first thing at the top.
    So maybe this method needs attention anyway, not to duplicate stuff.
    But even if defaults were NOT to respect env, they would require modifications, since
    [edit:] without any modifications,

    • envs-over-cfgs defaults --> envs ovewrite clashing cmd-lines;
    • cfgs-over-envs defaults --> cfgs ovewrite clashing envs;

    both behaviors are problematic.

  • case (3) would suffer from current defaults [edit:] and needs changes in both cases, as above.
    But i'm wondering why an intermediate config and calling update_config() is needed?
    Why not simply assign the filename to both traits, like this:

    if os.path.isdir(f):
        self.notebook_dir = f
    elif os.path.isfile(f):
        self.file_to_run = f
    

To conclude the situation of the x9 update_config() sites with current env-wins defaults:

  • x5 are fine;
  • x2 need modifications whatever defaults are choosen - and most probably need other changes as well (i.e. for not duplicating stuff or even stop overriding methods);
  • x1 suffers, but can retrofit not to depend on config_update() at all;
  • x1 suffers (the "user-provided high priority calls"),

Anyhow, choosing current defaults is 4 vs 7 changes on downstream code.
I believe backward compatibility of this PR is not a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another proof that compatibility does not suffer under this PR is that ALL pre-existing TCs passed this PR without further modifications - any TC-changes were only to strengthen this new feature.

So the remaining concern which is the "correct" order.
I vote for configs-->env-->cmdline, so bash-commands like this run as expected:

JUPYTER_HOME=/opt/foo jupyter notebook

and not having to worry about clashes with config-file.
And update_config() should enforce this rule by default, UNLESS some exceptional reason exists, i.e. configs sourced from cmd-line args or user-locals.

Choosing the opposite defaults, and hoping that downstream code will somehow achieve the above order, is too "optimistic".

Note-1: I did some corrections in my previous comment - please read it from GH.
Note-2: I can address your points one-by-one, @minrk, if you think it would facilitate discussion.

@@ -274,6 +286,11 @@ def class_get_trait_help(cls, trait, inst=None, helptext=None):
# include Enum choices
lines.append(indent('Choices: %s' % trait.info()))

env_var = trait.metadata.get('envvar')
if env_var:
env_info = 'Env-var: %s' % env_var
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Environment variable"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not dash between?

@ankostis
Copy link
Contributor Author

Why is ENV-VARS destined for 5.1.x?

It is an important change that would deserve a major bump - all downstream-libraries would become environment-aware, if their devs decide to tag their traits.
So I would suggest either 5.0 or 6.0, to help library users know what's inside.
Remembering that 5.0 had no env, but 5.1 had it, is a bit of a torture...no?

@minrk
Copy link
Member

minrk commented Aug 14, 2017

Why is ENV-VARS destined for 5.1.x?

Mainly because I'm trying (mostly in vain) to get traitlets close to ready for 5.0 release. Every new feature makes that farther and farther away as it increases the amount of docs and testing required before 5.0 can be released.

Remembering that 5.0 had no env, but 5.1 had it, is a bit of a torture...no?

Not at all. New non-breaking features are normally introduced in minor releases. Only breaking changes need major releases, but we can also mark major releases with sufficiently significant new features. A relatively minor convenience feature like this doesn't cross that bar, for me.

+ Had a double-indent in rst, forgotten `#  ` prefix in config file.
+ Add parametrized pytest TC.
@ankostis ankostis mentioned this pull request Aug 14, 2017
@rmorshea
Copy link
Contributor

rmorshea commented Aug 14, 2017

@ankostis, @minrk is this PR preferable to #246?

If so, I will close it.

Also, how does this differ from #157?

@minrk
Copy link
Member

minrk commented Aug 15, 2017

@rmorshea I think it is preferable to #246. I don't think #246 makes it much more convenient than what you can already do with a default method. I think we have to be able to do it with a single declarative tag (no method, no decorator) in order for it to be worthwhile. Differences to #157:

@minrk minrk modified the milestones: 5.0, 5.1 Aug 15, 2017
@minrk
Copy link
Member

minrk commented Aug 15, 2017

Another option would be to match #157, and make env vars uniformly lower priority than config. This would have the following benefits:

  1. it matches the behavior of ~all existing use of config via env vars in traitlets (@default methods that load from env are very common already, and the source of this feature request)
  2. it eliminates any ambiguity and complexity in .update_config

Downside: ignoring anything about implementation complexity, environment variables logically have higher priority than config files, since priority should generally correspond with time (files < env < cli < runtime .update_config())

But, of course this only comes up for cases where both config files and env vars specify a single value. So maybe it isn't worth the complexity of re-ordering priority, and we should simplify it to just a trait default, which is easy to understand in all edge cases, even if it's not ideal in the case of certain conflicts.

One reason I feel okay having only the trait default is that the pattern:

@default('trait')
def _trait_default(self):
    return os.environ.get('envvar') or true_default

is already very common, and I haven't encountered a single complaint about an environment variable failing to override a configuration file.

Since the origin of this feature request is making exactly this existing pattern more convenient, perhaps we should just match it.

@minrk minrk mentioned this pull request Aug 15, 2017
2 tasks
@ankostis
Copy link
Contributor Author

Since the origin of this feature request is making exactly this existing pattern more convenient, perhaps we should just match it.

Canceling the order of this PR makes this bash-cmd pattern impossible:

JUPYTER_HOME=/opt/foo jupyter notebook

ankostis added a commit to ankostis/traitlets that referenced this pull request Aug 16, 2017
@ankostis
Copy link
Contributor Author

After studying the situation with downstream jupyter/ipython, i think there is an alternative that @minrk you would agree:

  • Do NOT modify update_config () at all, and
  • add instead a new apply_env() method that
  • gets to be invited at the end by parse_command_line(), and maybe by load_config_files().

That way we don't have the new kwd breaking the API.
What do you think?
(If you agree, disregard all my previous comments).

On a minus side, traits for which both config and environ values exist, will get to be written twice.

@minrk
Copy link
Member

minrk commented Aug 17, 2017

An explicit update_config_from_env seems like a good idea, since it takes away all of the internal magic of priority resolution.

@ankostis
Copy link
Contributor Author

ankostis commented Aug 17, 2017

...i was a bit too quick to jump to conclusions :-(

An explicit update_config_from_env seems like a good idea, since it takes away all of the internal magic of priority resolution.

...unfortunately, along with the magic, you loose the free ride...in order to enable envars,
you need to modify ALL update_config(cfg) call sites and add a new line below, replace the line with another one calling update_config_from_env(cfg).

For instance, the table below roughly describes the "effort" required to enable envvars for the various combinations
(number of update_config() call-sites requiring modifications):

ipython/jupyter co2dice(*)
env-over-cfg 4 2 0
cfg-over-env 7 44 24
new update_config_from_env() method 9 24 26

(*) our app relying on traitlets.

ankostis added a commit to ankostis/traitlets that referenced this pull request Aug 17, 2017
…aking API

+ Have `update_config()` delegate to NEW METHOD
`update_config_with_env(..., skip_env)`.
+ Keep "env-over-cfg" as default behavior of `update_config()`.
+ Add `skip_env` kwd also in `load_config_file()`;
  this is not an API-break change, bc is not called from anywhere.
@ankostis
Copy link
Contributor Author

ankostis commented Aug 17, 2017

Pushed 1 more commit fixing the API-breaking new keyword skip_env of update_config() method.
My solution was to add a new method update_config_with_env(..., skip_env=False) that update_config() delegates to. Client code whishing to controll the default (such as the high-priority ipython user invocations) have to use this new method.

Regarding the optimal "default" for update_config() I've kept them unchanged (env-over-cfg);
the discussion i think boils down to

  • which case needs more work downstream (see my table above), and
  • which is the "expected" behavior when calling update_config(),

assuming that the client indeed wants to enable the new feature.
Regardless of the above, the update_config() method is central, and the defaults MUST apply here, or else, changes will proliferate to client code (particularly test-cases).

I remind other open issues for this PR (see also OP):

  • Documentation?
  • Print the envvar value in the help description?
  • Control skip_env default from Configurable constructor?
  • [edit:] Any other code-paths needing inspection (e.g. parse_command_line())?
  • Merged configs in Configurable.config do not contain env-vars; is that an issue?

@minrk
Copy link
Member

minrk commented Aug 17, 2017

I realize now where I was a bit wrong, and it's that update_config doesn't load environment variables into the config object. This means that evaluating the env flag has to be done every time config is loaded.

This doesn't fit with the fact that we construct a single config object from config files and CLI options, some of which belongs lower priority than env and some of which belongs higher priority than env. So update_config with env priority either front or back is always wrong - either CLI config is overridden (wrong) or config files override env (also wrong).

I think this is why @ellisonbg pointed toward implementing env vars entirely in traitlets.config rather than in the base traitlets, because to implement the priority correctly, env handling really needs to be a fundamental part of constructing the config object, and not handled by traits directly.

I think we have the following choices:

  • change how config is constructed from env (populate the config object at application launch time from env variables in between config files and CLI options)
  • implement env only at the traits level, and only simplify the default traits constructor (always lower priority than config), which people have already been doing for years
  • change how config objects are loaded, so that a series of config objects are stored, with clear priority level, rather than merging into a single config object. This will be extremely hard to make backward-compatible, and probably not appropriate given the conservative nature of this repo.

Given this reality, my inclination is to only define the convenience methods for getting defaults for traits from the env, and preserve these as lower priority than all config (i.e. traitlets.config has no awareness of this feature at all). This is the pattern that's widely in use today and easy to understand, just a bit tedious to type.

There is precisely one, narrow use case not met by this: overriding already-specified config with environment variables. Since this is already something that doesn't exist, I don't think it's a big loss given the complexity of implementing it.

@ankostis
Copy link
Contributor Author

ankostis commented Aug 17, 2017

I think this is why @ellisonbg pointed toward implementing env vars entirely in traitlets.config rather than in the base traitlets, ...

I have carefully structured this PR to cover functionality all 2 3 layers (<==>modules): traitlet, config, application; i believe it's not possible to achieve full env-var functionality by affecting any one layer alone. Actually, traitlets alone support env-vars, without config-layer involvement (i.e. tag(config=False)).

... because to implement the priority correctly, env handling really needs to be a fundamental part of constructing the config object, and not handled by traits directly.
...

  • change how config is constructed from env (populate the config object at application launch time from env variables in between config files and CLI options)

At some point I contemplated going down that road - but stopped couldn't think of all possible use-cases. Config instances are swiss-army knifes, and fitting-in "magic" functionality seemed easier in the trait/_load_config() logic. I'm not equipped to judge this, and not that familiar with the code to implement it here. But in principle i don't disagree either.

  • implement env only at the traits level, and only simplify the default traits constructor (always lower priority than config), which people have already been doing for years

People can continue doing this, and nothing here will get in their way. This PR is for something different, env having higher priority than configs.

change how config objects are loaded, so that a series of config objects are stored, with clear priority ...

Yes, I know, I've already started doing something similar in my app for "runtime" configs stored in different files, etc, But as you said, too ambitious. I aimed for the low hanging fruit:

JUPYTER_NOTEBOOK_DIR=/opt/foo jupyter notebook

There is precisely one, narrow use case not met by this: overriding already-specified config with environment variables. Since this is already something that doesn't exist, I don't think it's a big loss given the complexity of implementing it.

Well, this use-case, having env-var for values already in config-files, is pretty-common, as shown just above.
If we don't cover this, this PR won't worth its weight.

Still, i haven't understand what is the penalty for hooking in the new functionality in the config layer?
Let me be clear on this: existing clients don't see ANY side-effects!

@ellisonbg
Copy link
Member

I haven't been following this, but one thing to keep in mind. I fully agree with @minrk 's assessment that ENV VAR handling is very subtle and can't be only before/after other stages.

ankostis added a commit to ankostis/traitlets that referenced this pull request Aug 17, 2017
…aking API

+ Have `update_config()` delegate to NEW METHOD
`update_config_with_env(..., skip_env)`.
+ Keep "env-over-cfg" as default behavior of `update_config()`.
+ Add `skip_env` kwd also in `load_config_file()`;
  this is not an API-break change, bc is not called from anywhere.
ankostis added a commit to ankostis/traitlets that referenced this pull request Aug 17, 2017
…aking API

+ Have `update_config()` delegate to NEW METHOD
`update_config_with_env(..., skip_env)`.
+ Keep "env-over-cfg" as default behavior of `update_config()`.
+ Add `skip_env` kwd also in `load_config_file()`;
  this is not an API-break change, bc is not called from anywhere.
ankostis added a commit to ankostis/traitlets that referenced this pull request Aug 17, 2017
…aking API

+ Have `update_config()` delegate to NEW METHOD
`update_config_with_env(..., skip_env)`.
+ Keep "env-over-cfg" as default behavior of `update_config()`.
+ Add `skip_env` kwd also in `load_config_file()`;
  this is not an API-break change, bc is not called from anywhere.
…aking API

+ Have `update_config()` delegate to NEW METHOD
`update_config_with_env(..., skip_env)`.
+ Keep "env-over-cfg" as default behavior of `update_config()`.
+ Add `skip_env` kwd also in `load_config_file()`;
  this is not an API-break change, bc is not called from anywhere.
- Use that method instead, if you don't want any env-vars to apply.
"""
## Had to split methods to preserve BW-compatibility for new `skip_env`.
self.update_config_with_env(config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to have skip_env=True to preserve behavior.

class App(Application):
a = Unicode('def').tag(config=True, envvar='MY_ENVVAR')
b = Unicode().tag(config=True, envvar='MY_ENVVAR')
aliases = {'a': 'App.a', 'b': 'App.b'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to instantiate an additional Configurable in initialize to test that priority resolves correctly.

@minrk
Copy link
Member

minrk commented Aug 18, 2017

Given the defined priority order:

  1. program defaults (trait.default_value, dynamic_defaults),
  2. config-files,
  3. env-var,
  4. command-line arguments,
  5. values directly assigned.

This PR cannot achieve the correct ordering of env-var without introducing much more complicated changes than I think we should allow. The priority resolves correctly for the Application object itself, but not any of the objects instantiated later by the Application. It can only reliably achieve env higher than all config or lower than all config, due to the use of a single config object.

Consider the order of events for initializing an application with configurables:

  1. load CLI arguments into app.config
  2. load config files into app.config, allowing CLI config to take priority
  3. instantiate configurable objects with parent=app, loading the app config

At this point, all objects receive a single config object with values that should be both higher and lower priority than env, with no way to distinguish them. These instantiations are deeply nested down the chain of all objects, and happen at arbitrary points in time. The only guarantee is that there is a single config object being passed around.

So the only achievable goal I see is to pick between:

  • env variables override all config (breaks several APIs)
  • all config overrides env variables

The only thing that seems achievable and consistent is traitlets-level env-defaults that traitlets.config is not aware of. This behavior is consistent and understandable, and already in wide use today, if slightly tedious. Addressing that inconvenience is all that we are after with #99. Solving the priority issue would be nice, but doesn't seem doable without introducing significant new problems.

Still, i haven't understand what is the penalty for hooking in the new functionality in the config layer?

It's not that there's a penalty, it's that implementing it via two systems makes it more complicated and more difficult to implement correctly, as illustrated by the fact that this PR doesn't implement the priority it aims to, and I can't see a way for it to accomplish that without much more complicated changes.

Here is a test demonstrating several very basic cases where the priority resolution of this PR is not as intended.

In every case, the foo and app sources should match because they have config defined at the same levels.

Let me be clear on this: existing clients don't see ANY side-effects!

I think this is a lot harder to say than that. Jupyter applications instantiate many objects that can be provided by extensions. If those extensions, which are already using the existing @default implementation of environment variable defaults (the goal of #99), it is important that they behave in a consistent way, with no changes to the Application object that instantiates them.

@ankostis
Copy link
Contributor Author

ankostis commented Aug 18, 2017

Thank you for pointing me the problem with child-configurables.
BTW, you've just answered 1 open-task question right on-the-spot (and read my respective comment in the code for this), and the answer unfortunately has grave consequences!

That's a blocker.

So although I could revert all changes except the 1st and half of the 2nd commit (for the env-var info msgs), I prefer not to!

I'm afraid that if we go down that road, we will stuck with cfg-overwrites-env for ever, due to BW-compatibility pressure from client code already having adopted the new envvar tag in traits-level,
end expecting this behavior.

So I prefer not to push anything at all!
For me (and anyone else working with cmd-line programs), the ordering of things is of paramount importance.
Without it, i can fallback to ad-hoc @property methods that enforce the right resolution order on access-time; it's a decent workaround.

@minrk
Copy link
Member

minrk commented Aug 18, 2017

So I prefer not to push anything at all!

That's okay! After this, I think we understand the challenges associated with supporting env properly in config much better. I think I'll go ahead with the traitlets-level implementation of env defaults (approximately #157, unchanged), and we can come back to the idea of something to do in config if it's deemed a priority.

Applications can always define their own env -> config in their initialize step, which would allow proper priority resolution.

@minrk minrk closed this Aug 18, 2017
@ankostis
Copy link
Contributor Author

ankostis commented Aug 18, 2017

Just to leave it as a trace for the future:

This PR closes NOT because it proposed BW-incompatible changes!
It couldn't deliver the orderings promised.

I think I'll go ahead with the traitlets-level implementation
...
Applications can always define their own env -> config in their initialize step, which would allow proper priority resolution.

@minrk But how?
Please, don't just add envvar tag to traits-level, because THEN you do introduce BW-incompatible changes if you try to fix the mess in the future.

@minrk
Copy link
Member

minrk commented Aug 18, 2017

But how?

in initialize:

def initialize(self, argv=None):
    super().initialize(argv)
    self.load_config_file(self.config_file)
    env_config = self.config_from_env() # env -> config defined entirely by the Application
    env_config.update(self.cli_config) # preserve CLI priority
    self.update_config(env_config)

This will precisely implement environment-variable config support at the right priority in a single config object. The missing piece is the env->config transformation, which Applications are free to define. The most important thing about this: environment variables are not taken into account during traitlets initialization without config, which is the source of the conflict that breaks priority resolution.

What I think is never going to work: env defined on traits hooked up to trait defaults, resolved at config loading time with appropriate priority.

Please, don't just add envvar tag to traits-level, because THEN you do introduce BW-incompatible changes if you try to fix the mess in the future.

I believe we should do this, for these reasons:

  1. I frankly don't expect we will deliver proper env support in traitlets.config, due to the complexity inherent there, and the relatively low benefit over the low hanging fruit of env-based trait defaults (independent of config)
  2. getting defaults from env has proven useful for years and should be more convenient. It's not a mess.
  3. If traitlets.config is to get proper env support, it should be done in a totally different way, and not use the same metadata, so there would be no conflict.

If anything, this discussion has clarified that getting trait defaults from environment variables, a common existing pattern that we can make more convenient, is largely unrelated and should be implemented in a totally different way than loading config from environment variables, which should always and exclusively result in populating the single global Config object.

@ankostis
Copy link
Contributor Author

ankostis commented Aug 18, 2017

environment variables are not taken into account during traitlets initialization without config, which is the source of the conflict that breaks priority resolution.

You mean "during traitlets Application initialization" , no?
Even with this correction, i think I cannot fully understand the "without config" term.
You mean "without the env-values ending-up into the final global unique Config object", correct?

  1. If traitlets.config is to get proper env support, it should be done in a totally different way, and not use the same metadata, so there would be no conflict.

I would really appreciate any hint on that.

... should be implemented in a totally different way than loading config from environment variables, which should always and exclusively result in populating the single global Config object.

Why do you say a "single global Config object"? Application already uses 2 config-objects (Configurable.config and Application.cli_config).
Are you thinking of deprecating the later?

BTW, implementing Application.config_from_env() based on the classes attribute seems like a feasible task, no?

@minrk
Copy link
Member

minrk commented Aug 18, 2017

You mean "during traitlets Application initialization" , no?

No, I mean traitlets. Instantiating the object without the config object shouldn't load from the env. It should go env -> config -> trait, not env -> trait -> config -> trait.

I would really appreciate any hint on that.

I don't know how to do it, which is why I said that I don't think it's likely to happen. All I know is that based on this experience, it will not look like what we've tried here.

Why do you say a "single global Config object"? Application already uses 2 config-objects (Configurable.config and Application.cli_config).

No, it uses one. There are a few temporary variables used for ensuring that the single global config loads things with the right priority, but it always ends up in a single global self.config, which is the only config that any objects other than the Application do or should get. The Application has exactly two responsibilities:

  1. construct a single Config object
  2. instantiate classes, passing the single config to them, as appropriate.

BTW, implementing Application.config_from_env() based on the classes attribute seems like a feasible task, no?

For simple, non-extensible cases where .classes is comprehensive, yes, I think it is. In general, no, because there is no guarantee that all classes are (or ever can be) in that list. .classes generally only includes the classes necessary to produce aliases, not every possible class that might be configured during the lifetime of the application. It has to work correctly, even if that list is empty and arbitrary user-defined classes are imported and instantiated at an arbitrary time, only with:

Class(config=cfg)

So the config object has to be correct for all objects that might be instantiated by the time Application is done constructing it, even for objects the Application doesn't know about that may not have been imported.

@ankostis
Copy link
Contributor Author

ankostis commented Aug 18, 2017

So you are thinking of annotating the values in a Config objects with a "source" tag, like cli, or env cfg?
[edit: deciding the env tagged values is impossible to know without visiting all HasTrait classes, see @minrk problem explained above]

And then use this tag on update_config() to enforce the desired ordering?
Or even better, let the user extend it with it's own tags with tag-->order_num mapping,
for allowing arbitrary many config-sources.

Does that sound like a an alternative worth exploring?

@rmorshea
Copy link
Contributor

rmorshea commented Aug 18, 2017

@minrk, @ankostis perhaps I'm naive, but what prevents someone from simply integrating env-vars into a python config file and loading that in the desired order? This requires someone to load the env-var config first, but perhaps we could include an env_config keyword in Configurable's constructor that would handle that.

@ankostis
Copy link
Contributor Author

ankostis commented Aug 18, 2017

@rmorshea that would work, assuming it is feasible to convert all envvar-tagged traits into file-configs.
A problem is how to discover them all such traits, WITHOUT relying on the Application.classes attribute, as @minrk explained just above.

And assuming that problem is solvable, why create an intermediate file (or a special Loader for instance) and not just merge those trait-values straight back into the main Application.config object?

@rmorshea
Copy link
Contributor

@ankostis, I don't mean that you would be able to tag traits with envvar="MY_ENVVAR", but rather, that all env-var specifications would be done by the user in a python config file:

import os

C.MyClass.my_trait = os.environ.get("MY_ENVVAR", "default")

That or, as you said we could create some special env-var ConfigLoader that is expected in the keyword env_config. I don't what the best way to implement such a loader would be, but it's a thought.

I suggest this option simply because it leverages features that traitlets already provides, and avoids the need to dig deep into it's depths to implement env-vars priority.

@minrk minrk modified the milestones: no action, 5.0 Aug 18, 2017
@minrk
Copy link
Member

minrk commented Aug 18, 2017

And then use this tag on update_config() to enforce the desired ordering?
Or even better, let the user extend it with it's own tags with tag-->order_num mapping,
for allowing arbitrary many config-sources.
Does that sound like a an alternative worth exploring?

No, nothing that complicated. Just load the config in increasing priority order, like we already do for CLI config and file config, as I described in the snippet above:

  1. load file config
  2. load env config (application defined, because this isn't part of traitlets)
  3. load cli config

(with the bit of fiddling we have because we have to load CLI config first and last because we use it to find the config files)

Done. One config object with all the right values. No tagging, no tracking of config sources. Very straightforward. The missing piece is how to construct the env config without access to the classes that will be configured. If you have that, it's not a problem. But we don't in general, and a partial solution is no good.

If your application knows all of its potential configurable classes, you can go ahead and implement env config right now in your Application. This can be done today, right now, with no modifications to traitlets. Just not in the base traitlets.config.Application due to the generality requirements. Here's an example:

def load_env_config(class_list):
    """Create a config object for any traits with `config_env` metadata"""
    config = Config()
    seen = set()
    for config_cls in class_list:
        for cls in config_cls.mro():
            if cls is Configurable:
                break
            if cls in seen:
                continue
            seen.add(cls)
            for name, trait in cls.class_own_traits(config=True).items():
                key = trait.metadata.get('config_env')
                if key and os.environ.get(key):
                    config[cls.__name__][name] = os.environ[key]
    return config

example

@rmorshea loading env in a config files certainly does work, and people do it all the time. It's just that the burden of defining and handing env is then on the user rather than the developer, i.e. env vars are a feature of your personal config file, rather than a feature of the application. This can be nice, and I use it sometimes. It doesn't achieve our primary goal, which is to make this common pattern:

trait = Unicode(config=True)
@default('trait')
def _trait_default(self):
    return os.environ.get('ENV_VAR') or 'actual-default'

a bit more convenient.

I really think we should stop looking for a general solution for environment-based config and implement simple, straightforward environment defaults for traitlets via a metadata tag. It solves a real issue that real people have and doesn't cause any.

@rmorshea
Copy link
Contributor

I'd need to look into it tomorrow, but I think I have an idea for tags that won't intrude on traitlets.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants