Envvars #439

ankostis · 2017-08-12T21:36:04Z

I believe I Implemented #99 for receiving trait-values from environment-variables with only minimal changes to the codebase.
The priority for the source of trait-values is the following (bottom wins):

program defaults (trait.default_value, dynamic_defaults),
config-files,
env-var,
command-line arguments,
values directly assigned.

Example:

class App(Application):
    a = Unicode('def').tag(config=True,
                           envvar='MY_ENVVAR')
    b = Unicode().tag(config=True,
                      envvar='MY_ENVVAR')
    aliases = {'a': 'App.a', 'b': 'App.b'}

    @default('b')
    def set_a_dyn(self):
        return 'def'


cfg = Config()
cfg.App.a = cfg.App.b = 'cfg'

app = App()
assert app.a == app.b == 'def'

app = App(config=cfg)
assert app.a == app.b == 'cfg'

app.parse_command_line(['-a=cmd', '-b=cmd'])
assert app.a == app.b == 'cmd'


os.environ['MY_ENVVAR'] = 'env'


app = App()
assert app.a == app.b == 'env'

app = App(config=cfg)
assert app.a == app.b == 'env'

app.parse_command_line(['-a=cmd', '-b=cmd'])
assert app.a == app.b == 'cmd'

And this is the help string:

>>> print(App.class_get_trait_help(App.a))
--App.a=<Unicode>
    Env-var: MY_ENVVAR
    Default: 'def'

Implementation:

The feature is implemented in 3 forward-dependent commits:

trait-layer:
The TraitType.metadata['envvar'] stores the name of the environment-variable.
That metadata is checked in TraitType.get(), just before defaults.
If the env-var exists, its value fetched and cached in _trait_values dict, as a normal value.
No specific effort is made for converting string env-vars - use of coercing traits is recommended.
configurable-layer:
the Configurable._load_config() does an exception for traits with an existing env-var;
in that case, it does not load their values from configs but from the env-var.
application-layer:
the Application.parse_command_line() invokes Configurable._load_config(skip_env=True)
to bypass the above rule for its command-line config-params.
(the skip_env keywords had to be added on update_config() for this to work).

Rational:

Dynamic defaults (default initialization of configurable traits from os.environ #158, add envvar default handler #246) wouldn't achieve the desired functionality because configs would overwrite ENV-VAR values.
Using a metadata at the trait-layer allows for config and app layers to do informed decisions.
It satisfies the @carreau's call for carefullness on the precendence rule
and for automatic doc-generation in the presence of env-vars..

Open issues:

Documentation: Where to describe the new capability? Maybe collect all metadata usages in some place?
Should we print the envvar-value in the help description, if var exists?
Should we add a new Configurable.skip_env trait (not tagged as config of course)?
I would vote for *yes. Otherwise, it's impossible to affect this behavior from the constructor, when a config object is given - as it is now, environment always applies, and it overwrites any configs.
But adding a new special-purpose trait on Configurable class is too obtrusive for me to decide.
[edit:] A cheaper alternative would be to add a special skip_env kwd in configurable's cstor.
Is there any other code-path that needs to become env-aware? Is _load_config() & parse_command_line() enough?

+ When getting trait-value, try 1st env-var, then fallback to defaults.

+ Don't load config-values on traits with existing ENV-VAR. + Print ENV-VAR used in class-help, config and RsT msgs. + Add/update TCs.

+ Cmd-line configs override ENV-VARS + Add/update TCs.

+ When getting trait-value, try 1st env-var, then fallback to defaults.

+ Don't load config-values on traits with existing ENV-VAR. + Print ENV-VAR used in class-help, config and RsT msgs. + Add/update TCs.

+ Cmd-line configs override ENV-VARS + Add/update TCs.

minrk · 2017-08-14T12:06:35Z

traitlets/config/configurable.py

@@ -402,6 +424,10 @@ def class_config_rst_doc(cls):
                termline += ' : ' + ttype
            lines.append(termline)

+            env_var = trait.metadata.get('envvar')
+            if env_var:
+                lines.append(indent(indent('Env-var: ``%s``' % env_var, 4)))


indent(indent( typo?

Fixed.
Were missng also '# ' prefix in write-config comment-line.
Now had added new parameterized pytest TC checking new comment-line in all 3 cases.

minrk · 2017-08-14T12:06:51Z

traitlets/config/configurable.py

+
+                env_var = trait.metadata.get('envvar')
+                if env_var:
+                    lines.append('Env-var: %s' % env_var)


'Environment variable:'

minrk · 2017-08-14T12:11:38Z

traitlets/traitlets.py

+            that no `envvar` metadata is defined.
+        """
+        env_var = self.metadata.get('envvar')
+        return env_var and os.environ.get(env_var)


In general, any environment variable defined and empty should behave the same as that environment variable being undefined. It's probably appropriate to return Undefined in this case.

Not necessarily.
I may want to override some config-param NOT to the empty-string('') but to None - that's a common need.

Mind that the code for coercing config-values does not run for env-vars (correct?).

(oups, was to ambitious, and had forgotten the purpose of my here)

The point above was NOT to allow for None; it was for empty-string('') to pass through to the trait - that's a very common case.
None is impossible to signify from environment; that is why I used it as a sentinel to mean "no-env" var exists at all.

that's a very common case.

It is also a very common case where setting an environment variable to an empty string is meant to be treated the same as undefined. I have not personally encountered a case where setting an environment variable to an empty string should be treated differently to the environment variable not being set at all. Please make sure that env being an empty string is treated the same as the value not being defined. Also make sure that this value returns Undefined for no value being set, so that traits that allow None can still use this.

But then how can I set a trait to the empty-string with an envvar?

minrk · 2017-08-14T12:21:13Z

traitlets/config/configurable.py

@@ -203,8 +210,13 @@ def _config_changed(self, change):
        section_names = self.section_names()
        self._load_config(change.new, traits=traits, section_names=section_names)

-    def update_config(self, config):
-        """Update config and load the new values"""
+    def update_config(self, config, skip_env=False):


Maybe override_env=True as the default? skip_env is ambigous as to whether it means "skip loading environment variables" or "skip config also set by environment variables".

Also, the default behavior should be reversed. Calling .update_config explicitly should load all of the config by default. The exception is loading default config files, which should be applied at lower-than-env priority.

I always try to have booleans defaulting to False.
That's the default for traits, and many other programming domains.
Also it allows to work as planned if None is given.

Now why should it be reversed?
If someone puts in his code the tag(envvar='SOMEVAR'), why would he not want to use this, by default?
If he didn't like it, he wouldn't put the env-var there, it in the first place; he could have used dynamic-defaults.
That's the essence of this PR, no?

... ambigous as to whether it means "skip loading environment variables" or "skip config also set by environment variables".

Sorry, I think i may have missunderstood what you meant.
Can you elaborate a bit?

When I run:

cfg = Config(key=value) obj.update_config(cfg)

all of that configuration should take effect. Not just what hasn't been specified via a particular mechanism. It is not update_config in general that should be lower priority than environment variables, but load_config_files in particular.

I always try to have booleans defaulting to False.

I don't think that matters. The name should be the clearest choice and the value should be the logical default behavior. But if you have a clear name where False clearly means "override environment variables", that's okay, too.

It is not update_config in general that should be lower priority than environment variables, but load_config_files in particular.

I see now.
respect_env=False for update_config() should be clear, no?

There is also here the API-backward-compatibility problem, with the new kwd in update_config() (_load_config() is private, so i guess it's ok).
So what? Should better add it as a trait on configurable? Then it would be difficult to enforce a different default from load_config_file().
Maybe breaking API change for this kwd is the only way here.

I came to the conclusion that not respecting the environment, by default,
is not the correct behavior for this new feature.
Let me explain why:

Previous versions of traitlets had not any envvar capability.
Newer versions of traitlets would support that, but client-apps would NOT get that for free; they have to add tag(envvar='FOO') to specific traits. Otherwise, they will continue to behave EXACTLY as before.

Now, having by default respect_env=False means that for clients, modifying their tags is not enough; they must also do changes to all update_config()/load_config_file() sites.
For instance, in my application that's in 4 places + ~20 sites in test-cod.
That's a big effort. But why?

That's the point of a new feature:
it should come preloaded with all hassle-free defaults, no?
For this PR, it is to follow the priority given in the OP of #439.

That practical point is this:

For old clients, nothing will change.

For those wanting the new functionality, the current defaults makes more sense.

That's a big effort. But why?

Because it breaks expectations of existing code, that .update_config will load the config given.

For those wanting the new functionality, the current defaults makes more sense.

I disagree. update_config(config) loading the config given makes more sense to me.

it should come preloaded with all hassle-free defaults, no?

Yes, though I think your assessment of hassle-free is the reverse of mine. Plus backward-compatibility is of absolute utmost priority for traitlets. Changing the default behavior of .update_config to have no effect if a lower priority config source is given is not better than preserving the existing behavior of respecting the arguments of the function call. User calls to .update_config(config) not loading that config is a backward-incompatible change and more surprising behavior.

The only calls to .update_config that ought to have lower priority than environment variables are those in Application.load_config_file. That should not be true for all config objects in general loaded at any time.

The default in this PR would require updates to IPython, because the only calls to .update_config in IPython/Jupyter projects are explicit user-provided high priority calls that should override environment variables. If the default were override_env=True and Application.load_config_file set override_env=False, then everything would behave correctly and nothing would need updating.

It is uttermost important to keep this PR backward-compatible, so I did a small study.

I searched through all jupyter/ipython sources for update_config() invocations
and I singled out the following x4 (+ x5) config_update() usages:

ipython/ipython@IPython/core/application.py#L462: self.update_config(<cmd-line>)
in overridden Application.initialize().

jupyter/jupyter_core@jupyter_core/application.py#L246: self.update_config(<cmd-line>)
same as above.

jupyter/notebook@notebook/notebookapp.py#L1091: self.update_config(<notebook-dir>)
in overridden Application.parse_command_line().

ipython/ipython@IPython/core/magics/config.py#L156: configurable.update_config(<notebook-locals>)
in %config line-magic (is the "user-provided high priority calls" you talked about @minrk?)

x5 more usages in test-cases or other projects (e.g. nbgrader).

Now, for all of them, simply upgrading to this branch of traitlets would cause no harm, because:

no tag(envvar=...) would exist in these downstream projects;

any pre-existing trait-default from os.environ patterns would continue to work before, and configs would overwrite envvars.

Assuming that you Jupyter-devs want to really take advantage of this PR so you start adding tags to various traits.

Current defaults of respecting envvars would benefit x5 places of (5).

Of course (4) must NOT respect the environment, but apply strictly all user-defined locals from notebook-cell - so intervention needed.

For the rest, the situation requires intervention for both cases.
Specifically:

for cases (1) & (2), the overridden Application.initialize() method, seems "strange",
since it replicates the logic already present in traitlet's Application.parse_command_line(),
which it already invokes, first thing at the top.
So maybe this method needs attention anyway, not to duplicate stuff.
But even if defaults were NOT to respect env, they would require modifications, since
[edit:] without any modifications,

envs-over-cfgs defaults --> envs ovewrite clashing cmd-lines;

cfgs-over-envs defaults --> cfgs ovewrite clashing envs;

both behaviors are problematic.

case (3) would suffer from current defaults [edit:] and needs changes in both cases, as above.
But i'm wondering why an intermediate config and calling update_config() is needed?
Why not simply assign the filename to both traits, like this:

if os.path.isdir(f): self.notebook_dir = f elif os.path.isfile(f): self.file_to_run = f

To conclude the situation of the x9 update_config() sites with current env-wins defaults:

x5 are fine;

x2 need modifications whatever defaults are choosen - and most probably need other changes as well (i.e. for not duplicating stuff or even stop overriding methods);

x1 suffers, but can retrofit not to depend on config_update() at all;

x1 suffers (the "user-provided high priority calls"),

Anyhow, choosing current defaults is 4 vs 7 changes on downstream code.
I believe backward compatibility of this PR is not a problem.

Another proof that compatibility does not suffer under this PR is that ALL pre-existing TCs passed this PR without further modifications - any TC-changes were only to strengthen this new feature.

So the remaining concern which is the "correct" order.
I vote for configs-->env-->cmdline, so bash-commands like this run as expected:

JUPYTER_HOME=/opt/foo jupyter notebook

and not having to worry about clashes with config-file.
And update_config() should enforce this rule by default, UNLESS some exceptional reason exists, i.e. configs sourced from cmd-line args or user-locals.

Choosing the opposite defaults, and hoping that downstream code will somehow achieve the above order, is too "optimistic".

Note-1: I did some corrections in my previous comment - please read it from GH.
Note-2: I can address your points one-by-one, @minrk, if you think it would facilitate discussion.

minrk · 2017-08-14T12:21:22Z

traitlets/config/configurable.py

@@ -274,6 +286,11 @@ def class_get_trait_help(cls, trait, inst=None, helptext=None):
            # include Enum choices
            lines.append(indent('Choices: %s' % trait.info()))

+        env_var = trait.metadata.get('envvar')
+        if env_var:
+            env_info = 'Env-var: %s' % env_var


"Environment variable"

Not dash between?

ankostis · 2017-08-14T13:38:17Z

Why is ENV-VARS destined for 5.1.x?

It is an important change that would deserve a major bump - all downstream-libraries would become environment-aware, if their devs decide to tag their traits.
So I would suggest either 5.0 or 6.0, to help library users know what's inside.
Remembering that 5.0 had no env, but 5.1 had it, is a bit of a torture...no?

minrk · 2017-08-14T14:06:44Z

Why is ENV-VARS destined for 5.1.x?

Mainly because I'm trying (mostly in vain) to get traitlets close to ready for 5.0 release. Every new feature makes that farther and farther away as it increases the amount of docs and testing required before 5.0 can be released.

Remembering that 5.0 had no env, but 5.1 had it, is a bit of a torture...no?

Not at all. New non-breaking features are normally introduced in minor releases. Only breaking changes need major releases, but we can also mark major releases with sufficiently significant new features. A relatively minor convenience feature like this doesn't cross that bar, for me.

+ Had a double-indent in rst, forgotten `# ` prefix in config file. + Add parametrized pytest TC.

…mment-line

rmorshea · 2017-08-14T23:00:39Z

@ankostis, @minrk is this PR preferable to #246?

If so, I will close it.

Also, how does this differ from #157?

minrk · 2017-08-15T09:40:13Z

@rmorshea I think it is preferable to #246. I don't think #246 makes it much more convenient than what you can already do with a default method. I think we have to be able to do it with a single declarative tag (no method, no decorator) in order for it to be worthwhile. Differences to #157:

ENH: Support default initialization from os.environ. #157 defines default_from_env, which allows traits to define how to interpret environment strings (this PR only allows text traits to come from envs)
This PR aims to address a possible priority issue in ENH: Support default initialization from os.environ. #157 (env vars have higher priority than config files)
ENH: Support default initialization from os.environ. #157 signals envvar as a top-level trait attribute, while this PR uses a metadata tag. I'm not sure I have a preference there.

minrk · 2017-08-15T12:07:48Z

Another option would be to match #157, and make env vars uniformly lower priority than config. This would have the following benefits:

it matches the behavior of ~all existing use of config via env vars in traitlets (@default methods that load from env are very common already, and the source of this feature request)
it eliminates any ambiguity and complexity in .update_config

Downside: ignoring anything about implementation complexity, environment variables logically have higher priority than config files, since priority should generally correspond with time (files < env < cli < runtime .update_config())

But, of course this only comes up for cases where both config files and env vars specify a single value. So maybe it isn't worth the complexity of re-ordering priority, and we should simplify it to just a trait default, which is easy to understand in all edge cases, even if it's not ideal in the case of certain conflicts.

One reason I feel okay having only the trait default is that the pattern:

@default('trait')
def _trait_default(self):
    return os.environ.get('envvar') or true_default

is already very common, and I haven't encountered a single complaint about an environment variable failing to override a configuration file.

Since the origin of this feature request is making exactly this existing pattern more convenient, perhaps we should just match it.

ankostis · 2017-08-16T16:01:04Z

Since the origin of this feature request is making exactly this existing pattern more convenient, perhaps we should just match it.

Canceling the order of this PR makes this bash-cmd pattern impossible:

JUPYTER_HOME=/opt/foo jupyter notebook

ankostis · 2017-08-16T22:53:34Z

After studying the situation with downstream jupyter/ipython, i think there is an alternative that @minrk you would agree:

Do NOT modify update_config () at all, and
add instead a new apply_env() method that
gets to be invited at the end by parse_command_line(), and maybe by load_config_files().

That way we don't have the new kwd breaking the API.
What do you think?
(If you agree, disregard all my previous comments).

On a minus side, traits for which both config and environ values exist, will get to be written twice.

minrk · 2017-08-17T07:40:01Z

An explicit update_config_from_env seems like a good idea, since it takes away all of the internal magic of priority resolution.

ankostis · 2017-08-17T09:36:06Z

...i was a bit too quick to jump to conclusions :-(

An explicit update_config_from_env seems like a good idea, since it takes away all of the internal magic of priority resolution.

...unfortunately, along with the magic, you loose the free ride...in order to enable envars,
you need to modify ALL update_config(cfg) call sites and ~~add a new line below,~~ replace the line with another one calling update_config_from_env(cfg).

For instance, the table below roughly describes the "effort" required to enable envvars for the various combinations
(number of update_config() call-sites requiring modifications):

	ipython/jupyter	co2dice(*)
env-over-cfg	4	2 0
cfg-over-env	7	44 24
new `update_config_from_env()` method	9	24 26

(*) our app relying on traitlets.

…aking API + Have `update_config()` delegate to NEW METHOD `update_config_with_env(..., skip_env)`. + Keep "env-over-cfg" as default behavior of `update_config()`. + Add `skip_env` kwd also in `load_config_file()`; this is not an API-break change, bc is not called from anywhere.

ankostis · 2017-08-17T11:35:01Z

Pushed 1 more commit fixing the API-breaking new keyword skip_env of update_config() method.
My solution was to add a new method update_config_with_env(..., skip_env=False) that update_config() delegates to. Client code whishing to controll the default (such as the high-priority ipython user invocations) have to use this new method.

Regarding the optimal "default" for update_config() I've kept them unchanged (env-over-cfg);
the discussion i think boils down to

which case needs more work downstream (see my table above), and
which is the "expected" behavior when calling update_config(),

assuming that the client indeed wants to enable the new feature.
Regardless of the above, the update_config() method is central, and the defaults MUST apply here, or else, changes will proliferate to client code (particularly test-cases).

I remind other open issues for this PR (see also OP):

Documentation?
Print the envvar value in the help description?
Control skip_env default from Configurable constructor?
[edit:] Any other code-paths needing inspection (e.g. parse_command_line())?
Merged configs in Configurable.config do not contain env-vars; is that an issue?

minrk · 2017-08-17T12:02:49Z

I realize now where I was a bit wrong, and it's that update_config doesn't load environment variables into the config object. This means that evaluating the env flag has to be done every time config is loaded.

This doesn't fit with the fact that we construct a single config object from config files and CLI options, some of which belongs lower priority than env and some of which belongs higher priority than env. So update_config with env priority either front or back is always wrong - either CLI config is overridden (wrong) or config files override env (also wrong).

I think this is why @ellisonbg pointed toward implementing env vars entirely in traitlets.config rather than in the base traitlets, because to implement the priority correctly, env handling really needs to be a fundamental part of constructing the config object, and not handled by traits directly.

I think we have the following choices:

change how config is constructed from env (populate the config object at application launch time from env variables in between config files and CLI options)
implement env only at the traits level, and only simplify the default traits constructor (always lower priority than config), which people have already been doing for years
change how config objects are loaded, so that a series of config objects are stored, with clear priority level, rather than merging into a single config object. This will be extremely hard to make backward-compatible, and probably not appropriate given the conservative nature of this repo.

Given this reality, my inclination is to only define the convenience methods for getting defaults for traits from the env, and preserve these as lower priority than all config (i.e. traitlets.config has no awareness of this feature at all). This is the pattern that's widely in use today and easy to understand, just a bit tedious to type.

There is precisely one, narrow use case not met by this: overriding already-specified config with environment variables. Since this is already something that doesn't exist, I don't think it's a big loss given the complexity of implementing it.

ankostis · 2017-08-17T15:50:44Z

I think this is why @ellisonbg pointed toward implementing env vars entirely in traitlets.config rather than in the base traitlets, ...

I have carefully structured this PR to cover functionality all 2 3 layers (<==>modules): traitlet, config, application; i believe it's not possible to achieve full env-var functionality by affecting any one layer alone. Actually, traitlets alone support env-vars, without config-layer involvement (i.e. tag(config=False)).

... because to implement the priority correctly, env handling really needs to be a fundamental part of constructing the config object, and not handled by traits directly.
...

change how config is constructed from env (populate the config object at application launch time from env variables in between config files and CLI options)

At some point I contemplated going down that road - but stopped couldn't think of all possible use-cases. Config instances are swiss-army knifes, and fitting-in "magic" functionality seemed easier in the trait/_load_config() logic. I'm not equipped to judge this, and not that familiar with the code to implement it here. But in principle i don't disagree either.

implement env only at the traits level, and only simplify the default traits constructor (always lower priority than config), which people have already been doing for years

People can continue doing this, and nothing here will get in their way. This PR is for something different, env having higher priority than configs.

change how config objects are loaded, so that a series of config objects are stored, with clear priority ...

Yes, I know, I've already started doing something similar in my app for "runtime" configs stored in different files, etc, But as you said, too ambitious. I aimed for the low hanging fruit:

JUPYTER_NOTEBOOK_DIR=/opt/foo jupyter notebook

There is precisely one, narrow use case not met by this: overriding already-specified config with environment variables. Since this is already something that doesn't exist, I don't think it's a big loss given the complexity of implementing it.

Well, this use-case, having env-var for values already in config-files, is pretty-common, as shown just above.
If we don't cover this, this PR won't worth its weight.

Still, i haven't understand what is the penalty for hooking in the new functionality in the config layer?
Let me be clear on this: existing clients don't see ANY side-effects!

ellisonbg · 2017-08-17T16:00:10Z

I haven't been following this, but one thing to keep in mind. I fully agree with @minrk 's assessment that ENV VAR handling is very subtle and can't be only before/after other stages.

…aking API + Have `update_config()` delegate to NEW METHOD `update_config_with_env(..., skip_env)`. + Keep "env-over-cfg" as default behavior of `update_config()`. + Add `skip_env` kwd also in `load_config_file()`; this is not an API-break change, bc is not called from anywhere.

minrk · 2017-08-18T08:12:13Z

traitlets/config/configurable.py

+        - Use that method instead, if you don't want any env-vars to apply.
+        """
+        ## Had to split methods to preserve BW-compatibility for new `skip_env`.
+        self.update_config_with_env(config)


This needs to have skip_env=True to preserve behavior.

minrk · 2017-08-18T08:13:50Z

traitlets/config/tests/test_application.py

+    class App(Application):
+        a = Unicode('def').tag(config=True, envvar='MY_ENVVAR')
+        b = Unicode().tag(config=True, envvar='MY_ENVVAR')
+        aliases = {'a': 'App.a', 'b': 'App.b'}


Make sure to instantiate an additional Configurable in initialize to test that priority resolves correctly.

minrk · 2017-08-18T09:18:24Z

Given the defined priority order:

program defaults (trait.default_value, dynamic_defaults),

config-files,

env-var,

command-line arguments,

values directly assigned.

This PR cannot achieve the correct ordering of env-var without introducing much more complicated changes than I think we should allow. The priority resolves correctly for the Application object itself, but not any of the objects instantiated later by the Application. It can only reliably achieve env higher than all config or lower than all config, due to the use of a single config object.

Consider the order of events for initializing an application with configurables:

load CLI arguments into app.config
load config files into app.config, allowing CLI config to take priority
instantiate configurable objects with parent=app, loading the app config

At this point, all objects receive a single config object with values that should be both higher and lower priority than env, with no way to distinguish them. These instantiations are deeply nested down the chain of all objects, and happen at arbitrary points in time. The only guarantee is that there is a single config object being passed around.

So the only achievable goal I see is to pick between:

env variables override all config (breaks several APIs)
all config overrides env variables

The only thing that seems achievable and consistent is traitlets-level env-defaults that traitlets.config is not aware of. This behavior is consistent and understandable, and already in wide use today, if slightly tedious. Addressing that inconvenience is all that we are after with #99. Solving the priority issue would be nice, but doesn't seem doable without introducing significant new problems.

Still, i haven't understand what is the penalty for hooking in the new functionality in the config layer?

It's not that there's a penalty, it's that implementing it via two systems makes it more complicated and more difficult to implement correctly, as illustrated by the fact that this PR doesn't implement the priority it aims to, and I can't see a way for it to accomplish that without much more complicated changes.

Here is a test demonstrating several very basic cases where the priority resolution of this PR is not as intended.

In every case, the foo and app sources should match because they have config defined at the same levels.

Let me be clear on this: existing clients don't see ANY side-effects!

I think this is a lot harder to say than that. Jupyter applications instantiate many objects that can be provided by extensions. If those extensions, which are already using the existing @default implementation of environment variable defaults (the goal of #99), it is important that they behave in a consistent way, with no changes to the Application object that instantiates them.

ankostis · 2017-08-18T10:30:28Z

Thank you for pointing me the problem with child-configurables.
BTW, you've just answered 1 open-task question right on-the-spot (and read my respective comment in the code for this), and the answer unfortunately has grave consequences!

That's a blocker.

So although I could revert all changes except the 1st and half of the 2nd commit (for the env-var info msgs), I prefer not to!

I'm afraid that if we go down that road, we will stuck with cfg-overwrites-env for ever, due to BW-compatibility pressure from client code already having adopted the new envvar tag in traits-level,
end expecting this behavior.

So I prefer not to push anything at all!
For me (and anyone else working with cmd-line programs), the ordering of things is of paramount importance.
Without it, i can fallback to ad-hoc @property methods that enforce the right resolution order on access-time; it's a decent workaround.

minrk · 2017-08-18T13:08:08Z

So I prefer not to push anything at all!

That's okay! After this, I think we understand the challenges associated with supporting env properly in config much better. I think I'll go ahead with the traitlets-level implementation of env defaults (approximately #157, unchanged), and we can come back to the idea of something to do in config if it's deemed a priority.

Applications can always define their own env -> config in their initialize step, which would allow proper priority resolution.

ankostis · 2017-08-18T13:14:19Z

Just to leave it as a trace for the future:

This PR closes NOT because it proposed BW-incompatible changes!
It couldn't deliver the orderings promised.

I think I'll go ahead with the traitlets-level implementation
...
Applications can always define their own env -> config in their initialize step, which would allow proper priority resolution.

@minrk But how?
Please, don't just add envvar tag to traits-level, because THEN you do introduce BW-incompatible changes if you try to fix the mess in the future.

minrk · 2017-08-18T14:59:16Z

But how?

in initialize:

def initialize(self, argv=None):
    super().initialize(argv)
    self.load_config_file(self.config_file)
    env_config = self.config_from_env() # env -> config defined entirely by the Application
    env_config.update(self.cli_config) # preserve CLI priority
    self.update_config(env_config)

This will precisely implement environment-variable config support at the right priority in a single config object. The missing piece is the env->config transformation, which Applications are free to define. The most important thing about this: environment variables are not taken into account during traitlets initialization without config, which is the source of the conflict that breaks priority resolution.

What I think is never going to work: env defined on traits hooked up to trait defaults, resolved at config loading time with appropriate priority.

Please, don't just add envvar tag to traits-level, because THEN you do introduce BW-incompatible changes if you try to fix the mess in the future.

I believe we should do this, for these reasons:

I frankly don't expect we will deliver proper env support in traitlets.config, due to the complexity inherent there, and the relatively low benefit over the low hanging fruit of env-based trait defaults (independent of config)
getting defaults from env has proven useful for years and should be more convenient. It's not a mess.
If traitlets.config is to get proper env support, it should be done in a totally different way, and not use the same metadata, so there would be no conflict.

If anything, this discussion has clarified that getting trait defaults from environment variables, a common existing pattern that we can make more convenient, is largely unrelated and should be implemented in a totally different way than loading config from environment variables, which should always and exclusively result in populating the single global Config object.

ankostis · 2017-08-18T15:59:08Z

environment variables are not taken into account during traitlets initialization without config, which is the source of the conflict that breaks priority resolution.

You mean "during ~~traitlets~~ Application initialization" , no?
Even with this correction, i think I cannot fully understand the "without config" term.
You mean "without the env-values ending-up into the final global unique Config object", correct?

If traitlets.config is to get proper env support, it should be done in a totally different way, and not use the same metadata, so there would be no conflict.

I would really appreciate any hint on that.

... should be implemented in a totally different way than loading config from environment variables, which should always and exclusively result in populating the single global Config object.

Why do you say a "single global Config object"? Application already uses 2 config-objects (Configurable.config and Application.cli_config).
Are you thinking of deprecating the later?

BTW, implementing Application.config_from_env() based on the classes attribute seems like a feasible task, no?

minrk · 2017-08-18T16:26:39Z

You mean "during traitlets Application initialization" , no?

No, I mean traitlets. Instantiating the object without the config object shouldn't load from the env. It should go env -> config -> trait, not env -> trait -> config -> trait.

I would really appreciate any hint on that.

I don't know how to do it, which is why I said that I don't think it's likely to happen. All I know is that based on this experience, it will not look like what we've tried here.

Why do you say a "single global Config object"? Application already uses 2 config-objects (Configurable.config and Application.cli_config).

No, it uses one. There are a few temporary variables used for ensuring that the single global config loads things with the right priority, but it always ends up in a single global self.config, which is the only config that any objects other than the Application do or should get. The Application has exactly two responsibilities:

construct a single Config object
instantiate classes, passing the single config to them, as appropriate.

BTW, implementing Application.config_from_env() based on the classes attribute seems like a feasible task, no?

For simple, non-extensible cases where .classes is comprehensive, yes, I think it is. In general, no, because there is no guarantee that all classes are (or ever can be) in that list. .classes generally only includes the classes necessary to produce aliases, not every possible class that might be configured during the lifetime of the application. It has to work correctly, even if that list is empty and arbitrary user-defined classes are imported and instantiated at an arbitrary time, only with:

Class(config=cfg)

So the config object has to be correct for all objects that might be instantiated by the time Application is done constructing it, even for objects the Application doesn't know about that may not have been imported.

ankostis · 2017-08-18T16:33:10Z

So you are thinking of annotating the values in a Config objects with a "source" tag, like cli, or ~~env~~ cfg?
[edit: deciding the env tagged values is impossible to know without visiting all HasTrait classes, see @minrk problem explained above]

And then use this tag on update_config() to enforce the desired ordering?
Or even better, let the user extend it with it's own tags with tag-->order_num mapping,
for allowing arbitrary many config-sources.

Does that sound like a an alternative worth exploring?

rmorshea · 2017-08-18T16:33:25Z

@minrk, @ankostis perhaps I'm naive, but what prevents someone from simply integrating env-vars into a python config file and loading that in the desired order? This requires someone to load the env-var config first, but perhaps we could include an env_config keyword in Configurable's constructor that would handle that.

ankostis · 2017-08-18T16:41:03Z

@rmorshea that would work, assuming it is feasible to convert all envvar-tagged traits into file-configs.
A problem is how to discover them all such traits, WITHOUT relying on the Application.classes attribute, as @minrk explained just above.

And assuming that problem is solvable, why create an intermediate file (or a special Loader for instance) and not just merge those trait-values straight back into the main Application.config object?

rmorshea · 2017-08-18T17:08:25Z

@ankostis, I don't mean that you would be able to tag traits with envvar="MY_ENVVAR", but rather, that all env-var specifications would be done by the user in a python config file:

import os

C.MyClass.my_trait = os.environ.get("MY_ENVVAR", "default")

That or, as you said we could create some special env-var ConfigLoader that is expected in the keyword env_config. I don't what the best way to implement such a loader would be, but it's a thought.

I suggest this option simply because it leverages features that traitlets already provides, and avoids the need to dig deep into it's depths to implement env-vars priority.

minrk · 2017-08-18T19:05:13Z

And then use this tag on update_config() to enforce the desired ordering?
Or even better, let the user extend it with it's own tags with tag-->order_num mapping,
for allowing arbitrary many config-sources.
Does that sound like a an alternative worth exploring?

No, nothing that complicated. Just load the config in increasing priority order, like we already do for CLI config and file config, as I described in the snippet above:

load file config
load env config (application defined, because this isn't part of traitlets)
load cli config

(with the bit of fiddling we have because we have to load CLI config first and last because we use it to find the config files)

Done. One config object with all the right values. No tagging, no tracking of config sources. Very straightforward. The missing piece is how to construct the env config without access to the classes that will be configured. If you have that, it's not a problem. But we don't in general, and a partial solution is no good.

If your application knows all of its potential configurable classes, you can go ahead and implement env config right now in your Application. This can be done today, right now, with no modifications to traitlets. Just not in the base traitlets.config.Application due to the generality requirements. Here's an example:

def load_env_config(class_list):
    """Create a config object for any traits with `config_env` metadata"""
    config = Config()
    seen = set()
    for config_cls in class_list:
        for cls in config_cls.mro():
            if cls is Configurable:
                break
            if cls in seen:
                continue
            seen.add(cls)
            for name, trait in cls.class_own_traits(config=True).items():
                key = trait.metadata.get('config_env')
                if key and os.environ.get(key):
                    config[cls.__name__][name] = os.environ[key]
    return config

example

@rmorshea loading env in a config files certainly does work, and people do it all the time. It's just that the burden of defining and handing env is then on the user rather than the developer, i.e. env vars are a feature of your personal config file, rather than a feature of the application. This can be nice, and I use it sometimes. It doesn't achieve our primary goal, which is to make this common pattern:

trait = Unicode(config=True)
@default('trait')
def _trait_default(self):
    return os.environ.get('ENV_VAR') or 'actual-default'

a bit more convenient.

I really think we should stop looking for a general solution for environment-based config and implement simple, straightforward environment defaults for traitlets via a metadata tag. It solves a real issue that real people have and doesn't cause any.

rmorshea · 2017-08-19T06:00:26Z

I'd need to look into it tomorrow, but I think I have an idea for tags that won't intrude on traitlets.py

ankostis force-pushed the envvars branch from c43e749 to 61d11d0 Compare August 12, 2017 21:43

ankostis added a commit to ankostis/traitlets that referenced this pull request Aug 12, 2017

feat(trait.ipython#439): impl Trait-level metadata[ENV-VAR]

d12fee3

+ When getting trait-value, try 1st env-var, then fallback to defaults.

ankostis added a commit to ankostis/traitlets that referenced this pull request Aug 12, 2017

feat(trait.ipython#439): impl Application-level metadata[ENV-VAR]

61d11d0

+ Cmd-line configs override ENV-VARS + Add/update TCs.

ankostis added 3 commits August 13, 2017 00:45

feat(trait.ipython#99): impl Trait-level metadata[ENV-VAR]

88be1d9

+ When getting trait-value, try 1st env-var, then fallback to defaults.

feat(trait.ipython#99): impl Configurable-level metadata[ENV-VAR]

c1afac2

+ Don't load config-values on traits with existing ENV-VAR. + Print ENV-VAR used in class-help, config and RsT msgs. + Add/update TCs.

feat(trait.ipython#99): impl Application-level metadata[ENV-VAR]

20162ca

+ Cmd-line configs override ENV-VARS + Add/update TCs.

ankostis force-pushed the envvars branch from 61d11d0 to 20162ca Compare August 12, 2017 21:45

minrk added this to the 5.1 milestone Aug 14, 2017

minrk reviewed Aug 14, 2017

View reviewed changes

ankostis added 2 commits August 14, 2017 18:03

fix(help.ipython#439): errors in new env-var help-line went unchecked

83bf01c

+ Had a double-indent in rst, forgotten `# ` prefix in config file. + Add parametrized pytest TC.

fix(help.ipython#439): 'Env-var: ' --> 'Environment variable: ' in co…

db0a1a8

…mment-line

ankostis mentioned this pull request Aug 14, 2017

Traitlets 5.0 #401

Closed

minrk modified the milestones: 5.0, 5.1 Aug 15, 2017

minrk mentioned this pull request Aug 15, 2017

allow traits to parse string config #441

Closed

2 tasks

ankostis added a commit to ankostis/traitlets that referenced this pull request Aug 16, 2017

feat(app.ipython#439): add skip_env kwd also in load_config_file()

d8192eb

ankostis force-pushed the envvars branch from bca6920 to dc3b4a9 Compare August 17, 2017 18:25

ankostis force-pushed the envvars branch from dc3b4a9 to ec64149 Compare August 17, 2017 18:40

ankostis force-pushed the envvars branch from ec64149 to db64862 Compare August 17, 2017 18:40

ankostis force-pushed the envvars branch from db64862 to 6f05aad Compare August 17, 2017 18:40

minrk reviewed Aug 18, 2017

View reviewed changes

minrk closed this Aug 18, 2017

minrk modified the milestones: no action, 5.0 Aug 18, 2017

This was referenced Aug 19, 2017

feat(traits): Implement ENV-VARS in traits-level #447

Closed

feat(config): support MERGE-PRIORITIES for Config items #448

Closed

feat(app): support ENV-VAR with priority against CLI-configs #449

Closed

Envvars #439

Envvars #439

Conversation

ankostis commented Aug 12, 2017 • edited Loading

Example:

Implementation:

Rational:

Open issues:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

minrk Aug 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankostis Aug 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankostis Aug 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankostis Aug 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankostis commented Aug 14, 2017

minrk commented Aug 14, 2017

rmorshea commented Aug 14, 2017 • edited Loading

minrk commented Aug 15, 2017

minrk commented Aug 15, 2017 • edited Loading

ankostis commented Aug 16, 2017

ankostis commented Aug 16, 2017

minrk commented Aug 17, 2017

ankostis commented Aug 17, 2017 • edited Loading

ankostis commented Aug 17, 2017 • edited Loading

minrk commented Aug 17, 2017

ankostis commented Aug 17, 2017 • edited Loading

ellisonbg commented Aug 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

minrk commented Aug 18, 2017

ankostis commented Aug 18, 2017 • edited Loading

minrk commented Aug 18, 2017

ankostis commented Aug 18, 2017 • edited Loading

minrk commented Aug 18, 2017

ankostis commented Aug 18, 2017 • edited Loading

minrk commented Aug 18, 2017 • edited Loading

ankostis commented Aug 18, 2017 • edited Loading

rmorshea commented Aug 18, 2017 • edited Loading

ankostis commented Aug 18, 2017 • edited Loading

rmorshea commented Aug 18, 2017

minrk commented Aug 18, 2017

rmorshea commented Aug 19, 2017

ankostis commented Aug 12, 2017 •

edited

Loading

minrk Aug 15, 2017 •

edited

Loading

ankostis Aug 14, 2017 •

edited

Loading

ankostis Aug 14, 2017 •

edited

Loading

ankostis Aug 16, 2017 •

edited

Loading

rmorshea commented Aug 14, 2017 •

edited

Loading

minrk commented Aug 15, 2017 •

edited

Loading

ankostis commented Aug 17, 2017 •

edited

Loading

ankostis commented Aug 17, 2017 •

edited

Loading

ankostis commented Aug 17, 2017 •

edited

Loading

ankostis commented Aug 18, 2017 •

edited

Loading

ankostis commented Aug 18, 2017 •

edited

Loading

ankostis commented Aug 18, 2017 •

edited

Loading

minrk commented Aug 18, 2017 •

edited

Loading

ankostis commented Aug 18, 2017 •

edited

Loading

rmorshea commented Aug 18, 2017 •

edited

Loading

ankostis commented Aug 18, 2017 •

edited

Loading