Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for various advanced functionality? #176

Closed
ben-arnao opened this issue Aug 7, 2019 · 9 comments
Closed

Support for various advanced functionality? #176

ben-arnao opened this issue Aug 7, 2019 · 9 comments

Comments

@ben-arnao
Copy link

A few question regarding functionality/support?

  • Is there support for activation functions not called by name? (lrelu for example?)
  • How do we handle multiple layers? I get an error when i try to create a random number of layers
  • Is there a good way to select a random number of columns to feature engineer on? Ie. let's say the optimal way to scale my data would be to only scale columns 1 and 10.
  • I also noticed when adding more advanced params to callbacks ie min_delta on reduce_lr, i get an error that real is not compatible with float. Any way around this?
    -Last but not least is learn rate, you have to define this inside your optimizer. Is there support for this or a way i can do this?

Thanks, and sorry if some of these have already been answered

@HunterMcGushion
Copy link
Owner

Thanks for your questions! I'm going to answer them in separate comments as I can. Sorry for the delay!

Regarding your last question (on optimizer lr), you can define lr inside one of Keras' optimizers classes for an Experiment.

So instead of the below line in examples/keras_examples/experiment_example.py:

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

... we can import a Keras optimizer, and modify the above model.compile call like so:

from keras.optimizers import Adam

...  # Everything up to line 20, linked above

model.compile(
   optimizer=Adam(lr=0.01),
   loss="binary_crossentropy",
   metrics=["accuracy"],
)
return model

...  # Everything after `build_fn` definition

This definitely needs to be documented or added to a Keras example script, so thank you for bringing it up.

Also, it isn't yet possible to optimize parameters inside Keras optimizers defined like this, so Adam(lr=Real(0.0001, 0.1)) isn't working. A separate issue should be made to track progress on this, but I have some ideas if you (or anyone else) is interested in taking a shot at it with a PR.

... More answers to come...

@HunterMcGushion
Copy link
Owner

Regarding your first question (advanced activations), yes, they can be used.
Using examples/keras_examples/experiment_example.py again, we can do the following:

from keras.layers.advanced_activations import LeakyReLU

Then, instead of

Dense(100, kernel_initializer="uniform", input_shape=input_shape, activation="relu"),

use two separate lines, like so:

Dense(100, kernel_initializer="uniform", input_shape=input_shape),
LeakyReLU(),

One thing to be aware of here (noted in the first Keras question in the README's FAQs) is that if you start using separate activation layers like this, you'll want to be consistent even when using normal activations. Details can be found in the above-linked FAQ, but for optimization to correctly match with older Experiments, you would need to build all your models using separate activation layers, rather than the activation kwarg of Dense (or any other layer).

This is inconvenient, I know (sorry), and we should open up another issue to get this fixed up

@HunterMcGushion
Copy link
Owner

Could you expand on questions 2 and 4, and provide a reproducible example for each, please?

Regarding question 3, are you trying to use an OptPro to determine which columns of all your input columns you should apply StandardScaler to, for example?

@ben-arnao
Copy link
Author

ben-arnao commented Aug 8, 2019

Thanks for your questions! I'm going to answer them in separate comments as I can. Sorry for the delay!

Regarding your last question (on optimizer lr), you can define lr inside one of Keras' optimizers classes for an Experiment.

So instead of the below line in examples/keras_examples/experiment_example.py:

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

... we can import a Keras optimizer, and modify the above model.compile call like so:

from keras.optimizers import Adam

...  # Everything up to line 20, linked above

model.compile(
   optimizer=Adam(lr=0.01),
   loss="binary_crossentropy",
   metrics=["accuracy"],
)
return model

...  # Everything after `build_fn` definition

This definitely needs to be documented or added to a Keras example script, so thank you for bringing it up.

Also, it isn't yet possible to optimize parameters inside Keras optimizers defined like this, so Adam(lr=Real(0.0001, 0.1)) isn't working. A separate issue should be made to track progress on this, but I have some ideas if you (or anyone else) is interested in taking a shot at it with a PR.

... More answers to come...

Hi, thanks for the response.

In regards to your comment on learn rate and custom activation functions, now that i am thinking about it a way that i've gotten by similar limitations in my own custom hyper parameter optimizer is to use a wrapper functions, where the appropriate layer is returned by string key. So for example

def actv_custom_wrapper_func(activation):
    if activation == 'lrelu':
        return LeakyReLU()
    if activation == 'relu':
        return Activation('relu')

model.add(actv_custom_wrapper_func(Categorical(['relu', 'lrelu'])))

And i think this should solve my issue. I think i could use a similar method to get variable optimizer and learn rate. I would just need my wrapper function to take two arguments.

model.add(opt_custom_wrapper_func(Categorical(['adam', 'nadam', 'adamax']), Real(0.1, 0.0001)))

I can test this out and see if it works but before i do, is there a better way to do this?

As for the layers question, if i do something like the following

for x in range(Integer(1, 5)):
    # add layer ie.
    model.add(Dense(1))

I get an error which i assume to be caused by an inconsistent number of parameters on different runs which is understandable. Is there a way to have the # of layers as a variable? Or is it incompatible with this type of parameter optimization.

For the error about params on callbacks, i think any time you use a float for a callback param you get an error. For example:


model_extra_params=dict(
    callbacks=[ReduceLROnPlateau(monitor='loss', patience=Integer(5, 25), **min_delta=Real(0.01, 0.0001)**, verbose=0)]
)

I believe that inside the ReduceLROnPlateau callback function there is comparison operators for min_delta like greater than that causes an error when it tries to compare a Real object to a float.

Lastly, to your follow up on question 3, yes, i am trying to find out if there is a good way to select a random number of columns to scale. For example, if there were a selection entity, ie. Selection(0, 1000, 30) which would select 30 random columns in the range of 0 to 1000. I'm sure sure if this is feasible given how your program works but i think it would be an important feature to have.

Thanks again.

@HunterMcGushion
Copy link
Owner

HunterMcGushion commented Aug 9, 2019

I’m sorry, but I’m having some trouble tracking which of your five questions we’re talking about haha. I know it’s inconvenient, but it’d be very helpful if you could split these up into separate issues for all the questions that haven’t been answered yet. Doing this will also make it easier to keep track of any bug fixes or new features we make that relate to your questions. Does this sound ok to you?

I want to be careful here because all of your questions are great, and I want to make sure they're all addressed clearly. Then we can migrate parts of our conversation here to the appropriate new issues.

Am I correct in saying that I’ve at least answered your first question?

Is there support for activation functions not called by name? (lrelu for example?)

Or have I just embarrassed myself by not answering anything at all? Hahaha

@HunterMcGushion
Copy link
Owner

I may also be able to answer your fourth point:

For the error about params on callbacks, i think any time you use a float for a callback param you get an error. For example:

   model_extra_params=dict(
      callbacks=[
         ReduceLROnPlateau(
            monitor='loss', 
            patience=Integer(5, 25), 
            min_delta=Real(0.01, 0.0001),
            verbose=0
         )
      ]
   )

I believe that inside the ReduceLROnPlateau callback function there is comparison operators for min_delta like greater than that causes an error when it tries to compare a Real object to a float.

I received the following error using the above ReduceLROnPlateau configuration:

ValueError: Lower bound (0.01) must be less than the upper bound (0.0001)

This is because Real expects the lower bound to be the first argument, followed by the upper bound. So just switching your min_delta value from Real(0.01, 0.0001) to Real(0.0001, 0.01) did the trick for me.

Would you mind seeing if that solves your fourth issue, as well?

@ben-arnao
Copy link
Author

I may also be able to answer your fourth point:

For the error about params on callbacks, i think any time you use a float for a callback param you get an error. For example:

   model_extra_params=dict(
      callbacks=[
         ReduceLROnPlateau(
            monitor='loss', 
            patience=Integer(5, 25), 
            min_delta=Real(0.01, 0.0001),
            verbose=0
         )
      ]
   )

I believe that inside the ReduceLROnPlateau callback function there is comparison operators for min_delta like greater than that causes an error when it tries to compare a Real object to a float.

I received the following error using the above ReduceLROnPlateau configuration:

ValueError: Lower bound (0.01) must be less than the upper bound (0.0001)

This is because Real expects the lower bound to be the first argument, followed by the upper bound. So just switching your min_delta value from Real(0.01, 0.0001) to Real(0.0001, 0.01) did the trick for me.

Would you mind seeing if that solves your fourth issue, as well?

Sure no problem i can definitely split these up. And try to clarify a little bit more.

I’m sorry, but I’m having some trouble tracking which of your five questions we’re talking about haha. I know it’s inconvenient, but it’d be very helpful if you could split these up into separate issues for all the questions that haven’t been answered yet. Doing this will also make it easier to keep track of any bug fixes or new features we make that relate to your questions. Does this sound ok to you?

I want to be careful here because all of your questions are great, and I want to make sure they're all addressed clearly. Then we can migrate parts of our conversation here to the appropriate new issues.

Am I correct in saying that I’ve at least answered your first question?

Is there support for activation functions not called by name? (lrelu for example?)

Or have I just embarrassed myself by not answering anything at all? Hahaha

Sure no problem i can definitely split these up. And try to clarify a little bit more.

@HunterMcGushion
Copy link
Owner

Sorry to comment on a closed issue. Thanks for splitting this into #181 and #182! Just wanted to clarify that using Real in ReduceLROnPlateau is working for you. I think there may have been a copy/paste mishap in the response as I’m seeing the same thing twice:

Would you mind seeing if that solves your fourth issue, as well?

Sure no problem i can definitely split these up. And try to clarify a little bit more.

I also wanted to check on your third question:

Is there a good way to select a random number of columns to feature engineer on? Ie. let’s say the optimal way to scale my data would be to only scale columns 1 and 10.

Were you able to get this working, or is this still an issue?

@ben-arnao
Copy link
Author

Thanks for following up! Yes the Callback issue was resolved.. I must have been doing something wrong before, sorry for the false alarm.

As for the question of scaling optimization on a per feature basis, I think this is a much bigger/fundamental question as to how this could be done or if it is worth it. For now it is probably not something worth getting into. And feature scaling is a lot more intuitive so I wouldn't saying it's really necessary to include atm, one could pick the right the scalings themselves for most problems.

Maybe something to think about in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants