You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am interested in using BO with derivative observations (FOBO as you term it). I scanned through the botorch discussions and found some of your comments, questions and this commit. I also have a code running with FOBO which uses basic acquisition functions that works well. It looks very much like the tutorial you're suggesting.
I would like to test d-KG as an acquisition function (as you do), but ran into problems with FANTASIZE during implementation. I see you have been working to work around this. Since I'm not a GitHub expert, I'm not sure if I'm getting the status of this implementation right. Can you tell me how far you have gotten? I found an open pull request as the last update. So are you waiting for approval or did you still have problems?
I would be very pleased to receive a short reply and thank you for your contribution!
The reason will be displayed to describe this comment to others. Learn more.
Hello @mreumi ,
Yes, I think I solved the problem here (cornellius-gp/gpytorch#2452), try getting those changes locally and let me know if that fixes your problem! I'm not sure what the status is of the changes being made in the main repository since that is currently out of my control.
The reason will be displayed to describe this comment to others. Learn more.
@yyexela I tested your code briefly and it seems to be running just fine, thanks!
Btw, I ran into some numerical issues using 'fit_gpytorch_mll' which uses LBFGS to train the derivative-enanbled GP model in some cases (using other test functions and analytical gradients then your tutorial case).
As far as I understand the error messages, there seems to be an issue with the line search. Training with stochastic gradient descent seems to be significantly more robust (I use the code here: https://botorch.org/tutorials/fit_model_with_torch_optimizer). In case you end up with similar problems.
The reason will be displayed to describe this comment to others. Learn more.
Hi @yyexela - I digged a bit deeper into FOBO. For some test functions, I only get qualitatively nice surrogate models, which are off in quantitative range. I am not sure if this is a scaling or a model training error. Thinking about it, I realized that output scaling is not so trivial for FOBO.
In case you are still working in this direction, I would have a question there that you might have come across already and that affects your tutorial:
In your tutorial, you apply a scaling to the training data:
Thus mean_Y is something like [mean(f), mean(dfdx1), mean(dfdx2)...] and std_Y accordingly.
Given the fact that train_Y includes function values and the corresponding derivative values doesn't this scaling lead to a mismatch between function values and the corresponding derivatives?
As $d(f-const)/dx = df/dx$ I would think that when subtracting the mean of the function values we should not subtract anything from the gradients. But because $d(constf)/dx = constdf/dx$ we should scale function values and derivatives with the same std to keep them consistent. I would rather think that a scaling of the function values should look like this:
def normalize_target_data(ytrain_nominal):
ytrain_normalized = torch.clone(ytrain_nominal) #f, dfdx1, dfdx2, ...
# subtract mean from f only:
ytrain_normalized[:,0] -= ytrain_normalized[:,0].mean()
# scale f and gradient by std of f:
ytrain_normalized /= ytrain_normalized[:,0].std()
return ytrain_normalized
I couldn't find a clear answer from different tests I ran so I thought maybe you have an idea? If so, I'd appreciate if you could share it!
The reason will be displayed to describe this comment to others. Learn more.
Hey @mreumi , I'm glad you're interested in FOBO! Unfortunately, if I remember correctly, the scaling fixed an error in the code where the values were overflowing. I don't have any theoretical justification. Sorry I couldn't help give any more insight.
One thing we noticed when working on FOBO was that the theory Peter Frazier presented for FOBO may not be optimal. As a simple experiment to try on your own, try this:
Create a zeroth-order Bayesian Optimization (ZOBO) model
Run some initialization (ie. grid search) for your objective and obtain gradient information as well
Use a linear approximation around each samples point to generate additional data and use that in your ZOBO model
Doing this resulted in much better results that FOBO in what I've seen empirically. This suggests maybe there's a better way to do FOBO than what Frazier showed and what's implemented in BoTorch.
a39e6f5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yyexela,
I am interested in using BO with derivative observations (FOBO as you term it). I scanned through the botorch discussions and found some of your comments, questions and this commit. I also have a code running with FOBO which uses basic acquisition functions that works well. It looks very much like the tutorial you're suggesting.
I would like to test d-KG as an acquisition function (as you do), but ran into problems with FANTASIZE during implementation. I see you have been working to work around this. Since I'm not a GitHub expert, I'm not sure if I'm getting the status of this implementation right. Can you tell me how far you have gotten? I found an open pull request as the last update. So are you waiting for approval or did you still have problems?
I would be very pleased to receive a short reply and thank you for your contribution!
a39e6f5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @mreumi ,
Yes, I think I solved the problem here (cornellius-gp/gpytorch#2452), try getting those changes locally and let me know if that fixes your problem! I'm not sure what the status is of the changes being made in the main repository since that is currently out of my control.
a39e6f5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a39e6f5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yyexela I tested your code briefly and it seems to be running just fine, thanks!
Btw, I ran into some numerical issues using 'fit_gpytorch_mll' which uses LBFGS to train the derivative-enanbled GP model in some cases (using other test functions and analytical gradients then your tutorial case).
As far as I understand the error messages, there seems to be an issue with the line search. Training with stochastic gradient descent seems to be significantly more robust (I use the code here: https://botorch.org/tutorials/fit_model_with_torch_optimizer). In case you end up with similar problems.
a39e6f5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mreumi great, thanks for letting me know! Glad the code is working
a39e6f5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yyexela - I digged a bit deeper into FOBO. For some test functions, I only get qualitatively nice surrogate models, which are off in quantitative range. I am not sure if this is a scaling or a model training error. Thinking about it, I realized that output scaling is not so trivial for FOBO.
In case you are still working in this direction, I would have a question there that you might have come across already and that affects your tutorial:
In your tutorial, you apply a scaling to the training data:
Thus mean_Y is something like [mean(f), mean(dfdx1), mean(dfdx2)...] and std_Y accordingly.
Given the fact that train_Y includes function values and the corresponding derivative values doesn't this scaling lead to a mismatch between function values and the corresponding derivatives?
As$d(f-const)/dx = df/dx$ I would think that when subtracting the mean of the function values we should not subtract anything from the gradients. But because $d(constf)/dx = constdf/dx$ we should scale function values and derivatives with the same std to keep them consistent. I would rather think that a scaling of the function values should look like this:
I couldn't find a clear answer from different tests I ran so I thought maybe you have an idea? If so, I'd appreciate if you could share it!
a39e6f5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mreumi , I'm glad you're interested in FOBO! Unfortunately, if I remember correctly, the scaling fixed an error in the code where the values were overflowing. I don't have any theoretical justification. Sorry I couldn't help give any more insight.
One thing we noticed when working on FOBO was that the theory Peter Frazier presented for FOBO may not be optimal. As a simple experiment to try on your own, try this:
Doing this resulted in much better results that FOBO in what I've seen empirically. This suggests maybe there's a better way to do FOBO than what Frazier showed and what's implemented in BoTorch.
a39e6f5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reply, I will look into that!