-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement gradient computation w.r.t. initial conditions #290
base: develop
Are you sure you want to change the base?
Conversation
13cb829
to
48c4911
Compare
controls = pyadjoint.enlisting.Enlist(self._controls) | ||
with pyadjoint.stop_annotating(): | ||
with tape.marked_nodes(m): | ||
with tape.marked_nodes(controls): | ||
tape.evaluate_adj(markings=True) | ||
|
||
# Compute the gradient on the first subinterval | ||
if i == 0 and compute_gradient: | ||
self._gradient = controls.delist( | ||
[control.get_derivative() for control in controls] | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the logic here is based on the logic in the Pyadjoint driver: https://github.com/dolfin-adjoint/pyadjoint/blob/dbf923e656b37333e45267da2e0264a84f160be5/pyadjoint/drivers.py#L5-L33.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good. I only have some broader questions/thoughts about how we'd expand this, in particular also interface wise, to be used in a gradient based optimisation loop. But yes happy for it to go in as is
- How do we actually see this being used in optimisation. So far we've not really dealt with the model being parameterised and needing to be rerun for different parameters. So is the idea to do something like:
class MeshSeqObjective(AdjointMeshSeq):
_initial_conditions = None
def get_initial_conditions(self):
return AttrDict({'u': _initial_conditions})
def value(self, x): # evaluate objective function at x
self._initial_condition.assign(x)
# now what? how do I rerun the model and return the functional value
self.solve()
return self.J
def derivative(self, x): # evaluate derivative at x
self._initial_condition.assign(x)
self.solve_adjoint(compute_gradient=True) # does this rerun the forward?
return self.gradient
The reason I ask is that I get a little nervous from these magic properties (like self.gradient
here) that rely on them having been computed previously dependent on some changing internal state before you access it. It might be worth resetting ._gradient
back to None when we rerun the model without compute_gradient=True
- I think in your implementation the gradient for a Functon control will be the L2 Riesz representative - which is a good default I guess. I would advocate being consistent (as I think you are) from the start in our use of the words "derivative" and "gradient" with a derivative being a scalar function in the dual of the tangent of the control space - and thus a Cofunction if the control is a Function - and gradient being a gradient vector (in the tangent control space) that is the Riesz representative of the derivative w.r.t. some choice of inner product. Note that this is not (yet) consistently done in pyadjoint. So at the moment you are using
control.get_derivative()
which, without any options, by default now returns the L2 Riesz representative (Function). This is so thatcompute_gradient
by default indeed returns a gradient. If you want the derivative (Cofunction) you have to passoptions={'Riesz_representative': None}
. The reason I bring this up is that sometimes you need to provide both a gradient implementation and a (directional) derivative implementation, e.g.: https://docs.trilinos.org/dev/packages/rol/doc/html/classROL_1_1Objective.html#details
Closes #284.
This PR sets up the ability to compute gradients of the QoI w.r.t. the initial conditions. This will be extended to a wider range of controls in #289.
Unit tests are added, which solve a simple R-space problem, where the solution field is multiplied by a scalar at each timestep. Gradients of steady-state, end-time, and time-integrated QoIs are computed and compared against expected values.
(Note a5d87aa is not part of this PR and will be added in #287, while b300b24 and bc2589b will be added in #288.)