Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement gradient computation w.r.t. initial conditions #290

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

jwallwork23
Copy link
Member

Closes #284.

This PR sets up the ability to compute gradients of the QoI w.r.t. the initial conditions. This will be extended to a wider range of controls in #289.

Unit tests are added, which solve a simple R-space problem, where the solution field is multiplied by a scalar at each timestep. Gradients of steady-state, end-time, and time-integrated QoIs are computed and compared against expected values.

(Note a5d87aa is not part of this PR and will be added in #287, while b300b24 and bc2589b will be added in #288.)

@jwallwork23 jwallwork23 added the enhancement New feature or request label Feb 24, 2025
@jwallwork23 jwallwork23 self-assigned this Feb 24, 2025
@jwallwork23 jwallwork23 linked an issue Feb 28, 2025 that may be closed by this pull request
Comment on lines +530 to +540
controls = pyadjoint.enlisting.Enlist(self._controls)
with pyadjoint.stop_annotating():
with tape.marked_nodes(m):
with tape.marked_nodes(controls):
tape.evaluate_adj(markings=True)

# Compute the gradient on the first subinterval
if i == 0 and compute_gradient:
self._gradient = controls.delist(
[control.get_derivative() for control in controls]
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@stephankramer stephankramer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. I only have some broader questions/thoughts about how we'd expand this, in particular also interface wise, to be used in a gradient based optimisation loop. But yes happy for it to go in as is

  1. How do we actually see this being used in optimisation. So far we've not really dealt with the model being parameterised and needing to be rerun for different parameters. So is the idea to do something like:
class MeshSeqObjective(AdjointMeshSeq):
     _initial_conditions = None
     def get_initial_conditions(self):
        return AttrDict({'u': _initial_conditions})

     def value(self, x):  # evaluate objective function at x
         self._initial_condition.assign(x)
         # now what? how do I rerun the model and return the functional value
         self.solve()
         return self.J

     def derivative(self, x):  # evaluate derivative at x
         self._initial_condition.assign(x)
         self.solve_adjoint(compute_gradient=True) # does this rerun the forward?
         return self.gradient

The reason I ask is that I get a little nervous from these magic properties (like self.gradient here) that rely on them having been computed previously dependent on some changing internal state before you access it. It might be worth resetting ._gradient back to None when we rerun the model without compute_gradient=True

  1. I think in your implementation the gradient for a Functon control will be the L2 Riesz representative - which is a good default I guess. I would advocate being consistent (as I think you are) from the start in our use of the words "derivative" and "gradient" with a derivative being a scalar function in the dual of the tangent of the control space - and thus a Cofunction if the control is a Function - and gradient being a gradient vector (in the tangent control space) that is the Riesz representative of the derivative w.r.t. some choice of inner product. Note that this is not (yet) consistently done in pyadjoint. So at the moment you are using control.get_derivative() which, without any options, by default now returns the L2 Riesz representative (Function). This is so that compute_gradient by default indeed returns a gradient. If you want the derivative (Cofunction) you have to pass options={'Riesz_representative': None}. The reason I bring this up is that sometimes you need to provide both a gradient implementation and a (directional) derivative implementation, e.g.: https://docs.trilinos.org/dev/packages/rol/doc/html/classROL_1_1Objective.html#details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compute gradient w.r.t. initial conditions
2 participants