-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bindings] Add custom ref_cycle annotation #22068
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+(release notes: none)
+@jwnimmer-tri or delegate for feature review.
FYI @EricCousineau-TRI
Reviewable status: LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers
fb3aacf
to
5a65cd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feature. After the next push, I suggest assigning Eric as a second feature reviewer, prior to platform review.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: 8 unresolved discussions, needs at least two assigned reviewers
bindings/pydrake/common/ref_cycle_pybind.h
line 16 at r1 (raw file):
Wherever pybind11::keep_alive<M, N>() can be used, ref_cycle<M, N>() can be used similarly. */
BTW When I read this API doc before the implementation, for some reason I took ref_cycle<M, N>
to mean a one-way reference, i.e., the same semantics as keep_alive<M, N>
. Of course the name "cycle" should have clued me otherwise, but I suppose its worth adding really clear text in the docs here:
- M keeps N alive, and N keeps M alive.
- Neither one is collected until both are unreachable.
*The template order of M vs N does not matter.
bindings/pydrake/common/ref_cycle_pybind.h
line 26 at r1 (raw file):
/* This function creates the cycle, given peers as pybind11::handles. */ void do_ref_cycle_impl(pybind11::handle p0, pybind11::handle p1);
I don't see any reason why this should be declared in the header? It is only ever called from the cc file.
bindings/pydrake/common/ref_cycle_pybind.cc
line 31 at r1 (raw file):
// if it is not yet existing. auto make_link = [](handle a, handle b) { static const char refcycle_peers[] = "_pydrake_ref_cycle_peers";
BTW Given the consequences of a user ever accessing this (the DEMAND on line 44 will fail), maybe it should have an even scarier name, like a double underscore or "do not use" or something?
bindings/pydrake/common/ref_cycle_pybind.cc
line 63 at r1 (raw file):
return call.args[n - 1]; } return handle();
Why doesn't this throw an exception? It seems to me like we only reach this spot if a developer types a number wrong in a bindings file.
If it's because it already trips the "Could not activate" clause up top, that should be more clear. But in that case, it seems like if we throw here, then we could give a more useful exception message, e.g., with the name of the function in the error message. It might be best for this function to check that truth condition (for all of its return branches?) so that developer errors are clear.
bindings/pydrake/common/test/ref_cycle_test.py
line 28 at r1 (raw file):
# Recall that sys.getrefcount() adds 1 to the actual count. self.assertEqual(sys.getrefcount(x.__dict__), 2) self.assertEqual(sys.getrefcount(x._pydrake_ref_cycle_peers), 2)
BTW more clear?
Suggestion:
self.assertEqual(sys.getrefcount(x.__dict__) - 1, 1)
self.assertEqual(sys.getrefcount(x._pydrake_ref_cycle_peers) - 1, 1)
bindings/pydrake/common/test/ref_cycle_test.py
line 73 at r1 (raw file):
self.check_no_cycle(dut, peer) # Annotated call dies because dut is not py::dynamic_attr(). with self.assertRaisesRegex(SystemExit, ".*PyType_IS_GC.*"):
BTW Here is an example where "AddIsCycle" in the message would be a nice win for Drake developers.
Ditto throughout.
bindings/pydrake/common/test/ref_cycle_test.py
line 89 at r1 (raw file):
# Un-annotated call is fine. self.assertIsNone(dut.ReturnNullIs()) # Annotated call does not die because one peer is missing.
This result is fine in case it's hard to implement otherwise, but really this situation (using ref_cycle on two classes where either one is not dynamic-attr) is a mistake by the drake developer and should definitely throw, if we can swing it.
Failing that, repeating my comment here into the test case for posterity (that we would hope for a throw) would be fine, too.
Ditto throughout.
bindings/pydrake/common/test/ref_cycle_test_util_py.cc
line 29 at r1 (raw file):
}; using IsDynamic = TestDummyBase<true>; using NotDynamic = TestDummyBase<false>;
nit These lines duplicate the in-class using
above. To better communicate they we are using the same meaning everywhere, these should cite the nested types instead of defining them de-novo.
5a65cd3
to
2898791
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 unresolved discussions, needs at least two assigned reviewers
bindings/pydrake/common/ref_cycle_pybind.h
line 26 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
I don't see any reason why this should be declared in the header? It is only ever called from the cc file.
Done.
bindings/pydrake/common/ref_cycle_pybind.cc
line 63 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
Why doesn't this throw an exception? It seems to me like we only reach this spot if a developer types a number wrong in a bindings file.
If it's because it already trips the "Could not activate" clause up top, that should be more clear. But in that case, it seems like if we throw here, then we could give a more useful exception message, e.g., with the name of the function in the error message. It might be best for this function to check that truth condition (for all of its return branches?) so that developer errors are clear.
Done.
bindings/pydrake/common/test/ref_cycle_test.py
line 89 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
This result is fine in case it's hard to implement otherwise, but really this situation (using ref_cycle on two classes where either one is not dynamic-attr) is a mistake by the drake developer and should definitely throw, if we can swing it.
Failing that, repeating my comment here into the test case for posterity (that we would hope for a throw) would be fine, too.
Ditto throughout.
This is not a case that should throw. I'm following the lead of the py::keep_alive<>()
implementers here, as far as the action to take. In this case, we happen to be using invalid types, but the check for None in either peer index position has to supersede that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+@EricCousineau-TRI for second feature review.
Reviewable status: 3 unresolved discussions, LGTM missing from assignee EricCousineau-TRI(platform)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: LGTM missing from assignee EricCousineau-TRI(platform)
bindings/pydrake/common/test/ref_cycle_test.py
line 89 at r1 (raw file):
Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
This is not a case that should throw. I'm following the lead of the
py::keep_alive<>()
implementers here, as far as the action to take. In this case, we happen to be using invalid types, but the check for None in either peer index position has to supersede that.
Yes, now I realize we're only using the dynamic type, not the static type. If we were checking the static type we could complain, but we don't always have that, and it's not easy to obtain even when we do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignee EricCousineau-TRI(platform)
bindings/pydrake/common/ref_cycle_pybind.h
line 26 at r1 (raw file):
Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
Done.
I will likely need to re-declare this later, to implement the (ZOMGWTFBBQ) AddMultibodyPlantSceneGraph bindings. That's fine -- I'll just plumb it when I need it.
2898791
to
4aaa667
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 5 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! all discussions resolved, LGTM from assignees jwnimmer-tri(platform),EricCousineau-TRI(platform)
bindings/pydrake/common/ref_cycle_pybind.h
line 14 at r3 (raw file):
https://pybind11.readthedocs.io/en/stable/advanced/functions.html#additional-call-policies `ref_cycle` creates a reference count cycle that Python's cyclic garbage
FYI Given this design, this will prevent cycles when replaced directly.
However, we could still eventually have non-discoverable cycles if we somehow design bindings that allow transitive keep_alive
calls to be strung together.
The more holistic solution would be to replace keep_alive
with this custom implementation (but only in one direction), s.t. it's no longer an issue.
However, the cost is that anything wanting to extend lifetimes must use dynamic_attr()
declarations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all discussions resolved, LGTM from assignees jwnimmer-tri(platform),EricCousineau-TRI(platform)
bindings/pydrake/common/ref_cycle_pybind.h
line 14 at r3 (raw file):
Previously, EricCousineau-TRI (Eric Cousineau) wrote…
FYI Given this design, this will prevent cycles when replaced directly.
However, we could still eventually have non-discoverable cycles if we somehow design bindings that allow transitivekeep_alive
calls to be strung together.
The more holistic solution would be to replacekeep_alive
with this custom implementation (but only in one direction), s.t. it's no longer an issue.However, the cost is that anything wanting to extend lifetimes must use
dynamic_attr()
declarations.
Good point. To restate things, it's still possible to accidentally create immortal objects by arranging py::keep_alive
links in a loop.
I suppose I was at least unconsciously aware of that, but I think the danger is slight, or at least no greater than it is for any other pybind11 users. I think I am content with the notion that we make obvious badness better, and remain aware of potential problems.
If we were to discover a new immortality problem, I would recommend:
- (maybe) build some instrumentation that can dump out graphs of gc-trackable and
py::keep_alive
reference relations. - Modify this code to support a call policy that makes a one-way gc-trackable reference relation.
I don't think either of those is difficult, if we hit the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs platform: +@SeanCurtis-TRI for platform review, by schedule.
Reviewable status: LGTM missing from assignee SeanCurtis-TRI(platform)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee SeanCurtis-TRI(platform)
a discussion (no related file):
a thread to disable the squash button while awaiting platform review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 5 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: 8 unresolved discussions
bindings/pydrake/common/ref_cycle_pybind.h
line 16 at r3 (raw file):
`ref_cycle` creates a reference count cycle that Python's cyclic garbage collection can find and collect, once the cycle's objects are no longer reachable. Both peer objects must by pybind11 classes that were defined with
nit: typo
Suggestion:
be
bindings/pydrake/common/ref_cycle_pybind.h
line 17 at r3 (raw file):
collection can find and collect, once the cycle's objects are no longer reachable. Both peer objects must by pybind11 classes that were defined with the dynamic_attr() annotation.
BTW It's worth elaborating on this requirement. It's detected at runtime with an exception. (Although, the exception message doesn't allude to this language.) I'd like to see this documentation and that error message converge a bit closer.
Code quote:
dynamic_attr() annotation
bindings/pydrake/common/ref_cycle_pybind.h
line 26 at r3 (raw file):
* M keeps N alive, and N keeps M alive. * Neither object is finalized until:
BTW I realize it's a python term of art, but given that this is in the context of Drake, the term "finalize" is a bit overloaded. Perhaps a bit less terse?
Code quote:
finalized
bindings/pydrake/common/ref_cycle_pybind.h
line 68 at r3 (raw file):
private: static constexpr bool needs_result() { return Peer0 == 0 || Peer1 == 0; }
BTW Initially, this whole thing was completely opaque to me (as a non-member of the Jeremy/Eric/Rico group). Initially, I was calling for a life ring.
After reading this a bit more, I have a sense of what this is about. I think it's still worth making a note for drake developers that aren't steeped in pybind.
bindings/pydrake/common/ref_cycle_pybind.cc
line 37 at r3 (raw file):
DRAKE_DEMAND(PyType_IS_GC(Py_TYPE(peers.ptr()))); a.attr(refcycle_peers) = peers; Py_DECREF(peers.ptr());
BTW An explanation of why this explicit reference management is necessary would be nice.
bindings/pydrake/common/test/ref_cycle_test.py
line 25 at r3 (raw file):
# python call) cost. @functools.cache def wrapped_refcount_cost():
BTW CRAZY!
bindings/pydrake/common/test/ref_cycle_test_util_py.cc
line 12 at r3 (raw file):
namespace { template <bool AmIDynamic>
BTW While this only appears twice, bool expressions are typically statements and not questions. However, given that you've taken the most likely candidate for predicate and turned it into a thing, meh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 5 unresolved discussions
bindings/pydrake/common/ref_cycle_pybind.h
line 68 at r3 (raw file):
Previously, SeanCurtis-TRI (Sean Curtis) wrote…
BTW Initially, this whole thing was completely opaque to me (as a non-member of the Jeremy/Eric/Rico group). Initially, I was calling for a life ring.
After reading this a bit more, I have a sense of what this is about. I think it's still worth making a note for drake developers that aren't steeped in pybind.
I suppose what's missing is "zero refers to the return value" with a citation to here?
4aaa667
to
6413baa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all discussions resolved, LGTM from assignees jwnimmer-tri(platform),SeanCurtis-TRI(platform),EricCousineau-TRI(platform)
bindings/pydrake/common/ref_cycle_pybind.cc
line 37 at r3 (raw file):
Previously, SeanCurtis-TRI (Sean Curtis) wrote…
BTW An explanation of why this explicit reference management is necessary would be nice.
got rid of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: complete! all discussions resolved, LGTM from assignees jwnimmer-tri(platform),SeanCurtis-TRI(platform),EricCousineau-TRI(platform)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: complete! all discussions resolved, LGTM from assignees jwnimmer-tri(platform),SeanCurtis-TRI(platform),EricCousineau-TRI(platform)
bindings/pydrake/common/ref_cycle_pybind.h
line 18 at r4 (raw file):
reachable. Both peer objects must be pybind11 classes that were defined with the
BTW From reading the code, I'm not sure if it stictly "must" be a pybind11
class.
For example, someone could define their own Python class and pass it here.
bindings/pydrake/common/ref_cycle_pybind.h
line 86 at r4 (raw file):
private: // Returns true if either template parameter denotes the return value. static constexpr bool needs_result() { return Peer0 == 0 || Peer1 == 0; }
BTW result()
is ambiguous. Consider renaming to needs_return_value()
. I was initially confused until I had read the source code in my first review.
bindings/pydrake/common/ref_cycle_pybind.cc
line 33 at r4 (raw file):
// N.B.: `object` does automatic ref counting; `handle` does not. if (!hasattr(a, refcycle_peers)) { auto new_set = pybind11::reinterpret_steal<object>(PySet_New(nullptr));
BTW This could be simplified by using py::set()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: 3 unresolved discussions
bindings/pydrake/common/ref_cycle_pybind.h
line 68 at r3 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
I suppose what's missing is "zero refers to the return value" with a citation to here?
The new documentation does that, and @EricCousineau-TRI 's comment disambiguating "result" from "return value" also captures it.
bindings/pydrake/common/ref_cycle_pybind.h
line 86 at r4 (raw file):
Previously, EricCousineau-TRI (Eric Cousineau) wrote…
BTW
result()
is ambiguous. Consider renaming toneeds_return_value()
. I was initially confused until I had read the source code in my first review.
Eric has succinctly summarized my confusion.
Introduce the new annotation internal::ref_cycle<M, N>(). It will eventually replace existing cyclic py::keep_alive<>() annotations (which do their job, but leak their participants forever). The participants of ref_cycle<>() will be garbage collectible, so that applications can run loops that use various drake components without exhausting memory. This patch just adds the implementation and its unit test.
6413baa
to
6e2eda9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r5, all commit messages.
Reviewable status: complete! all discussions resolved, LGTM from assignees jwnimmer-tri(platform),SeanCurtis-TRI(platform),EricCousineau-TRI(platform)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r5, all commit messages.
Reviewable status: complete! all discussions resolved, LGTM from assignees jwnimmer-tri(platform),SeanCurtis-TRI(platform),EricCousineau-TRI(platform)
Introduce the new annotation internal::ref_cycle<M, N>(). It will eventually replace existing cyclic py::keep_alive<>() annotations (which do their job, but leak their participants forever). The participants of ref_cycle<>() will be garbage collectable, so that applications can run loops that use various drake components without exhausting memory.
This patch just adds the implementation and its unit test.
Full WIP at #22022.
This change is