Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix object destruction order #497

Merged
merged 11 commits into from
Mar 4, 2020
Merged

Conversation

ivanpauno
Copy link
Member

Supersedes #470.
See #470 (comment).

Solves strange bugs related with destruction order.

Included in this PR:

  • rclpy_handle_t struct, related methods, and usage of it in Handle class.
  • Replaces usage of PyCapsule for rclpy_handle_t where needed.
  • Some formatting changes (avoid unnecessary castings, etc).

Not included (subject of a follow up):

  • rmw_request_id_t capsule is leaking.
  • rmw_qos_profile_t capsule is leaking.
  • Refactor wait set related code, so it can be wrapped by a Handle.
  • Refactor rclpy_action objects, so they can be wrapped in a Handle.

@ivanpauno
Copy link
Member Author

CI above rclpy:

  • Linux Build Status

@ivanpauno
Copy link
Member Author

New CI, after fixing failures:

  • Linux Build Status

@ivanpauno ivanpauno added the bug Something isn't working label Jan 21, 2020
rclpy/CMakeLists.txt Show resolved Hide resolved
rclpy/rclpy/handle.py Outdated Show resolved Hide resolved
rclpy/src/rclpy/_rclpy.c Outdated Show resolved Hide resolved
@@ -2766,7 +2744,7 @@ rclpy_wait(PyObject * Py_UNUSED(self), PyObject * args)
if (!PyArg_ParseTuple(args, "O|K", &pywait_set, &timeout)) {
return NULL;
}
rcl_wait_set_t * wait_set = (rcl_wait_set_t *)PyCapsule_GetPointer(pywait_set, "rcl_wait_set_t");
rcl_wait_set_t * wait_set = PyCapsule_GetPointer(pywait_set, "rcl_wait_set_t");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanpauno why doesn't rcl_wait_set_t follow the same treatment that other rcl types' allocations do i.e. using rclpy_handle_t too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Currently, the rcl_wait_set_t capsule didn't have a deleter, and deletion was handled manually in Python.
To avoid doing this PR bigger, I didn't refactor how wait sets were used. It should be done in a follow up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's open an issue to track that work.

rclpy/src/rclpy_common/include/rclpy_common/handle.h Outdated Show resolved Hide resolved
rclpy/src/rclpy_common/src/handle.c Show resolved Hide resolved
rclpy/src/rclpy_common/src/handle.c Show resolved Hide resolved
rclpy/src/rclpy_common/src/handle.c Outdated Show resolved Hide resolved
rclpy/src/test/rclpy_common/test_handle.cpp Outdated Show resolved Hide resolved
@ivanpauno ivanpauno requested a review from hidmic January 28, 2020 17:56
Copy link
Contributor

@hidmic hidmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but we should follow up on leaking resources ASAP

rclpy/src/test/rclpy_common/test_handle.cpp Outdated Show resolved Hide resolved
rclpy/src/test/rclpy_common/test_handle.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@sloretz sloretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. Will continue later.

rclpy/rclpy/handle.py Show resolved Hide resolved
rclpy/rclpy/handle.py Outdated Show resolved Hide resolved
rclpy/rclpy/handle.py Outdated Show resolved Hide resolved
rclpy/rclpy/handle.py Outdated Show resolved Hide resolved
rclpy/src/rclpy_common/src/handle.c Outdated Show resolved Hide resolved
The managed `rclpy_handle_t` object will not be destructed immediately if another handle
called `other.requires(this)`.
In that case, the managed object will be destroyed after all its
dependents are destroyed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can Handle call destroy() on all the downstream handles too? Without that, there must be something else that has duplicate information of which handle requires what so they can be destroyed in the right order. If I want to destroy Node, I must destroy the publishers, subscribers, etc created from it first. In practice this responsibility is owned by Node.destroy_node(), but that only works because Node also knows which handles require its handle.

That duplicate knowledge is hard to do for Context. Without that knowledge duplicated it means rclpy.shutdown() doesn't actually shutdown the library; instead everything must wait for the garbage collector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That duplicate knowledge is hard to do for Context. Without that knowledge duplicated it means rclpy.shutdown() doesn't actually shutdown the library; instead everything must wait for the garbage collector.

Up to now, shutting down rclpy wasn't destructing all the rcl created objects, as Context wasn't using a Handle. If we want that feature, I would rather do it in a follow-up PR, and limit this one to fix the order destruction problem (it can be done by keeping track of all the nodes created by a context).

There's also some inconsistency of what happens when a node is destructed:

  • Publishers, Subscriptions, Clients, Services, Timers, Guards created by a node are destructed.
  • Action clients and action servers aren't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also rely on things being explicitly destroyed in reversed order and fail if destruction of an object other objects still depend on is attempted.


If I may, I have to ask, where does this need of explicitly destroying objects come from? It's somewhat incompatible with Python, it creates invalid objects, it makes resource management more complex and it could be easily circumvented by calling gc.collect if need be. It almost seems to me like we're fighting the language to do things in a way it was not meant to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I may, I have to ask, where does this need of explicitly destroying objects come from? It's somewhat incompatible with Python, it creates invalid objects, it makes resource management more complex and it could be easily circumvented by calling gc.collect if need be. It almost seems to me like we're fighting the language to do things in a way it was not meant to.

I hear you. This is an old discussion. I've been on that side before, and I'm still mostly on that side. I agree we don't need to free memory right away. There's nothing special about our ram usage vs python's, so if the python garbage collector is good enough for some then it is good enough for all.

The reason to destroy these objects is not about their memory usage, it's about freeing system resources. Explicitly freeing system resources without waiting for the garbage collector is not fighting the language; it's pretty normal.

Maybe it could be circumvented by calling gc.collect(), but that's not how python objects usually solve this situation: https://docs.python.org/3/tutorial/inputoutput.html .

If you’re not using the with keyword, then you should call f.close() to close the file and immediately free up any system resources used by it. If you don’t explicitly close a file, Python’s garbage collector will eventually destroy the object and close the open file for you, but the file may stay open for a while. Another risk is that different Python implementations will do this clean-up at different times.

Yes rcl objects have to be destroyed in order, but that has nothing to do with why we want to destroy them. We want to destroy them because that's the only way the rcl API gives us to free system resources they use. Depending on the rmw implementation, nodes/pubs/subs/etc open sockets, they're using network bandwidth by continuing to participate in discovery, subscribers are receiving data from other publishers on the network, service servers may be accepting requests even if the intent is for another server to take their place, etc. That's where the need of explicitly destroying objects comes from.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the point of freeing resources. But, my point is that rclpy.shutdown is currently not doing so. I think that the responsability of doing that should be of the Context itself, and not of Handle. I can add that, but IMO it's subject of a follow up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is that rclpy.shutdown is currently not doing so. I think that the responsability of doing that should be of the Context itself, and not of Handle. I can add that, but IMO it's subject of a follow up.

That's a fair point. I think I'm getting hung up on what I intended Handle to do vs the reality of that never being fully implemented. I do like the approach this PR takes, and I think it makes it pretty easy to get to a state of everything being cleaned up right away.

If h1.requires(h0) caused h0 to store a weakref to h1, then h0.destory() could call h1.destroy() without worrying about destruction order of h1's dependents because the reference counting in this PR would take care of that. I think doing that is important for the reasons above, but you're right it doesn't need to be done right away if the feature doesn't fully exist now. I'll dismiss my review since it seems to be blocking progress. Sorry about that.

@sloretz sloretz dismissed their stale review February 3, 2020 18:37

Can be done in follow up PR

@ivanpauno ivanpauno force-pushed the ivanpauno/fix-destruction-order-2 branch from 267e393 to cd37c5a Compare February 12, 2020 19:54
@ivanpauno ivanpauno requested a review from hidmic February 12, 2020 19:55
rclpy/rclpy/handle.py Outdated Show resolved Hide resolved
rclpy/src/rclpy/_rclpy.c Outdated Show resolved Hide resolved
rclpy/src/rclpy/_rclpy.c Outdated Show resolved Hide resolved
rclpy/src/rclpy/_rclpy.c Show resolved Hide resolved
rclpy/src/rclpy/_rclpy_qos_event.c Show resolved Hide resolved
rclpy/src/rclpy_common/include/rclpy_common/handle.h Outdated Show resolved Hide resolved
@ivanpauno ivanpauno force-pushed the ivanpauno/fix-destruction-order-2 branch from cd37c5a to 034da37 Compare February 14, 2020 13:16
@ivanpauno ivanpauno requested a review from hidmic February 14, 2020 13:39
@ivanpauno ivanpauno force-pushed the ivanpauno/fix-destruction-order-2 branch from c0c56b9 to b08ad1b Compare February 17, 2020 14:40
@ivanpauno
Copy link
Member Author

  • Linux Build Status

@ivanpauno
Copy link
Member Author

@hidmic About the regression test you asked, I gave it a try, but it's super hard to reproduce the problem.
The only example of the problem I found is #470, which is not more reproducible after #488 and #490.

My main reason to move lifetime manangement to C is that destruction order is not documented in Python, nor how the gc works. Both of them are an implementation detail of a particular python interpreter, and users shouldn't make any supposition about how they work.

From PEP 442:

Cyclic isolate (CI)
A standalone subgraph of objects in which no object is referenced from the outside, containing one or several reference cycles, and whose objects are still in a usable, non-broken state: they can access each other from their respective finalizers.

For CI objects, the order in which finalizers are called (step 2 above) is undefined.

@hidmic
Copy link
Contributor

hidmic commented Feb 27, 2020

I think listing that PEP to back up rclpy_handler_t rationale would be great for our future selves.

Copy link
Contributor

@hidmic hidmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but for a question, though some green CI would be nice too

rclpy/src/rclpy/_rclpy.c Outdated Show resolved Hide resolved
rclpy/src/rclpy_common/include/rclpy_common/handle.h Outdated Show resolved Hide resolved
with node1.handle:
pass
finally:
rclpy.shutdown(context=context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanpauno why were these tests dropped? they certainly have to be rewritten but still apply conceptually, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests were relaying in using requires, which is not exposed more.
I think that the tests I wrote in https://github.com/ros2/rclpy/blob/39a93ac5cde247b53137441e34f431d68ba211c4/rclpy/src/test/rclpy_common/test_handle.cpp are a good replacement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, fair enough.

Signed-off-by: Ivan Santiago Paunovic <[email protected]>

Add node handle dependeny on context handle

Signed-off-by: Ivan Santiago Paunovic <[email protected]>
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
@ivanpauno ivanpauno force-pushed the ivanpauno/fix-destruction-order-2 branch from b08ad1b to 39a93ac Compare February 27, 2020 21:03
@ivanpauno
Copy link
Member Author

I think listing that PEP to back up rclpy_handler_t rationale would be great for our future selves.

See 826e6cc.

@ivanpauno ivanpauno requested a review from hidmic February 27, 2020 21:06
Copy link
Contributor

@hidmic hidmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM again but for a few nits, and pending green CI

rclpy/src/rclpy_common/include/rclpy_common/handle.h Outdated Show resolved Hide resolved
rclpy/src/rclpy_common/include/rclpy_common/handle.h Outdated Show resolved Hide resolved
rclpy/src/rclpy_common/include/rclpy_common/handle.h Outdated Show resolved Hide resolved
rclpy/src/rclpy_common/include/rclpy_common/handle.h Outdated Show resolved Hide resolved
rclpy/src/rclpy_common/include/rclpy_common/handle.h Outdated Show resolved Hide resolved
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
@ivanpauno
Copy link
Member Author

ivanpauno commented Mar 2, 2020

  • Linux Build Status (unrelated failures)
  • Linux-aarch64 Build Status (unrelated failures)
  • macOS Build Status
  • Windows Build Status (unrelated failures)

Signed-off-by: Ivan Santiago Paunovic <[email protected]>
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
@ivanpauno
Copy link
Member Author

ivanpauno commented Mar 3, 2020

  • macOS Build Status (unrelated failures)

@ivanpauno ivanpauno requested review from sloretz and hidmic and removed request for dirk-thomas and sloretz March 3, 2020 13:59
Copy link
Contributor

@hidmic hidmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! :shipit:

@ivanpauno ivanpauno merged commit 131a91d into master Mar 4, 2020
@delete-merged-branch delete-merged-branch bot deleted the ivanpauno/fix-destruction-order-2 branch March 4, 2020 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants