-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context destruction order bug #470
Conversation
That's the theory anyways. From the print statements it looks like the context is being destroyed after the first node, but before the second node. Maybe there's a bug in |
I understand the same.
That's possible, though I unfortunately haven't found a bug.
Thanks!!! |
No conclusions yet, but I do have a workaround. The destruction order appears correct using diff --git a/rclpy/rclpy/context.py b/rclpy/rclpy/context.py
index 76a2fcc..2a26696 100644
--- a/rclpy/rclpy/context.py
+++ b/rclpy/rclpy/context.py
@@ -59,6 +59,7 @@ class Context:
with self._handle as capsule, self._lock:
rclpy_implementation.rclpy_shutdown(capsule)
self._call_on_shutdown_callbacks()
+ self.handle.destroy()
def try_shutdown(self):
"""Shutdown this context, if not already shutdown."""
|
Yeah, that fixes this specific problem. |
This. Should test that the garbage collector behaves as we'd expect it to? If ref counting does not result in ordered destruction, then the |
Yeah, that seems to be the case. It looks like the garbage collector is destroying If I modify the example so |
I'm surprised that Python doesn't ensure destruction order in this case at all, where there is no reference cycle nor anything strange, but I reached the same conclusion. About how to solve the problem, I'm thinking about implementing the handle type in C: typedef struct rclpy_handle_t {
void * ptr; // opaque pointer to the wrapped object.
size_t ref_count; // Reference count.
rclpy_handle_t ** dependencies; // array of pointers to dependencies.
size_t num_of_dependencies; // size of the array.
} rclpy_handle_t;
// Start with ref count 1, empty dependency list.
// A deleter might be added to the signature, though we just use malloc/free in rclpy.
rclpy_handle_t *
rclpy_create_handle(void * ptr);
// Increments ref count of the dependency, and add the dependency to the dependent list.
void
rclpy_handle_add_dependency(rclpy_handle_t * dependent, rclpy_handle_t * dependency);
// Decrements the ref count of the handle.
// If reaches zero:
// - Calls `rclpy_handle_dec_ref` of its dependencies.
// - Deallocates the handels.
void
rclpy_handle_dec_ref(rclpy_handle_t * handle); We can wrap that in a normal @hidmic @sloretz @dirk-thomas Does that sound reasonable? |
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
Signed-off-by: Ivan Santiago Paunovic <[email protected]>
I tested a bit more today, and I think I understand way Python doesn't destruct the object without respecting the order: Nodes create reference cycles when constructing a Lines 203 to 207 in d2c30d9
That means that all The I commented the mentioned lines, and the example works correctly. It's pretty hard to confirm that my supposal is correct, but I think it is. I don't think that the problem is that we're creating reference cycles when creating nodes (which probably is not ideal), the problem is the whole I think that all this "dependency order of objects" problem has to be solved in C, I don't think that it can be solved in Python. @dirk-thomas @sloretz @hidmic Friedly ping. I proposed in my last comment a solution, I would like to have some feedback. |
Aha! That's fun. I see why we may need to extend |
Would breaking these cycles solve the problem? If yes, I would suggest to do that instead of a custom handle class. E.g. considering the same in C++ we wouldn't use |
I'm not sure I understand here. Would rclpy_create_context() return a PyCapsule with a pointer to an instance of
What's the separation between I think I understand how |
That partially solves the problem. The problem is that if the user creates any reference cycle, and one object in that cycle references a Node, then that node can't be destroyed by reference counting more.
I think that's ok, maybe a little strange though.
That's a good question. Following that idea, what should happen when someone call |
Then lets fix those first.
And worry about user code which does create their own cycles in a second step. |
5f7f249
to
670673c
Compare
Yeah I agree it's a strange thing to do in Python. The goal is to remove stuff from the middleware asap. rclpy can't rely on the gc because it doesn't make any guarantees about when it will collect stuff.
I can see how this would work for nodes. If In the context of this PR, * It can be delayed by subscribers/timers/services/etc being used in |
The
I think that users shouldn't call directly
IMO, I don't think that I also want to mention that I consider my proposal "complex", but I have failed to find a simpler alternative. |
Right, not the same since there aren't any dependencies, but similar in that there's a resource to be free'd without waiting for the garbage collector.
That sums it up.
If a context manager is used to cleanup things without waiting for the garbage collector, then nesting context managers (nesting
It can be called |
I forgot about The proposal I made I think can also satisfy that requirement (as commented above). |
Essentially recreating |
Closing, superseded by #497. |
Intro
As part of ros2/rmw#183, I needed to ensure that the context is destructed after the nodes/publishers/etc. That's needed as now the
Participant
will be part of thecontext
, and theParticipant
is needed in the destruction of most of the entities (e.g.: https://github.com/ros2/rmw_fastrtps/blob/a16c45ef11a19361a15c585497bf373a7ed0bae6/rmw_fastrtps_shared_cpp/src/rmw_service.cpp#L75).Also,
rcl
documentation says that not doing so, isundefined behavior
https://github.com/ros2/rcl/blob/3a5c3a34191bd95491cb72c7458b9409452a28f0/rcl/include/rcl/context.h#L101-L106 (before, undefined meant nothing happened. Now it means a segfault).Commits
Handle
object to wrap the context capsule, and establishing a dependency of the node handle on the context handle. The other changes are just a consequence of this.Buggy example
The context is sometimes being destructed out of order, though it's hard to reproduce it.
Here is one example:
Example
Code
Output
Conclusion
The context is being destructed out of order. All the example where I observed this have two nodes (maybe someone is decrementing a reference twice (?)).Original bug
In the branches where I'm moving the
Participant
to the context, the following test failedrclpy/rclpy/test/test_handle.py
Lines 77 to 98 in d375c84
Some interesting facts:
colcon test --packages-select rclpy
python3 -m pytest -s test_handle.py
python3 -m pytest -s test_handle.py::test_handle_does_not_destroy_requirements
.I have been debugging, and found that the objects are being destructed by the
gc
.AFAIU, cpython
gc
doesn't ensure any destruction order (my understanding of cpython garbage collection process is really poor).