Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RoundRobin in recovery is not reset when main branch success #4737

Open
FrGrQuim opened this issue Oct 31, 2024 · 2 comments
Open

RoundRobin in recovery is not reset when main branch success #4737

FrGrQuim opened this issue Oct 31, 2024 · 2 comments

Comments

@FrGrQuim
Copy link

Bug report

Required Info:

  • Operating System:
    • Custom Linux
  • ROS2 Version:
    • Iron source
  • Version or commit hash:
    • 1.2.9
  • DDS implementation:
    • CycloneDDS

Steps to reproduce issue

In our Behavior Tree, we have a RecoveryNode with a RoundRobin in the recovery branch, structured as follows:

revovery_round_robin drawio

Expected behavior

We expect that each time the main action succeeds, the round_robin is reset. Therefore, the next time the main action fails, the first recovery action is triggered.

This behavior is illustrated by the following sequence:
1 - Main action: Fail
2 - Recovery A: Success
3 - Main action: Fail
4 - Recovery B: Success
5 - Main action: Success
... (later)
6 - Main action: Fail
7 - Recovery A is call

Actual behavior

Currently, we observe that the roundRobin is not reset when the main branch succeeds. Therefore, the next time the main action fails, the roundRobin continues incrementing the child index.

This behavior is illustrated by the following sequence:
1 - Main action: Fail
2 - Recovery A: Success
3 - Main action: Fail
4 - Recovery B: Success
5 - Main action: Success
... (later)
6 - Main action: Fail
7 - Recovery C is call

Additional information

After checking the code, we noticed that the only way to reset the roundRobin is by calling the halt function. However, in the BehaviorTreeCPP library, the halt function is only called when the target node is in the RUNNING state (which seems consistent with the function name). The issue is that the Recovery node does not call the roundRobin's halt function when it is in the RUNNING state.

My question, then, is: Is this the intended behavior? And if so, do you have any suggestions on how to implement the expected behavior sequence?

@SteveMacenski
Copy link
Member

SteveMacenski commented Nov 5, 2024

This seems to be the intended behavior as also evidenced by the documentation: https://docs.nav2.org/behavior_trees/overview/nav2_specific_nodes.html#control-roundrobin, but this is worth some discussion and potential behavior changes (I'd suggest reading this page before continuing, I myself refer to it in these questions). The text below is a bit of my own logical process to document how I came to my conclusions / suggestions for potential improvements.

The control node in a BT regulates the execution of the child nodes it has access to (in this case, your RoundRobin control node has child nodes of Recoveries A, B, and C & your RecoveryNode has children of your main action and the RoundRobin).

The result of the main action is not communicated to the RoundRobin directly as they are both children of the same parent node and are thus independent of each other to communicate statuses. Rather the handling of RoundRobin's state based on the state of the main action is on the logic of the parent they share, i.e. RecoveryNode.

It sounds like you would like the RecoveryNode to reset the state of the recovery child (second child, in this case RoundRobin) when the primary task (in your case, main action) succeeds. When halt() is called in the RoundRobin, we seem to already reset the state of the child IDs to start back at its first child node (RecoveryA) like you want.

So it seems that the behavior you want is really tied into the fact that halt() isn't being called on the recovery child of the RecoveryNode once the primary task succeeds. In implementation, ControlNode::haltChild(1);. Right now, we call RecoveryNode's halt() which resets its state and calls ControlNode::halt(), but that only calls halt() on currently RUNNING nodes. There's been discussions about that in BT.CPP, but ultimately that's the right move according to the maintainers.

So, I think perhaps adding ControlNode::haltChild(1); when the first child is successful would fix that problem. I would appreciate if you tested that to verify this resolves the behavior issue you expect and get back to me.

The follow up question is: should we make this change in the RecoveryNode in Nav2 by default. We shouldn't halt the recovery child each time we exit the RecoveryNode (since we want to maintain some state on FAILURE to tick the next node), but would SUCCESS be a special case given the intent of the BT Node's second child as a recovery mechanism? Would we also have some SUCCESS state being maintained - perhaps if we're doing something like a PipelineSequence or other Control Node as a container for more complex logic under the RecoveryNode's second child (i.e. recovery child).

I think ... possibly so, someone might want to maintain some state. But also, if the action is now successful, wouldn't it make sense to start again from the top - given the explicit intent of the RecoveryNode? So I'm leaning towards approving that change for Nav2's RecoveryNode more generally. I think restarting the recovery child's work each time its newly failing is sensible for the design and intent.

Let me know what you think! We'd just need to add a sentence or two to that documentatation I linked above about that behavior & a couple of sentences in the migration guide explicitly telling people this has changed and we should be good to go.

@SteveMacenski
Copy link
Member

@tonynajjar what do you think? I know you're thinking about BTs these days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants