-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle on waiting #1483
Handle on waiting #1483
Conversation
Note, I didn't touch the other |
Does it make sense to add this to the unspawner script as well? |
Good question! That doesn't have the same check in it. Coming to think of this, since asking for permission instead of forgiveness is an antipattern (and actually breaks stuff: #1200). I would opt for modifying this PR to align spawner and unspawner to both retry-and-log instead of the more expensive check-and-try. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! LGTM and tested with/without running CM.
would you mind adding hardware_spawner
also to the helper scripts section in userdoc.rst, as you have fixed the arguments of the other scripts already there?
edit: the tests are failing now, could you also have a look there please? (see RHEL/debian workflows)
I need some assistance here. CI is failing on:
But I didn't change anything related to |
This was some issue with the rolling CI due to the transition to Ubuntu noble. The issues related to your changes are
|
121a8a5
to
1941427
Compare
I didn't have a massive amount of time to look into things the last couple of days but I wanted to finally fix #1182 today. My result is in #1501. This PR basically solves the first problem from #1182, as well, so +1 from me. But seeing the
indicates that there is still something missing. Edit: Actually, looking at the test it is obvious that this fails since this PR explicitly changes the spawner not to fail if it can't find the CM while the test expects it to fail. If that's the desired behavior that test probably simply has to be removed. |
I dropped the test. As it was testing if false inputs gave no result. It could be written if a warning was issued, but that would mean parsing stdout or stderr in tests. Which is also a bit frowned upon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good and I manually tested them in my setup.
If the removal of the controller_manager_timeout
parameter is acceptable I think this is actually a good solution. However, this is a breaking change, so now would be the best time to merge this.
Porting this back to Iron & Humble isn't that straightforward, we could use my implementation from #1501 for that where I also removed the wait for controller_manager functionality but added the controller_manager_timeout
for the first service call. Or we do indeed remove the functionality and print a deprecation warning on those if the timeout is not the default value.
Discussing in the working group today, we agreed on the following things:
@Timple Would you like to update this PR according to the first point please? |
Regarding backporting, keeping the original timeout with the option of setting it to indefinitely makes sense! However, I strongly opt for this to be the default in future releases. We've had flaky simulation and hardware startups because the default timeout was close to the boot time of the nodes. Flaky behavior seems the worst of all possible scenarios. |
I think @bmagyar might have an opinion on that. In the end that boils down to the default value of the timeout. I agree that a larger default timeout than 10 seconds might make a lot of sense, but in the spirit of making it fail by default and not leave launch files hanging indefinitely it might be good to still have one. |
Yes, but defaults are very important. Only scenario I can think of is some kind of catch mechanism that restarts/reboots. But very likely the result will be exactly the same, but debugging much harder as everything keeps restarting and printing a lot. If I'm missing a scenario here, or proper scenarios were discussed at the working group, let me know! |
One scenario is having launchfiles in an autostart job, e.g. a systemd unit where the shell output will never been seen by the user. In this case a failed unit is much more obvious to find than a unit repeating a log output. |
That's a valid use case. If the logs of the autostart jobs are evaluated before the ROS logs. Although I'd rather have that specific case as exception than flaky simulation and hardware startups. |
Aligns spawner and unspawner logic
9368265
to
7c2cdcc
Compare
We should keep timeout at it was and just add new option here. |
This pull request is in conflict. Could you fix it @Timple? |
Closing in favour of #1562 |
That was easier than anticipated, fixes: #1482 and #1200