-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dispatcher deadlock #10482
Comments
Hi @v-lopez May I first ask if the problem still occurs if you do not include initial_reset:=true in the launch instruction. initial_reset is optional and can be useful if problems are occurring during launch or after launch. But it is not a requirement to set it to true. |
It doesn't occur without |
I also note that you are using kernel 5.13.0-40-generic. The librealsense SDK does not officially support kernel 5.13 and although the SDK can work with unsupported kernels it can have unforseen consequences in regards to stability. The most recent supported kernels are 5.4 when building the SDK from Debian packages, and 5.8 and 5.11 when building it from source code. The kernel can though be bypassed if building the SDK from source code with the RSUSB backend installation method, which is not dependent on Linux versions or kernel versions and does not require patching. |
I am running it with |
What other issues have you been experiencing when initial_reset is not true, please? |
I don't have the logs anymore, but some of the streams such as Infra1 were not starting. |
The reference to _devices_changed_callbacks_mtx gives me the impression that when a camera is reset during the launch, the computer cannot find it again after it has disconnected when reset (devices_changed_callback handles events related to listening for camera connection and disconnection). Are you using the official 1 meter USB cables supplied with the camera or your own choice of USB cable? |
No, another 0.5m USB3.2 cable that we found was able to deliver what we needed. I wrote a workaround in the library to stop the dispatched thread before at the beginning of the device destructor and I avoid the deadlock and the system works as expected. But I don't know the code well enough to determine if this makes sense. |
There is no information that I know of regarding the effects of using a shorter USB cable than 1 meter with RealSense cameras unfortunately as USB cable references usually relate to 1 m length or greater. Quality matters with cables used with RealSense cameras though due to the high volume of data that the cameras can transmit along the cable and it should be a cable designed for data transfer rather than just device charging. Having said that, if your 0.5 m cable works well with your workaround then it indicates that your cable choice is fine. In regard to whether your workaround is 'correct', I am not involved in SDK development as I am a Support Engineer and so do not have advice that I can offer on your method. I would recommend performing long-run tests with the 5 cameras for periods such as 12 hours or longer to confirm whether your changes are stable. If they are stable after repeated successful long-run tests then the workaround is likely fine. |
Hi @v-lopez Do you require further assistance with this case, please? Thanks! |
No, if you want me to submit my patch as a PR let me know. |
A PR would be a useful reference for other RealSense users with a similar problem and you are very welcome to submit one. It is of course completely optional though whether you do so or not. Thanks again! |
This deadlock is now fixed, as far as we know, on the development branch. Please see #12275. |
Issue Description
I have a deadlock when starting my realsense node.
I have 5 cameras connected, but I can reproduce this with just one camera connected.
I am launching it with
initial_reset:=true
, the device enumeration phase gets stuck and never ends:Upon attaching with GDB I can see that thread 17 is a dispatcher thread stuck on this line, having acquired
_dispatch_mutex
. And following the callstack it is stuck on this line, waiting for the_devices_changed_callbacks_mtx
.On the other hand, thread 20 is destroying the same realsense device here, which is calling
unregister_internal_device_callback
, acquiring_devices_changed_callbacks_mtx
and then calling_device_watcher->stop();
here. Which is requires_dispatch_mutex
, but it is possessed by thread 17 causing the deadlock.The full callstack are below:
thread_20.txt
thread_17.txt
I have had this happen as many times as attempts I've made. I have rebooted all realsense applications but not rebooted my computer yet, which may clear the issue, but I worried about this happening in the future in a customer facility.
The text was updated successfully, but these errors were encountered: