Getting master going offline error #1

weithegreat · 2024-12-04T05:55:09Z

I am running with eRobTest, it's run on a RTLinux.
It connects with slaves fine, but very fast it moves to master goes offline error.
Is there error tolerance I can set to devices to not report such error?

ZeroErrControl · 2024-12-04T07:28:17Z

I am running with eRobTest, it's run on a RTLinux. It connects with slaves fine, but very fast it moves to master goes offline error. Is there error tolerance I can set to devices to not report such error?

Thank you for reporting this issue.

The problem you described is likely caused by system latency or insufficient real-time performance, which affects stable communication between the master and slaves.

Even with RTLinux, we recommend running the following command to verify if the maximum scheduling latency exceeds millisecond-level ranges:

sudo cyclictest -m -p99 -t1 -i100 -a3

In a similar case with Jetson Nano (without RTLinux), we achieved 4 hours of stable testing through the following optimizations:

Reducing system latency: Disable unnecessary tasks to lower system load.
Maximizing system performance: Ensure all cores are running at maximum frequency.
Kernel isolation: Bind EtherCAT tasks to specific CPU cores to avoid interference.
Regarding error tolerance, EtherCAT's strict real-time requirements mean that system latency-induced disconnections cannot be resolved by adjusting error thresholds. Instead, we recommend focusing on optimizing system performance and resource allocation.

Additionally:

Check slave AL status codes to identify the disconnection cause.
Verify that Distributed Clock (DC) synchronization is stable, as desynchronization may exacerbate the issue.
Let us know if you need further assistance or have additional logs to share.

weithegreat · 2024-12-04T07:50:11Z

I noticed you commented out ec_configdc() compared to simple test, why is it no needed?
What is the right sequence of calling
ecx_dcsync0();
ec_configdc();
ec_config_map(&IOmap_);

and do they all have to be called inside PRE_OP state? Or SAFE_OP state?

weithegreat · 2024-12-04T07:55:19Z

Also, I noticed you're calling ec_send_processdata immediately after ec_receive_processdata.
Is it possible to do some data processing after ec_receive_processdata, then call ec_send_processdata? Would it cause problem?

ZeroErrControl · 2024-12-04T08:34:32Z

Also, I noticed you're calling ec_send_processdata immediately after ec_receive_processdata.
Is it possible to do some data processing after ec_receive_processdata, then call ec_send_processdata? Would it cause problem?

This is an excellent question. Here's what we found:

When using ec_configdc(), we noticed that the slave devices were unable to enter DC mode, which in turn prevented them from reaching the OP state. To investigate this, we analyzed the process of initializing eRob with the TwinCAT master (a standard EtherCAT master) by capturing network packets. During this analysis, we observed that the TwinCAT master writes the value 3 to the register at address 0x0981, and the value of object dictionary 1C32 is set to 2 to indicate the slave has correctly entered DC mode.

Based on these observations, we found that calling ecx_dcsync0() in the PRE_OP state, before ec_config_map(), satisfies these conditions. While ec_configdc() is theoretically meant to be called in the SAFE_OP state to calculate the master’s reference clock, we haven’t identified any significant differences when calling it in PRE_OP or SAFE_OP. You might want to try both and see how it works for your setup.

Thus, the correct sequence should be:

ecx_dcsync0() -> preop
ec_config_map() ->preop
ec_configdc() ->safeop
Let us know if you have any further questions or issues.

ZeroErrControl · 2024-12-04T08:43:33Z

Also, I noticed you're calling ec_send_processdata immediately after ec_receive_processdata.
Is it possible to do some data processing after ec_receive_processdata, then call ec_send_processdata? Would it cause problem?

We haven't done much testing on this yet. This example is primarily designed to address the issue where the SOEM master cannot bring eRob into the OP state, with simple extensions for enabling and motion functionality.

We recommend that you try experimenting with your setup and focus on developing and optimizing the current code on the basis of stable operation of the master.

weithegreat · 2024-12-04T08:47:01Z

If I set the ecat DC cycle time to 1ms, what is the maximum delay for EROB unit to report master go offline?Can I set Cycletime to 2ms but still send and receive data at 1ms to avoid entering master go offline error.We have 20 devices on Ecat line, 15 erob acuators, but only EROB reports master go offline. There got be some threshold value I can set to avoid this error. This error is catastrophic for our system, we can tolerate delay in communication but cannot tokerate master go offline error Sent from my iPhoneOn Dec 4, 2024, at 12:34 AM, ZeroErr ***@***.***> wrote: Also, I noticed you're calling ec_send_processdata immediately after ec_receive_processdata. Is it possible to do some data processing after ec_receive_processdata, then call ec_send_processdata? Would it cause problem? This is an excellent question. Here's what we found: When using ec_configdc(), we noticed that the slave devices were unable to enter DC mode, which in turn prevented them from reaching the OP state. To investigate this, we analyzed the process of initializing eRob with the ###TwinCAT### master (a standard EtherCAT master) by capturing network packets. During this analysis, we observed that the TwinCAT master writes the value 3 to the register at address 0x0981, and the value of object dictionary 1C32 is set to 2 to indicate the slave has correctly entered DC mode. Based on these observations, we found that calling ecx_dcsync0() in the PRE_OP state, before ec_config_map(), satisfies these conditions. While ec_configdc() is theoretically meant to be called in the SAFE_OP state to calculate the master’s reference clock, we haven’t identified any significant differences when calling it in PRE_OP or SAFE_OP. You might want to try both and see how it works for your setup. Thus, the correct sequence should be: ecx_dcsync0() ec_config_map() ec_configdc() Let us know if you have any further questions or issues. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>

ZeroErrControl · 2024-12-04T09:31:57Z

If I set the ecat DC cycle time to 1ms, what is the maximum delay for EROB unit to report master go offline?Can I set Cycletime to 2ms but still send and receive data at 1ms to avoid entering master go offline error.We have 20 devices on Ecat line, 15 erob acuators, but only EROB reports master go offline. There got be some threshold value I can set to avoid this error. This error is catastrophic for our system, we can tolerate delay in communication but cannot tokerate master go offline error

This is a highly technical question. We will conduct experiments and tests based on your issue to try and resolve it. However, we currently do not have a specific solution. Here are our suggestions:

Try adjusting the watchdog timeout settings in SOEM.
Configure the DC mode offset time properly.
Capture the error codes reported when the master goes offline to facilitate further troubleshooting.
Set both the cycle time and the data transmission time to 2ms for consistency.
Additionally, setting the DC cycle time to 2ms while maintaining a transmission interval of 1ms can also result in the master going offline.

weithegreat · 2024-12-04T09:43:19Z

How can I adjust the watchdog timeout settings in SOEM. Sent from my iPhone

…

On Dec 4, 2024, at 1:32 AM, ZeroErr ***@***.***> wrote: Try adjusting the watchdog timeout settings in SOEM.

weithegreat · 2024-12-04T19:22:59Z

The master station goes offline problem is really a headache for us, only eROB is reporting it

ZeroErrControl · 2024-12-05T01:44:47Z

How can I adjust the watchdog timeout settings in SOEM. Sent from my iPhone
…
On Dec 4, 2024, at 1:32 AM, ZeroErr @.***> wrote: Try adjusting the watchdog timeout settings in SOEM.

You can refer to the development documentation of the SOEM master station.

ZeroErrControl · 2024-12-05T01:46:00Z

The master station goes offline problem is really a headache for us, only eROB is reporting it

We plan to use the SOEM master to reproduce the issue you mentioned. If there are any results, we will notify you at the earliest opportunity.

ZeroErrControl · 2024-12-06T07:33:23Z

I am running with eRobTest, it's run on a RTLinux. It connects with slaves fine, but very fast it moves to master goes offline error. Is there error tolerance I can set to devices to not report such error?

Currently, we have removed the CPU affinity binding for Thread 1 and Thread 2 in the new program （eRob_test.cpp） to avoid unpredictable scheduling delays. After making this adjustment, we tested the program on the RT Linux system to drive six eRob units, and it has successfully run stably for over one hour. We recommend testing the long-term enabling of eRob first. If the issue of dropping out of OP still occurs, I will further optimize the master program.

weithegreat · 2024-12-06T07:37:24Z

Did you reproduce the "Master go offline problem"? I don't care about dropping out of OP, "master go offline" is a fatal issue for us.
We're using RT system, we assigned a dedicated core only for the ethercat update thread, but it will immediately drop out if there's even a tiny slight glitch in timing. Once master station goes offline, there's no way to clear the fault, actuator stays in fault and won't work.

ZeroErrControl · 2024-12-06T08:02:51Z

Did you reproduce the "Master go offline problem"? I don't care about dropping out of OP, "master go offline" is a fatal issue for us. We're using RT system, we assigned a dedicated core only for the ethercat update thread, but it will immediately drop out if there's even a tiny slight glitch in timing. Once master station goes offline, there's no way to clear the fault, actuator stays in fault and won't work.

To be precise, I haven't fully understood what you mean by "master going offline." Could you provide the specific scenarios, the messages printed by the master, and the EtherCAT slave messages during the disconnection? This would help me better reproduce the issue. Previously, I interpreted the master going offline as the same as dropping out of OP state.

weithegreat · 2024-12-06T15:39:22Z

ECAT device send error code 0XA000, and enter fault state Sent from my iPhoneOn Dec 6, 2024, at 12:03 AM, ZeroErr ***@***.***> wrote: Did you reproduce the "Master go offline problem"? I don't care about dropping out of OP, "master go offline" is a fatal issue for us. We're using RT system, we assigned a dedicated core only for the ethercat update thread, but it will immediately drop out if there's even a tiny slight glitch in timing. Once master station goes offline, there's no way to clear the fault, actuator stays in fault and won't work. To be precise, I haven't fully understood what you mean by "master going offline." Could you provide the specific scenarios, the messages printed by the master, and the EtherCAT slave messages during the disconnection? This would help me better reproduce the issue. Previously, I interpreted the master going offline as the same as dropping out of OP state. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>

Rohithossain007 · 2024-12-06T22:06:22Z

ECAT device send error code 0XA000, and enter fault state Sent from my iPhoneOn Dec 6, 2024, at 12:03 AM, ZeroErr @.> wrote: Did you reproduce the "Master go offline problem"? I don't care about dropping out of OP, "master go offline" is a fatal issue for us. We're using RT system, we assigned a dedicated core only for the ethercat update thread, but it will immediately drop out if there's even a tiny slight glitch in timing. Once master station goes offline, there's no way to clear the fault, actuator stays in fault and won't work. To be precise, I haven't fully understood what you mean by "master going offline." Could you provide the specific scenarios, the messages printed by the master, and the EtherCAT slave messages during the disconnection? This would help me better reproduce the issue. Previously, I interpreted the master going offline as the same as dropping out of OP state. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.>

Hey do you have screenshot or anything of the error that you are facing?

ZeroErrControl · 2024-12-16T09:53:09Z

I am running with eRobTest, it's run on a RTLinux. It connects with slaves fine, but very fast it moves to master goes offline error. Is there error tolerance I can set to devices to not report such error?

Through my attempts with the SOEM master, I identified several key points for optimization:

Thread Isolation and CPU Affinity: Bind the EtherCAT thread to a specific CPU core and isolate the core to reduce interference from network management tasks.
State Monitoring and Automatic Recovery: Add state monitoring and automatic recovery mechanisms to ensure quick recovery in case of exceptions.
Thread CPU Affinity: Set CPU affinity for critical threads to minimize scheduling delays.
Improve System Real-Time Performance: Optimize the real-time kernel configuration and thread priorities to ensure timely execution of cyclic tasks.
Communication Timeout Handling: Implement timeout detection to prevent system stalls due to communication issues.
These improvements significantly enhance the stability and real-time performance of the SOEM master.

ZeroErrControl · 2024-12-16T11:22:42Z

I am running with eRobTest, it's run on a RTLinux. It connects with slaves fine, but very fast it moves to master goes offline error. Is there error tolerance I can set to devices to not report such error?

In the latest upload, I have included the optimized PP mode master project, which has undergone multiple one-hour stability tests. Additional considerations have already been mentioned in my previous responses and will not be repeated here. The eRob_eCoder project can be used for testing purposes, but it should not be directly applied to real-world applications to avoid potential risks or unforeseen losses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting master going offline error #1

Getting master going offline error #1

weithegreat commented Dec 4, 2024

ZeroErrControl commented Dec 4, 2024

weithegreat commented Dec 4, 2024

weithegreat commented Dec 4, 2024

ZeroErrControl commented Dec 4, 2024 •

edited

Loading

ZeroErrControl commented Dec 4, 2024

weithegreat commented Dec 4, 2024 via email

ZeroErrControl commented Dec 4, 2024

weithegreat commented Dec 4, 2024 via email

weithegreat commented Dec 4, 2024

ZeroErrControl commented Dec 5, 2024

ZeroErrControl commented Dec 5, 2024

ZeroErrControl commented Dec 6, 2024

weithegreat commented Dec 6, 2024

ZeroErrControl commented Dec 6, 2024

weithegreat commented Dec 6, 2024 via email

Rohithossain007 commented Dec 6, 2024

ZeroErrControl commented Dec 16, 2024

ZeroErrControl commented Dec 16, 2024

Getting master going offline error #1

Getting master going offline error #1

Comments

weithegreat commented Dec 4, 2024

ZeroErrControl commented Dec 4, 2024

weithegreat commented Dec 4, 2024

weithegreat commented Dec 4, 2024

ZeroErrControl commented Dec 4, 2024 • edited Loading

ZeroErrControl commented Dec 4, 2024

weithegreat commented Dec 4, 2024 via email

ZeroErrControl commented Dec 4, 2024

weithegreat commented Dec 4, 2024 via email

weithegreat commented Dec 4, 2024

ZeroErrControl commented Dec 5, 2024

ZeroErrControl commented Dec 5, 2024

ZeroErrControl commented Dec 6, 2024

weithegreat commented Dec 6, 2024

ZeroErrControl commented Dec 6, 2024

weithegreat commented Dec 6, 2024 via email

Rohithossain007 commented Dec 6, 2024

ZeroErrControl commented Dec 16, 2024

ZeroErrControl commented Dec 16, 2024

ZeroErrControl commented Dec 4, 2024 •

edited

Loading