-
-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible race condition between Uart::write(data) and Uart::IrqHandler() ?? #708
Comments
Based on the title, I think #472 could also be related. I haven't looked at your issue in detail, but I think it's not exactly the same issue, but they might be good to compare. Also, I'd suggest you use |
Thanks @matthijskooijman for looking at this. However, the scenario I have identified by code inspection only occurs with an empty transmit Ringbuffer. It is a timing issue where interrupts are enabled and the samd Uart::IrqHandler is driven with a TXC interrupt from a previous write of a character to the UDR register AND this interrupts the Uart::write(data) method which is running in the normal (non IRQ) processing. I am assuming that any Uart IRQ interrupt will suspend non IRQ processing. I also assume there is only one processing thread on the MKRNB1500 Cortex processor. These seem like reasonable assumptions for a simple IOT device. I am using the write(data) code from the \samd\1.8.13\cores\arduino\Uart.cpp:
The window I am talking about is very small. It occurs when the Uart IrqHandler interrupts this code right after the else statement logic begins execution. The else logic is executed because the UDRE bit has not yet been set from the previous write on the Uart UDR register. If the UDRE bit is then set and TXC interrupt occurs right at the beginning of the else logic, the Uart IrqHandler would suspend this thread and begin execution. When the thread execution returns after the TXC interrupt is processed by the Uart IrqHandler, the while loop code is not executed because the transmit RingBuffer is empty. The data is then stored on the RingBuffer and UDRE interrupts are enabled. Unfortunately, the Uart IrqHandler will not get driven since the TXC interrupt for the previous write to the UDR has already completed processing. |
Yeah, I think those are indeed true. From a quick reading, your analysis sounds good (and it does seem plausible that the code has race conditions), though I have not looked at the code in detail. I also won't find time for that, so I will leave this for others to comment on for now (I just wanted to point out the other issue, which is also related but, as you say, indeed a different issue). |
To summarize I am running the modified Uart::flush() and Uart::write() methods documented at the top of this thread to make sure I don't hit the suspected issue I described above. I also believe these changes eliminate the need to define txBuffer as a SafeRingBuffer. There have been previous discussions about the lack of RingBuffer safety. See [https://github.com/arduino/ArduinoCore-API/issues/195] created by @KurtE. I also made a change to protect against a double read of the UDR. It looks like that could happen if there is a frame error.
I believe these modified Uart::flush() and Uart::write() methods would allow txBuffer to be defined as a simple RingBuffer for the following reasons:
I have not looked at the changes needed to allow the rxBuffer to be defined as a simple RingBuffer, however, @KurtE has discussed the need to eliminate the _numElems variable and only use the _iHead and_iTail indices in the RingBuffer class. This would require changes in almost all of the RingBuffer methods. I have made no changes in the RingBuffer class methods for txBuffer safety. |
I am working with a MKRNB1500 with MKRNB. Sometimes I see what appears to be an incomplete or no response from the SARA modem. The interface to the SARA modem is a UART serial interface.
I started looking at the UART interface code and I think I have identified a possible race condition between the Uart::write(data) method and the Uart::IrqHandler(). Here is the sequence:
I am not sure how the SARA modem would react to an incomplete AT command but it probably isn’t good.
In order to avoid a problem like this it seems like it would be better to first add the write data to the transmit RingBuffer and then use the Uart::flush() method to initiate the data transfer over the UART interface. The proposed code would require modifications to the Uart::write(data) and Uart::flush() methods:
Note that the Uart::flush() method must always be invoked to push the data out the UART interface. This doesn't seem to be much of a problem for the MKRNB1500 using the MKRNB library because the Modem.send() method in the Modem.cpp source file of the MKRNB library always does this.
I am now running a modified Uart.cpp with the above code changes to see if I still get my infrequent hangs. Note that the Uart::write(data) method above does not have the logic to address Issue #262. That issue only appears to happen when the Uart::write(data) is invoked with interrupts disabled or inside a higher priority ISR. Specifically, it appears that the Issue #262 reproduction test used println() commands inside an ISR to reproduce the problem and I am not doing anything like that. The Uart::write(data) issued on the SARA modem UART interface is only invoked in the background (non ISR) task priority.
I am not sure if I am missing something, and I would appreciate it if someone would comment on this hang scenario and possible solution.
The text was updated successfully, but these errors were encountered: