-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TMC2209 with SKR 2.0 using software serial now appears broken #136
Comments
I just hooked up a TMC2209 driver to a F446 Nucleo board and it initializes ok. This with the SKR 2 map and UART X routed to an available pin.
Changes were for bringing the code in line with other drivers and available for more STM32F4xx boards. If you decide to pull out the scope then the |
@terjeio
Then I started looking for what is different. There is a #define in my_machine.h for ESTOP_ENABLE in the latest code but was not present in the code I used last year. The default is commented out. When I leave it commented out, I cannot communicate with the 2209s. But if I uncomment the line, so ESTOP_ENABLE is 0, then I can communicate with the 2209s and all is as it was before. I do not understand why this would change anything. When I install new firmware, with it commented out I had to do a $14=71 to get rid of the ESTOP error. When I uncomment the line, I only need $14=7 to get everything going. Perhaps you know why this would affect the comms to the 2209s. If so can you explain to me what just happened? Regarding the uart timing: I noticed that the response baud rate coming back from the 2209 is actually faster than the baud rate sent to the 2209. At the very beginning of a read, the mcu is sampling the bit at approximately the center of the bit period, but after many bits, the sampling period slowly walks to later positions in time and eventually gets within ns of the bit transition from the 2209. This is not new behavior, it has existed since day 1, I just never noticed it before. I think the 2209 may not be measuring the bit width very accurately and with a baud rate of 100000. When it tries to reply with the same baud rate as we used to make the request, this is as close as it gets. Problem is we may be just on the edge of things working. I will look into the 2209 data sheet to see if there might be a baud rate that it can duplicate more precisely. Maybe a more traditional rate like 115200 would actually be better. If I make any improvements in this area I will let you know. Thanks, |
@terjeio Bill |
@terjeio What I have found is that things sort of work, but only if there is a single 2209 in the system. When I boot from this code, and I change the stepper driver selection to only include X, then I can indeed get status from that driver. Resetting the board then shows status that includes TMC=1. But at that point it does not like my Y or Z drivers. In contrast, the version on 2/5/2022 12:41:09 AM works just fine. Since this is related to 1 vs multiple 2209s, it may have been missed unless tested with multiple drivers. My quick comparison of the code from before and after did not find anything, but I was not looking to see how the X, Y, and Z calls worked, I only looked at the algorithm for the software serial which I thought was OK. The only changes in this checkin were related to the UART code. You may see an issue quicker than I can since I have to figure out how the multi-channel stuff works, but I will see if I can find it. Appreciate your help if you can. Thanks, |
Some initial findings: There is a bug in trinamic.c that will cause a hardfault if the X driver is not enabled. Change this line to After setting some breakpoints and reiniting the drivers multiple times I've found that the crashes seems to related to transmit, not receive. This is with a single physical driver connected to a single UART pin that is mapped to all logical drivers. |
Here is the end of a read at 38400 baud, sampling looks near perfect to me: I wonder if an interrupt delaying the next timer interrupt during transmission of the sync bits can affect driver baud rate detection? It is a pretty small window to hit though, and the timer interrupt has the highest priority. I am using UART to connect to the controller, USB might have longer latencies and thus be problematic? |
I took the latest code from a fresh clone. I deleted tmc_uart.c and replaced btt_skr_2.0.c and btt_skr_2.0_map.h with those from 2/5/2022. There was a warning and an error due to changes in enumerate_pins, but those are easily fixed. Basically, I wanted to see if going back to the old code works with everything else up to date. And it works just fine. I think I really need to dig out the debugger and just step through things, but being lazy I have started swapping old code for new code to see if I can quickly determine what the culprit is. I expect it will turn out to be something very subtle. I will let you know once I find it. Bill |
If we do delay one of the bits from another interrupt, as long as the delay is less than about 1/2 the baud rate we should be OK. My theory is that on a read we are trying to sample in the middle of the bit. We could be delayed all the way until the end of the bit and still get the same sample value. On a write, I am thinking the UART inside the 2209 is more sophisticated and could tolerate a comparable amount of jitter. My observation of the sample point walking toward the end of the bit cycle worries me. This is caused by 2209 behavior as the SKR uses the same timer for transmit as for receive. The 2209 has to measure the baud rate of the SKR during the request, and attempt to output its data using the same baud rate, but I think it is only able to get close, not perfect. With other interrupts causing delays, I can see the SKR getting a bad sample. I think once I understand why the new uart code does not work I will see if there is something that can be done to make this a bit more robust as well. |
Try this with baud rate set to 100000:
It sends a longer initial start bit - and seems to fix the issue. Bedtime is long overdue so I quit for today.... |
Disregard the previous comment, too tired when I wrote it so forgot a change elsewhere. The solution seems to be to resync the timer on each byte (wait for the start bit), the following version seems to be robust at 200 Kbaud, I have not tested any faster. FYI I use 230400 baud for hardware UART, I have not seen any problems with that so the drivers can handle it. Relying on no delay between bytes from the driver was a tad too optimistic? |
Terje, Did you eliminate the slave addressing at some point? |
Terje, Now the mystery is solved for me. Now that I have my debug environment all set up, I will take a look at your latest code you sent me. By the way, regarding re-syncing for each byte: The 2209 data transfer is not really asynchronous. When it sends data back, it sends it as a block, not as a bunch of individual bytes, so it should not be necessary to re-sync per byte. However, I don't see any down side to it other than a bit more work per byte to locate the START bit. If that means being able to run faster, that is probably a good thing, too. |
When using separate UARTs for each driver slave addressing does not make sense, and some boards does not even allow it. So it is disabled in the shared code. I missed that when I switched to the shared code for the SKR2, sorry about that.
I am lazy to look it up myself - is that stated in the datasheet? Anyway, drifting of the sample point that I see is not always present and only randomly causes read failures - sometimes very seldom. Either the initial sync bits sent from the controller is sligthly out of timing (due to another interrupt interfering?) or the response is. Anyway, resyncing helps so whatever triggers the drifting does not matter. From the datasheet: "The bit time is calculated by measuring the time from the beginning of start bit (1 to 0 transition) to the end of the sync frame (1 to 0 transition from bit 2 to bit 3)." |
Terje, Now regarding your code resampling for START every byte: I like it! The only change I would make is you have called stop_listening() in a few places where SetTx() had been called. In all cases, either a very long timeout has occurred or the mandatory 12 bits times have already elapsed, so waiting another 5 bit times I think is unnecessary. And since START_DELAY is no longer used, I would just pull it out of the code to eliminate any confusion. If a bit error occurs, there should be a CRC error. Does the driver detect this and try again? |
I have reverted this already, I made the change when experimenting and having trouble with the timer IRQ enabled state.
It is detected and the command silently fails. Retries can be added but should be done with raising an alarm if times out? |
Terje, thanks |
You have to add
Not yet, a programmers reference manual (PRM) should be written - but I have no idea about when I am able to do that. I have started to add markup to the source though, this in order to generate some of the documentation. See here for both the HAL API and some core helper functions. |
Terje, I never noticed before, but I apparently did not have 100% communication with my 2209s. I could reset the board and watch the traffic on my scope and maybe 1 out of 10 or 20 times, things would croak on a driver. I could get the Y driver to fail pretty consistently in the same place, and it was always a read where it just did not respond. I figured it must be getting a CRC error on the read request. This would be a good argument to have error checking on the reads and writes. So that gave me an idea. I added some code to TMC2209_WriteRegister() to check the ifcnt value after each write, and retry the write until the count incremented. I added some code the TMC2209_ReadRegister() to retry if it timed out or if it got a CRC error. With these changes in place, I am unable to see any failures during initialization, but things take a bit more time. Then I switched to the new driver that re-syncs on every received byte, and again could not see any problems. I removed my error checking and still cannot see any problems, but I don't really have any formal test. What are your thoughts regarding some error checking when using software serial? I see that during driver init that the code checks to make sure the writes were good, but after that the return status of _WriteRegister an _ReadRegister is ignored. I think retrying when an error occurs could be beneficial, especially if the return codes are ignored. Is there a way to raise an alarm if some maximum number of retries was exceeded? Not sure what else makes sense to do. Thanks. |
I have finally returned to my CNC project and built latest code for BTT SKR 2.0 with TMC2209. It appears the software serial code no longer works, or I need to configure something else that I am not aware of. The last build I made was 1/24/2022 and that works.
I took a quick look at the new tmc_uart.c file and compared it to the code from btt_skr_2.0.c where it used to be. I found a couple things that looked to be issues but I have been unsuccessful fixing them. I am trying to avoid pulling out my scope to look at the comms:-).
The status in iosender does not show a TMC in NEWOPT the way it did last year. But it does show Trinamic as a PLUGIN. My old firmware shows TMC=7. I cant do a M122 or M122I since the comm doesnt work.
I am happy to help resolve this, but need a starting point. There are a lot of changes to the code that I was not a part of.
Thanks!
The text was updated successfully, but these errors were encountered: