-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sonoff RF bridge OB38S003 Portisch FW freezes after 24-48 hours #19
Comments
I got some problems just after receiving one long code (66 bytes, I think). It seems to keep receiving, but beeping causes issues. This might be something else than your problem but I will dig into my problem more and hopefully we can find resolution to both problems. I already added stack trap for the last two bytes of internal RAM and it seems like it is not stack overflow, but not really sure about that. I have plans to add traps after arrays in external RAM to see if those are overflowing |
I think that my problem with beeping after receiving large bucket is some other bug as it seems that nothing is overflowing. I need to add more traps and keep board running until it stops receiving buckets and then check the traps. It will take some time |
I found the beep bug. https://github.com/mightymos/RF-Bridge-OB38S003/blob/main/src/main_portisch.c#L880 should be I think that I found the B1 overflow too, but not sure about that as it is working with the memory traps, but memory trap after buckets (https://github.com/mightymos/RF-Bridge-OB38S003/blob/main/src/portisch.c#L53) is changed. I think that the check here https://github.com/mightymos/RF-Bridge-OB38S003/blob/main/src/portisch.c#L729-L733 should be done before writing the duration on row 724. For my understading it is now checking the overflow condition after the overflow has already happened, but correct me if I have been thinking something silly. It seems that there is same kind of bug with RF_DATA (https://github.com/mightymos/RF-Bridge-OB38S003/blob/main/src/portisch.c#L737-L755) if I understand the logic correctly. I haven't tried those changes yet. I tried to understand from the memory map what happens with the overflow. It seems to overwrite bucket_sync variable and I just don't understand why that disables the B1 listening. Maybe I don't need to understand everything. Or maybe my build wiht the memory traps is mapping the memory in a different way than the original. |
I'm sorry, this goes beyond what I know about these microcontrollers. I should also report that I have strange thing happen where my window blind (B0 transmit) goes up on its own without anyone's input. I check ESPHome logs and I find nothing. |
@otsoni thanks for the catch on the beep, I think you are right and I think I fixed it. I do not have the buzzer installed on my board so I do not actually hear any beep. I will try to look at the original portisch logic and see if I understand if there are issues as you highlighted. I think I made a mistake previously with setting state for the state machines in the main file instead of between two files. I think it is back to the original portisch steps, but I just did a quick check that hardware could send, receive, etc, no long time period checking. I will eventually need to set the state in one file because it is too confusing to follow the logic when it is set between two files. But I am hoping by making it like original portisch for now testing can be more stable. Sorry for all the testing need. I will make the code more readable later, new release is published. EDIT: I am concerned about the blinds opening on their own, and will try to focus on soon. Observed that once ESPHome receives a bucket decoding (0xB1) it sends back an acknowledge. On original portisch receiving an ack kicks us out of sniffing mode and back to standard decoding (as tested manually with Tasmota). The same is also happening with ESPHome now. I guess my question is there a reason to leave it in sniffing mode or was it just to stress the firmware to look for overflow or bug behavior? |
Thanks guys for looking into it. The bridge was not in sniffing mode, just in standard decoding. I am not sure why the bridge is sending commands on its own, no one is touching the physical remote either. |
I just observed I receive 00C303 codes on my original portisch black box after flashing a new yaml to the white sonoff. My guess is that ESPHome is executing a default action on startup so that the visual toggle in HA matches the last executed action. However, I am not yet ready to blame ESPHome until this behavior can be confirmed. EDIT: |
I still think that this should fix the code receiving problem. I can't make a build for you with those fixed as for some reason I get I think that I could try to use other rf-bridge in passthrough mode to send code with too many buckets and test if I can get other rf-bridge to freeze straight away. |
|
Nice that you foud out the bug. Next I will try to make some kind of radio signal that causes the bridge to freeze. I think it should have at least 9 different bucket lengths. I hope that I can create it with passthrough firmware and esphome. I let you know when I have some more information about that. |
Support for learning mode has been added. Portisch port should be feature complete with original now. I do not fully understand it, but Portisch computes a checksum along with an 800 millisecond delay with repeat codes. I also made some organizational changes to make state machines more readable for me. @otsoni If you are making changes to the code you need to monitor the .MEM file to make sure you are not going over microcontroller memory limits (ram, xram, flash). Thanks for trying out your testing strategy, it is difficult for me to test on my own. @zd3sf Thanks for the additional work with the blinds. I know the example yaml you have is probably diverging from my own. If you want to somehow consolidate changes later or post your own custom yaml file I can include it in the example folder and not edit it once it seems complete. |
Thanks @mightymos, we can merge the YAMLs at some point. most of the changes are situation-specific and not related to the general operation of the bridge. I added one line
I tried the latest firmware, standard and bucket recieve work as intended. But, I couldn't get transmit to work at all standard or Bucket. I tried re-sniffing the codes and transmit them, but didnt work. |
I have been monitoring the .MEM file but it gives me the error also when building without any changes to the main branch. I will create separate issue about that. |
My bad. It seems that my Ubuntu WSL had some old version of sdcc (I think it was 4.0.0) in it's repositories and updating to 4.4.0 seems to fix build problems. I will continue to debug the freezing problem |
@zd3sf I just realized that you have problems with 0xA4 codes freezing and I have been using just 0xB1 sniffing. It seems that the problems with overflowing buffers are not related to 0xA4, just 0xB1 sniffing. So there might be two separate freezing problems. (Or one that is the cause for both and I have been tracing just wrong things) |
I made a new release to declare feature complete with learning mode. During development I used one of the hardware timers for software uart to output debug information. Anyway, using two timers for delays was the correct architecture choice. If it is still not working with your devices I will need to hookup oscilloscope to check signal timings. If you could once again be patient and try it that would help, I'll hope for good news. |
Ah okay, good catch. |
Okay testing 0.4.8 standard transmit/receive working fine. For door sensor, I revive 10 signals each trigger (like rcswitch) , while the broken 0.4.7 would receive only one due to the checksum timer.
B0/B1 working. I can control the blinds. Like previous versions, Bucket sniffing sniffs one code then stops. You'd have to reset the mcu and start sniffing again. Doesn't bother me because I only need the codes for the blinds.
I disabled the mcu reset automation from home assistant to see if the mcu will freeze.
|
Freezing happened overnight. Standard and B0 transmit remained working, but standard receive stopped working until an MCU reset. |
For receive only. Transmit keeps working fine!
Sonoff RF bridge V2.2 with [v0.4.5] portisch firmware and ESPHome 2024.11.2
After 24-48 hours of continuous operation, the MCU stops receiving 0xA4 codes. Restarting the ESP doesn't resolve the problem.
Workaround: reset MCU using rfraw AA FE 55, sending a command momentarily (e.g bucket sniffing, advanced transmit etc), or power unplug. Either of these work! MCU can be reset on a schedule using home assistant or ESPHome.
Logs: none yet, just changed ESPHome log level to Debug and will record results in an update.
Update: no logs from the MCU shows up on ESPhome.
The text was updated successfully, but these errors were encountered: