Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] MI Inverters not working with startFastWrite usage #1434

Open
1 task
joerocklin opened this issue Feb 17, 2024 · 26 comments
Open
1 task

[Bug] MI Inverters not working with startFastWrite usage #1434

joerocklin opened this issue Feb 17, 2024 · 26 comments
Assignees
Labels
bug Something isn't working

Comments

@joerocklin
Copy link

Platform

ESP32

Assembly

I did the assebly by myself

nRF24L01+ Module

nRF24L01+ plus

Antenna

external antenna

Power Stabilization

Elko (~100uF)

Connection picture

  • I will attach/upload an Image of my wiring

Version

0.8.83

Github Hash

5ebfe5a

Build & Flash Method

VSCode - Platform IO (build & flash)

Setup

I have MQTT set up, pin settings for my device (a feather huzzah32), and updates for region/timezone. I don't know what else has been changed.

Debug Serial Log output

No response

Error description

The commit in 0.8.77 (1534952) caused all communication with my MI inverters to cease. I've tracked it down to the change in hmRadio.h from mNrf24->startWrite to mNrf24->startFastWrite:

mNrf24->startFastWrite(mTxBuf.data(), len, false, true); // false (3) = request ACK response; true (4) reset CE to high after transmission

Changing this back to the following restores communication with the inverters:

mNrf24->startWrite(mTxBuf.data(), len, false); // false = request ACK response

The documentation for the startFastWrite call mentions needing to call txStandBy, and I don't see that anywhere. I'm not very familiar with the ahoy code, so I'm not sure where it would be appropriate to put that call.

@joerocklin joerocklin added the bug Something isn't working label Feb 17, 2024
@lumapu
Copy link
Owner

lumapu commented Feb 17, 2024

@rejoe2 @tictrick did you hear from txStandby command?
I can't imagine that MI inverters aren't accessible any more because rejoe2 has at least 4 of them.

@knickohr
Copy link

Please specify your MI-Inverters.

@joerocklin
Copy link
Author

I have MI-1200s and possibly a couple of MI-1000s.

Im not 100% sure though. Part of my interest in this project was because I don't think the people who installed my system used what they told me they were going to.

@rejoe2
Copy link
Contributor

rejoe2 commented Feb 18, 2024

Do the serial numbers start with 1061 (2nd gen MI) or 1062 (3rd gen)?

Nevertheless, issueing a txStandby() when entering the rx mode in hmRadio might be a good idea.
We did some tests with 2nd gen 4ch MI as well, so basically the startFastWrite() command imo isn't the root cause.
But there is at least one known bug with MI (state machine) breaking MQTT coms in some cases after new day has started.

@joerocklin
Copy link
Author

joerocklin commented Feb 18, 2024

All of them start with 1061. I know I have 4 inverters each with 4 panels connected and 2 more that each have 3 panels. The serial numbers on them make me think they are all 1200's.

I was wondering about the MQTT pieces as well. I'm happy to help figure things out any way I can. I'm comfortable with C/C++, just don't know a lot about Ahoy codebase.

@rejoe2
Copy link
Contributor

rejoe2 commented Feb 19, 2024

All of them start with 1061. I know I have 4 inverters each with 4 panels connected and 2 more that each have 3 panels. The serial numbers on them make me think they are all 1200's.

1061 imo just is "MI (2nd Gen), 4 channels". Could be anything from MI-1000 to MI-1500. Afaik, only some additional info requested from the inverter reveals more on the max. power, that's encoded on the inverter's EEPROM (besides that hardware seems to be identical...).
In hmDefines.h (starting with line 295) you'll find a list on the known subnumbers and the max power (more or less, at least afa we know).

I was wondering about the MQTT pieces as well. I'm happy to help figure things out any way I can. I'm comfortable with C/C++, just don't know a lot about Ahoy codebase.

For better understanding and communication with us what's going on, I'd recommend to activate "Serial debug" and "privacy mode" in "Settings/System config". Then you'll get a shortened output on "Webserial". (I assume, with 4ch MI you'll get a lot of buffer overflow messages as well...)

As Ahoy is very "asynchronous" in what's happening, here's a rough walkthrough:
app::tickSend() check's if it's time to send any request to the inverters and will enqueue the respective command. In case of 4ch MI this basically will be 0x36, in the starting phase there will be some other requests too, e.g. to get the power details mentionned above.
Communication.h (loop) will then check if there's anything to be send out now and will forward the next command to the respective radio. After having send out the message via nRF24, this transceiver will raise an interrupt. loop() in hmRadio.h (for HM/MI) then will react on that an switch to rx mode.

In case of 2 and 4ch MI, after reception of the first (data) response, the code will follow up with more requests (in case of 4ch: 0x37-0x39).

For an example output of a 4ch MI see e.g. https://drive.google.com/drive/folders/1BwN2WV4zxEumq4SlQfkA3gD4yJeOG9V2 and code at the end here.

As far as we know, the change to startFastWirte() is ok, as

If the auto retransmit/autoAck is enabled, the nRF24L01 is never in TX mode long enough to disobey this rule.

This was just to leaves the CE PIN high to enhance the chance to get ACK messages from the inverters (when using a pa+lna version nRF24 board). So in case this really changes your rf results, this may be due to the fact, the code is now faster issueing the follow up commands (0x37 etc.). Then we should look for other solutions then to lower the chances to be fast enough for most inverters to get all message frames...
Some more explanation to that - my personal assumtion is Hoymiles having noticed this way to communicate to be rather inefficient (one request per channel), so they modified that to having an appropriate number of answer frames to a single (more complex) request. So my 3rd gen MI-1500 will answer to a "production data request" (MainCmd 0x15 / SubCmd 0x0b) with 4 response frames - ideally without the need of more than one request send out for the entire result.
The Ahoy code base is more orientated towards this kind of complex request => multiframe answer, the "MI way" is always some kind of "strange exception"...

Hope this will help you to get into the code.

12:41:39.098 I: (#1) Radio infos: 0 4 0 0 0 | t: 3, s: 3, f: 0, n: 0 | p: 0
12:41:39.099 I: (#1) TX 11 CH23, 9 ret., rx offset: 2 | 36 75 36
12:41:39.168 I: (#1) RX  22ms | 27 CH61 | b6 01
12:41:39.169 W: (#1) next request (5 attempts left): 0x37
12:41:39.170 I: (#1) TX 11 CH23, 9 ret., rx offset: 2 | 37 42 00
12:41:39.262 I: (#1) RX  44ms | 27 CH61 | b7 01
12:41:39.263 W: (#1) next request (6 attempts left): 0x38
12:41:39.264 I: (#1) TX 11 CH23, 9 ret., rx offset: 2 | 38 4d 00
12:41:39.333 I: (#1) RX  22ms | 27 CH61 | b8 01
12:41:39.334 W: (#1) next request (7 attempts left): 0x39
12:41:39.335 I: (#1) TX 11 CH23, 9 ret., rx offset: 2 | 39 4c 00
12:41:39.402 I: (#1) RX  19ms | 27 CH61 | b9 01
12:41:39.403 I: (#1) got all data msgs

@rejoe2
Copy link
Contributor

rejoe2 commented Feb 19, 2024

Addditional remarks:
As pa+lna is kept enabled, this may slightly consume more power than before. So make sure to use a power stabilisation capacitor as recommended!

@rejoe2
Copy link
Contributor

rejoe2 commented Feb 22, 2024

@joerocklin - any news on this?
Note: there's been an update on the MI state issue as well, but unfortunately this is more or less just to get the code more readable and more logical structured. To be honest: I did no longer understand what I had coded before...

The problem itself seems to be the Alarm array to be somehow/somewhen reorganized, so the single values stored there are no longer at the same place as before. (Or there's another mistake I wasn't able to find until now).

@joerocklin
Copy link
Author

I haven't been able to get back into the code too much yet. I did pull in the changes and tested again though. A quick check just showed that it still doesn't seem to get anything from the inverters with the StartFastWrite call (I waited about 5 minutes. Putting the StartWrite call back started receiving after the 3rd comm loop.

@rejoe2
Copy link
Contributor

rejoe2 commented Feb 22, 2024

Would you mind adding some serial output from both versions?
For the settings see

For better understanding and communication with us what's going on, I'd recommend to activate "Serial debug" and "privacy mode" in "Settings/System config". Then you'll get a shortened output on "Webserial". (I assume, with 4ch MI you'll get a lot of buffer overflow messages as well...)

@rejoe2
Copy link
Contributor

rejoe2 commented Feb 22, 2024

Additionally: please provide some more info on the nRF module you are using. As you mentionned an external antenna, it's pa+lna version, right? Shielded, unshielded? Sure it's genuine nordic transceiver in it?

@joerocklin
Copy link
Author

I am using a pa+lna version. It is not shielded. I thought it was an authentic nordic transciever, but it looks like there's a round dot on the IC, so I would call that suspect.

Here are some logs, pulled from a direct serial connection and not the webserial output (I missed the boot part of the startFastWrite, but it starts before wifi is connected):
MI-StartWrite.log
MI-StartFastWrite.log

@rejoe2
Copy link
Contributor

rejoe2 commented Feb 22, 2024

OK, so a "blob" nRF module is really suspicious... In the good old MySensors days these had often been described as "rubbish", esp. the range is really poor with these afai remember.

But: First make sure, Ahoy can reach a NTP server (or set time with your browser). Just yesterday we had a (regular HM) user here with a rather similar log without any reception and no time info in it (see #1440 (comment)).

Additionally: Event when it's working with startWrite(), the rf performance is rather poor. Giving a slightly increased power level a try might be a good idea (no guarantee, esp. with the "blob" "nRFs").

@rejoe2
Copy link
Contributor

rejoe2 commented Feb 22, 2024

@lumapu - any idea why keeping CE high seems to conflict with a non set time in the mcu? Should we actively put it to low after sending is over?
Additionally: In the oscilloscope pictures you showed on discord to evaluate the CE state with both of the funcions (and before as well!) the CE state was HIGH long (?) before the Write command has been transferred to the nRF. This imo is suspicious. CE should be low as long as we do neither send nor wait for frames from the inverters. Or did I miss something?

@lumapu
Copy link
Owner

lumapu commented Feb 22, 2024

I have no idea and I see no correlation between both. The time should be synced directly after booting the ESP.
Are there more logs as the both on a previous post here where you refer to?

@rejoe2
Copy link
Contributor

rejoe2 commented Feb 22, 2024

I have no idea and I see no correlation between both. The time should be synced directly after booting the ESP. Are there more logs as the both on a previous post here where you refer to?

No, and there's no confirmation from @joerocklin until now on startFastWrite() working as expected as soon as time is synced. Nevertheless this coincidence imo is remarkable, so let's see what @joerocklin reports on that...

I had a look in the code wrt. to why there's the tx log entry at all - even as it seemed to be "night time" in the other case. @joerocklin - do you have "no communitation at night" activated as well? Does deactivating this feature make a difference also when using startFastWrite() ?
@lumapu - not sure, but perhaps we have the commands send out in the startup phase but do no longer listen if it's "night"? Strange...

@joerocklin
Copy link
Author

@rejoe2 I do have 'Pause communicate at night' activated for all inverters.

I tried pushing the power up to 'High', but that didn't show any change. So I tried the 'Max' setting, and I started getting data. But then it stopped communicating with all inverters. I reverted back to the startWrite method and lower power and it started working. I'll see if I can move the board closer to the inverters this weekend and do some more testing.

@rejoe2
Copy link
Contributor

rejoe2 commented Feb 24, 2024

@rejoe2 I do have 'Pause communicate at night' activated for all inverters.

Do you have a synced time? Or is the missing timestamps just caused by using USB serial debugging? (You may simply use the Webserial debug. At least here, it takes around 11 seconds to log in there, the output starting from then on is sufficient imo..

I tried pushing the power up to 'High', but that didn't show any change. So I tried the 'Max' setting, and I started getting data. But then it stopped communicating with all inverters. I reverted back to the startWrite method and lower power and it started working. I'll see if I can move the board closer to the inverters this weekend and do some more testing.

Sounds to some extend like a problem with the power consumption of the longer activated pa+lna? (This is the effective difference of the two "write" methods). Could you please try a different power supply (and/or just a different USB cable) for the DTU?

@rejoe2
Copy link
Contributor

rejoe2 commented Mar 1, 2024

@joerocklin - any news?

@joerocklin
Copy link
Author

Work and life got really busy and I haven't had a chance to sit down and investigate.

@rejoe2
Copy link
Contributor

rejoe2 commented Mar 5, 2024

OK, thanks for the short update.

@stefan123t
Copy link
Collaborator

stefan123t commented Oct 29, 2024

The documentation for the startFastWrite call mentions needing to call txStandBy, and I don't see that anywhere. I'm not very familiar with the ahoy code, so I'm not sure where it would be appropriate to put that call.

@joerocklin @rejoe2 you may want to read up on our discussion in the upstream nRF24/RF24#877

@2bndy5 could you have a quick look at this and give us a short recap of the right way to use the RF24 lib, please 😊 ?
startWrite vs. startFastWrite and what the txStandBy may be used for.

@2bndy5
Copy link

2bndy5 commented Oct 30, 2024

We had a brief discussion about the various write*() confusion (see nRF24/RF24#816). We ended up augmenting the docs to try and relieve the confusion (see "which write should I use?").

Some opinionated ranting

Personally, I never liked how manicBug originally wrote the multitude of write*() methods (over 10 years ago). They all focus on what I call "babysitting" as if all RF24 users are infants that won't understand async concepts.

I personally have written a few different implementations for the nRF24L01. And each implementation only has a blocking send() and a non-blocking write() (which would be used in conjunction with other API about processing the Status/IRQ flags). I'm currently writing an embedded rust implementation (with python & node.js bindings) using this approach (again).

The documentation for the startFastWrite call mentions needing to call txStandBy, and I don't see that anywhere. I'm not very familiar with the ahoy code, so I'm not sure where it would be appropriate to put that call.

If you are using interrupts, then the code doesn't need to explicitly call txStandby(). Instead, you would just handle the interrupt to see if the transmission succeeded or not (using RF24::whatHappened()).

I reverted back to the startWrite method and lower power and it started working

If startFastWrite() is failing but startWrite() is working, then it is likely that the radio's CE pin is not being held HIGH long enough (mandatory minimum 10 microseconds) to initiate the transmission. This is the only real difference between the 2 functions.

Warning

Lowering the PA level (referred to here as the "power setting") is a software hack that avoids insufficient power supply problems when using PA/LNA modules. We use this tactic in our examples solely for that reason. If you find that lowering the PA level is the solution, then you really need to reassess your module's power supply and make sure it complies with whatever power requirements are stated by the manufacturer (ebyte or others).

See also "My PA/LNA module fails to transmit".

Looking at the source link in OP

I see the code flushes the RX FIFO (👍🏼) and enters TX mode (stopListening() also calls flush_tx() when ACK payloads are enabled 👍🏼) immediately before calling startFastWrite(). This makes me think that the mandatory 10 us pulse on CE pin is not (always) satisfied.

@rejoe2
Copy link
Contributor

rejoe2 commented Oct 30, 2024

Sorry for beeing out of all that stuff for quite some time now. Atm. doing some tests would be a little complicated as well..

This makes me think that the mandatory 10 us pulse on CE pin is not (always) satisfied.

On the one hand, that sounds quite logic, but on the other side, afair the cause for using startFastWrite() had been startWrite() putting CE to low to soon. Our intention had been to keep CE high for much longer than the hardcoded 10us in startWrite(), as it had turned out to be contraproductive for receiving ACKs from the inverters for those using PA+LNA modules.

So where's the point in the code to put CE to low to early? Short look points to stopListening(), but that's called when a (much longer) timeout is reached or we got all fragments expected. Confused.

On the other hand, adding a short (10ms) delay after the mMillis = millis(); line should not do any big harm and should make sure, the requirements are met.
Right, or did I miss something essential?

@lumapu - Your opinion on that?

@2bndy5
Copy link

2bndy5 commented Oct 30, 2024

To clarify, I said mandatory 10 microseconds, not 10 milliseconds. This is important when interrupts are involved. I'm not familiar with all the code here, but typically you shouldn't delay() during a ISR.

@rejoe2
Copy link
Contributor

rejoe2 commented Oct 30, 2024

Sorry, typo, 10us had been meant as delay.

This pice of code is outside the ISR, so a short delay here may just lead to other tasks like webfrontend or MQTT transmissions not executed that fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants