Sofar ME3000-SP requests often result in V5FrameError (modbus exception 5) or CRCError #307

coserotondo · 2024-12-09T08:08:19Z

Description

Very often all entities become unavailable for few seconds. The inverter is "always on" as it is a storage inverter Sofar ME3000SP.
The definition file is sofar_wifikit.yaml. I tried to manually query the registers and it works without errors.

Reproduction Steps

No specific actions to reproduce, it happens regularly.

Log

home-assistant_solarman_2024-12-09T07-35-27.188Z.log.filterline.log

Version

24.12.06_4

Home Assistant Version

2024.11.3

The text was updated successfully, but these errors were encountered:

davidrapan · 2024-12-09T09:45:27Z

Hi @coserotondo, it looks like your inverter from time to time replies w/ nonsense data... Did you tried to increase update_interval in your file?

coserotondo · 2024-12-09T13:30:44Z

Hi @davidrapan, nope, I left the default 5.

I tried only with min_span to read few registers at a time but if too low it doesn't initialize.

What would be a reasonable update_interval?

davidrapan · 2024-12-09T13:42:44Z

I mean it should be good w/ 5. But try incrementing by 5 and observe if it has any impact!

coserotondo · 2024-12-09T14:02:07Z

Trying 10.

Now I have: Error fetching FV data: V5 Modbus EXCEPTION: AcknowledgeError that never had before...

davidrapan · 2024-12-09T14:23:53Z

It's just correctly parsed exception but it's the same as invalid frame.

coserotondo · 2024-12-09T14:37:37Z

I see. Any changes to catch this retaining the previous state of the sensors (so without making them unavailable)?

davidrapan · 2024-12-09T14:53:57Z

What is stick's firmware version?

@githubDante, could you take a look when (and if) you have a moment?

coserotondo · 2024-12-09T15:06:59Z

Stick firmware version: H4.01.51MW.2.01W1.0.72(2018-11-271-D)

davidrapan · 2024-12-09T15:13:40Z

Could you also give me some "background"? Has it started behaving badly now or are you a new user?

coserotondo · 2024-12-09T15:23:57Z

Sure. I was long time user of Stephan Joubert's integration and just recently migrated to your integration.
Honestly, I didn't know if I had CRC errors in the past, but I remember not having the sensors in unavailable state.

davidrapan · 2024-12-09T15:28:35Z

What version it was? Latest or the one before?

If I remember correctly there is update interval 30 seconds, did you tried to increase it to update_interval: 30 in your custom yaml file?

coserotondo · 2024-12-09T16:19:22Z

It was the latest, yes 30 seconds. I'm trying now.

githubDante · 2024-12-09T23:11:24Z

It looks like overlapping. @coserotondo do you have another client querying the inverter at the same time/same interval?!?

coserotondo · 2024-12-10T05:06:09Z

I was also thinking about this @githubDante. I’ve a HA dev instance where I tested the upgrade from Stephan Joubert's integration but it was on for few seconds, just to check the data collection. The inverter is also configured (by default) to send the data over the internet to the solarmanpv servers every 5 minutes but the disconnections seems to be more frequent. Attached the settings in config_hide.html of the stick.

davidrapan · 2024-12-10T08:00:08Z

That would be something way more frequent than solarman servers.

PaulOckenden · 2024-12-12T09:37:28Z

I see this too. Both of my ME3000SPs go ‘unavailable’ for all parameters about once a minute, with the values returning a second or two later. It’s been like this for a long time, but seems worse now than it has been previously.

It only really affects me if one of my automations tries to read, say, battery SOC and if it coincides with an unavailable period the automation fails.

PaulOckenden · 2024-12-12T09:38:57Z

Do you clear the values before writing the new data, rather than just overwriting the old data with the new?

coserotondo · 2024-12-12T10:15:16Z

Do you clear the values before writing the new data, rather than just overwriting the old data with the new?

I remember with the old integration, although maybe I had CRC errors, I never found the sensors in unavailable state.

PaulOckenden · 2024-12-12T10:31:52Z

This is how one of my parameters changes over a ten minute period, but all parameters do this.

The main thing I take from this is that it isn’t random (which might imply poor connection, for example). It’s extremely regular.

PaulOckenden · 2024-12-12T10:41:46Z

I'm also seeing this in my logs:

This error originated from a custom integration.

Logger: custom_components.solarman.coordinator
Source: helpers/update_coordinator.py:411
integration: Solarman (documentation, issues)
First occurred: 11 December 2024 at 09:34:51 (3466 occurrences)
Last logged: 10:38:59

Error fetching Battery2 data: V5 Modbus EXCEPTION: AcknowledgeError
Error fetching Battery1 data: V5 Modbus EXCEPTION: AcknowledgeError
Error fetching Battery2 data: unpack requires a buffer of 24 bytes

davidrapan · 2024-12-12T14:20:06Z

Error fetching Battery2 data: unpack requires a buffer of 24 bytes

This could indicate overlapping of requests.

githubDante · 2024-12-12T14:44:49Z

Can any of you try this proxy to see if it will improve the situation. Instructions how to be used are available in the README. Just make sure to start it in -buffered mode. It's tested with simultaneous access from 5 clients and no errors were reported from any of them (the inverter used for the test is Deye though).

coserotondo · 2024-12-12T19:01:05Z

Trying now. Still having CRC errors. and so disconnections (from the proxy). I'm leaving the integration in debug. See tomorrow.

From the proxy:

[2024-12-12 18:49:40.447] [WARN] - Client [192.168.200.2:41542] will use serial number <***>
[2024-12-12 18:55:37.296] [ERROR] - Client read error: EOF

@githubDante what is Client read error: EOF?

githubDante · 2024-12-13T10:17:00Z

@githubDante what is Client read error: EOF?

EOF is a connection closed by the client/remote side. This is not an actual error, but is reported as such on read. It requires extra check to be explicitly skipped.

PaulOckenden · 2024-12-13T12:00:11Z

Error fetching Battery2 data: unpack requires a buffer of 24 bytes

This could indicate overlapping of requests.

Because I have two inverters? Any way to stop that happening?

I guess first step is to disable one of them in HA and see whether the problem persists.

coserotondo · 2024-12-13T13:30:03Z

@githubDante what is Client read error: EOF?

EOF is a connection closed by the client/remote side. This is not an actual error, but is reported as such on read. It requires extra check to be explicitly skipped.

So any suggestion to improve the situation? Better stay with the proxy? @PaulOckenden did you tried the proxy?

PaulOckenden · 2024-12-13T14:12:21Z

I didn’t try the proxy - sorry. It struck me as one more thing to maintain, and another possible point of failure. I thought it would be better to help get the integration working properly, like the old version used to.

davidrapan · 2024-12-13T14:55:20Z

I didn’t try the proxy - sorry. It struck me as one more thing to maintain, and another possible point of failure. I thought it would be better to help get the integration working properly, like the old version used to.

Trying out the proxy wasn't about getting it working, but about debugging the issues you're having. 😆

coserotondo · 2024-12-13T15:07:24Z

I didn’t try the proxy - sorry. It struck me as one more thing to maintain, and another possible point of failure. I thought it would be better to help get the integration working properly, like the old version used to.

Trying out the proxy wasn't about getting it working, but about debugging the issues you're having. 😆

Of course. So, do you need more info before I'll switch back (without proxy).

githubDante · 2024-12-13T18:29:54Z

It is unclear why the inverter returns these zeroed frames, but their CRC is definitely wrong/invalid. If the proxy doesn't help try to increase/change the query interval.

Edit: Looking at the log again, this inverter seems slow in its responses, 400-500ms for a few registers when the read is OK and up to a second when the CRC error occurs. It's obviously busy with something, but I really have no idea what. The only reasonable explanation is data upload to the Solarman platform, but this is usually a relatively small packet, the data for which should not be problem at all.

githubDante · 2024-12-13T18:54:03Z

What about this:

2024-12-09 08:32:04.944 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] Querying 04 ~ 8192 - 8203 | 0x2000 - 0x200B # 012 ...
2024-12-09 08:32:04.944 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] SENT: a5 17 00 10 45 cf 00 a1 f6 51 30 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 04 20 00 00 0c fb cf 50 15
2024-12-09 08:32:07.245 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] RECD: a5 2b 00 10 15 cf ec a1 f6 51 30 02 01 8e 24 38 08 66 5b 00 00 68 78 1e 5f 01 04 18 00 00 5a 45 31 45 53 33 33 30 4b 32 55 34 36 34 56 31 39 30 56 31 30 30 3f ee c5 15
2024-12-09 08:32:07.246 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] Querying 04 ~ 8192 - 8203 | 0x2000 - 0x200B # 012 succeeded.

2.5 seconds for 11 registers, and soon after that:

2024-12-09 08:33:13.939 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] Querying 04 ~ 4272 - 4284 | 0x10B0 - 0x10BC # 013 failed, attempts left: 1 [CRCError: CRC validation failed.]
2024-12-09 08:33:14.740 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] SENT: a5 17 00 10 45 f8 00 a1 f6 51 30 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 04 10 b0 00 0d 34 e8 6c 15
2024-12-09 08:33:14.778 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] RECD: a5 2d 00 10 15 f8 16 a1 f6 51 30 02 01 ce 24 38 08 a9 5b 00 00 6c 78 1e 5f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 12 15
2024-12-09 08:33:14.779 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] Querying 04 ~ 4272 - 4284 | 0x10B0 - 0x10BC # 013 failed, attempts left: 0, aborting. [CRCError: CRC validation failed.]

zeroed packet for under 40ms.

It seems that the request interrupted something and the logger is throwing away what is already in its buffer.

davidrapan · 2024-12-13T19:03:36Z

Yeah it's obvious that there is something wrong going on.

This integration treats every "not successful" request as if the device is not online for obvious reason (data accuracy over data availability). And it won't be changed nor workarounds (for issues that are clearly from the outside) added as the code contains way too many of those already for devices which do not follow standards.

You should focus on finding out what the hell is going on. Few hints:

Try to disconnect your device from solarman servers
Isolate your device to be 100% sure that HA is the only one who (and only using single connection) is talking to the device
Sniff and observe the traffic to/from the device using tcpdump (MITM)

PaulOckenden · 2024-12-15T10:36:04Z

I can't say that I really understand the logs, but wonder whether this is a clue:

It looks like there's two attempts to read this block, some starting at 4272 (which succeed) and some starting at 4273 (which fail). Maybe. I dunno - I'm out of my depth here...

davidrapan · 2024-12-15T12:01:29Z

Try change (sofar_wifikit.yaml, row 366):

    update_interval: 300

to

    update_interval: 5

PaulOckenden · 2024-12-15T22:01:55Z

Try change (sofar_wifikit.yaml, row 366):
    update_interval: 300
to
    update_interval: 5

Thanks David - that appears to have helped. I'd have thought more frequent updates would cause more frequent problems, but apparently not!

davidrapan · 2024-12-15T23:27:04Z

It looks like your device requires requests to specific register address ranges and this change just does exactly that so it's more about what and when than how often.

davidrapan · 2024-12-15T23:44:19Z

fix: ME3000SP V5FrameError and CRCError, try these changes as it also has 1 less request in the 0x2000 up range.

PaulOckenden · 2024-12-16T09:25:08Z

It looks like your device requires requests to specific register address ranges and this change just does exactly that so it's more about what and when than how often.

Ah. I assumed ‘update_interval’ would be about timing!

davidrapan · 2024-12-16T11:53:01Z

It is but in this case it's about timing that one register the same as others. 😉

PaulOckenden · 2024-12-16T12:33:35Z

Ah ha, that makes sense.

davidrapan · 2024-12-16T14:06:02Z

Can you show me 10 minute debug log now?

PaulOckenden · 2024-12-16T21:36:02Z

Here's a ten minute log file using the complete new wifikit.yaml (not just the 300 -> 5 change).

Note that I have two inverters, but I think that's obvious from the logs.

home-assistant_solarman_2024-12-16T21-30-20.071Z.log

davidrapan · 2024-12-16T23:04:59Z

Sure, just wanted to check if that interrupting error disappeared completely and it looks like it did. 😉

coserotondo · 2024-12-17T07:30:07Z

@davidrapan a quick question: after changing the inverter definition (custom) do I need to restart HA or an integration reload is enough?

davidrapan · 2024-12-17T07:34:14Z

Reload is enough.

PaulOckenden · 2024-12-17T10:11:22Z

Thanks for fixing this David. Are the BTC details shown in the Wallet link here still the best way to reward you for the time you've put into this integration?

davidrapan · 2024-12-17T14:46:13Z

Sure thing. 😉

Either that or by Github Sponsors by selecting One-time tier:

coserotondo added the bug Something isn't working label Dec 9, 2024

davidrapan removed the bug Something isn't working label Dec 10, 2024

davidrapan changed the title ~~Sofar ME300SP entities very often unavailable for few seconds.~~ Sofar ME300SP requests often result in V5FrameError (modbus exception 5) or CRCError Dec 13, 2024

davidrapan changed the title ~~Sofar ME300SP requests often result in V5FrameError (modbus exception 5) or CRCError~~ Sofar ME3000-SP requests often result in V5FrameError (modbus exception 5) or CRCError Dec 15, 2024

Sofar ME3000-SP requests often result in V5FrameError (modbus exception 5) or CRCError #307

Sofar ME3000-SP requests often result in V5FrameError (modbus exception 5) or CRCError #307

Comments

coserotondo commented Dec 9, 2024

Description

Reproduction Steps

Log

Version

Home Assistant Version

davidrapan commented Dec 9, 2024

coserotondo commented Dec 9, 2024

davidrapan commented Dec 9, 2024

coserotondo commented Dec 9, 2024

davidrapan commented Dec 9, 2024 • edited Loading

coserotondo commented Dec 9, 2024

davidrapan commented Dec 9, 2024

coserotondo commented Dec 9, 2024

davidrapan commented Dec 9, 2024

coserotondo commented Dec 9, 2024

davidrapan commented Dec 9, 2024

coserotondo commented Dec 9, 2024

githubDante commented Dec 9, 2024

coserotondo commented Dec 10, 2024 • edited Loading

davidrapan commented Dec 10, 2024

PaulOckenden commented Dec 12, 2024

PaulOckenden commented Dec 12, 2024

coserotondo commented Dec 12, 2024

PaulOckenden commented Dec 12, 2024 • edited Loading

PaulOckenden commented Dec 12, 2024 • edited Loading

davidrapan commented Dec 12, 2024

githubDante commented Dec 12, 2024

coserotondo commented Dec 12, 2024 • edited Loading

githubDante commented Dec 13, 2024

PaulOckenden commented Dec 13, 2024

coserotondo commented Dec 13, 2024 • edited Loading

PaulOckenden commented Dec 13, 2024

davidrapan commented Dec 13, 2024

coserotondo commented Dec 13, 2024

githubDante commented Dec 13, 2024 • edited Loading

githubDante commented Dec 13, 2024 • edited Loading

davidrapan commented Dec 13, 2024 • edited Loading

PaulOckenden commented Dec 15, 2024

davidrapan commented Dec 15, 2024 • edited Loading

PaulOckenden commented Dec 15, 2024

davidrapan commented Dec 15, 2024 • edited Loading

davidrapan commented Dec 15, 2024

PaulOckenden commented Dec 16, 2024

davidrapan commented Dec 16, 2024 • edited Loading

PaulOckenden commented Dec 16, 2024

davidrapan commented Dec 16, 2024

PaulOckenden commented Dec 16, 2024

davidrapan commented Dec 16, 2024

coserotondo commented Dec 17, 2024

davidrapan commented Dec 17, 2024

PaulOckenden commented Dec 17, 2024

davidrapan commented Dec 17, 2024

davidrapan commented Dec 9, 2024 •

edited

Loading

coserotondo commented Dec 10, 2024 •

edited

Loading

PaulOckenden commented Dec 12, 2024 •

edited

Loading

PaulOckenden commented Dec 12, 2024 •

edited

Loading

coserotondo commented Dec 12, 2024 •

edited

Loading

coserotondo commented Dec 13, 2024 •

edited

Loading

githubDante commented Dec 13, 2024 •

edited

Loading

githubDante commented Dec 13, 2024 •

edited

Loading

davidrapan commented Dec 13, 2024 •

edited

Loading

davidrapan commented Dec 15, 2024 •

edited

Loading

davidrapan commented Dec 15, 2024 •

edited

Loading

davidrapan commented Dec 16, 2024 •

edited

Loading