Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sofar ME3000-SP requests often result in V5FrameError (modbus exception 5) or CRCError #307

Open
coserotondo opened this issue Dec 9, 2024 · 47 comments

Comments

@coserotondo
Copy link

Description

Very often all entities become unavailable for few seconds. The inverter is "always on" as it is a storage inverter Sofar ME3000SP.
The definition file is sofar_wifikit.yaml. I tried to manually query the registers and it works without errors.

Reproduction Steps

No specific actions to reproduce, it happens regularly.

Log

home-assistant_solarman_2024-12-09T07-35-27.188Z.log.filterline.log

Version

24.12.06_4

Home Assistant Version

2024.11.3

@coserotondo coserotondo added the bug Something isn't working label Dec 9, 2024
@davidrapan
Copy link
Owner

Hi @coserotondo, it looks like your inverter from time to time replies w/ nonsense data... Did you tried to increase update_interval in your file?

@coserotondo
Copy link
Author

Hi @davidrapan, nope, I left the default 5.

I tried only with min_span to read few registers at a time but if too low it doesn't initialize.

What would be a reasonable update_interval?

@davidrapan
Copy link
Owner

I mean it should be good w/ 5. But try incrementing by 5 and observe if it has any impact!

@coserotondo
Copy link
Author

Trying 10.

Now I have: Error fetching FV data: V5 Modbus EXCEPTION: AcknowledgeError that never had before...

@davidrapan
Copy link
Owner

davidrapan commented Dec 9, 2024

It's just correctly parsed exception but it's the same as invalid frame.

@coserotondo
Copy link
Author

I see. Any changes to catch this retaining the previous state of the sensors (so without making them unavailable)?

@davidrapan
Copy link
Owner

What is stick's firmware version?

@githubDante, could you take a look when (and if) you have a moment?

@coserotondo
Copy link
Author

Stick firmware version: H4.01.51MW.2.01W1.0.72(2018-11-271-D)

@davidrapan
Copy link
Owner

Could you also give me some "background"? Has it started behaving badly now or are you a new user?

@coserotondo
Copy link
Author

Sure. I was long time user of Stephan Joubert's integration and just recently migrated to your integration.
Honestly, I didn't know if I had CRC errors in the past, but I remember not having the sensors in unavailable state.

@davidrapan
Copy link
Owner

What version it was? Latest or the one before?

If I remember correctly there is update interval 30 seconds, did you tried to increase it to update_interval: 30 in your custom yaml file?

@coserotondo
Copy link
Author

It was the latest, yes 30 seconds. I'm trying now.

@githubDante
Copy link

It looks like overlapping. @coserotondo do you have another client querying the inverter at the same time/same interval?!?

@coserotondo
Copy link
Author

coserotondo commented Dec 10, 2024

I was also thinking about this @githubDante. I’ve a HA dev instance where I tested the upgrade from Stephan Joubert's integration but it was on for few seconds, just to check the data collection. The inverter is also configured (by default) to send the data over the internet to the solarmanpv servers every 5 minutes but the disconnections seems to be more frequent. Attached the settings in config_hide.html of the stick.
screencapture-192-168-220-60-config-hide-html-2024-12-09-16_01_25

@davidrapan
Copy link
Owner

That would be something way more frequent than solarman servers.

@davidrapan davidrapan removed the bug Something isn't working label Dec 10, 2024
@PaulOckenden
Copy link

I see this too. Both of my ME3000SPs go ‘unavailable’ for all parameters about once a minute, with the values returning a second or two later. It’s been like this for a long time, but seems worse now than it has been previously.

It only really affects me if one of my automations tries to read, say, battery SOC and if it coincides with an unavailable period the automation fails.

@PaulOckenden
Copy link

Do you clear the values before writing the new data, rather than just overwriting the old data with the new?

@coserotondo
Copy link
Author

Do you clear the values before writing the new data, rather than just overwriting the old data with the new?

I remember with the old integration, although maybe I had CRC errors, I never found the sensors in unavailable state.

@PaulOckenden
Copy link

PaulOckenden commented Dec 12, 2024

This is how one of my parameters changes over a ten minute period, but all parameters do this.

Screenshot_2024-12-12 10 30 26_u7CQwG

The main thing I take from this is that it isn’t random (which might imply poor connection, for example). It’s extremely regular.

@PaulOckenden
Copy link

PaulOckenden commented Dec 12, 2024

I'm also seeing this in my logs:

This error originated from a custom integration.

Logger: custom_components.solarman.coordinator
Source: helpers/update_coordinator.py:411
integration: Solarman (documentation, issues)
First occurred: 11 December 2024 at 09:34:51 (3466 occurrences)
Last logged: 10:38:59

Error fetching Battery2 data: V5 Modbus EXCEPTION: AcknowledgeError
Error fetching Battery1 data: V5 Modbus EXCEPTION: AcknowledgeError
Error fetching Battery2 data: unpack requires a buffer of 24 bytes

@davidrapan
Copy link
Owner

Error fetching Battery2 data: unpack requires a buffer of 24 bytes

This could indicate overlapping of requests.

@githubDante
Copy link

Can any of you try this proxy to see if it will improve the situation. Instructions how to be used are available in the README. Just make sure to start it in -buffered mode. It's tested with simultaneous access from 5 clients and no errors were reported from any of them (the inverter used for the test is Deye though).

@coserotondo
Copy link
Author

coserotondo commented Dec 12, 2024

Trying now. Still having CRC errors. and so disconnections (from the proxy). I'm leaving the integration in debug. See tomorrow.

From the proxy:

[2024-12-12 18:49:40.447] [WARN] - Client [192.168.200.2:41542] will use serial number <***>
[2024-12-12 18:55:37.296] [ERROR] - Client read error: EOF

@githubDante what is Client read error: EOF?

@githubDante
Copy link

@githubDante what is Client read error: EOF?

EOF is a connection closed by the client/remote side. This is not an actual error, but is reported as such on read. It requires extra check to be explicitly skipped.

@PaulOckenden
Copy link

Error fetching Battery2 data: unpack requires a buffer of 24 bytes

This could indicate overlapping of requests.

Because I have two inverters? Any way to stop that happening?

I guess first step is to disable one of them in HA and see whether the problem persists.

@coserotondo
Copy link
Author

coserotondo commented Dec 13, 2024

@githubDante what is Client read error: EOF?

EOF is a connection closed by the client/remote side. This is not an actual error, but is reported as such on read. It requires extra check to be explicitly skipped.

So any suggestion to improve the situation? Better stay with the proxy? @PaulOckenden did you tried the proxy?

@PaulOckenden
Copy link

I didn’t try the proxy - sorry. It struck me as one more thing to maintain, and another possible point of failure. I thought it would be better to help get the integration working properly, like the old version used to.

@davidrapan
Copy link
Owner

I didn’t try the proxy - sorry. It struck me as one more thing to maintain, and another possible point of failure. I thought it would be better to help get the integration working properly, like the old version used to.

Trying out the proxy wasn't about getting it working, but about debugging the issues you're having. 😆

@coserotondo
Copy link
Author

I didn’t try the proxy - sorry. It struck me as one more thing to maintain, and another possible point of failure. I thought it would be better to help get the integration working properly, like the old version used to.

Trying out the proxy wasn't about getting it working, but about debugging the issues you're having. 😆

Of course. So, do you need more info before I'll switch back (without proxy).

@githubDante
Copy link

githubDante commented Dec 13, 2024

It is unclear why the inverter returns these zeroed frames, but their CRC is definitely wrong/invalid. If the proxy doesn't help try to increase/change the query interval.

Edit: Looking at the log again, this inverter seems slow in its responses, 400-500ms for a few registers when the read is OK and up to a second when the CRC error occurs. It's obviously busy with something, but I really have no idea what. The only reasonable explanation is data upload to the Solarman platform, but this is usually a relatively small packet, the data for which should not be problem at all.

@githubDante
Copy link

githubDante commented Dec 13, 2024

What about this:

2024-12-09 08:32:04.944 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] Querying 04 ~ 8192 - 8203 | 0x2000 - 0x200B # 012 ...
2024-12-09 08:32:04.944 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] SENT: a5 17 00 10 45 cf 00 a1 f6 51 30 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 04 20 00 00 0c fb cf 50 15
2024-12-09 08:32:07.245 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] RECD: a5 2b 00 10 15 cf ec a1 f6 51 30 02 01 8e 24 38 08 66 5b 00 00 68 78 1e 5f 01 04 18 00 00 5a 45 31 45 53 33 33 30 4b 32 55 34 36 34 56 31 39 30 56 31 30 30 3f ee c5 15
2024-12-09 08:32:07.246 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] Querying 04 ~ 8192 - 8203 | 0x2000 - 0x200B # 012 succeeded.

2.5 seconds for 11 registers, and soon after that:

2024-12-09 08:33:13.939 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] Querying 04 ~ 4272 - 4284 | 0x10B0 - 0x10BC # 013 failed, attempts left: 1 [CRCError: CRC validation failed.]
2024-12-09 08:33:14.740 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] SENT: a5 17 00 10 45 f8 00 a1 f6 51 30 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 04 10 b0 00 0d 34 e8 6c 15
2024-12-09 08:33:14.778 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] RECD: a5 2d 00 10 15 f8 16 a1 f6 51 30 02 01 ce 24 38 08 a9 5b 00 00 6c 78 1e 5f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 12 15
2024-12-09 08:33:14.779 DEBUG (MainThread) [custom_components.solarman.api] [*REDACTED*] Querying 04 ~ 4272 - 4284 | 0x10B0 - 0x10BC # 013 failed, attempts left: 0, aborting. [CRCError: CRC validation failed.]

zeroed packet for under 40ms.

It seems that the request interrupted something and the logger is throwing away what is already in its buffer.

@davidrapan
Copy link
Owner

davidrapan commented Dec 13, 2024

Yeah it's obvious that there is something wrong going on.

This integration treats every "not successful" request as if the device is not online for obvious reason (data accuracy over data availability). And it won't be changed nor workarounds (for issues that are clearly from the outside) added as the code contains way too many of those already for devices which do not follow standards.

You should focus on finding out what the hell is going on. Few hints:

  • Try to disconnect your device from solarman servers
  • Isolate your device to be 100% sure that HA is the only one who (and only using single connection) is talking to the device
  • Sniff and observe the traffic to/from the device using tcpdump (MITM)

@davidrapan davidrapan changed the title Sofar ME300SP entities very often unavailable for few seconds. Sofar ME300SP requests often result in V5FrameError (modbus exception 5) or CRCError Dec 13, 2024
@PaulOckenden
Copy link

I can't say that I really understand the logs, but wonder whether this is a clue:
Screenshot_2024-12-15 10 31 59_okh46O
It looks like there's two attempts to read this block, some starting at 4272 (which succeed) and some starting at 4273 (which fail). Maybe. I dunno - I'm out of my depth here...

@davidrapan
Copy link
Owner

davidrapan commented Dec 15, 2024

Try change (sofar_wifikit.yaml, row 366):

    update_interval: 300

to

    update_interval: 5

@PaulOckenden
Copy link

Try change (sofar_wifikit.yaml, row 366):

    update_interval: 300

to

    update_interval: 5

Thanks David - that appears to have helped. I'd have thought more frequent updates would cause more frequent problems, but apparently not!

@davidrapan
Copy link
Owner

davidrapan commented Dec 15, 2024

It looks like your device requires requests to specific register address ranges and this change just does exactly that so it's more about what and when than how often.

@davidrapan davidrapan changed the title Sofar ME300SP requests often result in V5FrameError (modbus exception 5) or CRCError Sofar ME3000-SP requests often result in V5FrameError (modbus exception 5) or CRCError Dec 15, 2024
@davidrapan
Copy link
Owner

fix: ME3000SP V5FrameError and CRCError, try these changes as it also has 1 less request in the 0x2000 up range.

@PaulOckenden
Copy link

It looks like your device requires requests to specific register address ranges and this change just does exactly that so it's more about what and when than how often.

Ah. I assumed ‘update_interval’ would be about timing!

@davidrapan
Copy link
Owner

davidrapan commented Dec 16, 2024

It is but in this case it's about timing that one register the same as others. 😉

@PaulOckenden
Copy link

Ah ha, that makes sense.

@davidrapan
Copy link
Owner

Can you show me 10 minute debug log now?

@PaulOckenden
Copy link

Here's a ten minute log file using the complete new wifikit.yaml (not just the 300 -> 5 change).

Note that I have two inverters, but I think that's obvious from the logs.

home-assistant_solarman_2024-12-16T21-30-20.071Z.log

@davidrapan
Copy link
Owner

Sure, just wanted to check if that interrupting error disappeared completely and it looks like it did. 😉

@coserotondo
Copy link
Author

@davidrapan a quick question: after changing the inverter definition (custom) do I need to restart HA or an integration reload is enough?

@davidrapan
Copy link
Owner

Reload is enough.

@PaulOckenden
Copy link

Thanks for fixing this David. Are the BTC details shown in the Wallet link here still the best way to reward you for the time you've put into this integration?

@davidrapan
Copy link
Owner

Sure thing. 😉

Either that or by Github Sponsors by selecting One-time tier:
github_sponsors_select_tier_one-time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants