Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Home Assistant hangs when the inverters aren't available #344

Open
lenwar opened this issue Dec 24, 2024 · 70 comments
Open

Home Assistant hangs when the inverters aren't available #344

lenwar opened this issue Dec 24, 2024 · 70 comments
Labels
bug Something isn't working

Comments

@lenwar
Copy link

lenwar commented Dec 24, 2024

Description

When using the ha-solarman integration and my micro-inverters aren't available, my entire Home Assistant instance 'hangs' periodically (automations don't run, the web interface hangs and so on). I'm not sure to troubleshoot this, as there is no specific logging.

What I notice is that everything hangs when the ha-solarman integration tries to connect to the inverters that are switched off. When it disconnects, everything works fine. (which is about 4 seconds before it tries to reconnect again) So the 4 seconds between the 5 top entries and the 5 bottom entries in the logging below, everything is fine.)

So when the last log-lines say 'Connecting', then everything hangs.

I'm using 5 dual-port Solarman/Deye micro inverters using the 2MPPT profile.
Firmware: MW3_16U_5408_5.0C-S

I made a similar bug-report some time ago, but I didn't have the opportunity to respond and troubleshoot, so I closed the issue. The behaviour was different back then
#101

Reproduction Steps

Have the ha-solarman integration running when my micro-inverters aren't available. (due to the sun being under for example)

Log

2024-12-24 16:22:22.113 INFO (MainThread) [custom_components.solarman.api] [3903803151] Disconnecting from 192.168.60.xx:8899
2024-12-24 16:22:22.206 INFO (MainThread) [custom_components.solarman.api] [3903253142] Disconnecting from 192.168.60.xx:8899
2024-12-24 16:22:22.207 INFO (MainThread) [custom_components.solarman.api] [3903232270] Disconnecting from 192.168.60.xx:8899
2024-12-24 16:22:22.207 INFO (MainThread) [custom_components.solarman.api] [3903123148] Disconnecting from 192.168.60.xx:8899
2024-12-24 16:22:22.207 INFO (MainThread) [custom_components.solarman.api] [4185689303] Disconnecting from 192.168.60.xx:8899

2024-12-24 16:22:26.483 INFO (MainThread) [custom_components.solarman.api] [3903253142] Connecting to 192.168.60.xx:8899
2024-12-24 16:22:26.603 INFO (MainThread) [custom_components.solarman.api] [3903232270] Connecting to 192.168.60.xx:8899
2024-12-24 16:22:26.690 INFO (MainThread) [custom_components.solarman.api] [3903123148] Connecting to 192.168.60.xx:8899
2024-12-24 16:22:26.768 INFO (MainThread) [custom_components.solarman.api] [3903803151] Connecting to 192.168.60.xx:8899
2024-12-24 16:22:26.770 INFO (MainThread) [custom_components.solarman.api] [4185689303] Connecting to 192.168.60.xx:8899

Version

v24.12.22

Home Assistant Version

2024.12.5

@lenwar lenwar added the bug Something isn't working label Dec 24, 2024
@J-Smits
Copy link

J-Smits commented Dec 24, 2024

I got the same issue, when inverters are disconnected, HA gets slow and i got reconnects from the website.

As a workaround i disabled the integrations and HA is running correct again.

I have a deye sun1600G3-230 with the 4mppt profile connected and 3x the sun800G3-230 with the 2mppt profile.

I don't have any logging yet from the issue, but just want to tell you are not the only one with this behavior :-) when the inverters are going offline, i noticed the issue yesterday after upgrading to the v24.12.22 version

@davidrapan
Copy link
Owner

Yeah, that's unfortunate, but I'm not able to replicate your problem, and since there are many users with microinverters, it must be in combination with something else, or there must be more to it than just running the integration.

You can for example try to run clean install of HA and just this integration added as a starting point.

What is the type of the installation of your HA anyway?

@J-Smits
Copy link

J-Smits commented Dec 25, 2024

I created a new HA in Proxmox with the following setup ( see scrip webpage )

https://community-scripts.github.io/ProxmoxVE/scripts?id=haos-vm

and i deleted all default integrations and only installed this solarman integration with my 4 inverters, i will check tonight when they are offline the status, i also enabled the debug logging

fingers crossed :-)

@davidrapan
Copy link
Owner

davidrapan commented Dec 25, 2024

I'm also running haos but just in qemu (which shouldn't make any difference) and I think it's the most reliable way. 😉

BTW run it with debug logging enabled nonstop so we have as much data as possible.

@Skarabaen
Copy link

Skarabaen commented Dec 25, 2024

Hello,
I have the same issue with Home Assistant (in proxmox). As soon as the deye is not online anymore (no sun) HA need a long time to show any menu. As soon as the Solarman addon is deactivated everything is normal.

Attached please find some debug logging.

FYI:
I remove the Inverter ID with "INVERTER_ID". My Inverter is a BosswerkMI600 (rebrand of Deye SUN600G3-EU-230)
The Addon try to use IP 172.30.232.0 but this IP is not used from my side.
2024-12-25 17:04:32.357 DEBUG (MainThread) [custom_components.solarman.discovery] _discover_all: Broadcasting on [IPv4Network('192.168.1.0/24'), IPv4Network('172.30.232.0/23'), IPv4Network('172.30.32.0/23')]
In Proxmox I can`t see any high CPU usage of Home Assistant if the slowdown happens.

I hope this will help you.

home-assistant_solarman_2024-12-25T16-08-53.112Z.log

@davidrapan
Copy link
Owner

The Addon try to use IP 172.30.232.0 but this IP is not used from my side.

It's used by HA for docker.

@davidrapan
Copy link
Owner

Another thing to try:

  • Add fake device with made up ip and serial number, restart HA and observe behavior.

@radektkacik
Copy link

I am facing also problem when inverters go offline. I have DEYE M80G3 and M80G4.
solarman.txt

@davidrapan
Copy link
Owner

davidrapan commented Dec 25, 2024

I am facing also problem when inverters go offline. I have DEYE M80G3 and M80G4.

This exception is to be "expected" when is the device unreachable.

@davidrapan
Copy link
Owner

davidrapan commented Dec 25, 2024

@radektkacik, it's a stretch but changes in feat: Improve inverter load method exception resolution could potentially help.

@radektkacik
Copy link

@radektkacik, it's a stretch but changes in feat: Improve inverter load method exception resolution could potentially help.

Unfortunately it didn't help. I think there must be some synchronous call which hangs whole HA for a while. After disable integration it behaves normally.

@davidrapan davidrapan changed the title Home Assistant hangs when the inverters aren't available. Home Assistant hangs when the inverters aren't available Dec 25, 2024
@davidrapan
Copy link
Owner

davidrapan commented Dec 25, 2024

Unfortunately it didn't help. I think there must be some synchronous call which hangs whole HA for a while. After disable integration it behaves normally.

This would mean it would also happen w/ my installation too which is not the case.

Did at least the exception change?

@radektkacik
Copy link

Unfortunately it didn't help. I think there must be some synchronous call which hangs whole HA for a while. After disable integration it behaves normally.

This would mean it would also happen w/ my installation too which is not the case.

Maybe you can try add entities on the main dashboard, and also make automation which will use entities of the inverter. I can confirm the problem is the same as lenwar described.

@davidrapan
Copy link
Owner

Maybe you can try add entities on the main dashboard, and also make automation which will use entities of the inverter. I can confirm the problem is the same as lenwar described.

Not sure what do you mean by that. Even so it can't affect the integration in any way of course I'm using entities on my dashboards and have plenty of automations which do interact w/ inverter (some of which even starts alongside every HA start).

@lenwar
Copy link
Author

lenwar commented Dec 25, 2024

I have template sensors that use the entities. Maybe that is something?

@davidrapan
Copy link
Owner

Me too and again this just can't in any way affect the integration.

@radektkacik
Copy link

Maybe you can try add entities on the main dashboard, and also make automation which will use entities of the inverter. I can confirm the problem is the same as lenwar described.

Not sure what do you mean by that. Even so it can't affect the integration in any way of course I'm using entities on my dashboards and have plenty of automations which do interact w/ inverter (some of which even starts alongside every HA start).

Maybe it is related specifically to microinverters only ? Just guessing.
How are you trying to reproduce this issue ? By real inverter which goes offline or by fake one ?

@lenwar
Copy link
Author

lenwar commented Dec 25, 2024

Me too and again this just can't in any way affect the integration.

My idea was that the integration sets the entities on ‘available’, but it won’t actually fill them or lock them somehow, which causes it to hang.

I haven’t checked anything, but that was something that came in to mind. (I have no idea how the internals of the template sensors work, so maybe that behaviour is covered)

@davidrapan
Copy link
Owner

Maybe it is related specifically to microinverters only?

The integration has no idea what the device on the other end is until it successfully reads some modbus registers.

How are you trying to reproduce this issue ? By real inverter which goes offline or by fake one?

Both.

My idea was that the integration sets the entities on ‘available’, but it won’t actually fill them or lock them somehow, which causes it to hang.

There are no entities until after successful connection.

@davidrapan
Copy link
Owner

davidrapan commented Dec 25, 2024

In your case everything what is called/processed is inside this method

await coordinator.async_config_entry_first_refresh()

and any entities are created only if this action is successful.

This method inside calls (and depends on successful call) of

async def load(self):

which tries to discover and connect to your devices and read registers 0-22.

@Skarabaen
Copy link

Maybe it is something with Home Assistant itself? I can`t see any big CPU load when it happens at my system, but it took like 10 seconds before the new page loads if I click something in HA. It feels like a waiting for a timeout. Also if I click on a sensor the graph will load after ~10 seconds.

My setup: (Proxmox) 2 vCPU cores with 8 GB RAM and HA OS

@radektkacik
Copy link

I can confirm the same behaviour as Skarabaen described. HA waits for something in case of unavailable inverter.
I am using proxmox too on mini PC with Intel N5105.

@J-Smits
Copy link

J-Smits commented Dec 26, 2024

Attached the debug log file, this is from a fresh HA installation with only the Solarman integration added with 4 deye microinverters.

At about 16.21 the inverter(s) got offline because of the darkness, then you can see the disconnects in the error log.

At that moment HA gets slow with loading the pages and i got reconnect messages from the HA webpage

home-assistant_solarman_2024-12-25T16-13-16.155Z.zip

(because the txt file is 28 mb big, i needed to zip it )

@J-Smits
Copy link

J-Smits commented Dec 26, 2024

I don't know what more debug logs i need to enable in HA itself, but i have this test instance still runing, so just tell me if more is needed then i can collect the logs if needed

@lenwar
Copy link
Author

lenwar commented Dec 26, 2024

Would it be useful to maybe write out specific aspects of our network setups? Even though they seem or are irrelevant for the behaviour? Possibly, it's a combination of network topology/hardware/setups?

For myself:

  • Running HAOS on an Odroid N2+ - cabled.
  • 5 microinverters. ( Deye SUN1000G3-EU-230 )
  • Running the latest/current stable versions of everything.
    -- OS 14.1
    -- Supervisor 2024.12.0
    -- Core 2024.12.5
  • My network is based on Unifi. My Home Assistant is in another vlan than my Solar Inverters.
  • I use the nignx reverse-proxy addon (required for my doorbell-camera if I want to keep using https, which I do)
  • The name servers in my network are AdGuard Home instances.
  • IPv6 is enabled.

I don't see how the items above could be relevant or explain the behaviour, but maybe we can find something.
Personally, I don't see how all could be relevant, but perhaps we can find some correlation?

@Skarabaen
Copy link

Skarabaen commented Dec 26, 2024

I don`t think it is something with the network or hardware, because as soon as the inverters have enough sunlight to power on there is no issue. As soon as the sun goes down = inverters go offline the issue starts. So in my opinion it is something with the plugin or interface to HA. Can we put something in the code like ping device and if it is available then use the code, otherwise wait x seconds to ping again and so on?

A question about the profile. it is set to "auto" at my setup. How can I see which profile the addon is using? I now set it to "deye_micro.yaml" for my MI600 Inverter. Under additional options there was also number of MPPT set to 4 (I moved it now to 2 MPPT) and number of Phase was set to 3 and i moved it to 1. Maybe this will help.

@davidrapan
Copy link
Owner

So in that case it has to happen w/ fake inverter too, so? Did someone test it?

@radektkacik
Copy link

Maybe some library is causing it (modbus..) ?

@davidrapan
Copy link
Owner

I then made the changes you suggested. (removal of discovery() and commented out the two lines).

You need to remove the await too.

@lenwar
Copy link
Author

lenwar commented Dec 29, 2024

I removed the await this afternoon.

The sun is down, and everything is fine, now!
I also restarted HA and I’m getting the expected startup-errors (as it can’t find my inverters)

Everything keeps working snappy.

I’ll check in tomorrow when the inverters are back up again.

@iwannatalk
Copy link

I got the same issue once when inverter wasn't unreachable due to one mesh instance was off so there was no connection. HA freezed, started to work only after restart.

@Skarabaen
Copy link

It seems to work now. Install the currently latest available version 24.12.22 and change the api.py and discover.py with the version of the repo. No lags anymore.

@lenwar
Copy link
Author

lenwar commented Dec 30, 2024

Hi,

A few minutes ago, my last inverter came back online. From my angle, this resolves the issue.

For completeness:
I installed the 24.12.22 version. This version causes issues.

I replaced api.py and discover.py with the current 'Main' version.
This made the interface okay (Negligible delays. Probably only visible because I was focussed on it. My family didn't notice.) But there were still some delays in automations. (so, when pressing wall-buttons and so on. A little smaller than half a second, I think, but very visible.)

By removing the 'await' and 'discover()' phrases and commenting out line 110 and 111, everything is good. The interface is snappy (for as much as we can expect from the HA-interface ;) ) and the wall-buttons respond 'instantly'.

@iwannatalk
Copy link

iwannatalk commented Dec 30, 2024

It seems to work now. Install the currently latest available version 24.12.22 and change the api.py and discover.py with the version of the repo. No lags anymore.

I have not very good wifi signal at inverter. After this change couple of hours ago i had sensors not available for 9 mins and than another time for 11mins. But its better than HA freeze 🙂

Screenshot_20241230_124225_Home Assistant

@davidrapan
Copy link
Owner

Hi @lenwar, can you please test for me which on of those edits has bigger impact?

@lenwar
Copy link
Author

lenwar commented Jan 1, 2025

Hi,

I was able to test in the evening (restarting Home Assistant with the inverters offline)
Both line 52 and lines 110/111 seem to cause the same behaviour. If either one is 'standard', then it causes noticeable delays. But only while the Integrations are 'Initialising'. When they are all 'red' in the GUI, then everything is fine.

If you like, I can also test tomorrow when the Integrations are running and try to firewall my inverters. (so that HA can't connect to them any more, to mimic the behaviour of them going offline after sundown)

@davidrapan
Copy link
Owner

Maybe you could try to measure runtime of both code blocks? So we can distil specific parts of the code which are causing the delays.

@lenwar
Copy link
Author

lenwar commented Jan 1, 2025

I would like to try, but I wouldn’t have a clue on how to do that. Could you provide info on that?

@Thiiib
Copy link

Thiiib commented Jan 4, 2025

Hi all

I just just installed it, I had the other integration… without any uninstall or other… it blocked my HA with big lags and reboot.

I needed to delete the configuration of my micro inverter that looks to have been transfered here. As they are offline maybe its the reason …

hope it will works tomorrow for first configuration and will run when they will be offline but « known ».

I didnt get this problem with other integration… but came here for up to date support 😉 and looking for integration of smartmeter DDZY422-D2

@gedger
Copy link

gedger commented Jan 5, 2025

I have got into the habit of disabling this integration if I need to restart HA whilst the inverter is offline and then manually enabling the next day. I haven't noticed the slow down but I do get a huge log output if I forget. Fingers crossed for #72.

@Thiiib
Copy link

Thiiib commented Jan 5, 2025

I have got into the habit of disabling this integration if I need to restart HA whilst the inverter is offline and then manually enabling the next day. I haven't noticed the slow down but I do get a huge log output if I forget. Fingers crossed for #72.

I will try to think about it before restarting… begining in HA I restart very often 😂

@lenwar
Copy link
Author

lenwar commented Jan 5, 2025

2025-01-05 15:18:14.632 DEBUG (MainThread) [custom_components.solarman.discovery] discover  <-- FROM HERE
2025-01-05 15:18:14.632 DEBUG (MainThread) [custom_components.solarman.discovery] _discover: Broadcasting on 192.168.60.41
2025-01-05 15:18:14.632 DEBUG (MainThread) [custom_components.solarman.discovery] _discover
2025-01-05 15:18:15.635 DEBUG (MainThread) [custom_components.solarman.discovery] _discover_all   <-- TO HERE
2025-01-05 15:18:15.635 DEBUG (MainThread) [custom_components.solarman.discovery] _discover_all: Broadcasting on [IPv4Network('192.168.60.16/29'), IPv4Network('172.30.232.0/23'), IPv4Network('172.30.32.0/23')]
2025-01-05 15:18:15.635 DEBUG (MainThread) [custom_components.solarman.discovery] _discover  
2025-01-05 15:18:16.637 DEBUG (MainThread) [custom_components.solarman.discovery] discover: attempts left: 0, aborting.

I was able to perform some tests with debugging on.
The above is the moment it starts to hang. Specially between the FROM HERE to TO HERE. (so everything seems to 'stall' up to the point where 'discovery_all' is being executed.
I tested it with only one integration active. The delay was very minimal then. But I have 5 (micro)inverters, and the delay seems to stack by a lot. (from 'barely noticeable' with one inverter-integration active to 'annoyingly visible' with 5 inverter-integrations active.)

What exactly does 'broadcasting on ' do? (network-wise)

I'm asking because my inverters are in another subnet (as you can see later on, in the logging. My inverter is on 192.168.60.41 and my Home Assistant is in 192.168.60.16/29, which is another subnet.) I only have TCP ports 80 and 8899 open towards the subnet of my inverters.

When I look at the logging. Is it correct that the integrations always wait for one another? (They don't seem to work completely parallel to one-another. But I may be mistaking. I thought I noticed that by how they write their debug-logging towards the home-assistant log file.)

@gedger
Copy link

gedger commented Jan 5, 2025

I'm not sure if I should post this here or start a new issue but with the current 24.12.14 variant it no longer handles the inverter going off line cleanly, in fact it's the worst it's ever been!

IMG_0314

you can see from the message in logs the issue 3420 times!! It's not happy for an event that happens every day. I dare not turn on debug logging as I don't think my server will cope with the volume.

This error originated from a custom integration.

Logger: custom_components.solarman.discovery
Source: custom_components/solarman/discovery.py:40
integration: Solarman (documentation, issues)
First occurred: 16:29:54 (3420 occurrences)
Last logged: 17:18:52

_discover: OSError: [Errno 126] Required key not available
Traceback (most recent call last):
  File "/usr/local/lib/python3.13/asyncio/tasks.py", line 507, in wait_for
    return await fut
           ^^^^^^^^^
  File "/usr/local/lib/python3.13/asyncio/queues.py", line 186, in get
    await getter
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/config/custom_components/solarman/api.py", line 101, in try_read_write
    if (response := await self.read_write(code, start, arg)) and (length := ilen(response)) is None and (expected := arg if code < CODE.WRITE_SINGLE_COIL else 1) and length != expected:
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solarman/api.py", line 78, in read_write
    return await self.modbus.read_holding_registers(start, arg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solarman/include/pysolarmanv5/pysolarman.py", line 87, in read_holding_registers
    return await super().read_holding_registers(register_addr, quantity)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solarman/include/pysolarmanv5/pysolarmanv5_async.py", line 342, in read_holding_registers
    modbus_values = await self._get_modbus_response(mb_request_frame)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solarman/include/pysolarmanv5/pysolarmanv5_async.py", line 298, in _get_modbus_response
    mb_response_frame = await self._send_receive_modbus_frame(mb_request_frame)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solarman/include/pysolarmanv5/pysolarmanv5_async.py", line 285, in _send_receive_modbus_frame
    v5_response_frame = await self._send_receive_v5_frame(v5_request_frame)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solarman/include/pysolarmanv5/pysolarmanv5_async.py", line 249, in _send_receive_v5_frame
    v5_response = await asyncio.wait_for(
                  ^^^^^^^^^^^^^^^^^^^^^^^
        self.data_queue.get(), self.socket_timeout
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/local/lib/python3.13/asyncio/tasks.py", line 506, in wait_for
    async with timeouts.timeout(timeout):
               ~~~~~~~~~~~~~~~~^^^^^^^^^
  File "/usr/local/lib/python3.13/asyncio/timeouts.py", line 116, in __aexit__
    raise TimeoutError from exc_val
TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/config/custom_components/solarman/discovery.py", line 40, in _discover
    await loop.sock_sendto(sock, DISCOVERY_MESSAGE[0], (ip, DISCOVERY_PORT))
  File "/usr/local/lib/python3.13/asyncio/selector_events.py", line 592, in sock_sendto
    return sock.sendto(data, address)
           ~~~~~~~~~~~^^^^^^^^^^^^^^^
OSError: [Errno 126] Required key not available

@lenwar
Copy link
Author

lenwar commented Jan 5, 2025

Hi,
I've been going through the Discovery.py file.
I'm not very familiar with Python, but is it correct that the discovery is done via a UDP-broadcast towards a set ip-address?
(I'm not entirely sure how to interpret the code.)

If that is the case, then that concept is 'uncommon' at best and that won't work across networks. (UDP-broadcasting isn't forwarded on routers by default, as it is a common DoS-method.)

Normally you would send a Udp-broadcast towards the broadcast address (255.255.255.255) to hit every possible node within a subnet. (Which is the purpose of a broadcast.) You loop through all nodes that respond and perform relevant actions for the responses.

I'm not entirely familiar with HA's network stack/hygiene but I can imagine that this can causes issues. If you would remain within the same subnet, then technically it would work, but it's still an uncommon way of working.

What is the purpose of the discovery-packet towards the configured ip-address?

Edit: If my assumption is correct, you can replicate the behaviour yourself be defining 5 (or more) (fake) integrations with target ip-addresses outside HA's own subnet, where no UDP-broadcasting is forwarded. The more integrations you define the stronger the effect.

@gedger
Copy link

gedger commented Jan 5, 2025

In fact when the inverter goes offline it never stops trying, now at over 19,000 attempts!! I'm reverting back to a previous version....

IMG_0315

@davidrapan
Copy link
Owner

I'm gonna need full debug log from that stack to be able to do something about it.

@lenwar
Copy link
Author

lenwar commented Jan 7, 2025

@gedger
Hi,
Required key is not available in regards to network connections is usually related to UDP/TCP availability.

Do you have any specific non-standard (for consumers) network layout? Is your inverter in another network segment? Or do you have any type of firewalling/proxying going on around Home Assistant and/or your inverters?

So anything which is not "having HA in the same network segment than your inverter(s) without any firewall/proxy/etc"

@gedger
Copy link

gedger commented Jan 7, 2025

No, everything is on the same subnet, the inverter just goes off-line as it does every night. It's something that has been introduced in the last few releases as I reverted back to an earlier version and everything is fine. I'm rarely in a position to grab a debug log at sunset at the moment but I'll see what I can do. I can't leave debug logging on as when it happens it already swamps the system with normal logging and I suspect debug may well bring it to its knees. Once I can get a debug log I'll create a new issue as it is obviously something different to your issues. I think my slowdown is caused by the sheer mass of logging rather than anything else?

@davidrapan
Copy link
Owner

There really was not introduced anything new and this type of an error should not even happen in usual course of actions. That being said it could be thrown out I guess.

@lenwar
Copy link
Author

lenwar commented Jan 7, 2025

I did some tcpdumping:

For completeness:
My Home Assistant's subnet is:

Network: 192.168.60.16/29
IP-address: 192.168.60.18
(Which makes broadcast address: 192.168.60.23 )

My inverters are in:
192.168.60.32/27

tcpdump -nn -i eth0.616 'ip and src host 192.168.60.18 and (ether broadcast)' -vvvvv -X

12:00:52.691676 IP (tos 0x0, ttl 64, id 25501, offset 0, flags [DF], proto UDP (17), length 47)
    192.168.60.18.50057 > 192.168.60.23.48899: [udp sum ok] UDP, length 19
        0x0000:  4500 002f 639d 4000 4011 dda6 c0a8 3c12  E../c.@.@.....<.
        0x0010:  c0a8 3c17 c389 bf03 001b f779 5749 4649  ..<........yWIFI
        0x0020:  4b49 542d 3231 3430 3238 2d52 4541 44    KIT-214028-READ
12:00:55.931891 IP (tos 0x0, ttl 64, id 25752, offset 0, flags [DF], proto UDP (17), length 47)
    192.168.60.18.45045 > 192.168.60.23.48899: [udp sum ok] UDP, length 19
        0x0000:  4500 002f 6498 4000 4011 dcab c0a8 3c12  E../d.@.@.....<.
        0x0010:  c0a8 3c17 aff5 bf03 001b 0b0e 5749 4649  ..<.........WIFI
        0x0020:  4b49 542d 3231 3430 3238 2d52 4541 44    KIT-214028-READ
12:00:58.143712 IP (tos 0x0, ttl 64, id 26193, offset 0, flags [DF], proto UDP (17), length 47)
    192.168.60.18.33546 > 192.168.60.23.48899: [udp sum ok] UDP, length 19
        0x0000:  4500 002f 6651 4000 4011 daf2 c0a8 3c12  E../fQ@.@.....<.
        0x0010:  c0a8 3c17 830a bf03 001b 37f9 5749 4649  ..<.......7.WIFI
        0x0020:  4b49 542d 3231 3430 3238 2d52 4541 44    KIT-214028-READ
12:01:00.150557 IP (tos 0x0, ttl 64, id 26328, offset 0, flags [DF], proto UDP (17), length 47)
    192.168.60.18.46987 > 192.168.60.23.48899: [udp sum ok] UDP, length 19
        0x0000:  4500 002f 66d8 4000 4011 da6b c0a8 3c12  E../f.@[email protected]..<.
        0x0010:  c0a8 3c17 b78b bf03 001b 0378 5749 4649  ..<........xWIFI
        0x0020:  4b49 542d 3231 3430 3238 2d52 4541 44    KIT-214028-READ

Looking on Home Assistant, I see the same packages: (the bad udp chksum are normal on Linux when broadcasting UDP)

tcpdump -i end0 'ip and (ether broadcast)' -nn -vvvvv -X

12:17:17.050730 IP (tos 0x0, ttl 64, id 61931, offset 0, flags [DF], proto UDP (17), length 47)
    192.168.60.18.44536 > 192.168.60.23.48899: [udp sum ok] UDP, length 19
        0x0000:  4500 002f f1eb 4000 4011 4f58 c0a8 3c12  E../..@[email protected]..<.
        0x0010:  c0a8 3c17 adf8 bf03 001b 0d0b 5749 4649  ..<.........WIFI
        0x0020:  4b49 542d 3231 3430 3238 2d52 4541 44    KIT-214028-READ
12:17:19.713701 IP (tos 0x0, ttl 64, id 61999, offset 0, flags [DF], proto UDP (17), length 47)
    192.168.60.18.46588 > 192.168.60.23.48899: [udp sum ok] UDP, length 19
        0x0000:  4500 002f f22f 4000 4011 4f14 c0a8 3c12  E.././@[email protected]...<.
        0x0010:  c0a8 3c17 b5fc bf03 001b 0507 5749 4649  ..<.........WIFI
        0x0020:  4b49 542d 3231 3430 3238 2d52 4541 44    KIT-214028-READ
12:17:21.722759 IP (tos 0x0, ttl 64, id 62015, offset 0, flags [DF], proto UDP (17), length 47)
    192.168.60.18.33115 > 192.168.60.23.48899: [udp sum ok] UDP, length 19
        0x0000:  4500 002f f23f 4000 4011 4f04 c0a8 3c12  E../.?@[email protected]...<.
        0x0010:  c0a8 3c17 815b bf03 001b 39a8 5749 4649  ..<..[....9.WIFI
        0x0020:  4b49 542d 3231 3430 3238 2d52 4541 44    KIT-214028-READ
12:17:23.731956 IP (tos 0x0, ttl 64, id 62016, offset 0, flags [DF], proto UDP (17), length 47)
    192.168.60.18.33238 > 192.168.60.23.48899: [udp sum ok] UDP, length 19
        0x0000:  4500 002f f240 4000 4011 4f03 c0a8 3c12  E../.@@[email protected]...<.
        0x0010:  c0a8 3c17 81d6 bf03 001b 392d 5749 4649  ..<.......9-WIFI
        0x0020:  4b49 542d 3231 3430 3238 2d52 4541 44    KIT-214028-READ

So this looks okay. The nic of HA is actually forwarding the packages to my router. (so I think we can rule out some low-level thing in the operating system)
The wifikit-message appears to be specific for inverters??

Is there any reason why you are trying to 'broadcast' this, instead of 'multicasting' it, or even better unicasting it directly to the configured node? Is that something specific for the inverters?

When I look at the behaviour of the iOS-app (on my phone). I also get one broadcast-message when I start it up, but with the payload 'tryme', but it only tries once and then never again.

12:06:55.626671 IP (tos 0x0, ttl 64, id 55902, offset 0, flags [none], proto UDP (17), length 33)
    192.168.25.80.49999 > 192.168.25.255.48899: [udp sum ok] UDP, length 5
        0x0000:  4500 0021 da5e 0000 4011 ebcd c0a8 1950  E..!.^[email protected]
        0x0010:  c0a8 19ff c34f bf03 000d 7600 7472 796d  .....O....v.trym
        0x0020:  6500 0000 0000 0000 0000                 e.........

(( No broadcast-traffic is happening in the subnet in which my inverters live )). Been staring at it for a few minutes now. ))
Also. Home Assistant doesn't do any other broadcast traffic at all. (( Except for the Solarman-integration. ))

What exactly is the purpose of the UDP broadcast? I have commented it out as per your suggestion, and everything works fine then. It is impossible that it ever received any form of response ever, because of how my network is set up (it never reaches my inverters).

@davidrapan
Copy link
Owner

The wifikit-message appears to be specific for inverters?

Yes, it's specific to the hw of the stick.

Is there any reason why you are trying to 'broadcast' this, instead of 'multicasting' it, or even better unicasting it directly to the configured node? Is that something specific for the inverters?

Yes.

What exactly is the purpose of the UDP broadcast?

I thought it's obvious. Mainly inverter discovery but can also react to IP change or detect incorrect serial number configured. 😉

@lenwar
Copy link
Author

lenwar commented Jan 7, 2025

I thought it's obvious. Mainly inverter discovery but can also react to IP change or detect incorrect serial number configured.

Ah... Far too obvious for me to have thought of that 😃 duh... (I'm a Linux engineer, not a software developer, obviously 😉 )

I thought it's obvious. Mainly inverter discovery but can also react to IP change or detect incorrect serial number configured.

If you can make the discovery optional, then it would resolve the issue. (Give it a checkmark for detecting IP-changes or something like that, make a suggestion to give the inverter a static address or something like that, and so on) Technically, it would only be a workaround, of course.

I was just thinking. Could it be a bug in HA itself? (As broadcasting isn't common, it may not happen often.)

@davidrapan
Copy link
Owner

That wasn't meant to be an insult or anything. 😆

@gedger
Copy link

gedger commented Jan 7, 2025

I agree it seems unnecessary to do a full discovery every time you connect. In my view discovery should be an option at configuration time and once the inverter is discovered the address hard coded. 99% of routers will always give the same address to a device but it always make sense to me to fix the address.

@lenwar
Copy link
Author

lenwar commented Jan 7, 2025

That wasn't meant to be an insult or anything. 😆

I know. It probably got lost in translation somewhere. 😄

If you can make the (re)discovery optional, then the issue will be gone at least. It may be worth it to investigate further (outside of this Issue), as it could be a bug in HA itself?

@Nobeernogman
Copy link

Same problem here. My home-assistant will become very slow and unresponsive when my inverter is offline.

When I disable the addon, it's butter smooth again.

@desolator7
Copy link

Hi,

A few minutes ago, my last inverter came back online. From my angle, this resolves the issue.

For completeness: I installed the 24.12.22 version. This version causes issues.

I replaced api.py and discover.py with the current 'Main' version. This made the interface okay (Negligible delays. Probably only visible because I was focussed on it. My family didn't notice.) But there were still some delays in automations. (so, when pressing wall-buttons and so on. A little smaller than half a second, I think, but very visible.)

By removing the 'await' and 'discover()' phrases and commenting out line 110 and 111, everything is good. The interface is snappy (for as much as we can expect from the HA-interface ;) ) and the wall-buttons respond 'instantly'.

Hi, I have the same issues, editing this file solves them for me, it seems stable so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests