-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daily HA crash at 12am with HA 2024.12 #380
Comments
I am working on v3.1.4 fo get the Appple password login working again (it is using he old v3.0.5.9 method) and want to try changing the time the end-of-day tasks are done from midnight to 12:15 or another time to see if that solves your problem.. Edit icloud3/icloud3_main.py. Change line 604: From : To: Restart HA If that still have problems, try another time |
Thanks. I will and report back. I'm also planning to htop my system at 12am (and 12.15am) to see if I get any clues. |
changed the midnight setting to the following. Report back in the next couple of days.
|
I can confirm that the cpu/load spike is related to icloud. Happened at 12:15am since I changed the config. Same error in the HA log relating to the following. However, I have HA setup on test server with a copy of my prod config and it is not having the same issue so I am stumped. My prod server is rpi5 and test is rpi4 both running exact same version of raspbian etc.
|
One of the end of day tasks is doing some maintenance on the Waze History Database. If it needs to recalculate the time/distance, a message may be displayed in the first device’s info sensor. Check and disable the Waze History on the Configure Waze Screen and see if the problem occurs. |
You can keep using Waze. The db maintenance is controlled by the Waze history db option so that is the only one that really needs to be disabled |
Thanks. Understood. Ill turn it all off and then re-enable as we work it through. |
I am still getting these errors and a big load spike related to something that is happening with icloud3. Anything else that you can suggest I look into? It's just this one server/container as I can't reproduce it on either of my test setups. Very strange...
|
I found a problem with the waze history maintenance. If there were duplicate or incorrect errors in the database, the maintenance would delete them. If there were a lot of them, the updates were not being 'committed' and that could lead to error notifications. Unzip the file below into the icloud3/support directory and restart HA. Maybe the history database on your production system is fine but the one on your test system has these problems. |
Thanks. I didn't have time to make this change yesterday so I will see what happens tonight after I update waze history. Not sure if this helps but last night I had the issue and it caused the HA container to crash. So I have now reproduced the error that causes HA to crash and it happened at 12.15am and does seem to be related to waze/icloud. log from HA at the time:
icloud log at the time:
|
Sorry I forgot to say that I don't really need waze history so I can delete the database if that will help and is possible without causing other issues. For now I will just use the new waze_history.py and report back, |
Hey Gary - for some reason this update to waze_history causes issues with icloud services/actions. None are available after updating and this breaks a number of automations that rely on icloud3.action for my device trackers. This wouldn't be a problem if this issue was on my test setup but the issue is on prod... related log entries.
|
My bad. I sent you a waze_history.py file for v3.1.4. That would create all kinds of problems and iCloud3 would not start due to version incompatibilities. The v3.0.5.9 file is below. |
sorted. thanks. |
Not sure if this is related but my wife's iphone is no longer getting gps updates from icloud3. My iphone and my daughters is still working fine. All phones are all 'family sharing' to my icloud account and all worked previously. Location working fine in find my app - just not with icloud3. Note that this issue is not related to using the waze_history.py (as I noticed this issue yesterday). This error originated from a custom integration.
|
A quick guess would be that HA has appended a '_2' to device_tracker and sensor entities of your wife. iCloud3 is using device_tracker and sensor entities (without the '_2'). Check on 'developer tools > states` to see what entities HA is using. |
Nope. Her phone/setup does not use the HA app (just me but I don't have the xx_2 issue either) |
Check the iCloud3 entities, not the HA app entities. HA may have changed the ic3 entity names and added a _2 on a restart. Then they would both exist and ic3 would update device_tracker.wife_phone_2 while the Dashboard and automation would be looking at device_tracker.wife_phone. Same for the sensors. The best/easiest way to fix this is to disable the iCloud3 integration and restart hA 2 times (yes 2 times to clear it's cache). |
none of my icloud3 devices/entities have _2 at the end. They all look correct... |
That's actually good. I assume you mean that iCloud3 is no longer getting location updates from her phone. If the FindMy app on your phone shows her location, then her Location Sharing is enabled. Check the d_t entity assigned to her phone in Stage 4 & 5 when iC3 starts to make sure that is correct. Stage 4 shows the phone names Apple is sending to iC3. |
Maybe her phone needs a reboot. I will update it to latest iOS and see if this clears the issue |
Some progress... This morning at 12.15am the cpu/load spike was very short/quick. I am still seeing some messages relating to system load but not a message specific to waze. So I think there is still something weird going on. I will continue to monitor. Let me know if you have any other thoughts/suggestions.
|
... and in other good news. Updating my wife's phone to iOS 18.2 and a restart seems to have cleared the issue with location updates to icloud3. So I think this issue was a glitch on the client (iphone). |
I was mistaken about turning off the Waze history would stop the end of day maintenance running. It will run anyway on v3.0.5.9. I will change that on v3.1.4. Rename the .storage/icloud3/waze_location_history.db to something else and restart HA. If Waze history is disabled, it will not be recreated on a restart and that will prove that is the source. When it runs at night, it will call Waze to update the location time and distance info. That will issue http calls to Waze to do this. It wits for one update to finish before starting the next one and it might be that your system can not handle the load. My RPi 4 and 5 chugs along without any problems. The purpose of keeping the history is to eliminate Waze requests for the same locations as you drive. Especially when close to home with short location request intervals. |
thanks. I will fully disable and delete the db. Report back tomorrow |
More progress. After totally disabling waze and renaming the waze history db - no more cpu/load spike. So there does look like something has gone wrong with my waze db (but it only affects my prod system). My test systems both are direct copies of prod (including waze db) so it is still strange. I will re-enable waze when you release the next update to icloud3 and see if the issue is resolved. Will a new db for waze history be created if/when I reenable waze? |
@gcobb321 I am seeing this same behaviour - and it moved when you moved the db rebuild. I'm not going to troubleshoot further or press on the issue as I'm planning to move my setup off of my RPi to a MiniPC in the coming weeks. Just wanted you to be aware. |
It would be interesting to see what happens if you copy the .storage/icloud3/waze_location_history.db file from the production to the test system. In any case, I do not see any problems. It’s just cleaning up the database. |
Thanks for your detailed post and investigation. I have waze turned off and the problem (core dump/system crash) has not returned. I am not getting nearly the same load spike or cpu load warnings on my server. A service/integration that is doing maintenance (anything really) and it causes the container/service to crash - there is something going badly wrong. I'm happy to keep monitoring and do any testing if that could help to get it resolved. Happy New Year all. |
Hey Gary - hope all is well and you are having a nice holiday. Thanks as always for the great work on this integration.
I'm still investigating this issue and I am not 100% sure it is caused by icloud3 - but it might be worth looking into.
I am currently running 3.0.5.9 due to the issue with apple authentication.
I think HAOS 14 (so 2024.12.x) includes an updated version of python. Since updating to 2024.12 I have been experiencing an issue where my HA container crashes at 12am. It recovers/restarts but the crash is screwing with a number of my sensors that get reset at 12am. This is how I worked out I had an issue at all.
I have done a load of investigating and I have a feeling it could be related to the icloud3 integration. My clue is that I setup a test HA environment in a VM (HAOS) and have been running it in parallel with my prod system. I disabled icloud in my test environment and HA does not crash.
In my prod environment (where I am still experiencing this issue), I see in the HA log at the time it crashes the following entry.
2024-12-11 00:00:10.446 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140732429842848] Jago from 172.1.0.7: Client unable to keep up with pending messages. Reached 4096 pending messages. The system's load is too high or an integration is misbehaving; Last message was: b'{"type":"event","event":{"c":{"sensor.jago_iphone_info":{"+":{"lc":1733835610.4462795,"c":"01JERAFPAENT5WWBKX6QX48CMP"}}}},"id":7}'
Happy to provide additional info as needed. Good if you could take a look when you get back.
Today I have reverted back to 2024.11.3 in prod as I want to confirm the issue is linked to 2024.12 (HAOS 14).
The text was updated successfully, but these errors were encountered: