description |
---|
This page is designed to be printed |
As much as we don't want it to happen, somethings things die. Sometimes power runs out.
This documentation can be followed for the below scenarios
- Power outage
- Server hardware failure
-
Press the display button on the UPS and confirm
- It is powered on
- It has power from mains
-
Check all infrastructure is powered on (look for power lights)
Refer to Physical Hardware section on the left
-
Remove faceplate from NTD and confirm powered on
-
Confirm networking equipment is powered on
- Network Test
- Ping 8.8.8.8 to confirm internet is working
- Ping google.com to confirm external DNS is working
- Ping setup.ui.com to confirm internal DNS is working
- Confirm Proxmox VE is accessible
- Internal link loads login page (use Linux credentials)
- All storage pools are online
- VM's show and are booting (there is a delay between boots so some may be on, others off)
- Confirm Proxmox Backup Server is accessible
- Internal link loads login page (use Linux credentials)
- All storage pools are online
- Confirm the NAS is accessible (creds in vault)
- Open Storage Manager and confirm 'system is healthy'
- Log into UptimeKuma and confirm that all services are green. It may take 15 minutes for them all to report as online
- Confirm servers are reporting data back to NetData and check for any alerts
Alerts related to disk backlog, IO delay or disk usage can be ignored for now. Backlog and IO delay can be caused by multiple VM's starting up ay once
An excessive, but very thorough way, to check all services are online is to go through each page in this doco and trying to access any "link to app" links
Unfortunately I'm unable to write specific doco here as there is to much to capture. Please refer to the troubleshooting section on the left panel and/or the hints below
- Compare the down services against the Cloudflare tunnels - are they all on the 1 server?