When StatusCake raises an alert about a library site being down.
- Determine whether the site is accessible by accessing the
/health
endpoint for the site - Note the response
- Log into StatusCake
- Locate the uptime test for the library site in question
- Find the downtime root cause for alert based on the time the alert.
- Click Extra details
- Note the error
- Log into Grafana
- Click Explore
- Add a label filter with the label namespace and value being the name of the site being down.
- Add a label filter with the label app and value php
- Set the time range to include the start of the alert
- Note any errors
These are a list of known problems and how to address them.
- StatusCake reports Request timeout and
EAI_AGAIN
additional data.
You can ignore this error. StatusCake is experiencing DNS lookup issues. These are outside the scope of the platform.
/health
reports HTTP status code 500- Grafana logs contain PHP exceptions
- Log into Lagoon UI
- Locate the project in question
- Locate the environment in question
- Locate Tasks
- Run the task Clear Drupal caches for the environment in question
- Wait for the task to finish
- Verify that
/health
reports HTTP status code 200
/health
reports HTTP status code 500- Grafana logs contain
PDOException: SQLSTATE[HY000]
Do nothing. This is likely caused by a restart of the underlying database. Experience shows that it takes about 20 minutes for the restart to complete.
Note that such errors will affect all sites running on the platform and will result in multiple alerts being raised.