Maintenance
Runbook: Maintenance
Description
This alert triggers when the exporter for the health-check-proxy detects that a host is in maintenance.
Possible Causes
- Server is in planned maintenance mode
- Maintenance file was not removed after maintenance
Severity estimation
This alert is not critical unless the majority of a specific streamcloud component in a geoCluster is in maintenance or the work load for streamcloud componentsin a geoCluster is too high.
Troubleshooting Steps
-
Check maintenance dashboard
- Action:
- use the Streamcloud Load Overview dashboard to get an overview of how many hosts are in maintenance
- Action:
-
Check for Maintenance announcements in CloudStatus channel
- Action:
- check the CloudStatus channel in Mattermost for any deployment or maintenance announcements
- Action:
-
Check alerts
- Action:
- check currently firing alerts and the alert history of the server with the Host Alert Overview dashboard in grafana
- Action:
-
Remove maintenance file if certain that no reason for maintenance exists
- Command / Action:
- before removing the maintenance file check for firing alerts or the alert history to figure out the status of the host
sudo rm /var/www/maintenance
- Command / Action:
Additional resources
Grafana dashboards: