ExporterDown
Runbook: ExporterDown
Description
This alert triggers when the exporter for the health-check-proxy job is down across all instances in the specified nanocosmosGroup and environment.
Possible Causes:
- The exporter service is not running.
- Network issues preventing the exporter from reporting.
- Configuration errors in the exporter setup.
Severity estimation
This alert is critical and needs to be resolved. This exporter alert exposees the metric which shows if a server is in maintenance.
Troubleshooting steps
-
Check Exporter Service Status
- Command / Action:
- access terminal session on server and check systemd service status
systemctl status hcp
- Expected Result:
- service is running
1 2 3 4 5 6 7 8 9$ systemctl status hcp ● hcp.service - health check proxy Loaded: loaded (/etc/systemd/system/hcp.service; enabled; vendor preset: e> Active: active (running) since Sun 2025-07-27 08:51:32 UTC; 1 weeks 2 days> Main PID: 2225299 (hcp) Tasks: 8 (limit: 9417) Memory: 150.5M CGroup: /system.slice/hcp.service └─2225299 /opt/nanostream/hcp/hcp
- Command / Action:
-
Restart Exporter Service
- Command / Action:
- if hcp is not running, try to restart the service
sudo systemctl restart hcp
- Command / Action:
-
Check Logs for Errors
- Command / Action:
- Look for any error messages which describe
journalctl -u hcp --since "1 hour ago"
- Expected Result:
- Command / Action:
-
Check Network Connectivity
- Command / Action:
ping -c 4 $<hostname|IP>
- Example:
1 2 3 4 5 6 7 8 9$ ping -c 4 t3b-vtrans-sa-gc-gru-02.vtrans-b.nanocosmos.de PING t3b-vtrans-sa-gc-gru-02.vtrans-b.nanocosmos.de (213.156.149.183) 56(84) bytes of data. 64 bytes from 213.156.149.183: icmp_seq=1 ttl=48 time=216 ms 64 bytes from 213.156.149.183: icmp_seq=2 ttl=48 time=215 ms 64 bytes from 213.156.149.183: icmp_seq=3 ttl=48 time=218 ms 64 bytes from 213.156.149.183: icmp_seq=4 ttl=48 time=215 ms t3b-vtrans-sa-gc-gru-02.vtrans-b.nanocosmos.de ping statistics 4 packets transmitted, 4 received, 0% packet loss, time 3004ms rtt min/avg/max/mdev = 214.672/215.831/217.789/1.177 ms
- Command / Action:
-
Verify Exporter Configuration
- Command / Action:
cat /etc/nanostream/hcp/config.yml
- Expected Output:
- Configuration file contents. Ensure all settings are correct.
- Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26$ cat /etc/exporter/config.yml Server: port: 9590 Elastic: url: https://api.elasticsearch.fsn.hz.k8s.nanostream.cloud user: $secret password: $secret index: streamcloud_accounting timePeriode: 600 services: h5live;nginx-rtmp LogFile: /var/log/hcp/hcp.log MaintenanceFile: /var/www/maintenance TestStream: url: https://bintu-play.nanocosmos.de/h5live/http/stream.mp4?url=rtmp://bintu-play.nanocosmos.de:1935/play&stream=CD6oL-2kE1g duration: 5 interval: 15 PromHealth: url: https://mimir.nanocosmos.cloud urlPath: /prometheus query: (group by (instance, environment, component, geoCluster, nanocosmosGroup, datacenterRegion) (ALERTS{instance="%s",alertstate="firing", health="unhealthy", nanocosmosGroup="streamcloud"}) ) or ( group by (instance, environment, component, geoCluster, nanocosmosGroup, datacenterRegion) ( up{nanocosmosGroup="streamcloud",instance="%s"}) ) * 0 user: $secret pass: $secret interval: 60 NetworkSpeedFile: /etc/nanostream/hcp/networkspeed Debug: on: false
- Command / Action:
Additional resources
Dashbords:
Git repos: