Runbook: ExporterDown

Alert Details

Alert Name: ExporterDown
Expression: sum by (hostname) (up{job=~".+", nanocosmosGroup=~".+", instance=~".+", environment=~".+"}) == 0

Description

This alert triggers when one or more exporters are down. Exporters are responsible for collecting and exposing metrics to Prometheus. If no instances of a job are up, the alert will be triggered.

Possible Causes

Exporter service is not running
Network issues preventing Prometheus from reaching the exporter
Configuration errors in the exporter or Prometheus
Resource constraints on the host running the exporter
Firewall or security group rules blocking access

Troubleshooting Steps

1. Check the Status of Exporters

Use the following command to check the status of the exporters:

curl -s http://<exporter_host>:<exporter_port>/metrics | grep up

Expected Output

If the exporter is up, you should see something like:

up{instance="<exporter_host>:<exporter_port>",job="<job_name>"} 1

If the exporter is down, the output will be:

up{instance="<exporter_host>:<exporter_port>",job="<job_name>"} 0

2. Restart the Exporter Service

If the exporter is down, restart the service. For example, if you are using a Node Exporter, you can restart it with:

sudo systemctl restart node_exporter

Expected Output

Check the status again to ensure the exporter is up:

sudo systemctl status node_exporter

You should see an output indicating that the service is active and running.

3. Check Logs for Errors

If the exporter does not start, check the logs for any errors. For example, for Node Exporter:

sudo journalctl -u node_exporter

Expected Output

Look for any error messages that might indicate why the exporter is failing to start. Common issues include configuration errors, missing dependencies, or port conflicts.

4. Verify Network Connectivity

Ensure that the network connectivity between Prometheus and the exporter is intact. You can use tools like ping or telnet:

ping <exporter_host>
telnet <exporter_host> <exporter_port>

Expected Output

ping should show successful replies.
telnet should establish a connection to the specified port.

5. Update Prometheus Configuration

If the exporter has been moved to a different host or port, update the Prometheus configuration and reload it:

Edit the Prometheus configuration file (usually prometheus.yml):

- job_name: '<job_name>'
  static_configs:
  - targets: ['<new_exporter_host>:<new_exporter_port>']

Reload Prometheus configuration:

curl -X POST http://<prometheus_host>:<prometheus_port>/-/reload

Expected Output

Prometheus should reload the configuration without errors, and the exporter should be scraped successfully.

Additional Steps

1. Monitor Exporter Performance

Continuously monitor the performance and availability of the exporter to ensure it remains operational. Use tools like prometheus and grafana to set up dashboards and alerts.

2. Scale Exporter Instances

If the exporter frequently goes down due to high load, consider scaling the number of exporter instances to distribute the load.

By following these steps, you should be able to troubleshoot and resolve the “ExporterDown” alert. If the issue persists, further investigation into the specific exporter and its environment may be necessary.

Runbook: ExporterDown#

Alert Details#

Description#

Possible Causes#

Troubleshooting Steps#

1. Check the Status of Exporters#

Expected Output#

2. Restart the Exporter Service#

Expected Output#

3. Check Logs for Errors#

Expected Output#

4. Verify Network Connectivity#

Expected Output#

5. Update Prometheus Configuration#

Expected Output#

Additional Steps#

1. Monitor Exporter Performance#

2. Scale Exporter Instances#

Runbook: ExporterDown

Alert Details

Description

Possible Causes

Troubleshooting Steps

1. Check the Status of Exporters

Expected Output

2. Restart the Exporter Service

Expected Output

3. Check Logs for Errors

Expected Output

4. Verify Network Connectivity

Expected Output

5. Update Prometheus Configuration

Expected Output

Additional Steps

1. Monitor Exporter Performance

2. Scale Exporter Instances