Runbook: ExporterDown Alert

Alert Details

  • Alert Name: ExporterDown
  • Expression: filebeat_up{job="filebeat-metrics", instance=~".+", environment=~".+"} == 0

Description

This alert is triggered when the filebeat_up metric for any instance within the filebeat-metrics job and a specific environment (environment) is equal to zero. This indicates that the Filebeat exporter is down or not reporting metrics.

Possible Causes

  1. The Filebeat service is not running.
  2. Network issues affecting the reachability of the instance.
  3. Misconfiguration of the Filebeat exporter.
  4. Resource constraints on the instance.
  5. Firewall or security group rules blocking the exporter.

Troubleshooting Steps

1. Check Filebeat Service Status

Ensure that the Filebeat service is running on the affected instance.

# Example: Check the status of the Filebeat service
ssh <instance_hostname_or_ip> 'systemctl status filebeat'

Expected Output:

● filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
   Loaded: loaded (/etc/systemd/system/filebeat.service; enabled; vendor preset: enabled)
   Active: active (running) since <date>; <time> ago
...

2. Start Filebeat Service

If the Filebeat service is not running, start it.

# Example: Start the Filebeat service
ssh <instance_hostname_or_ip> 'sudo systemctl start filebeat'

Expected Output:

Job for filebeat.service started successfully.

3. Check Filebeat Logs

Review the Filebeat logs for any errors or warnings.

# Example: Review Filebeat logs
ssh <instance_hostname_or_ip> 'sudo journalctl -u filebeat --since "1 hour ago"'

Expected Output:

Nov 13 12:00:00 <hostname> filebeat[1234]: Starting filebeat.
Nov 13 12:00:01 <hostname> filebeat[1234]: <Log message>
...

4. Verify Network Connectivity

Ensure that the instance is reachable over the network.

# Example: Check network connectivity to the instance
ping <instance_hostname_or_ip>

Expected Output:

PING <instance_hostname_or_ip> (<ip_address>) 56(84) bytes of data.
64 bytes from <instance_hostname_or_ip>: icmp_seq=1 ttl=64 time=0.123 ms
...

5. Check Filebeat Configuration

Ensure that the Filebeat configuration is correct.

# Example: Check Filebeat configuration
ssh <instance_hostname_or_ip> 'cat /etc/filebeat/filebeat.yml'

Expected Output:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log
...
output.elasticsearch:
  hosts: ["http://localhost:9200"]
...

6. Review Prometheus Configuration

Ensure that Prometheus is correctly configured to scrape metrics from the Filebeat exporter.

# Example: Check Prometheus configuration
cat /etc/prometheus/prometheus.yml | grep -A 10 'scrape_configs:'

Expected Output:

scrape_configs:
  - job_name: 'filebeat-metrics'
    static_configs:
      - targets: ['<instance_hostname_or_ip>:<port>']
...

Additional Steps

If the issue persists, consider:

  • Restarting the affected instance.
  • Checking for resource constraints on the instance.
  • Contacting the network or system administrator for further investigation.