Runbook: ExporterDown Alert
Alert Details
- Alert Name: ExporterDown
- Expression:
filebeat_up{job="filebeat-metrics", instance=~".+", environment=~".+"} == 0
Description
This alert is triggered when the filebeat_up metric for any instance within the filebeat-metrics job and a specific environment (environment) is equal to zero. This indicates that the Filebeat exporter is down or not reporting metrics.
Possible Causes
- The Filebeat service is not running.
- Network issues affecting the reachability of the instance.
- Misconfiguration of the Filebeat exporter.
- Resource constraints on the instance.
- Firewall or security group rules blocking the exporter.
Troubleshooting Steps
1. Check Filebeat Service Status
Ensure that the Filebeat service is running on the affected instance.
# Example: Check the status of the Filebeat service
ssh <instance_hostname_or_ip> 'systemctl status filebeat'
Expected Output:
● filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
Loaded: loaded (/etc/systemd/system/filebeat.service; enabled; vendor preset: enabled)
Active: active (running) since <date>; <time> ago
...
2. Start Filebeat Service
If the Filebeat service is not running, start it.
# Example: Start the Filebeat service
ssh <instance_hostname_or_ip> 'sudo systemctl start filebeat'
Expected Output:
Job for filebeat.service started successfully.
3. Check Filebeat Logs
Review the Filebeat logs for any errors or warnings.
# Example: Review Filebeat logs
ssh <instance_hostname_or_ip> 'sudo journalctl -u filebeat --since "1 hour ago"'
Expected Output:
Nov 13 12:00:00 <hostname> filebeat[1234]: Starting filebeat.
Nov 13 12:00:01 <hostname> filebeat[1234]: <Log message>
...
4. Verify Network Connectivity
Ensure that the instance is reachable over the network.
# Example: Check network connectivity to the instance
ping <instance_hostname_or_ip>
Expected Output:
PING <instance_hostname_or_ip> (<ip_address>) 56(84) bytes of data.
64 bytes from <instance_hostname_or_ip>: icmp_seq=1 ttl=64 time=0.123 ms
...
5. Check Filebeat Configuration
Ensure that the Filebeat configuration is correct.
# Example: Check Filebeat configuration
ssh <instance_hostname_or_ip> 'cat /etc/filebeat/filebeat.yml'
Expected Output:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/*.log
...
output.elasticsearch:
hosts: ["http://localhost:9200"]
...
6. Review Prometheus Configuration
Ensure that Prometheus is correctly configured to scrape metrics from the Filebeat exporter.
# Example: Check Prometheus configuration
cat /etc/prometheus/prometheus.yml | grep -A 10 'scrape_configs:'
Expected Output:
scrape_configs:
- job_name: 'filebeat-metrics'
static_configs:
- targets: ['<instance_hostname_or_ip>:<port>']
...
Additional Steps
If the issue persists, consider:
- Restarting the affected instance.
- Checking for resource constraints on the instance.
- Contacting the network or system administrator for further investigation.