Runbook: FilebeatReadRateStagnating Alert
Alert Details
- Alert Name: FilebeatReadRateStagnating
- Expression:
rate(filebeat_libbeat_output_read_bytes_total{instance=~".+", environment=~".+"}[5m]) == 0
Description
This alert is triggered when the read rate of Filebeat (filebeat_libbeat_output_read_bytes_total) over a 5-minute window is equal to zero for any instance within a specific environment (environment). This indicates that Filebeat is not reading any data, which could be due to various issues.
Possible Causes
- Filebeat service is not running.
- Network issues affecting the connection to the input source.
- Misconfiguration of Filebeat.
- No new log data to read.
- Resource constraints on the instance running Filebeat.
Troubleshooting Steps
1. Check Filebeat Service Status
Ensure that the Filebeat service is running on the affected instance.
# Example: Check the status of the Filebeat service
ssh <instance_hostname_or_ip> 'systemctl status filebeat'
Expected Output:
● filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
Loaded: loaded (/etc/systemd/system/filebeat.service; enabled; vendor preset: enabled)
Active: active (running) since <date>; <time> ago
...
2. Start Filebeat Service
If the Filebeat service is not running, start it.
# Example: Start the Filebeat service
ssh <instance_hostname_or_ip> 'sudo systemctl start filebeat'
Expected Output:
Job for filebeat.service started successfully.
3. Check Filebeat Logs
Review the Filebeat logs for any errors or warnings.
# Example: Review Filebeat logs
ssh <instance_hostname_or_ip> 'sudo journalctl -u filebeat --since "1 hour ago"'
Expected Output:
Nov 13 12:00:00 <hostname> filebeat[1234]: Starting filebeat.
Nov 13 12:00:01 <hostname> filebeat[1234]: <Log message>
...
4. Verify Network Connectivity
Ensure that the instance can reach the input source.
# Example: Check network connectivity to the input source
ping <input_source_hostname_or_ip>
Expected Output:
PING <input_source_hostname_or_ip> (<ip_address>) 56(84) bytes of data.
64 bytes from <input_source_hostname_or_ip>: icmp_seq=1 ttl=64 time=0.123 ms
...
5. Check Filebeat Configuration
Ensure that the Filebeat configuration is correct and points to the right input source.
# Example: Check Filebeat configuration
ssh <instance_hostname_or_ip> 'cat /etc/filebeat/filebeat.yml'
Expected Output:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/*.log
...
output.elasticsearch:
hosts: ["http://<output_destination_hostname_or_ip>:9200"]
...
6. Verify Log Data Availability
Ensure that there is new log data for Filebeat to read.
# Example: Check for new log data
ssh <instance_hostname_or_ip> 'ls -l /var/log/*.log'
Expected Output:
-rw-r--r-- 1 root root 12345 Nov 13 13:00 /var/log/syslog
-rw-r--r-- 1 root root 67890 Nov 13 13:00 /var/log/auth.log
...
Additional Steps
If the issue persists, consider:
- Restarting the affected instance.
- Checking for resource constraints on the instance.
- Contacting the network or system administrator for further investigation.