Runbook: FilebeatReadRateStagnating Alert

Alert Details

  • Alert Name: FilebeatReadRateStagnating
  • Expression: rate(filebeat_libbeat_output_read_bytes_total{instance=~".+", environment=~".+"}[5m]) == 0

Description

This alert is triggered when the read rate of Filebeat (filebeat_libbeat_output_read_bytes_total) over a 5-minute window is equal to zero for any instance within a specific environment (environment). This indicates that Filebeat is not reading any data, which could be due to various issues.

Possible Causes

  1. Filebeat service is not running.
  2. Network issues affecting the connection to the input source.
  3. Misconfiguration of Filebeat.
  4. No new log data to read.
  5. Resource constraints on the instance running Filebeat.

Troubleshooting Steps

1. Check Filebeat Service Status

Ensure that the Filebeat service is running on the affected instance.

# Example: Check the status of the Filebeat service
ssh <instance_hostname_or_ip> 'systemctl status filebeat'

Expected Output:

● filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
   Loaded: loaded (/etc/systemd/system/filebeat.service; enabled; vendor preset: enabled)
   Active: active (running) since <date>; <time> ago
...

2. Start Filebeat Service

If the Filebeat service is not running, start it.

# Example: Start the Filebeat service
ssh <instance_hostname_or_ip> 'sudo systemctl start filebeat'

Expected Output:

Job for filebeat.service started successfully.

3. Check Filebeat Logs

Review the Filebeat logs for any errors or warnings.

# Example: Review Filebeat logs
ssh <instance_hostname_or_ip> 'sudo journalctl -u filebeat --since "1 hour ago"'

Expected Output:

Nov 13 12:00:00 <hostname> filebeat[1234]: Starting filebeat.
Nov 13 12:00:01 <hostname> filebeat[1234]: <Log message>
...

4. Verify Network Connectivity

Ensure that the instance can reach the input source.

# Example: Check network connectivity to the input source
ping <input_source_hostname_or_ip>

Expected Output:

PING <input_source_hostname_or_ip> (<ip_address>) 56(84) bytes of data.
64 bytes from <input_source_hostname_or_ip>: icmp_seq=1 ttl=64 time=0.123 ms
...

5. Check Filebeat Configuration

Ensure that the Filebeat configuration is correct and points to the right input source.

# Example: Check Filebeat configuration
ssh <instance_hostname_or_ip> 'cat /etc/filebeat/filebeat.yml'

Expected Output:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log
...
output.elasticsearch:
  hosts: ["http://<output_destination_hostname_or_ip>:9200"]
...

6. Verify Log Data Availability

Ensure that there is new log data for Filebeat to read.

# Example: Check for new log data
ssh <instance_hostname_or_ip> 'ls -l /var/log/*.log'

Expected Output:

-rw-r--r-- 1 root root 12345 Nov 13 13:00 /var/log/syslog
-rw-r--r-- 1 root root 67890 Nov 13 13:00 /var/log/auth.log
...

Additional Steps

If the issue persists, consider:

  • Restarting the affected instance.
  • Checking for resource constraints on the instance.
  • Contacting the network or system administrator for further investigation.