Runbook: FilebeatWriteRateStagnating Alert

Alert Details

  • Alert Name: FilebeatWriteRateStagnating
  • Expression: rate(filebeat_libbeat_output_write_bytes_total{instance=~".+", environment=~".+"}[5m]) == 0

Description

This alert is triggered when the write rate of Filebeat (filebeat_libbeat_output_write_bytes_total) over a 5-minute window is equal to zero for any instance within a specific environment (environment). This indicates that Filebeat is not writing any data, which could be due to various issues.

Possible Causes

  1. Filebeat service is not running.
  2. Network issues affecting the connection to the output destination (e.g., Elasticsearch).
  3. Misconfiguration of Filebeat.
  4. No new log data to process.
  5. Resource constraints on the instance running Filebeat.

Troubleshooting Steps

1. Check Filebeat Service Status

Ensure that the Filebeat service is running on the affected instance.

# Example: Check the status of the Filebeat service
ssh <instance_hostname_or_ip> 'systemctl status filebeat'

Expected Output:

● filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
   Loaded: loaded (/etc/systemd/system/filebeat.service; enabled; vendor preset: enabled)
   Active: active (running) since <date>; <time> ago
...

2. Start Filebeat Service

If the Filebeat service is not running, start it.

# Example: Start the Filebeat service
ssh <instance_hostname_or_ip> 'sudo systemctl start filebeat'

Expected Output:

Job for filebeat.service started successfully.

3. Check Filebeat Logs

Review the Filebeat logs for any errors or warnings.

# Example: Review Filebeat logs
ssh <instance_hostname_or_ip> 'sudo journalctl -u filebeat --since "1 hour ago"'

Expected Output:

Nov 13 12:00:00 <hostname> filebeat[1234]: Starting filebeat.
Nov 13 12:00:01 <hostname> filebeat[1234]: <Log message>
...

4. Verify Network Connectivity

Ensure that the instance can reach the output destination (e.g., Elasticsearch).

# Example: Check network connectivity to the output destination
ping <output_destination_hostname_or_ip>

Expected Output:

PING <output_destination_hostname_or_ip> (<ip_address>) 56(84) bytes of data.
64 bytes from <output_destination_hostname_or_ip>: icmp_seq=1 ttl=64 time=0.123 ms
...

5. Check Filebeat Configuration

Ensure that the Filebeat configuration is correct and points to the right output destination.

# Example: Check Filebeat configuration
ssh <instance_hostname_or_ip> 'cat /etc/filebeat/filebeat.yml'

Expected Output:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log
...
output.elasticsearch:
  hosts: ["http://<output_destination_hostname_or_ip>:9200"]
...

6. Verify Log Data Availability

Ensure that there is new log data for Filebeat to process.

# Example: Check for new log data
ssh <instance_hostname_or_ip> 'ls -l /var/log/*.log'

Expected Output:

-rw-r--r-- 1 root root 12345 Nov 13 13:00 /var/log/syslog
-rw-r--r-- 1 root root 67890 Nov 13 13:00 /var/log/auth.log
...

Additional Steps

If the issue persists, consider:

  • Restarting the affected instance.
  • Checking for resource constraints on the instance.
  • Contacting the network or system administrator for further investigation.