Runbook: NodeFilesystemFilesFillingUp

Alert Details

  • Alert Name: NodeFilesystemFilesFillingUp
  • Expression: predict_linear(node_filesystem_files_free{nanocosmosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"}[6h], 24*60*60) < 0 and node_filesystem_readonly{nanocosmosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"} == 0 and node_filesystem_files_free{nanocosmosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"} / node_filesystem_files{nanocosmosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"} * 100 <=

Description

This alert triggers when the number of free inodes (filesystem files) is predicted to run out within the next 24 hours. It checks if the filesystem is not read-only, predicts the trend of free inodes, and verifies if the percentage of free inodes is below a certain threshold.

Possible Causes

  • Large number of small files consuming inodes
  • Log files or temporary files not being cleaned up
  • Insufficient inode allocation during filesystem creation
  • Unused or orphaned files

Troubleshooting Steps

1. Check Inode Usage

Use the following command to check the inode usage on the affected instance:

df -i

Expected Output

You should see an output similar to this:

Filesystem      Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1      1000000  900000  100000   90% /
...

2. Identify Directories with High Inode Usage

To identify directories consuming a large number of inodes, use:

sudo find / -xdev -printf '%h\n' | sort | uniq -c | sort -k 1 -n | tail -n 20

Expected Output

This command lists the top 20 directories by inode usage:

100000 /var/log
50000  /home/user/temp
...

3. Clean Up Log Files

If log files are consuming a lot of inodes, consider cleaning them up. For example, to clear a specific log file:

sudo truncate -s 0 /var/log/large_log_file.log

Expected Output

The log file size should be reduced to 0 bytes:

ls -lh /var/log/large_log_file.log

4. Remove Unnecessary Files

Remove any unnecessary files or directories. For example:

sudo rm -rf /path/to/unnecessary/files

Expected Output

The specified files or directories should be deleted, freeing up inodes.

5. Check for Temporary Files

Clean up temporary files that may not be needed:

sudo rm -rf /tmp/*

Expected Output

The /tmp directory should be cleared of temporary files.

6. Recreate Filesystem with More Inodes

If the inode allocation is insufficient, consider recreating the filesystem with more inodes. This is a more drastic step and should be done with caution. Backup data before proceeding:

sudo mkfs.ext4 -N <number_of_inodes> /dev/sda1

Replace <number_of_inodes> with the desired number of inodes and /dev/sda1 with the actual device.

Additional Steps

1. Monitor Inode Usage

Continuously monitor inode usage to ensure it remains within acceptable limits. Use tools like prometheus and grafana to set up dashboards and alerts.

2. Automate Log Rotation

Set up log rotation to prevent log files from consuming too many inodes. Edit the logrotate configuration file (usually /etc/logrotate.conf or /etc/logrotate.d/*):

sudo nano /etc/logrotate.conf

Example Configuration

/var/log/large_log_file.log {
    daily
    rotate 7
    compress
    missingok
    notifempty
    create 0640 root utmp
    sharedscripts
    postrotate
        /usr/bin/systemctl reload rsyslog > /dev/null 2>&1 || true
    endscript
}

By following these steps, you should be able to troubleshoot and resolve the “NodeFilesystemFilesFillingUp” alert. If the issue persists, further investigation into the specific filesystem and its usage may be necessary.