Runbook: NodeFilesystemFilesFillingUp
Alert Details
- Alert Name: NodeFilesystemFilesFillingUp
- Expression:
predict_linear(node_filesystem_files_free{nanocosmosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"}[6h], 24*60*60) < 0 and node_filesystem_readonly{nanocosmosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"} == 0 and node_filesystem_files_free{nanocosmosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"} / node_filesystem_files{nanocosmosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"} * 100 <=
Description
This alert triggers when the number of free inodes (filesystem files) is predicted to run out within the next 24 hours. It checks if the filesystem is not read-only, predicts the trend of free inodes, and verifies if the percentage of free inodes is below a certain threshold.
Possible Causes
- Large number of small files consuming inodes
- Log files or temporary files not being cleaned up
- Insufficient inode allocation during filesystem creation
- Unused or orphaned files
Troubleshooting Steps
1. Check Inode Usage
Use the following command to check the inode usage on the affected instance:
df -i
Expected Output
You should see an output similar to this:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 1000000 900000 100000 90% /
...
2. Identify Directories with High Inode Usage
To identify directories consuming a large number of inodes, use:
sudo find / -xdev -printf '%h\n' | sort | uniq -c | sort -k 1 -n | tail -n 20
Expected Output
This command lists the top 20 directories by inode usage:
100000 /var/log
50000 /home/user/temp
...
3. Clean Up Log Files
If log files are consuming a lot of inodes, consider cleaning them up. For example, to clear a specific log file:
sudo truncate -s 0 /var/log/large_log_file.log
Expected Output
The log file size should be reduced to 0 bytes:
ls -lh /var/log/large_log_file.log
4. Remove Unnecessary Files
Remove any unnecessary files or directories. For example:
sudo rm -rf /path/to/unnecessary/files
Expected Output
The specified files or directories should be deleted, freeing up inodes.
5. Check for Temporary Files
Clean up temporary files that may not be needed:
sudo rm -rf /tmp/*
Expected Output
The /tmp directory should be cleared of temporary files.
6. Recreate Filesystem with More Inodes
If the inode allocation is insufficient, consider recreating the filesystem with more inodes. This is a more drastic step and should be done with caution. Backup data before proceeding:
sudo mkfs.ext4 -N <number_of_inodes> /dev/sda1
Replace <number_of_inodes> with the desired number of inodes and /dev/sda1 with the actual device.
Additional Steps
1. Monitor Inode Usage
Continuously monitor inode usage to ensure it remains within acceptable limits. Use tools like prometheus and grafana to set up dashboards and alerts.
2. Automate Log Rotation
Set up log rotation to prevent log files from consuming too many inodes. Edit the logrotate configuration file (usually /etc/logrotate.conf or /etc/logrotate.d/*):
sudo nano /etc/logrotate.conf
Example Configuration
/var/log/large_log_file.log {
daily
rotate 7
compress
missingok
notifempty
create 0640 root utmp
sharedscripts
postrotate
/usr/bin/systemctl reload rsyslog > /dev/null 2>&1 || true
endscript
}
By following these steps, you should be able to troubleshoot and resolve the “NodeFilesystemFilesFillingUp” alert. If the issue persists, further investigation into the specific filesystem and its usage may be necessary.