Runbook: NodeFilesystemAlmostOutOfSpace
Alert Details
- Alert Name: NodeFilesystemAlmostOutOfSpace
- Expression:
node_filesystem_readonly{nanocosmosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"} == 0 and node_filesystem_avail_bytes{nanocosmosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"} / node_filesystem_size_bytes{nanocosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"} * 100 <=
Description
This alert triggers when the available space on a filesystem is critically low. It checks if the filesystem is not read-only and if the available space is below a certain threshold.
Possible Causes
- Large files consuming disk space
- Log files growing uncontrollably
- Temporary files not being cleaned up
- Insufficient disk space allocation
- Unused or orphaned files
Troubleshooting Steps
1. Check Disk Usage
Use the following command to check the disk usage on the affected instance:
df -h
Expected Output
You should see an output similar to this:
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 45G 5G 90% /
...
2. Identify Large Files
To identify large files consuming disk space, use:
sudo du -ah / | sort -rh | head -n 20
Expected Output
This command lists the top 20 largest files and directories:
45G /
10G /var/log
5G /home/user/largefile
...
3. Clean Up Log Files
If log files are consuming a lot of space, consider cleaning them up. For example, to clear a specific log file:
sudo truncate -s 0 /var/log/large_log_file.log
Expected Output
The log file size should be reduced to 0 bytes:
ls -lh /var/log/large_log_file.log
4. Remove Unnecessary Files
Remove any unnecessary files or directories. For example:
sudo rm -rf /path/to/unnecessary/files
Expected Output
The specified files or directories should be deleted, freeing up space.
5. Check for Temporary Files
Clean up temporary files that may not be needed:
sudo rm -rf /tmp/*
Expected Output
The /tmp directory should be cleared of temporary files.
6. Extend Disk Space
If the disk space is insufficient, consider extending the disk space. This can be done through your cloud provider’s management console or using command-line tools specific to your environment.
Additional Steps
1. Monitor Disk Usage
Continuously monitor disk usage to ensure it remains within acceptable limits. Use tools like prometheus and grafana to set up dashboards and alerts.
2. Automate Log Rotation
Set up log rotation to prevent log files from growing uncontrollably. Edit the logrotate configuration file (usually /etc/logrotate.conf or /etc/logrotate.d/*):
sudo nano /etc/logrotate.conf
Example Configuration
/var/log/large_log_file.log {
daily
rotate 7
compress
missingok
notifempty
create 0640 root utmp
sharedscripts
postrotate
/usr/bin/systemctl reload rsyslog > /dev/null 2>&1 || true
endscript
}
By following these steps, you should be able to troubleshoot and resolve the “NodeFilesystemAlmostOutOfSpace” alert. If the issue persists, further investigation into the specific filesystem and its usage may be necessary.