Runbook: NodeFilesystemAlmostOutOfSpace

Alert Details

  • Alert Name: NodeFilesystemAlmostOutOfSpace
  • Expression: node_filesystem_readonly{nanocosmosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"} == 0 and node_filesystem_avail_bytes{nanocosmosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"} / node_filesystem_size_bytes{nanocosGroup=~".+", instance=~".+", fsSelector=~".+", diskDeviceSelector=~".+", environment=~".+"} * 100 <=

Description

This alert triggers when the available space on a filesystem is critically low. It checks if the filesystem is not read-only and if the available space is below a certain threshold.

Possible Causes

  • Large files consuming disk space
  • Log files growing uncontrollably
  • Temporary files not being cleaned up
  • Insufficient disk space allocation
  • Unused or orphaned files

Troubleshooting Steps

1. Check Disk Usage

Use the following command to check the disk usage on the affected instance:

df -h

Expected Output

You should see an output similar to this:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   45G  5G   90% /
...

2. Identify Large Files

To identify large files consuming disk space, use:

sudo du -ah / | sort -rh | head -n 20

Expected Output

This command lists the top 20 largest files and directories:

45G    /
10G    /var/log
5G     /home/user/largefile
...

3. Clean Up Log Files

If log files are consuming a lot of space, consider cleaning them up. For example, to clear a specific log file:

sudo truncate -s 0 /var/log/large_log_file.log

Expected Output

The log file size should be reduced to 0 bytes:

ls -lh /var/log/large_log_file.log

4. Remove Unnecessary Files

Remove any unnecessary files or directories. For example:

sudo rm -rf /path/to/unnecessary/files

Expected Output

The specified files or directories should be deleted, freeing up space.

5. Check for Temporary Files

Clean up temporary files that may not be needed:

sudo rm -rf /tmp/*

Expected Output

The /tmp directory should be cleared of temporary files.

6. Extend Disk Space

If the disk space is insufficient, consider extending the disk space. This can be done through your cloud provider’s management console or using command-line tools specific to your environment.

Additional Steps

1. Monitor Disk Usage

Continuously monitor disk usage to ensure it remains within acceptable limits. Use tools like prometheus and grafana to set up dashboards and alerts.

2. Automate Log Rotation

Set up log rotation to prevent log files from growing uncontrollably. Edit the logrotate configuration file (usually /etc/logrotate.conf or /etc/logrotate.d/*):

sudo nano /etc/logrotate.conf

Example Configuration

/var/log/large_log_file.log {
    daily
    rotate 7
    compress
    missingok
    notifempty
    create 0640 root utmp
    sharedscripts
    postrotate
        /usr/bin/systemctl reload rsyslog > /dev/null 2>&1 || true
    endscript
}

By following these steps, you should be able to troubleshoot and resolve the “NodeFilesystemAlmostOutOfSpace” alert. If the issue persists, further investigation into the specific filesystem and its usage may be necessary.